Getting Started
Get webclaw installed and extract your first page in under a minute. Choose from cargo install, building from source, or Docker.
Installation
From crates.io
The fastest way to install. Requires a working Rust toolchain.
From source
Clone the repository and build all three binaries in release mode.
The binaries will be at target/release/webclaw, target/release/webclaw-server, and target/release/webclaw-mcp.
rustls and h2 forks for Impit TLS impersonation. These are configured via [patch.crates-io] in the workspace Cargo.toml -- no manual setup needed.Docker
Pull the official image and run the API server in a container.
The server will be available at http://localhost:3000. Add -e WEBCLAW_API_KEY=your_key to enable authentication.
Your first extraction
The simplest usage: pass a URL. webclaw extracts the main content and outputs clean markdown by default.
Switch to LLM-optimized output for the most token-efficient representation. This runs a 9-step pipeline that strips images, removes emphasis, deduplicates links, merges stat blocks, and collapses whitespace.
Use JSON format to get the full ExtractionResult with metadata, content, word count, and extracted URLs.
Start the API server
The REST API exposes every extraction feature as an HTTP endpoint. Start the server on any port.
Test it with a scrape request:
--api-key your_secret to require a Bearer token on all requests.MCP server
The MCP server lets AI agents use webclaw as a tool. It communicates over stdio transport and works with Claude Desktop, Claude Code, and any MCP-compatible client.
Add webclaw to your Claude Desktop configuration at ~/Library/Application Support/Claude/claude_desktop_config.json:
Replace /path/to/webclaw-mcp with the actual binary path (e.g. target/release/webclaw-mcp if built from source).
The MCP server exposes 8 tools:
| Tool | Description |
|---|---|
scrape | Extract content from a single URL |
crawl | BFS crawl a website with depth control |
map | Discover URLs from sitemap.xml and robots.txt |
batch | Extract content from multiple URLs |
extract | LLM-powered JSON schema or prompt extraction |
summarize | LLM-powered content summarization |
diff | Track content changes between snapshots |
brand | Extract brand identity (colors, fonts, logo) |
Cloud API
For managed infrastructure, sign up at webclaw.io and create an API key from the dashboard. Keys are prefixed with wc_.
The cloud API uses the same endpoints and request format as the self-hosted server. Every example in this documentation works with both -- just swap the base URL and add the Authorization header.