Getting Started
Get webclaw installed and extract your first page in under a minute. Choose from cargo install, building from source, or Docker.
Installation
From crates.io
The fastest way to install. Requires a working Rust toolchain.
cargo install webclawFrom source
Clone the repository and build all three binaries in release mode.
git clone https://github.com/0xMassi/webclaw
cd webclaw
cargo build --releaseThe binaries will be at target/release/webclaw, target/release/webclaw-server, and target/release/webclaw-mcp.
rustls and h2 forks for browser-grade TLS impersonation. These are configured via [patch.crates-io] in the workspace Cargo.toml -- no manual setup needed.Docker
Pull the official image and run the API server in a container.
docker pull ghcr.io/0xmassi/webclaw:latestdocker run -p 3000:3000 ghcr.io/0xmassi/webclaw:latestThe server will be available at http://localhost:3000. Add -e WEBCLAW_API_KEY=your_key to enable authentication.
Your first extraction
The simplest usage: pass a URL. webclaw extracts the main content and outputs clean markdown by default.
webclaw https://example.comSwitch to LLM-optimized output for the most token-efficient representation. This runs a 9-step pipeline that strips images, removes emphasis, deduplicates links, merges stat blocks, and collapses whitespace.
webclaw https://example.com -f llmUse JSON format to get the full ExtractionResult with metadata, content, word count, and extracted URLs.
webclaw https://example.com -f jsonStart the API server
The REST API exposes every extraction feature as an HTTP endpoint. Start the server on any port.
webclaw-server --port 3000Test it with a scrape request:
curl -X POST http://localhost:3000/v1/scrape \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'--api-key your_secret to require a Bearer token on all requests.MCP server
The MCP server lets AI agents use webclaw as a tool. It communicates over stdio transport and works with Claude Desktop, Claude Code, Cursor, Windsurf, OpenCode, Codex, Antigravity, and any MCP-compatible client.
Add webclaw to your Claude Desktop configuration at ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"webclaw": {
"command": "/path/to/webclaw-mcp"
}
}
}Replace /path/to/webclaw-mcp with the actual binary path (e.g. target/release/webclaw-mcp if built from source).
The MCP server exposes 8 tools:
| Tool | Description |
|---|---|
scrape | Extract content from a single URL |
crawl | BFS crawl a website with depth control |
map | Discover URLs from sitemap.xml and robots.txt |
batch | Extract content from multiple URLs |
extract | LLM-powered JSON schema or prompt extraction |
summarize | LLM-powered content summarization |
diff | Track content changes between snapshots |
brand | Extract brand identity (colors, fonts, logo) |
Cloud API
For managed infrastructure, sign up at webclaw.io and create an API key from the dashboard. Keys are prefixed with wc_.
curl -X POST https://api.webclaw.io/v1/scrape \
-H "Authorization: Bearer wc_your_key" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "formats": ["markdown", "llm"]}'The cloud API uses the same endpoints and request format as the self-hosted server. Every example in this documentation works with both -- just swap the base URL and add the Authorization header.