Getting started

Get webclaw installed and extract your first page in under a minute. Choose from cargo install, building from source, or Docker.

Installation

From crates.io

The fastest way to install. Requires a working Rust toolchain.

bash

cargo install webclaw

From source

Clone the repository and build all three binaries in release mode.

bash

git clone https://github.com/0xMassi/webclaw
cd webclaw
cargo build --release

The binaries will be at target/release/webclaw, target/release/webclaw-server, and target/release/webclaw-mcp.

Tip

The workspace uses patched rustls and h2 forks for browser-grade TLS impersonation. These are configured via [patch.crates-io] in the workspace Cargo.toml -- no manual setup needed.

Docker

Pull the official image and run the API server in a container.

bash

docker pull ghcr.io/0xmassi/webclaw:latest

run the API server

docker run -p 3000:3000 ghcr.io/0xmassi/webclaw:latest

The server will be available at http://localhost:3000. Add -e WEBCLAW_API_KEY=your_key to enable authentication.

Your first extraction

The simplest usage: pass a URL. webclaw extracts the main content and outputs clean markdown by default.

markdown output (default)

webclaw https://example.com

Switch to LLM-optimized output for the most token-efficient representation. This runs a 9-step pipeline that strips images, removes emphasis, deduplicates links, merges stat blocks, and collapses whitespace.

LLM-optimized output

webclaw https://example.com -f llm

Use JSON format to get the full ExtractionResult with metadata, content, word count, and extracted URLs.

JSON output

webclaw https://example.com -f json

Start the API server

The REST API exposes every extraction feature as an HTTP endpoint. Start the server on any port.

bash

webclaw-server --port 3000

Test it with a scrape request:

curl

curl -X POST http://localhost:3000/v1/scrape \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

Note

By default the server runs without authentication. Pass --api-key your_secret to require a Bearer token on all requests.

MCP server

The MCP server lets AI agents use webclaw as a tool. It communicates over stdio transport and works with Claude Desktop, Claude Code, Cursor, Windsurf, OpenCode, Codex, Antigravity, and any MCP-compatible client.

Add webclaw to your Claude Desktop configuration at ~/Library/Application Support/Claude/claude_desktop_config.json:

claude_desktop_config.json

{
  "mcpServers": {
    "webclaw": {
      "command": "/path/to/webclaw-mcp"
    }
  }
}

Replace /path/to/webclaw-mcp with the actual binary path (e.g. target/release/webclaw-mcp if built from source).

The MCP server exposes 8 tools:

Tool	Description
`scrape`	Extract content from a single URL
`crawl`	BFS crawl a website with depth control
`map`	Discover URLs from sitemap.xml and robots.txt
`batch`	Extract content from multiple URLs
`extract`	LLM-powered JSON schema or prompt extraction
`summarize`	LLM-powered content summarization
`diff`	Track content changes between snapshots
`brand`	Extract brand identity (colors, fonts, logo)

Cloud API

For managed infrastructure, sign up at webclaw.io and create an API key from the dashboard. Keys are prefixed with wc_.

curl

curl -X POST https://api.webclaw.io/v1/scrape \
  -H "Authorization: Bearer wc_your_key" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "formats": ["markdown", "llm"]}'

The cloud API uses the same endpoints and request format as the self-hosted server. Every example in this documentation works with both -- just swap the base URL and add the Authorization header.

Tip

The cloud API is paid, from $19/mo. Cancel any time. Or self-host the AGPL-3.0 CLI / server with no limits on your own hardware.

Getting started

Installation

From crates.io

From source

Docker

Your first extraction

Start the API server

MCP server

Cloud API

Ready to build? Start extracting.