webclaw

Getting Started

Get webclaw installed and extract your first page in under a minute. Choose from cargo install, building from source, or Docker.

Installation

From crates.io

The fastest way to install. Requires a working Rust toolchain.

bash
cargo install webclaw

From source

Clone the repository and build all three binaries in release mode.

bash
git clone https://github.com/0xMassi/webclaw
cd webclaw
cargo build --release

The binaries will be at target/release/webclaw, target/release/webclaw-server, and target/release/webclaw-mcp.

Tip
The workspace uses patched rustls and h2 forks for Impit TLS impersonation. These are configured via [patch.crates-io] in the workspace Cargo.toml -- no manual setup needed.

Docker

Pull the official image and run the API server in a container.

bash
docker pull ghcr.io/0xmassi/webclaw:latest
run the API server
docker run -p 3000:3000 ghcr.io/0xmassi/webclaw:latest

The server will be available at http://localhost:3000. Add -e WEBCLAW_API_KEY=your_key to enable authentication.

Your first extraction

The simplest usage: pass a URL. webclaw extracts the main content and outputs clean markdown by default.

markdown output (default)
webclaw https://example.com

Switch to LLM-optimized output for the most token-efficient representation. This runs a 9-step pipeline that strips images, removes emphasis, deduplicates links, merges stat blocks, and collapses whitespace.

LLM-optimized output
webclaw https://example.com -f llm

Use JSON format to get the full ExtractionResult with metadata, content, word count, and extracted URLs.

JSON output
webclaw https://example.com -f json

Start the API server

The REST API exposes every extraction feature as an HTTP endpoint. Start the server on any port.

bash
webclaw-server --port 3000

Test it with a scrape request:

curl
curl -X POST http://localhost:3000/v1/scrape \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'
Note
By default the server runs without authentication. Pass --api-key your_secret to require a Bearer token on all requests.

MCP server

The MCP server lets AI agents use webclaw as a tool. It communicates over stdio transport and works with Claude Desktop, Claude Code, and any MCP-compatible client.

Add webclaw to your Claude Desktop configuration at ~/Library/Application Support/Claude/claude_desktop_config.json:

claude_desktop_config.json
{
  "mcpServers": {
    "webclaw": {
      "command": "/path/to/webclaw-mcp"
    }
  }
}

Replace /path/to/webclaw-mcp with the actual binary path (e.g. target/release/webclaw-mcp if built from source).

The MCP server exposes 8 tools:

ToolDescription
scrapeExtract content from a single URL
crawlBFS crawl a website with depth control
mapDiscover URLs from sitemap.xml and robots.txt
batchExtract content from multiple URLs
extractLLM-powered JSON schema or prompt extraction
summarizeLLM-powered content summarization
diffTrack content changes between snapshots
brandExtract brand identity (colors, fonts, logo)

Cloud API

For managed infrastructure, sign up at webclaw.io and create an API key from the dashboard. Keys are prefixed with wc_.

curl
curl -X POST https://api.webclaw.io/v1/scrape \
  -H "Authorization: Bearer wc_your_key" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "formats": ["markdown", "llm"]}'

The cloud API uses the same endpoints and request format as the self-hosted server. Every example in this documentation works with both -- just swap the base URL and add the Authorization header.

Tip
The cloud API includes a free tier. No credit card required to start building.