MCP server

The webclaw MCP (Model Context Protocol) server exposes the full extraction engine as tools that AI agents can call directly. Works with Claude Desktop, Claude Code, Cursor, Windsurf, OpenCode, Codex, Antigravity, and any MCP-compatible client.

What is MCP

Model Context Protocol is an open standard for connecting AI models to external tools and data sources. Instead of making HTTP calls manually, an AI agent discovers available tools through the MCP server and calls them natively. The webclaw MCP server communicates over stdio transport and exposes 8 tools covering scraping, crawling, extraction, and more.

Setup

Claude Desktop

Add webclaw to your Claude Desktop config file:

~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "webclaw": {
      "command": "webclaw-mcp",
      "env": {
        "WEBCLAW_API_KEY": "<YOUR_API_KEY>"
      }
    }
  }
}

Replace webclaw-mcp with the full path if not in PATH. The WEBCLAW_API_KEY enables automatic cloud fallback for bot-protected sites (Cloudflare, DataDome, AWS WAF) and JS-rendered SPAs. Without it, extraction works for ~80% of sites via local HTTP.

Claude Code

terminal

claude mcp add webclaw webclaw-mcp

Or add the JSON config above to your Claude Desktop config file. Claude Code auto-discovers MCP servers from the same config.

Cursor

Add webclaw to your Cursor MCP config:

~/.cursor/mcp.json

{
  "mcpServers": {
    "webclaw": {
      "command": "webclaw-mcp",
      "env": {
        "WEBCLAW_API_KEY": "<YOUR_API_KEY>"
      }
    }
  }
}

Windsurf

Add webclaw to your Windsurf MCP config:

~/.windsurf/mcp.json

{
  "mcpServers": {
    "webclaw": {
      "command": "webclaw-mcp",
      "env": {
        "WEBCLAW_API_KEY": "<YOUR_API_KEY>"
      }
    }
  }
}

OpenCode

Add webclaw to your OpenCode config:

~/.config/opencode/opencode.json

{
  "mcp": {
    "webclaw": {
      "type": "local",
      "command": ["~/.webclaw/webclaw-mcp"],
      "enabled": true
    }
  }
}

Codex

Add webclaw to your Codex config. Codex supports both a CLI and desktop app:

~/.codex/config.toml

[mcp_servers.webclaw]
command = "~/.webclaw/webclaw-mcp"
args = []
enabled = true

Antigravity

Antigravity uses the same mcpServers JSON format as Claude Desktop:

Antigravity MCP config

{
  "mcpServers": {
    "webclaw": {
      "command": "webclaw-mcp",
      "env": {
        "WEBCLAW_API_KEY": "<YOUR_API_KEY>"
      }
    }
  }
}

Other MCP clients

Any MCP client that supports stdio transport can connect to webclaw-mcp. Point the client at the binary and it will discover all available tools through the standard MCP handshake.

Smart Fetch

The MCP server uses a local-first architecture. Most scrapes happen locally over HTTP (free, no API credits). When bot protection or JS rendering is detected, it automatically falls back to the webclaw cloud API which has antibot solvers.

Local HTTP fetch -- fast, free (~80% of sites)

Detect bot protection (Cloudflare, DataDome, AWS WAF) or JS-rendered SPA

Automatic cloud API fallback (requires WEBCLAW_API_KEY)

Environment variables

Variable	Description
`WEBCLAW_API_KEY`	Enables cloud fallback for bot-protected and JS-rendered sites
`OPENAI_API_KEY`	Enables extract and summarize tools (OpenAI provider)
`OPENAI_BASE_URL`	OpenAI-compatible endpoint for local or hosted compatible backends
`OPENAI_RESPONSE_FORMAT_TYPE`	Response mode for OpenAI-compatible backends: json_object, json_schema, or text. Defaults to json_object.
`ANTHROPIC_API_KEY`	Enables extract and summarize tools (Anthropic provider)
`ANTHROPIC_BASE_URL`	Anthropic-compatible endpoint. Defaults to the official Anthropic API.
`OLLAMA_HOST`	Custom Ollama URL (default: localhost:11434)

Tip

For LM Studio or another OpenAI-compatible backend, set OPENAI_RESPONSE_FORMAT_TYPE to text or json_schemaif the backend rejects OpenAI's default JSON object mode.

Note

The MCP server uses the rmcp crate (the official Rust MCP SDK) and communicates over stdio. No network ports are opened.

Tools

The MCP server exposes 12 tools. Each tool maps to a corresponding REST API endpoint.

1. scrape

Extract content from a single URL.

Param	Type	Required	Description
`url`	`string`	Yes	URL to scrape.
`format`	`string`	No	Output format: markdown, llm, text, json, links, rawHtml, attributes, or query.
`include_selectors`	`string[]`	No	CSS selectors to include exclusively.
`exclude_selectors`	`string[]`	No	CSS selectors to remove.
`only_main_content`	`boolean`	No	Extract only the main content element.
`browser`	`string`	No	Browser profile: chrome, firefox, or random.

2. crawl

Crawl a website with BFS traversal.

Param	Type	Required	Description
`url`	`string`	Yes	Starting URL.
`depth`	`number`	No	Max crawl depth. Default: 2.
`max_pages`	`number`	No	Max pages to extract. Default: 50.
`concurrency`	`number`	No	Concurrent requests. Default: 5.
`use_sitemap`	`boolean`	No	Seed queue with sitemap URLs.
`format`	`string`	No	Output format for each page.

3. map

Discover all URLs on a site via sitemap parsing.

Param	Type	Required	Description
`url`	`string`	Yes	Base URL of the site to map.

4. batch

Extract content from multiple URLs concurrently.

Param	Type	Required	Description
`urls`	`string[]`	Yes	Array of URLs to extract.
`format`	`string`	No	Output format for each URL.
`concurrency`	`number`	No	Max concurrent requests. Default: 5.

5. extract

Extract structured JSON data using an LLM. Supports prompt-to-schema generation -- when only a prompt is provided (no schema), the LLM generates a JSON schema first, then extracts data matching it.

Param	Type	Required	Description
`url`	`string`	Yes	URL to extract data from.
`prompt`	`string`	No*	Natural language extraction prompt. When provided without a schema, the LLM auto-generates a schema first.
`schema`	`string`	No*	JSON schema string defining the output structure.

6. summarize

Generate a concise summary of a web page.

Param	Type	Required	Description
`url`	`string`	Yes	URL to summarize.
`max_sentences`	`number`	No	Max sentences in summary. Default: 3.

7. diff

Track content changes between snapshots.

Param	Type	Required	Description
`url`	`string`	Yes	URL to scrape for current version.
`previous_snapshot`	`string`	Yes	JSON string of a previous extraction result.

8. brand

Extract brand identity (colors, fonts, logos) from a site.

Param	Type	Required	Description
`url`	`string`	Yes	URL of the site to analyze.

9. list_extractors

Return the catalog of all 28 vertical extractors with their names, labels, and URL patterns. Takes no parameters. See the vertical extractors reference for the full list.

10. vertical_scrape

Run a specific vertical extractor on a URL. Returns typed JSON with fields specific to the target site (Reddit, GitHub, Amazon, YouTube, and more).

Param	Type	Required	Description
`name`	`string`	Yes	Extractor name, e.g. `github_pr`, `reddit`, `amazon_product`.
`url`	`string`	Yes	URL that matches the extractor's claimed pattern.

Example conversations

Here is how an AI agent might use the webclaw MCP tools in practice.

User

Scrape the Stripe pricing page and pull out all the plan names and prices.

Claude (using webclaw MCP)

I will use the extract tool to pull structured pricing data from the page.

Tool call: extract

{
  "url": "https://stripe.com/pricing",
  "prompt": "Extract all plan names, monthly prices, and included features"
}