MCP Server
The webclaw MCP (Model Context Protocol) server exposes the full extraction engine as tools that AI agents can call directly. Works with Claude Desktop, Claude Code, Cursor, Windsurf, OpenCode, Codex, Antigravity, and any MCP-compatible client.
What is MCP
Model Context Protocol is an open standard for connecting AI models to external tools and data sources. Instead of making HTTP calls manually, an AI agent discovers available tools through the MCP server and calls them natively. The webclaw MCP server communicates over stdio transport and exposes 8 tools covering scraping, crawling, extraction, and more.
Setup
Claude Desktop
Add webclaw to your Claude Desktop config file:
Replace webclaw-mcp with the full path if not in PATH. The WEBCLAW_API_KEY enables automatic cloud fallback for bot-protected sites (Cloudflare, DataDome, AWS WAF) and JS-rendered SPAs. Without it, extraction works for ~80% of sites via local HTTP.
Claude Code
Or add the JSON config above to your Claude Desktop config file. Claude Code auto-discovers MCP servers from the same config.
Cursor
Add webclaw to your Cursor MCP config:
Windsurf
Add webclaw to your Windsurf MCP config:
OpenCode
Add webclaw to your OpenCode config:
Codex
Add webclaw to your Codex config. Codex supports both a CLI and desktop app:
Antigravity
Antigravity uses the same mcpServers JSON format as Claude Desktop:
Other MCP clients
Any MCP client that supports stdio transport can connect to webclaw-mcp. Point the client at the binary and it will discover all available tools through the standard MCP handshake.
Smart Fetch
The MCP server uses a local-first architecture. Most scrapes happen locally over HTTP (free, no API credits). When bot protection or JS rendering is detected, it automatically falls back to the webclaw cloud API which has antibot solvers.
Local HTTP fetch -- fast, free (~80% of sites)
Detect bot protection (Cloudflare, DataDome, AWS WAF) or JS-rendered SPA
Automatic cloud API fallback (requires WEBCLAW_API_KEY)
Environment variables
| Variable | Description |
|---|---|
WEBCLAW_API_KEY | Enables cloud fallback for bot-protected and JS-rendered sites |
OPENAI_API_KEY | Enables extract and summarize tools (OpenAI provider) |
ANTHROPIC_API_KEY | Enables extract and summarize tools (Anthropic provider) |
OLLAMA_HOST | Custom Ollama URL (default: localhost:11434) |
rmcp crate (the official Rust MCP SDK) and communicates over stdio. No network ports are opened.Tools
The MCP server exposes 12 tools. Each tool maps to a corresponding REST API endpoint.
1. scrape
Extract content from a single URL.
| Param | Type | Required | Description |
|---|---|---|---|
url | string | Yes | URL to scrape. |
format | string | No | Output format: markdown, llm, text, json, links, rawHtml, attributes, or query. |
include_selectors | string[] | No | CSS selectors to include exclusively. |
exclude_selectors | string[] | No | CSS selectors to remove. |
only_main_content | boolean | No | Extract only the main content element. |
browser | string | No | Browser profile: chrome, firefox, or random. |
2. crawl
Crawl a website with BFS traversal.
| Param | Type | Required | Description |
|---|---|---|---|
url | string | Yes | Starting URL. |
depth | number | No | Max crawl depth. Default: 2. |
max_pages | number | No | Max pages to extract. Default: 50. |
concurrency | number | No | Concurrent requests. Default: 5. |
use_sitemap | boolean | No | Seed queue with sitemap URLs. |
format | string | No | Output format for each page. |
3. map
Discover all URLs on a site via sitemap parsing.
| Param | Type | Required | Description |
|---|---|---|---|
url | string | Yes | Base URL of the site to map. |
4. batch
Extract content from multiple URLs concurrently.
| Param | Type | Required | Description |
|---|---|---|---|
urls | string[] | Yes | Array of URLs to extract. |
format | string | No | Output format for each URL. |
concurrency | number | No | Max concurrent requests. Default: 5. |
5. extract
Extract structured JSON data using an LLM. Supports prompt-to-schema generation -- when only a prompt is provided (no schema), the LLM generates a JSON schema first, then extracts data matching it.
| Param | Type | Required | Description |
|---|---|---|---|
url | string | Yes | URL to extract data from. |
prompt | string | No* | Natural language extraction prompt. When provided without a schema, the LLM auto-generates a schema first. |
schema | string | No* | JSON schema string defining the output structure. |
6. summarize
Generate a concise summary of a web page.
| Param | Type | Required | Description |
|---|---|---|---|
url | string | Yes | URL to summarize. |
max_sentences | number | No | Max sentences in summary. Default: 3. |
7. diff
Track content changes between snapshots.
| Param | Type | Required | Description |
|---|---|---|---|
url | string | Yes | URL to scrape for current version. |
previous_snapshot | string | Yes | JSON string of a previous extraction result. |
8. brand
Extract brand identity (colors, fonts, logos) from a site.
| Param | Type | Required | Description |
|---|---|---|---|
url | string | Yes | URL of the site to analyze. |
9. list_extractors
Return the catalog of all 28 vertical extractors with their names, labels, and URL patterns. Takes no parameters. See the vertical extractors reference for the full list.
10. vertical_scrape
Run a specific vertical extractor on a URL. Returns typed JSON with fields specific to the target site (Reddit, GitHub, Amazon, YouTube, and more).
| Param | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Extractor name, e.g. github_pr, reddit, amazon_product. |
url | string | Yes | URL that matches the extractor's claimed pattern. |
Example conversations
Here is how an AI agent might use the webclaw MCP tools in practice.
User
Scrape the Stripe pricing page and pull out all the plan names and prices.
Claude (using webclaw MCP)
I will use the extract tool to pull structured pricing data from the page.
User
Crawl the Next.js docs and summarize the top 5 pages.
Claude (using webclaw MCP)
I will first map the site to discover pages, then crawl and summarize the most important ones.