MCP Server
The webclaw MCP (Model Context Protocol) server exposes the full extraction engine as tools that AI agents can call directly. Works with Claude Desktop, Claude Code, and any MCP-compatible client.
What is MCP
Model Context Protocol is an open standard for connecting AI models to external tools and data sources. Instead of making HTTP calls manually, an AI agent discovers available tools through the MCP server and calls them natively. The webclaw MCP server communicates over stdio transport and exposes 8 tools covering scraping, crawling, extraction, and more.
Setup
Claude Desktop
Add webclaw to your Claude Desktop config file:
Replace webclaw-mcp with the full path if not in PATH. The WEBCLAW_API_KEY enables automatic cloud fallback for bot-protected sites (Cloudflare, DataDome, AWS WAF) and JS-rendered SPAs. Without it, extraction works for ~80% of sites via local HTTP.
Claude Code
Or add the JSON config above to your Claude Desktop config file. Claude Code auto-discovers MCP servers from the same config.
Other MCP clients
Any MCP client that supports stdio transport can connect to webclaw-mcp. Point the client at the binary and it will discover all available tools through the standard MCP handshake.
Smart Fetch
The MCP server uses a local-first architecture. Most scrapes happen locally over HTTP (free, no API credits). When bot protection or JS rendering is detected, it automatically falls back to the webclaw cloud API which has antibot solvers.
Local HTTP fetch -- fast, free (~80% of sites)
Detect bot protection (Cloudflare, DataDome, AWS WAF) or JS-rendered SPA
Automatic cloud API fallback (requires WEBCLAW_API_KEY)
Environment variables
| Variable | Description |
|---|---|
WEBCLAW_API_KEY | Enables cloud fallback for bot-protected and JS-rendered sites |
OPENAI_API_KEY | Enables extract and summarize tools (OpenAI provider) |
ANTHROPIC_API_KEY | Enables extract and summarize tools (Anthropic provider) |
OLLAMA_HOST | Custom Ollama URL (default: localhost:11434) |
rmcp crate (the official Rust MCP SDK) and communicates over stdio. No network ports are opened.Tools
The MCP server exposes 8 tools. Each tool maps to a corresponding REST API endpoint.
1. scrape
Extract content from a single URL.
| Param | Type | Required | Description |
|---|---|---|---|
url | string | Yes | URL to scrape. |
format | string | No | Output format: markdown, llm, text, or json. |
include_selectors | string[] | No | CSS selectors to include exclusively. |
exclude_selectors | string[] | No | CSS selectors to remove. |
only_main_content | boolean | No | Extract only the main content element. |
browser | string | No | Browser profile: chrome, firefox, or random. |
2. crawl
Crawl a website with BFS traversal.
| Param | Type | Required | Description |
|---|---|---|---|
url | string | Yes | Starting URL. |
depth | number | No | Max crawl depth. Default: 2. |
max_pages | number | No | Max pages to extract. Default: 50. |
concurrency | number | No | Concurrent requests. Default: 5. |
use_sitemap | boolean | No | Seed queue with sitemap URLs. |
format | string | No | Output format for each page. |
3. map
Discover all URLs on a site via sitemap parsing.
| Param | Type | Required | Description |
|---|---|---|---|
url | string | Yes | Base URL of the site to map. |
4. batch
Extract content from multiple URLs concurrently.
| Param | Type | Required | Description |
|---|---|---|---|
urls | string[] | Yes | Array of URLs to extract. |
format | string | No | Output format for each URL. |
concurrency | number | No | Max concurrent requests. Default: 5. |
5. extract
Extract structured JSON data using an LLM.
| Param | Type | Required | Description |
|---|---|---|---|
url | string | Yes | URL to extract data from. |
prompt | string | No* | Natural language extraction prompt. |
schema | string | No* | JSON schema string defining the output structure. |
6. summarize
Generate a concise summary of a web page.
| Param | Type | Required | Description |
|---|---|---|---|
url | string | Yes | URL to summarize. |
max_sentences | number | No | Max sentences in summary. Default: 3. |
7. diff
Track content changes between snapshots.
| Param | Type | Required | Description |
|---|---|---|---|
url | string | Yes | URL to scrape for current version. |
previous_snapshot | string | Yes | JSON string of a previous extraction result. |
8. brand
Extract brand identity (colors, fonts, logos) from a site.
| Param | Type | Required | Description |
|---|---|---|---|
url | string | Yes | URL of the site to analyze. |
Example conversations
Here is how an AI agent might use the webclaw MCP tools in practice.
User
Scrape the Stripe pricing page and pull out all the plan names and prices.
Claude (using webclaw MCP)
I will use the extract tool to pull structured pricing data from the page.
User
Crawl the Next.js docs and summarize the top 5 pages.
Claude (using webclaw MCP)
I will first map the site to discover pages, then crawl and summarize the most important ones.