webclaw

MCP Server

The webclaw MCP (Model Context Protocol) server exposes the full extraction engine as tools that AI agents can call directly. Works with Claude Desktop, Claude Code, and any MCP-compatible client.

What is MCP

Model Context Protocol is an open standard for connecting AI models to external tools and data sources. Instead of making HTTP calls manually, an AI agent discovers available tools through the MCP server and calls them natively. The webclaw MCP server communicates over stdio transport and exposes 8 tools covering scraping, crawling, extraction, and more.

Setup

Claude Desktop

Add webclaw to your Claude Desktop config file:

~/Library/Application Support/Claude/claude_desktop_config.json
{
  "mcpServers": {
    "webclaw": {
      "command": "webclaw-mcp",
      "env": {
        "WEBCLAW_API_KEY": "<YOUR_API_KEY>"
      }
    }
  }
}

Replace webclaw-mcp with the full path if not in PATH. The WEBCLAW_API_KEY enables automatic cloud fallback for bot-protected sites (Cloudflare, DataDome, AWS WAF) and JS-rendered SPAs. Without it, extraction works for ~80% of sites via local HTTP.

Claude Code

terminal
claude mcp add webclaw webclaw-mcp

Or add the JSON config above to your Claude Desktop config file. Claude Code auto-discovers MCP servers from the same config.

Other MCP clients

Any MCP client that supports stdio transport can connect to webclaw-mcp. Point the client at the binary and it will discover all available tools through the standard MCP handshake.

Smart Fetch

The MCP server uses a local-first architecture. Most scrapes happen locally over HTTP (free, no API credits). When bot protection or JS rendering is detected, it automatically falls back to the webclaw cloud API which has antibot solvers.

1

Local HTTP fetch -- fast, free (~80% of sites)

2

Detect bot protection (Cloudflare, DataDome, AWS WAF) or JS-rendered SPA

3

Automatic cloud API fallback (requires WEBCLAW_API_KEY)

Environment variables

VariableDescription
WEBCLAW_API_KEYEnables cloud fallback for bot-protected and JS-rendered sites
OPENAI_API_KEYEnables extract and summarize tools (OpenAI provider)
ANTHROPIC_API_KEYEnables extract and summarize tools (Anthropic provider)
OLLAMA_HOSTCustom Ollama URL (default: localhost:11434)
Note
The MCP server uses the rmcp crate (the official Rust MCP SDK) and communicates over stdio. No network ports are opened.

Tools

The MCP server exposes 8 tools. Each tool maps to a corresponding REST API endpoint.

1. scrape

Extract content from a single URL.

ParamTypeRequiredDescription
urlstringYesURL to scrape.
formatstringNoOutput format: markdown, llm, text, or json.
include_selectorsstring[]NoCSS selectors to include exclusively.
exclude_selectorsstring[]NoCSS selectors to remove.
only_main_contentbooleanNoExtract only the main content element.
browserstringNoBrowser profile: chrome, firefox, or random.

2. crawl

Crawl a website with BFS traversal.

ParamTypeRequiredDescription
urlstringYesStarting URL.
depthnumberNoMax crawl depth. Default: 2.
max_pagesnumberNoMax pages to extract. Default: 50.
concurrencynumberNoConcurrent requests. Default: 5.
use_sitemapbooleanNoSeed queue with sitemap URLs.
formatstringNoOutput format for each page.

3. map

Discover all URLs on a site via sitemap parsing.

ParamTypeRequiredDescription
urlstringYesBase URL of the site to map.

4. batch

Extract content from multiple URLs concurrently.

ParamTypeRequiredDescription
urlsstring[]YesArray of URLs to extract.
formatstringNoOutput format for each URL.
concurrencynumberNoMax concurrent requests. Default: 5.

5. extract

Extract structured JSON data using an LLM.

ParamTypeRequiredDescription
urlstringYesURL to extract data from.
promptstringNo*Natural language extraction prompt.
schemastringNo*JSON schema string defining the output structure.

6. summarize

Generate a concise summary of a web page.

ParamTypeRequiredDescription
urlstringYesURL to summarize.
max_sentencesnumberNoMax sentences in summary. Default: 3.

7. diff

Track content changes between snapshots.

ParamTypeRequiredDescription
urlstringYesURL to scrape for current version.
previous_snapshotstringYesJSON string of a previous extraction result.

8. brand

Extract brand identity (colors, fonts, logos) from a site.

ParamTypeRequiredDescription
urlstringYesURL of the site to analyze.

Example conversations

Here is how an AI agent might use the webclaw MCP tools in practice.

User

Scrape the Stripe pricing page and pull out all the plan names and prices.

Claude (using webclaw MCP)

I will use the extract tool to pull structured pricing data from the page.

Tool call: extract
{
  "url": "https://stripe.com/pricing",
  "prompt": "Extract all plan names, monthly prices, and included features"
}

User

Crawl the Next.js docs and summarize the top 5 pages.

Claude (using webclaw MCP)

I will first map the site to discover pages, then crawl and summarize the most important ones.

Tool call: map
{
  "url": "https://nextjs.org/docs"
}
Tool call: batch
{
  "urls": [
    "https://nextjs.org/docs",
    "https://nextjs.org/docs/getting-started",
    "https://nextjs.org/docs/routing",
    "https://nextjs.org/docs/rendering",
    "https://nextjs.org/docs/data-fetching"
  ],
  "format": "llm"
}
Tip
The MCP server runs the same extraction engine as the REST API and CLI. Every tool produces identical output to its REST API counterpart.