>webclaw·The extraction engineThe web scraper
your AI agent
deserves

Clean structured data for your agents. In milliseconds, not seconds.

118ms avg response90% fewer tokensDrop-in Firecrawl replacement
One-command setup · MCP + CLIAuto-detects your tools and configures everything.
Works with

Endpoints

Ten surfaces. One extraction engine.

Pick an endpoint to see what it does, how you'd call it, and where to dive into the reference docs.

/v1/scrape

Scrape

Single-page extraction

Fetch any URL and return clean markdown, JSON, HTML, or LLM-ready text. Chrome-grade TLS fingerprinting and automatic antibot escalation built-in.

OPEN DOCS →
webclaw ~ liveLIVE
$
0
pages extracted
0
bot walls bypassed
0
websites scraped
0
github stars

Every page.
Every defense.

01

Fast by default. Smart when needed.

118ms average for static pages. Firecrawl's published P95 is 3.4s. Multi-layer rendering pipeline for JS-heavy sites. The engine picks the fastest path automatically. You configure nothing.

02

Drop-in Firecrawl replacement.

Change your base URL. Keep your existing SDK code. The /v2 endpoints are fully compatible. Same API shape, same response format, no rewrite needed. Better extraction quality, faster response times.

03

Best-in-class bot protection.

Challenge pages, CAPTCHAs, browser fingerprinting, all handled transparently. No manual cookies, no config. Your requests just work, even on the hardest sites.

04

Every format, every extraction.

Markdown, JSON, plain text, LLM-optimized. Schema-based extraction, prompt-based extraction, summarization, brand identity, content diffing. 14 endpoints, one API key.

05

Built for AI agents.

MCP server with 12 tools for Claude, Cursor, Windsurf, OpenCode, Codex, Antigravity, and any MCP client. REST API for everything else. Web search, batch processing, crawling, sitemap discovery.

06

90% fewer tokens.

The LLM format runs a 9-step optimization pipeline. Strips nav, ads, boilerplate, repeated elements. Measured on 18 production sites, median page drops 95% in token count while preserving content. Your agent gets more, spends less.

07

Agentic scraping.

Give a goal, get structured data. The AI agent reasons about page content, clicks buttons, navigates, and extracts exactly what you asked for. Powered by the best available models.

08

Deep content recovery.

Embedded JSON, structured data, server-rendered payloads, extracted even when the visible DOM is empty. Auto-detects PDFs, DOCX, XLSX. Multiple fallback strategies. If the content exists, webclaw finds it.

One credit.
One page.

One pool covers every endpoint. Heavier operations like antibot or LLM extract use a few extra credits. Research has its own counter so deep runs cannot drain your budget.

SAVE 20%
STARTER
$15/mo billed yearly
CREDITS················································································10,000/mo
RESEARCH················································································3 RUNS/mo
MAX SOURCES················································································10
CONCURRENCY················································································5
SUPPORT················································································EMAIL
GROWTHPOPULAR
$39/mo billed yearly
CREDITS················································································100,000/mo
RESEARCH················································································10 RUNS/mo
MAX SOURCES················································································20
CONCURRENCY················································································20
SUPPORT················································································PRIORITY
PRO
$79/mo billed yearly
CREDITS················································································250,000/mo
RESEARCH················································································20 RUNS/mo
MAX SOURCES················································································30
CONCURRENCY················································································50
SUPPORT················································································PRIORITY
SCALE
$319/mo billed yearly
CREDITS················································································1,000,000/mo
RESEARCH················································································60 RUNS/mo
MAX SOURCES················································································100
CONCURRENCY················································································100
SUPPORT················································································PRIORITY + SLACK
HOW CREDITS WORK
PLAIN PAGE················································································1 CREDIT
JS RENDER················································································+2 CREDITS
ANTIBOT SOLVE················································································+9 CREDITS
SEARCH / 10 RESULTS················································································2 CREDITS
SUMMARIZE················································································10 CREDITS
BRAND················································································5 CREDITS
DIFF················································································2 CREDITS
LLM EXTRACT················································································25 CREDITS

Research is metered separately as runs per month, with a per-tier cap on max sources so deep mode stays bounded.

DEDICATED

Unlimited pages. Unlimited research. 200 concurrent. Single-tenant on your cloud, your proxies, your rules. Dedicated Slack channel + SLA.

CONTACT US
OPEN SOURCE

Self-host forever. AGPL-3.0 license. CLI + server + MCP server. No limits on your hardware.

VIEW ON GITHUB

Common questions

FAQ

Webclaw is a web extraction toolkit that turns any website into clean, structured data. Output formats include Markdown, JSON, HTML, plain text, and an LLM-optimized mode that strips noise and cuts token count by around 90% vs raw HTML.

Webclaw uses HTTP with TLS fingerprint impersonation instead of spinning up a headless browser. Sub-200ms response times, zero browser overhead, no Selenium or Playwright dependency. Content extraction runs via readability scoring plus a 9-step pipeline, no browser needed for most pages.

Yes. Starter comes with a 7-day free trial. Card required up front so we don't get drowned in throwaway signups, and you can cancel any time during the trial directly from the billing portal. No charge if you cancel before day 7. If you want to use Webclaw without paying ever, the open-source version (AGPL-3.0) runs locally with no limits on your hardware.

Yes. Webclaw is open source under AGPL-3.0. You can run the CLI, REST API server, or MCP server on your own infrastructure. Docker images and one-line deploy scripts are available.

Six formats: Markdown (clean readable text), JSON (structured with metadata), HTML (sanitized), plain text, LLM-optimized (stripped of noise for AI consumption), and raw HTML. The LLM format runs a 9-step optimization pipeline to minimize token usage.

Webclaw ships a Model Context Protocol server binary that exposes 12 tools: scrape, crawl, map, batch, extract, summarize, diff, brand, search, research, vertical_scrape, and list_extractors. Works with any MCP client (Claude Desktop, Claude Code, Cursor, Windsurf, Codex, Antigravity) over stdio.

Your extracted content is never stored or logged on our servers. Requests are processed in real-time and the response is returned directly to you. If you use LLM features, content is sent to the AI provider for processing but is not retained. For full control, self-host the entire stack.

Webclaw can use language models to extract structured JSON from pages using a schema you define, answer questions about page content with prompt-based extraction, or generate summaries. It chains through local Ollama first, then falls back to cloud providers.

Ready to build?

Start extracting.

7-day Starter trial. Cancel anytime. Deploy in under a minute — or self-host forever. Open source.

Studio partners

Backing open web extraction

View partners
Quantum ProxiesProxy-SellerQuantum ProxiesProxy-SellerQuantum ProxiesProxy-Seller
Quantum ProxiesProxy-SellerQuantum ProxiesProxy-SellerQuantum ProxiesProxy-Seller