[ 200 OK ][ HTML ][ JSON ][ CRAWL ]

>webclaw·The extraction engineThe web scraper
your AI agent
deserves

Clean structured data for your agents. In milliseconds, not seconds.

118ms avg response90% fewer tokensDrop-in Firecrawl replacement

$GET STARTED→GITHUB

Global nodes online: 12

One-command setup · MCP + CLIAuto-detects your tools and configures everything.

Works with

Endpoints

Ten surfaces. One extraction engine.

Pick an endpoint to see what it does, how you'd call it, and where to dive into the reference docs.

/v1/scrape

Scrape

Single-page extraction

Fetch any URL and return clean markdown, JSON, HTML, or LLM-ready text. Chrome-grade TLS fingerprinting and automatic antibot escalation built-in.

OPEN DOCS →

webclaw ~ liveLIVE

pages extracted

bot walls bypassed

websites scraped

github stars

Every page.
Every defense.

Fast by default. Smart when needed.

118ms average for static pages. Firecrawl's published P95 is 3.4s. Multi-layer rendering pipeline for JS-heavy sites. The engine picks the fastest path automatically. You configure nothing.

Drop-in Firecrawl replacement.

Change your base URL. Keep your existing SDK code. The /v2 endpoints are fully compatible. Same API shape, same response format, no rewrite needed. Better extraction quality, faster response times.

Best-in-class bot protection.

Challenge pages, CAPTCHAs, browser fingerprinting, all handled transparently. No manual cookies, no config. Your requests just work, even on the hardest sites.

Every format, every extraction.

Markdown, JSON, plain text, LLM-optimized. Schema-based extraction, prompt-based extraction, summarization, brand identity, content diffing. 14 endpoints, one API key.

Built for AI agents.

MCP server with 12 tools for Claude, Cursor, Windsurf, OpenCode, Codex, Antigravity, and any MCP client. REST API for everything else. Web search, batch processing, crawling, sitemap discovery.

90% fewer tokens.

The LLM format runs a 9-step optimization pipeline. Strips nav, ads, boilerplate, repeated elements. Measured on 18 production sites, median page drops 95% in token count while preserving content. Your agent gets more, spends less.

Agentic scraping.

Give a goal, get structured data. The AI agent reasons about page content, clicks buttons, navigates, and extracts exactly what you asked for. Powered by the best available models.

Deep content recovery.

Embedded JSON, structured data, server-rendered payloads, extracted even when the visible DOM is empty. Auto-detects PDFs, DOCX, XLSX. Multiple fallback strategies. If the content exists, webclaw finds it.

FROM THE BLOG

Latest posts

VIEW ALL →

May 28, 2026

Jina Reader Alternative for LLM Web Scraping

Compare Jina Reader, r.jina.ai, and Webclaw for URL to markdown, RAG input, crawling, batching, JavaScript rendering, anti-bot pages, and production extraction.

May 26, 2026

Crawl4AI vs Playwright for LLM Web Scraping

Compare Crawl4AI and Playwright for scraping dynamic sites, RAG input, markdown output, browser control, and production reliability.

May 21, 2026

JavaScript Rendering API for Web Scraping: when browser fallback is actually needed

Learn when a JavaScript rendering API is necessary for scraping dynamic websites, how to detect empty app shells, and why browser fallback should run only after response classification.

May 19, 2026

Anti-Bot Scraping API 2026: signals that force browser fallback

The exact block markers, JA4 fingerprints, empty shells, anti-bot cookies, JavaScript heuristics, and content-quality signals that decide when a scraping API should escalate to a browser.

VIEW ALL →

One credit.
One page.

One pool covers every endpoint. Heavier operations like antibot or LLM extract use a few extra credits. Research has its own counter so deep runs cannot drain your budget.

SAVE 20%

STARTER

$15/mo billed yearly

CREDITS················································································10,000/mo

RESEARCH················································································3 RUNS/mo

MAX SOURCES················································································10

CONCURRENCY················································································5

SUPPORT················································································EMAIL

GROWTHPOPULAR

$39/mo billed yearly

CREDITS················································································100,000/mo

RESEARCH················································································10 RUNS/mo

MAX SOURCES················································································20

CONCURRENCY················································································20

SUPPORT················································································PRIORITY

PRO

$79/mo billed yearly

CREDITS················································································250,000/mo

RESEARCH················································································20 RUNS/mo

MAX SOURCES················································································30

CONCURRENCY················································································50

SUPPORT················································································PRIORITY

SCALE

$319/mo billed yearly

CREDITS················································································1,000,000/mo

RESEARCH················································································60 RUNS/mo

MAX SOURCES················································································100

CONCURRENCY················································································100

SUPPORT················································································PRIORITY + SLACK

HOW CREDITS WORK

PLAIN PAGE················································································1 CREDIT

JS RENDER················································································+2 CREDITS

ANTIBOT SOLVE················································································+9 CREDITS

SEARCH / 10 RESULTS················································································2 CREDITS

SUMMARIZE················································································10 CREDITS

BRAND················································································5 CREDITS

DIFF················································································2 CREDITS

LLM EXTRACT················································································25 CREDITS

Research is metered separately as runs per month, with a per-tier cap on max sources so deep mode stays bounded.

DEDICATED

Unlimited pages. Unlimited research. 200 concurrent. Single-tenant on your cloud, your proxies, your rules. Dedicated Slack channel + SLA.

OPEN SOURCE

Self-host forever. AGPL-3.0 license. CLI + server + MCP server. No limits on your hardware.

VIEW ON GITHUB

Common questions

FAQ

Webclaw is a web extraction toolkit that turns any website into clean, structured data. Output formats include Markdown, JSON, HTML, plain text, and an LLM-optimized mode that strips noise and cuts token count by around 90% vs raw HTML.

Webclaw uses HTTP with TLS fingerprint impersonation instead of spinning up a headless browser. Sub-200ms response times, zero browser overhead, no Selenium or Playwright dependency. Content extraction runs via readability scoring plus a 9-step pipeline, no browser needed for most pages.

Yes. Starter comes with a 7-day free trial. Card required up front so we don't get drowned in throwaway signups, and you can cancel any time during the trial directly from the billing portal. No charge if you cancel before day 7. If you want to use Webclaw without paying ever, the open-source version (AGPL-3.0) runs locally with no limits on your hardware.

Yes. Webclaw is open source under AGPL-3.0. You can run the CLI, REST API server, or MCP server on your own infrastructure. Docker images and one-line deploy scripts are available.

Six formats: Markdown (clean readable text), JSON (structured with metadata), HTML (sanitized), plain text, LLM-optimized (stripped of noise for AI consumption), and raw HTML. The LLM format runs a 9-step optimization pipeline to minimize token usage.

Webclaw ships a Model Context Protocol server binary that exposes 12 tools: scrape, crawl, map, batch, extract, summarize, diff, brand, search, research, vertical_scrape, and list_extractors. Works with any MCP client (Claude Desktop, Claude Code, Cursor, Windsurf, Codex, Antigravity) over stdio.

Your extracted content is never stored or logged on our servers. Requests are processed in real-time and the response is returned directly to you. If you use LLM features, content is sent to the AI provider for processing but is not retained. For full control, self-host the entire stack.

Webclaw can use language models to extract structured JSON from pages using a schema you define, answer questions about page content with prompt-based extraction, or generate summaries. It chains through local Ollama first, then falls back to cloud providers.

Ready to build?

Start extracting.

7-day Starter trial. Cancel anytime. Deploy in under a minute — or self-host forever. Open source.

▌

GET STARTED→VIEW ON GITHUB

>webclaw·The extraction engineThe web scraperyour AI agentdeserves