>webclaw·The extraction engineThe web scraper
your AI agent
deserves
Endpoints
Ten surfaces. One extraction engine.
Pick an endpoint to see what it does, how you'd call it, and where to dive into the reference docs.
Every page.
Every defense.
Fast by default. Smart when needed.
118ms average for static pages. Firecrawl's published P95 is 3.4s. Multi-layer rendering pipeline for JS-heavy sites. The engine picks the fastest path automatically. You configure nothing.
Drop-in Firecrawl replacement.
Change your base URL. Keep your existing SDK code. The /v2 endpoints are fully compatible. Same API shape, same response format, no rewrite needed. Better extraction quality, faster response times.
Best-in-class bot protection.
Challenge pages, CAPTCHAs, browser fingerprinting, all handled transparently. No manual cookies, no config. Your requests just work, even on the hardest sites.
Every format, every extraction.
Markdown, JSON, plain text, LLM-optimized. Schema-based extraction, prompt-based extraction, summarization, brand identity, content diffing. 14 endpoints, one API key.
Built for AI agents.
MCP server with 12 tools for Claude, Cursor, Windsurf, OpenCode, Codex, Antigravity, and any MCP client. REST API for everything else. Web search, batch processing, crawling, sitemap discovery.
90% fewer tokens.
The LLM format runs a 9-step optimization pipeline. Strips nav, ads, boilerplate, repeated elements. Measured on 18 production sites, median page drops 95% in token count while preserving content. Your agent gets more, spends less.
Agentic scraping.
Give a goal, get structured data. The AI agent reasons about page content, clicks buttons, navigates, and extracts exactly what you asked for. Powered by the best available models.
Deep content recovery.
Embedded JSON, structured data, server-rendered payloads, extracted even when the visible DOM is empty. Auto-detects PDFs, DOCX, XLSX. Multiple fallback strategies. If the content exists, webclaw finds it.
FROM THE BLOG
Latest posts
May 21, 2026
JavaScript Rendering API for Web Scraping: when browser fallback is actually needed
Learn when a JavaScript rendering API is necessary for scraping dynamic websites, how to detect empty app shells, and why browser fallback should run only after response classification.
May 19, 2026
Anti-Bot Scraping API 2026: signals that force browser fallback
The exact block markers, JA4 fingerprints, empty shells, anti-bot cookies, JavaScript heuristics, and content-quality signals that decide when a scraping API should escalate to a browser.
May 14, 2026
Anti-bot scraping API: browser fallback beats browser-first
Choose an anti-bot scraping API that detects blocks, avoids browser-first costs, and returns clean markdown or JSON for AI agents and RAG.
May 12, 2026
How to evaluate web scraping APIs for AI agents
A practical checklist for testing web scraping APIs on real agent and RAG workflows, not toy URLs like example.com.
One credit.
One page.
One pool covers every endpoint. Heavier operations like antibot or LLM extract use a few extra credits. Research has its own counter so deep runs cannot drain your budget.
Research is metered separately as runs per month, with a per-tier cap on max sources so deep mode stays bounded.
Unlimited pages. Unlimited research. 200 concurrent. Single-tenant on your cloud, your proxies, your rules. Dedicated Slack channel + SLA.
Self-host forever. AGPL-3.0 license. CLI + server + MCP server. No limits on your hardware.
Common questions
FAQ
Webclaw is a web extraction toolkit that turns any website into clean, structured data. Output formats include Markdown, JSON, HTML, plain text, and an LLM-optimized mode that strips noise and cuts token count by around 90% vs raw HTML.
Webclaw uses HTTP with TLS fingerprint impersonation instead of spinning up a headless browser. Sub-200ms response times, zero browser overhead, no Selenium or Playwright dependency. Content extraction runs via readability scoring plus a 9-step pipeline, no browser needed for most pages.
Yes. Starter comes with a 7-day free trial. Card required up front so we don't get drowned in throwaway signups, and you can cancel any time during the trial directly from the billing portal. No charge if you cancel before day 7. If you want to use Webclaw without paying ever, the open-source version (AGPL-3.0) runs locally with no limits on your hardware.
Yes. Webclaw is open source under AGPL-3.0. You can run the CLI, REST API server, or MCP server on your own infrastructure. Docker images and one-line deploy scripts are available.
Six formats: Markdown (clean readable text), JSON (structured with metadata), HTML (sanitized), plain text, LLM-optimized (stripped of noise for AI consumption), and raw HTML. The LLM format runs a 9-step optimization pipeline to minimize token usage.
Webclaw ships a Model Context Protocol server binary that exposes 12 tools: scrape, crawl, map, batch, extract, summarize, diff, brand, search, research, vertical_scrape, and list_extractors. Works with any MCP client (Claude Desktop, Claude Code, Cursor, Windsurf, Codex, Antigravity) over stdio.
Your extracted content is never stored or logged on our servers. Requests are processed in real-time and the response is returned directly to you. If you use LLM features, content is sent to the AI provider for processing but is not retained. For full control, self-host the entire stack.
Webclaw can use language models to extract structured JSON from pages using a schema you define, answer questions about page content with prompt-based extraction, or generate summaries. It chains through local Ollama first, then falls back to cloud providers.
Ready to build?
Start extracting.
7-day Starter trial. Cancel anytime. Deploy in under a minute — or self-host forever. Open source.

