The web scraper
your AI agent
actually deserves.
One command setup
MCP + CLI
Give your AI agents web data with a single command. Auto-detects your tools and configures everything.
Learn morenpx create-webclawWorks with Claude Code, Cursor,
Windsurf, Codex, OpenCode, and more
Try it live
Paste any URL
Every page.
Every defense.
Fast by default. Smart when needed.
118ms average for static pages. Firecrawl's published P95 is 3.4s. Multi-layer rendering pipeline for JS-heavy sites — the engine picks the fastest path automatically. You configure nothing.
Drop-in Firecrawl replacement.
Change your base URL. Keep your existing SDK code. The /v2 endpoints are fully compatible — same API shape, same response format, no rewrite needed. Better extraction quality, faster response times.
Best-in-class bot protection.
Challenge pages, CAPTCHAs, browser fingerprinting — handled transparently. No manual cookies, no config. Your requests just work, even on the hardest sites.
Every format, every extraction.
Markdown, JSON, plain text, LLM-optimized. Schema-based extraction, prompt-based extraction, summarization, brand identity, content diffing. 14 endpoints, one API key.
Built for AI agents.
MCP server with 12 tools for Claude, Cursor, Windsurf, OpenCode, Codex, Antigravity, and any MCP client. REST API for everything else. Web search, batch processing, crawling, sitemap discovery.
67% fewer tokens.
The LLM format runs a 9-step optimization pipeline — strips nav, ads, boilerplate, repeated elements. The median page goes from 3,800 tokens raw to 950 tokens of actual content. Your agent gets more, spends less.
Agentic scraping.
Give a goal, get structured data. The AI agent reasons about page content, clicks buttons, navigates, and extracts exactly what you asked for. Powered by the best available models.
Deep content recovery.
Embedded JSON, structured data, server-rendered payloads — extracted even when the visible DOM is empty. Auto-detects PDFs, DOCX, XLSX. Multiple fallback strategies. If the content exists, webclaw finds it.
FROM THE BLOG
Latest posts
Apr 14, 2026
Web Scraping with LangChain in 2026 — The Complete Guide
LangChain's built-in loaders break on bot-protected sites and return raw HTML your LLM can't use. Here's how to get clean, reliable web data into any LangChain pipeline.
Apr 10, 2026
How to scrape Google search results in 2026
Google killed plain HTTP access to search results. Here's what works now, from TLS fingerprinting libraries to headless browsers to APIs, with code examples for each approach.
Apr 7, 2026
Best web scraping APIs for LLMs in 2026
If you're building with LLMs, you need web data. Here's how the main scraping APIs compare on the things that actually matter for AI use cases.
Apr 2, 2026
How to bypass Cloudflare bot protection when scraping
Cloudflare protects over 20% of the web. If you're scraping, you've hit a 403. Here's what actually works, what doesn't, and why most tools get it wrong.
One credit.
One page.
No hidden multipliers. No per-feature charges. Pick a plan, start extracting.
Unlimited pages. Unlimited research. 200 concurrent. Single-tenant on your cloud, your proxies, your rules. Dedicated Slack channel + SLA.
Self-host forever. AGPL-3.0 license. CLI + server + MCP server. No limits on your hardware.
1 CREDIT = 1 PAGE, ALWAYS · NO HIDDEN MULTIPLIERS · OPEN SOURCE
Common questions
FAQ
Webclaw is a web extraction toolkit that converts any website into clean, structured data. It supports multiple output formats — Markdown, JSON, HTML, plain text, and an LLM-optimized format that strips noise and reduces token count by up to 67%.
Webclaw uses raw HTTP requests with TLS fingerprint impersonation instead of spinning up a headless browser. This means sub-200ms response times, zero browser overhead, and no Selenium or Playwright dependency. It achieves the same results through intelligent content extraction and readability scoring.
Yes. The Starter plan is completely free — 500 pages per month, 5 output formats, sitemap discovery, and full API access. No credit card required. You can upgrade anytime if you need higher limits or advanced features like LLM extraction.
Absolutely. Webclaw is open source under the AGPL-3.0 license. You can run the CLI, REST API server, or MCP server on your own infrastructure. Docker images and one-line deploy scripts are available for quick setup.
Six formats: Markdown (clean readable text), JSON (structured with metadata), HTML (sanitized), plain text, LLM-optimized (stripped of noise for AI consumption), and raw HTML. The LLM format runs a 9-step optimization pipeline to minimize token usage.
Webclaw ships a dedicated MCP (Model Context Protocol) server binary that exposes 8 tools — scrape, crawl, map, batch, extract, summarize, diff, and brand. It works with any MCP-compatible client like Claude Desktop, Claude Code, Cursor, Windsurf, OpenCode, Codex, or Antigravity over stdio transport.
Your extracted content is never stored or logged on our servers. Requests are processed in real-time and the response is returned directly to you. If you use LLM features, content is sent to the AI provider for processing but is not retained. For full control, self-host the entire stack.
Webclaw can use language models to extract structured JSON from pages using a schema you define, answer questions about page content with prompt-based extraction, or generate summaries. It chains through local Ollama first, then falls back to cloud providers.
Ready to build?
Start extracting.
Free tier. No credit card. Deploy in under a minute — or self-host forever. Open source.