CLOUD API
Web extraction API.
REST API for production applications. Antibot bypass, JS rendering, LLM-optimized output, and structured data extraction. One key, every format.
Quick start
Three steps to your first extraction.
Sign up at webclaw.io/dashboard and grab your key from the dashboard.
curl -X POST https://api.webclaw.io/v1/scrape \
-H "Authorization: Bearer $WEBCLAW_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "formats": ["markdown"]}'{
"success": true,
"data": {
"url": "https://example.com",
"markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
"metadata": {
"title": "Example Domain",
"description": "Example Domain",
"status_code": 200,
"response_time_ms": 118
}
}
}SDK quickstart
Official clients for the languages you use.
import webclaw
client = webclaw.Client(api_key="your-key")
result = client.scrape("https://example.com")
print(result.markdown)12 endpoints
Everything you need for web extraction at scale.
/v1/scrapeExtract content from any URL in 9 output formats
/v1/crawlStart a BFS crawl of an entire site
/v1/crawl/:idCheck progress and retrieve crawl results
/v1/mapDiscover all URLs via sitemap and link parsing
/v1/batchExtract multiple URLs in a single request
/v1/extractLLM-powered structured data extraction with prompt-to-schema
/v1/summarizeAI-generated page summaries
/v1/diffTrack content changes between snapshots
/v1/brandExtract brand identity (colors, fonts, logos)
/v1/searchWeb search with optional page scraping. Query search engines and optionally scrape results for full content.
/v1/researchDeep multi-source research with AI synthesis. Analyzes dozens of sources and produces cited reports.
Built for production
Every request goes through battle-tested infrastructure.
Automatic antibot bypass
Cloudflare, DataDome, AWS WAF. Handled transparently on every request.
Built-in caching
Configurable TTL per request. Identical URLs return cached results instantly.
JS-rendered pages
Full support for SPAs, React, Next.js. No browser on your side.
9 output formats
Markdown, text, JSON, LLM-optimized, links, rawHtml, attributes, query, and screenshot. Request any combination per scrape.
Rate-limited and managed
Per-key rate limits, usage tracking, and automatic retries built in.
YouTube transcript extraction
Auto-detected for youtube.com/watch URLs. Structured markdown with title, channel, views, and full transcript.
Prompt-to-schema generation
Send just a prompt to /v1/extract -- the LLM generates a JSON schema, then extracts structured data matching it.
Page-level Q&A
Ask a natural language question about any page with the query format. LLM reads the content and returns the answer.
Ready to build?
Start extracting.
7-day Starter trial. Cancel anytime. Scale when you need to.