CLOUD API

Web extraction API.

REST API for production applications. Antibot bypass, JS rendering, LLM-optimized output, and structured data extraction. One key, every format.

Quick start

Three steps to your first extraction.

1Get your API key

Sign up at webclaw.io/dashboard and grab your key from the dashboard.

2Make your first request
bash
curl -X POST https://api.webclaw.io/v1/scrape \
  -H "Authorization: Bearer $WEBCLAW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "formats": ["markdown"]}'
3Get clean results
Response
{
  "success": true,
  "data": {
    "url": "https://example.com",
    "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
    "metadata": {
      "title": "Example Domain",
      "description": "Example Domain",
      "status_code": 200,
      "response_time_ms": 118
    }
  }
}

SDK quickstart

Official clients for the languages you use.

import webclaw

client = webclaw.Client(api_key="your-key")
result = client.scrape("https://example.com")
print(result.markdown)

12 endpoints

Everything you need for web extraction at scale.

POST/v1/scrape

Extract content from any URL in 9 output formats

POST/v1/crawl

Start a BFS crawl of an entire site

GET/v1/crawl/:id

Check progress and retrieve crawl results

POST/v1/map

Discover all URLs via sitemap and link parsing

POST/v1/batch

Extract multiple URLs in a single request

POST/v1/extract

LLM-powered structured data extraction with prompt-to-schema

POST/v1/summarize

AI-generated page summaries

POST/v1/diff

Track content changes between snapshots

POST/v1/brand

Extract brand identity (colors, fonts, logos)

POST/v1/search

Web search with optional page scraping. Query search engines and optionally scrape results for full content.

POST/v1/research

Deep multi-source research with AI synthesis. Analyzes dozens of sources and produces cited reports.

Built for production

Every request goes through battle-tested infrastructure.

Automatic antibot bypass

Cloudflare, DataDome, AWS WAF. Handled transparently on every request.

Built-in caching

Configurable TTL per request. Identical URLs return cached results instantly.

JS-rendered pages

Full support for SPAs, React, Next.js. No browser on your side.

9 output formats

Markdown, text, JSON, LLM-optimized, links, rawHtml, attributes, query, and screenshot. Request any combination per scrape.

Rate-limited and managed

Per-key rate limits, usage tracking, and automatic retries built in.

YouTube transcript extraction

Auto-detected for youtube.com/watch URLs. Structured markdown with title, channel, views, and full transcript.

Prompt-to-schema generation

Send just a prompt to /v1/extract -- the LLM generates a JSON schema, then extracts structured data matching it.

Page-level Q&A

Ask a natural language question about any page with the query format. LLM reads the content and returns the answer.

Ready to build?

Start extracting.

7-day Starter trial. Cancel anytime. Scale when you need to.