Cloud API

Web extraction, as a service.

A REST API for production applications. Automatic bot protection, JS rendering, LLM-optimized output, and structured data extraction. One key, every format.

Full API reference

Quick start

Three steps to your first extraction.

1Get your API key
Sign up at webclaw.io/dashboard and grab your key from the dashboard.

2Make your first request

bash

curl -X POST https://api.webclaw.io/v1/scrape \
  -H "Authorization: Bearer $WEBCLAW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "formats": ["markdown"]}'

3Get clean results

json

{
  "success": true,
  "data": {
    "url": "https://example.com",
    "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
    "metadata": {
      "title": "Example Domain",
      "description": "Example Domain",
      "status_code": 200,
      "response_time_ms": 118
    }
  }
}

SDKs

Official clients for the languages you use.

python

import webclaw

client = webclaw.Client(api_key="WEBCLAW_API_KEY")

result = client.scrape("https://example.com", formats=["markdown", "json"])
print(result.markdown)

PyPI · pip install webclawRead the docs GitHub

Endpoints

Everything you need for web extraction at scale.

POST/v1/scrape

Extract content from any URL in 9 output formats

POST/v1/crawl

Start a BFS crawl of an entire site

GET/v1/crawl/:id

Check progress and retrieve crawl results

POST/v1/map

Discover all URLs via sitemap and link parsing

POST/v1/batch

Extract multiple URLs in a single request

POST/v1/extract

LLM-powered structured data extraction with prompt-to-schema

POST/v1/summarize

AI-generated page summaries

POST/v1/diff

Track content changes between snapshots

POST/v1/brand

Extract brand identity (colors, fonts, logos)

POST/v1/search

Web search with optional page scraping. Query search engines and optionally scrape results for full content.

POST/v1/research

Deep multi-source research with AI synthesis. Analyzes dozens of sources and produces cited reports.

GET/v1/research/:id

Check progress and retrieve research results

Built for production

Every request runs on battle-tested infrastructure.

Automatic antibot bypass

Challenge pages, CAPTCHAs and fingerprinting handled transparently on every request.

Built-in caching

Configurable TTL per request. Identical URLs return cached results instantly.

JS-rendered pages

Full support for SPAs, React, Next.js. We render only when a page needs it, with nothing on your side.

9 output formats

Markdown, text, JSON, LLM-optimized, links, rawHtml, attributes, query, and screenshot. Request any combination per scrape.

Rate-limited and managed

Per-key rate limits, usage tracking, and automatic retries built in.

YouTube transcript extraction

Auto-detected for youtube.com/watch URLs. Structured markdown with title, channel, views, and full transcript.

Prompt-to-schema generation

Send just a prompt to /v1/extract. The LLM generates a JSON schema, then extracts structured data matching it.

Page-level Q&A

Ask a natural language question about any page with the query format. The LLM reads the content and returns the answer.

Start extracting. Scale when you need to.

Cancel anytime. One key for every format and endpoint.

Full API reference