POST /v1/extract

Structured data extraction powered by LLMs.

Point an LLM at any URL and get back the exact JSON shape you asked for.

Define a JSON schema or write a plain-English prompt, send a URL, and an LLM reads the page and returns typed, structured data ready to feed straight into your agent or pipeline. No selectors, no parsing code, no brittle scrapers to maintain.

View docs

What you get

Everything in one call.

Schema-typed output

Pass a JSON Schema and the response conforms to it exactly, so your code gets predictable, typed fields every time.

Prompt mode

Skip the schema and describe what you want in plain English, and the LLM infers the structure from the page.

Clean input, fewer tokens

Pages are stripped to LLM-ready content before extraction, cutting roughly 90% of the tokens versus raw HTML.

Handles hard pages

JS rendering and bot protection are resolved automatically, so extraction works on the same sites a scrape would.

How it works

From URL to output in four steps.

Send a URL

POST the target URL with either a JSON schema or a natural language prompt describing what to pull.

Fetch and clean

The page is fetched in Rust and stripped of nav, ads, and boilerplate into compact content for the model.

LLM extracts

An LLM reads the cleaned content and maps it onto your schema, or builds a structure that fits your prompt.

JSON returned

You get back a data object with the typed fields you defined, ready to use directly in your application.

API

One request, structured back.

POST /v1/extract

Extracted data

Product: webclaw
Tagline: The web scraper your AI agent deserves.
Key features:
- Fast by default, smart when needed
- Every format, every extraction
- Built for AI agents
- 90% fewer tokens
- Agentic scraping

Common questions

Frequently asked questions

how to extract structured JSON from a website with an LLM

Send a URL to POST /v1/extract along with a JSON schema describing the fields you want. The endpoint fetches and cleans the page, then an LLM maps the content onto your schema and returns a typed data object. No selectors or parsing code required.

schema mode vs prompt mode for web extraction

Schema mode takes a JSON Schema and returns data conforming to it, giving you predictable typed output. Prompt mode takes a plain-English description and lets the LLM infer the structure from the page. If you send both, the schema wins.

can the extract API handle JavaScript-rendered pages and bot protection

Yes. Extraction runs on top of the same fetch path as scraping, so JS rendering and bot protection are handled automatically before the LLM ever sees the content.

Am I billed for failed requests?

No. Credits are only consumed on successful responses. A standard page is 1 credit; heavier work like JS rendering or protected-site access costs a few extra credits.

is webclaw extraction open source and self-hostable

Yes. The core extraction engine is open source and can be self-hosted for free. The hosted API adds managed infrastructure, automatic JS rendering, and bot protection handling on top.

Ship an agent that actually sees the web.

One credit pool, every endpoint. Cancel anytime, or self-host the open-source core for free.

API docs

Every endpoint

Web Scraping API HTML to Markdown API Web Crawler API Sitemap API Web Search API Batch Scraping API Webpage Summarization API Website Change Monitoring API Brand Data API Deep Research API YouTube Transcript API Lead Enrichment API