POST /v1/scrape

Web scraping built for LLMs.

Turn any URL into clean, LLM-ready content with one call.

Send a URL, get back markdown, JSON, structured text, or raw HTML, stripped of nav, ads, and boilerplate. Built in Rust, returns static pages in around 118ms, and handles JavaScript rendering and bot protection automatically.

View docs
What you get

Everything in one call.

LLM-ready formats

Choose markdown, JSON, llm, text, or raw HTML per request. No HTML parsing on your side.

~90% fewer tokens

Boilerplate, nav, and ads are stripped so your model spends tokens on signal, not chrome.

Protected sites, handled

JavaScript rendering and bot-protection bypass kick in automatically when a page needs them.

Rich metadata

Title, description, language, status, and timing come back alongside the content on every call.

How it works

From URL to output in four steps.

01

Send a URL

POST the URL plus an optional format. No headless browser to run, no proxies to manage.

02

We fetch it

The engine picks a fingerprint and escalates to JS rendering or anti-bot only if the page needs it.

03

Content is cleaned

HTML is parsed and converted to your chosen format, with boilerplate removed.

04

Structured output returned

Clean content plus metadata comes back, ready to pass to any LLM, vector store, or pipeline.

API

One request, structured back.

The web scraper your AI agent deserves

Turn any website into LLM-ready markdown, JSON, or structured data. Handles protected sites and returns static pages in around 118ms.

  • One credit pool, every endpoint
  • ~90% fewer tokens than raw HTML
  • Self-host the open-source core for free
Common questions

Frequently asked questions

How do I scrape a website and get clean markdown?

POST the URL to /v1/scrape with the markdown format. webclaw fetches the page, strips nav, ads, and boilerplate, and returns GitHub-flavored markdown ready for an LLM, with no HTML parsing on your side.

Does the web scraping API handle JavaScript-rendered pages?

Yes. The engine detects when a page needs JavaScript rendering and escalates automatically, so single-page apps and client-rendered content come back fully rendered.

Can it scrape sites behind bot protection?

Yes. Bot-protection bypass runs automatically when a page needs it, so protected pages return real content instead of a block page.

What output formats does the scrape API support?

markdown, json, llm, text, and raw html. Pick the one that fits your pipeline: markdown for RAG and prompts, json for structured data, html when you need the raw document.

Am I billed for failed requests?

No. Credits are only consumed on successful responses. A standard page is 1 credit; heavier work like JS rendering or protected-site access costs a few extra credits.

Ship an agent that actually sees the web.

One credit pool, every endpoint. Cancel anytime, or self-host the open-source core for free.

API docs