POST /v1/scrape

HTML to markdown, clean enough to feed an LLM.

Convert any URL into clean GitHub-flavored markdown with one call.

Built for RAG pipelines and agents that need readable context, not raw DOM. Send a URL, get back GitHub-flavored markdown stripped of nav, ads, and boilerplate, with headings, links, tables, and code blocks preserved. Built in Rust, returns static pages in around 118ms, and handles JavaScript rendering and bot protection automatically.

View docs
What you get

Everything in one call.

GitHub-flavored markdown

Standard GFM with headings, links, tables, and fenced code blocks, ready to render or chunk.

Boilerplate stripped

Nav, ads, footers, and sidebars are removed so only the real content survives the conversion.

Token-lean output

Clean markdown drops roughly 90% of the tokens you would spend parsing raw HTML.

Structure preserved

Tables stay tables, code stays in code blocks, and link text keeps its targets intact.

How it works

From URL to output in four steps.

01

Send a URL

POST the page URL with formats set to markdown and your Bearer key.

02

Fetch and render

We load the page, running JavaScript and clearing bot protection when the site needs it.

03

Convert to markdown

The HTML is reduced to main content and rewritten as clean GitHub-flavored markdown.

04

Markdown returned

You get the markdown string plus page metadata in a single JSON response.

API

One request, structured back.

The web scraper your AI agent deserves

Turn any website into LLM-ready markdown, JSON, or structured data. Handles protected sites and returns static pages in around 118ms.

  • One credit pool, every endpoint
  • ~90% fewer tokens than raw HTML
  • Self-host the open-source core for free
Common questions

Frequently asked questions

How do I convert HTML to markdown from a URL?

POST the URL to /v1/scrape with formats set to markdown. You get back clean GitHub-flavored markdown plus page metadata, with nav, ads, and boilerplate already stripped out.

What markdown format does the API return?

GitHub-flavored markdown. Headings, links, ordered and unordered lists, tables, and fenced code blocks are all preserved so the output renders correctly and chunks cleanly for RAG.

Is markdown better than raw HTML for feeding LLMs?

Yes. Clean markdown carries the same content with roughly 90% fewer tokens than raw HTML, so you spend less context, cut cost, and give the model far less noise to reason over.

Can it convert pages that need JavaScript or block bots?

Yes. JavaScript rendering and bot protection are handled automatically. You send the same request and get markdown back, whether the page is static or heavily defended.

Am I billed for failed requests?

No. Credits are only consumed on successful responses. A standard page is 1 credit; heavier work like JS rendering or protected-site access costs a few extra credits.

Ship an agent that actually sees the web.

One credit pool, every endpoint. Cancel anytime, or self-host the open-source core for free.

API docs