Agent Scrape

Let AI figure out how to get the data you need. Describe what you want in plain language, and Webclaw uses Claude to reason about the page, navigate through it, and extract exactly what you asked for.

POST/v1/agent-scrape

Run an AI-guided scrape that reasons about the page and extracts content based on your prompt.

Note

Agent scrape uses Claude under the hood. Each request consumes scrape credits based on the number of steps taken (1 credit per step).

Request body

json

{
  "url": "https://news.ycombinator.com",
  "prompt": "Find the top 5 posts about AI and return their titles, URLs, points, and comment counts",
  "max_steps": 5
}

Parameters

Field	Type	Required	Description
`url`	`string`	Yes	The starting URL to scrape.
`prompt`	`string`	Yes	Natural language description of what to find or extract. Be specific about the data you want.
`max_steps`	`integer`	No	Maximum reasoning and navigation steps (1-10). Default: 5. Higher values allow the agent to navigate through more pages but cost more credits.

How it works

The agent scrape pipeline runs in a loop of up to max_steps iterations:

Scrape the URL and extract readable content
Send the content and your prompt to Claude for analysis
Claude decides whether to return results, click a link, or navigate to another page
Repeat until the agent has the answer or hits max_steps
Return the final extracted content

Response

json

{
  "content": "## Top 5 AI Posts on Hacker News\n\n| # | Title | Points | Comments |\n|---|-------|--------|----------|\n| 1 | Show HN: Open-source reasoning engine | 342 | 128 |\n| 2 | Claude 4 system card analysis | 289 | 95 |\n| 3 | Fine-tuning LLMs on domain data | 201 | 67 |\n| 4 | AI code review tools comparison | 187 | 54 |\n| 5 | Neural architecture search breakthrough | 156 | 43 |",
  "steps": 2,
  "pages_visited": [
    "https://news.ycombinator.com"
  ],
  "elapsed_ms": 4200
}

Response fields

Field	Type	Description
`content`	`string`	The extracted content in markdown, shaped by your prompt.
`steps`	`integer`	Number of reasoning steps the agent took.
`pages_visited`	`string[]`	URLs the agent visited during extraction.
`elapsed_ms`	`integer`	Total processing time in milliseconds.

SDK examples

Python

python

from webclaw import Webclaw

client = Webclaw(api_key="wc_...")

result = client.agent_scrape(
    url="https://news.ycombinator.com",
    prompt="Find the top 5 posts about AI with titles, URLs, points, and comment counts",
)
print(result.content)
print(f"Completed in {result.steps} steps, visited {len(result.pages_visited)} pages")

TypeScript

typescript

import Webclaw from "@webclaw/sdk";

const client = new Webclaw({ apiKey: "wc_..." });

const result = await client.agentScrape({
  url: "https://news.ycombinator.com",
  prompt: "Find the top 5 posts about AI with titles, URLs, points, and comment counts",
});
console.log(result.content);
console.log(`${result.steps} steps, ${result.pagesVisited.length} pages`);

cURL

bash

curl -X POST https://api.webclaw.io/v1/agent-scrape \
  -H "Authorization: Bearer wc_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://news.ycombinator.com",
    "prompt": "Find the top 5 posts about AI with titles, URLs, points, and comment counts",
    "max_steps": 5
  }'

Tip

Write specific prompts for better results. Instead of "get data", try "extract all product names, prices, and availability status from the catalog page".

Note

Agent scrape requires a browser rendering engine on the server. If the page is simple enough, the agent may complete in a single step without navigation.