webclaw

Agent Scrape

Let AI figure out how to get the data you need. Describe what you want in plain language, and Webclaw uses Claude to reason about the page, navigate through it, and extract exactly what you asked for.

POST/v1/agent-scrape

Run an AI-guided scrape that reasons about the page and extracts content based on your prompt.

Note
Agent scrape uses Claude under the hood. Each request consumes scrape credits based on the number of steps taken (1 credit per step).

Request body

json
{
  "url": "https://news.ycombinator.com",
  "prompt": "Find the top 5 posts about AI and return their titles, URLs, points, and comment counts",
  "max_steps": 5
}

Parameters

FieldTypeRequiredDescription
urlstringYesThe starting URL to scrape.
promptstringYesNatural language description of what to find or extract. Be specific about the data you want.
max_stepsintegerNoMaximum reasoning and navigation steps (1-10). Default: 5. Higher values allow the agent to navigate through more pages but cost more credits.

How it works

The agent scrape pipeline runs in a loop of up to max_steps iterations:

  1. Scrape the URL and extract readable content
  2. Send the content and your prompt to Claude for analysis
  3. Claude decides whether to return results, click a link, or navigate to another page
  4. Repeat until the agent has the answer or hits max_steps
  5. Return the final extracted content

Response

json
{
  "content": "## Top 5 AI Posts on Hacker News\n\n| # | Title | Points | Comments |\n|---|-------|--------|----------|\n| 1 | Show HN: Open-source reasoning engine | 342 | 128 |\n| 2 | Claude 4 system card analysis | 289 | 95 |\n| 3 | Fine-tuning LLMs on domain data | 201 | 67 |\n| 4 | AI code review tools comparison | 187 | 54 |\n| 5 | Neural architecture search breakthrough | 156 | 43 |",
  "steps": 2,
  "pages_visited": [
    "https://news.ycombinator.com"
  ],
  "elapsed_ms": 4200
}

Response fields

FieldTypeDescription
contentstringThe extracted content in markdown, shaped by your prompt.
stepsintegerNumber of reasoning steps the agent took.
pages_visitedstring[]URLs the agent visited during extraction.
elapsed_msintegerTotal processing time in milliseconds.

SDK examples

Python

python
from webclaw import Webclaw

client = Webclaw(api_key="wc_...")

result = client.agent_scrape(
    url="https://news.ycombinator.com",
    prompt="Find the top 5 posts about AI with titles, URLs, points, and comment counts",
)
print(result.content)
print(f"Completed in {result.steps} steps, visited {len(result.pages_visited)} pages")

TypeScript

typescript
import Webclaw from "@webclaw/sdk";

const client = new Webclaw({ apiKey: "wc_..." });

const result = await client.agentScrape({
  url: "https://news.ycombinator.com",
  prompt: "Find the top 5 posts about AI with titles, URLs, points, and comment counts",
});
console.log(result.content);
console.log(`${result.steps} steps, ${result.pagesVisited.length} pages`);

cURL

bash
curl -X POST https://api.webclaw.io/v1/agent-scrape \
  -H "Authorization: Bearer wc_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://news.ycombinator.com",
    "prompt": "Find the top 5 posts about AI with titles, URLs, points, and comment counts",
    "max_steps": 5
  }'
Tip
Write specific prompts for better results. Instead of "get data", try "extract all product names, prices, and availability status from the catalog page".
Note
Agent scrape requires a browser rendering engine on the server. If the page is simple enough, the agent may complete in a single step without navigation.