webclaw

Extract

Extract structured JSON data from any URL. Provide a JSON schema for typed output, or a natural language prompt for flexible extraction. Both modes use an LLM to parse the page content.

POST/v1/extract

Extract structured data from a URL using a JSON schema or natural language prompt.

Note
This endpoint requires an LLM provider. The provider chain tries Ollama (local) first, then falls back to OpenAI, then Anthropic. At least one must be configured.

Schema mode

Provide a JSON Schema and the LLM will return data conforming to it. This gives you predictable, typed output.

Request body

json
{
  "url": "https://example.com/pricing",
  "schema": {
    "type": "object",
    "properties": {
      "title": { "type": "string" },
      "price": { "type": "number" },
      "currency": { "type": "string" },
      "features": {
        "type": "array",
        "items": { "type": "string" }
      }
    }
  }
}

Response

json
{
  "data": {
    "title": "Pro Plan",
    "price": 49,
    "currency": "USD",
    "features": [
      "Unlimited extractions",
      "Priority support",
      "Custom browser profiles"
    ]
  }
}

Prompt mode

Describe what you want in plain English. The LLM will determine the structure based on your prompt and the page content.

Request body

json
{
  "url": "https://example.com/pricing",
  "prompt": "Extract all pricing tiers with name, price, and features"
}

Response

json
{
  "data": {
    "tiers": [
      {
        "name": "Free",
        "price": 0,
        "features": ["500 pages/month", "Community support"]
      },
      {
        "name": "Pro",
        "price": 49,
        "features": ["100k pages/month", "Priority support", "Custom profiles"]
      },
      {
        "name": "Scale",
        "price": 199,
        "features": ["500k pages/month", "Dedicated support", "SLA"]
      }
    ]
  }
}

Parameters

FieldTypeRequiredDescription
urlstringYesURL to extract data from.
schemaobjectNo*JSON Schema defining the desired output structure.
promptstringNo*Natural language description of what to extract.
Warning
You must provide either schema or prompt. If both are provided, schema takes precedence.

LLM provider chain

The extract endpoint tries LLM providers in this order:

  1. Ollama (local) -- free, no API key needed. Set OLLAMA_HOST if not running on localhost.
  2. OpenAI -- requires OPENAI_API_KEY.
  3. Anthropic -- requires ANTHROPIC_API_KEY.

Example

curl -- schema mode
curl -X POST https://api.webclaw.io/v1/extract \
  -H "Authorization: Bearer wc_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/product/widget",
    "schema": {
      "type": "object",
      "properties": {
        "name": { "type": "string" },
        "price": { "type": "number" },
        "in_stock": { "type": "boolean" }
      }
    }
  }'
curl -- prompt mode
curl -X POST https://api.webclaw.io/v1/extract \
  -H "Authorization: Bearer wc_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/team",
    "prompt": "Extract all team members with name, role, and LinkedIn URL"
  }'