Extract

Extract structured JSON data from any URL. Provide a JSON schema for typed output, or a natural language prompt for flexible extraction. Both modes use an LLM to parse the page content.

POST/v1/extract

Extract structured data from a URL using a JSON schema or natural language prompt.

Note

This endpoint requires an LLM provider. The provider chain tries Ollama (local) first, then falls back to OpenAI, then Anthropic. At least one must be configured.

Schema mode

Provide a JSON Schema and the LLM will return data conforming to it. This gives you predictable, typed output.

Request body

json

{
  "url": "https://example.com/pricing",
  "schema": {
    "type": "object",
    "properties": {
      "title": { "type": "string" },
      "price": { "type": "number" },
      "currency": { "type": "string" },
      "features": {
        "type": "array",
        "items": { "type": "string" }
      }
    }
  }
}

Response

json

{
  "data": {
    "title": "Pro Plan",
    "price": 49,
    "currency": "USD",
    "features": [
      "Unlimited extractions",
      "Priority support",
      "Custom browser profiles"
    ]
  }
}

Prompt mode

Describe what you want in plain English. The LLM will determine the structure based on your prompt and the page content.

Request body

json

{
  "url": "https://example.com/pricing",
  "prompt": "Extract all pricing tiers with name, price, and features"
}

Response

json

{
  "data": {
    "tiers": [
      {
        "name": "Hobby",
        "price": 9,
        "features": ["1 seat", "Community support"]
      },
      {
        "name": "Pro",
        "price": 49,
        "features": ["5 seats", "Priority support", "Custom profiles"]
      },
      {
        "name": "Scale",
        "price": 199,
        "features": ["500k pages/month", "Dedicated support", "SLA"]
      }
    ]
  }
}

Parameters

Field	Type	Required	Description
`url`	`string`	Yes	URL to extract data from.
`schema`	`object`	No*	JSON Schema defining the desired output structure.
`prompt`	`string`	No*	Natural language description of what to extract.

Warning

You must provide either schema or prompt. If both are provided, schema takes precedence.

LLM provider chain

The extract endpoint tries LLM providers in this order:

Ollama (local) -- free, no API key needed. Set OLLAMA_HOST if not running on localhost.

OpenAI -- requires OPENAI_API_KEY.

Anthropic -- requires ANTHROPIC_API_KEY.

Example

curl -- schema mode

curl -X POST https://api.webclaw.io/v1/extract \
  -H "Authorization: Bearer wc_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/product/widget",
    "schema": {
      "type": "object",
      "properties": {
        "name": { "type": "string" },
        "price": { "type": "number" },
        "in_stock": { "type": "boolean" }
      }
    }
  }'

curl -- prompt mode

curl -X POST https://api.webclaw.io/v1/extract \
  -H "Authorization: Bearer wc_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/team",
    "prompt": "Extract all team members with name, role, and LinkedIn URL"
  }'

Extract

Schema mode

Request body

Response

Prompt mode

Request body

Response

Parameters

LLM provider chain

Example

Ready to build? Start extracting.