webclaw

Batch

Extract content from multiple URLs in a single request. Requests are processed concurrently on the server.

POST/v1/batch

Extract content from multiple URLs concurrently.

Request body

json
{
  "urls": [
    "https://example.com/page-1",
    "https://example.com/page-2",
    "https://example.com/page-3"
  ],
  "formats": ["markdown", "llm"],
  "concurrency": 5
}
FieldTypeRequiredDescription
urlsstring[]YesArray of URLs to extract.
formatsstring[]NoOutput formats. Options: markdown, llm, text, json. Defaults to ["markdown"].
concurrencynumberNoMax concurrent requests. Default: 5.

Response

json
{
  "results": [
    {
      "url": "https://example.com/page-1",
      "markdown": "# Page One\n\nContent of the first page...",
      "metadata": {
        "title": "Page One",
        "word_count": 654
      },
      "error": null
    },
    {
      "url": "https://example.com/page-2",
      "markdown": "# Page Two\n\nContent of the second page...",
      "metadata": {
        "title": "Page Two",
        "word_count": 1102
      },
      "error": null
    },
    {
      "url": "https://example.com/page-3",
      "markdown": null,
      "metadata": null,
      "error": "Failed to fetch: 404 Not Found"
    }
  ]
}
Note
Individual URL failures do not fail the entire batch. Check the error field on each result to detect per-URL failures.

Example

curl
curl -X POST https://api.webclaw.io/v1/batch \
  -H "Authorization: Bearer wc_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://openai.com/blog/gpt-4",
      "https://anthropic.com/research/claude-3",
      "https://deepmind.google/technologies/gemini"
    ],
    "formats": ["markdown", "llm"],
    "concurrency": 3
  }'
Tip
For large URL lists, keep concurrency reasonable (5-10) to avoid overwhelming target servers. The server-side default of 5 is a good starting point.