Crawl
Crawl an entire site with BFS traversal. Crawls run asynchronously -- start one, then poll until it completes.
Start a crawl
POST
/v1/crawlStart an async same-origin crawl from the given URL.
Request body
json
{
"url": "https://docs.example.com",
"max_depth": 2,
"max_pages": 50,
"use_sitemap": true
}| Field | Type | Required | Description |
|---|---|---|---|
url | string | Yes | Starting URL. The crawl stays within this origin. |
max_depth | number | No | Maximum link depth to follow. Default: 2. |
max_pages | number | No | Maximum number of pages to extract. Default: 50. |
use_sitemap | boolean | No | Seed the crawl queue with URLs from the site's sitemap. Default: false. |
Response
json
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "running"
}Note
The crawl ID is a UUID. Store it -- you need it to poll for results.
Poll crawl status
GET
/v1/crawl/{id}Get the current status and results of a running or completed crawl.
Path parameters
| Parameter | Type | Description |
|---|---|---|
id | string | The crawl job UUID returned by POST /v1/crawl. |
Response
json
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"pages": [
{
"url": "https://docs.example.com",
"markdown": "# Getting Started\n\nWelcome to the documentation...",
"metadata": {
"title": "Getting Started",
"word_count": 842
}
},
{
"url": "https://docs.example.com/guides/setup",
"markdown": "# Setup Guide\n\nFollow these steps...",
"metadata": {
"title": "Setup Guide",
"word_count": 1203
}
}
],
"total": 50,
"completed": 48,
"errors": 2,
"created_at": "2026-03-12T10:30:00Z"
}Status values
| Status | Description |
|---|---|
running | Crawl is in progress. Poll again for updates. |
completed | Crawl finished. All results are available. |
failed | Crawl encountered a fatal error and stopped. |
Paginate crawl pages
GET
/v1/crawl/{id}/pagesFetch crawl page results in server-side pages for large crawls.
Use this endpoint when a crawl returns many pages and you want to load them gradually in a dashboard, worker, or agent workflow.
Query parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
page | number | 1 | Page number to fetch. |
per_page | number | 10 | Number of crawl pages to return per response. |
Response
json
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"pages": [
{
"url": "https://docs.example.com/guides/setup",
"markdown": "# Setup Guide\n\nFollow these steps...",
"metadata": {
"title": "Setup Guide"
}
}
],
"page": 1,
"per_page": 10,
"total": 50,
"total_pages": 5
}Example
Start a crawl
curl -X POST https://api.webclaw.io/v1/crawl \
-H "Authorization: Bearer wc_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"url": "https://docs.stripe.com",
"max_depth": 2,
"max_pages": 100,
"use_sitemap": true
}'Poll for results
curl https://api.webclaw.io/v1/crawl/550e8400-e29b-41d4-a716-446655440000 \
-H "Authorization: Bearer wc_your_api_key"Load page results
curl "https://api.webclaw.io/v1/crawl/550e8400-e29b-41d4-a716-446655440000/pages?page=1&per_page=10" \
-H "Authorization: Bearer wc_your_api_key"Tip
Enable
use_sitemap to seed the crawl queue with sitemap URLs. This helps discover pages that aren't reachable through link traversal alone.