Crawl
Crawl an entire site with BFS traversal. Crawls run asynchronously -- start one, then poll until it completes.
Start a crawl
POST
/v1/crawlStart an async same-origin crawl from the given URL.
Request body
| Field | Type | Required | Description |
|---|---|---|---|
url | string | Yes | Starting URL. The crawl stays within this origin. |
max_depth | number | No | Maximum link depth to follow. Default: 2. |
max_pages | number | No | Maximum number of pages to extract. Default: 50. |
use_sitemap | boolean | No | Seed the crawl queue with URLs from the site's sitemap. Default: false. |
Response
Note
The crawl ID is a UUID. Store it -- you need it to poll for results.
Poll crawl status
GET
/v1/crawl/{id}Get the current status and results of a running or completed crawl.
Path parameters
| Parameter | Type | Description |
|---|---|---|
id | string | The crawl job UUID returned by POST /v1/crawl. |
Response
Status values
| Status | Description |
|---|---|
running | Crawl is in progress. Poll again for updates. |
completed | Crawl finished. All results are available. |
failed | Crawl encountered a fatal error and stopped. |
Example
Tip
Enable
use_sitemap to seed the crawl queue with sitemap URLs. This helps discover pages that aren't reachable through link traversal alone.