webclaw

Map

Discover all URLs on a site by parsing robots.txt and sitemap.xml. Recursively resolves sitemap indexes to find every listed page.

POST/v1/map

Discover all URLs on a site via sitemap parsing.

Request body

json
{
  "url": "https://docs.example.com"
}
FieldTypeRequiredDescription
urlstringYesBase URL of the site to map.

Response

json
{
  "urls": [
    "https://docs.example.com",
    "https://docs.example.com/getting-started",
    "https://docs.example.com/api/reference",
    "https://docs.example.com/guides/authentication",
    "https://docs.example.com/guides/deployment"
  ],
  "count": 156
}
FieldTypeDescription
urlsstring[]All discovered URLs from sitemap parsing.
countnumberTotal number of URLs found.
Note
The map endpoint checks robots.txt for sitemap references first, then falls back to /sitemap.xml. Sitemap indexes are resolved recursively, so a single request can discover thousands of URLs.

Example

curl
curl -X POST https://api.webclaw.io/v1/map \
  -H "Authorization: Bearer wc_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://docs.stripe.com"}'
Tip
Use /v1/map to build a URL list, then feed it to /v1/batch for bulk extraction. This is faster than crawling when the site has a comprehensive sitemap.