webclaw

Vertical extractors

28 site-specific extractors that return typed JSON instead of generic markdown. Point them at a GitHub PR URL and get back {title, state, author, commits, reviews, ...}. Point them at an Amazon product URL and get back {title, price, rating, review_count, asin, ...}. No CSS selectors to maintain, no markdown to re-parse.

POST/v1/scrape/{vertical}

Run a specific vertical extractor by name. Returns typed JSON with fields specific to the target site.

GET/v1/extractors

Catalog of every vertical extractor and the URL patterns each one claims.

Run an extractor

Pick the extractor name from the catalog below, put it in the path, and send the target URL in the JSON body.

curl
curl -X POST https://api.webclaw.io/v1/scrape/reddit \
  -H "Authorization: Bearer $WEBCLAW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.reddit.com/r/rust/comments/abc/your-favorite-crate/"
  }'

Response shape

json
{
  "vertical": "reddit",
  "url": "https://www.reddit.com/r/rust/comments/abc/your-favorite-crate/",
  "data": {
    "post": {
      "title": "Your favorite crate",
      "author": "example_user",
      "points": 342,
      "comment_count": 127,
      "created_utc": 1712345678,
      "body": "..."
    },
    "comments": [
      { "author": "...", "body": "...", "points": 52 }
    ]
  }
}

Thedatafield is extractor-specific. Consult GET /v1/extractors at runtime to discover what's available. New extractors land without requiring an SDK bump.

Catalog

28 extractors. URL shape shown is a template, not a match pattern.

NameLabelURL shape
redditReddit threadhttps://www.reddit.com/r/*/comments/*
hackernewsHacker News storyhttps://news.ycombinator.com/item?id=N
github_repoGitHub repositoryhttps://github.com/{owner}/{repo}
github_prGitHub pull requesthttps://github.com/{owner}/{repo}/pull/{n}
github_issueGitHub issuehttps://github.com/{owner}/{repo}/issues/{n}
github_releaseGitHub releasehttps://github.com/{owner}/{repo}/releases/tag/{tag}
pypiPyPI packagehttps://pypi.org/project/{name}/
npmnpm packagehttps://www.npmjs.com/package/{name}
crates_iocrates.io packagehttps://crates.io/crates/{name}
huggingface_modelHuggingFace modelhttps://huggingface.co/{owner}/{name}
huggingface_datasetHuggingFace datasethttps://huggingface.co/datasets/{owner}/{name}
arxivArXiv paperhttps://arxiv.org/abs/{id}
docker_hubDocker Hub repositoryhttps://hub.docker.com/_/{name}
dev_todev.to articlehttps://dev.to/{user}/{slug}
stackoverflowStack Overflow Q&Ahttps://stackoverflow.com/questions/{id}/{slug}
substack_postSubstack posthttps://{pub}.substack.com/p/{slug}
youtube_videoYouTube videohttps://www.youtube.com/watch?v={id}
linkedin_postLinkedIn posthttps://www.linkedin.com/feed/update/urn:li:share:{id}
instagram_postInstagram posthttps://www.instagram.com/p/{id}/
instagram_profileInstagram profilehttps://www.instagram.com/{user}/
shopify_productShopify producthttps://{shop}/products/{handle}
shopify_collectionShopify collectionhttps://{shop}/collections/{handle}
ecommerce_productEcommerce product (generic)https://{store}/products/{slug}
woocommerce_productWooCommerce producthttps://{shop}/product/{slug}
amazon_productAmazon producthttps://www.amazon.com/dp/{asin}
ebay_listingeBay listinghttps://www.ebay.com/itm/{id}
etsy_listingEtsy listinghttps://www.etsy.com/listing/{id}
trustpilot_reviewsTrustpilot reviewshttps://www.trustpilot.com/review/{domain}

Auto-dispatch from /v1/scrape

23 of 28 extractors have distinctive URL shapes, so a plain POST /v1/scrape automatically dispatches to the matching vertical and returns the typed payload. The remaining 5 (shopify_product, shopify_collection, ecommerce_product, woocommerce_product, substack_post) use patterns that non-target sites share, so they require the explicit /v1/scrape/{vertical} route.

Discover the catalog

curl
curl https://api.webclaw.io/v1/extractors \
  -H "Authorization: Bearer $WEBCLAW_API_KEY"
json
{
  "extractors": [
    {
      "name": "reddit",
      "label": "Reddit thread",
      "description": "...",
      "url_patterns": ["https://www.reddit.com/r/*/comments/*"]
    },
    ...
  ]
}

Errors

StatusMeaning
400Unknown vertical name, or the URL does not match the extractor's claimed pattern.
502Upstream fetch failed after antibot escalation.
Tip
Bills 1 credit per successful call, same as /v1/scrape. Failed calls are not billed.