Vertical extractors

28 site-specific extractors that return typed JSON instead of generic markdown. Point them at a GitHub PR URL and get back {title, state, author, commits, reviews, ...}. Point them at an Amazon product URL and get back {title, price, rating, review_count, asin, ...}. No CSS selectors to maintain, no markdown to re-parse.

POST/v1/scrape/{vertical}

Run a specific vertical extractor by name. Returns typed JSON with fields specific to the target site.

GET/v1/extractors

Catalog of every vertical extractor and the URL patterns each one claims.

Run an extractor

Pick the extractor name from the catalog below, put it in the path, and send the target URL in the JSON body.

curl

curl -X POST https://api.webclaw.io/v1/scrape/reddit \
  -H "Authorization: Bearer $WEBCLAW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.reddit.com/r/rust/comments/abc/your-favorite-crate/"
  }'

Response shape

json

{
  "vertical": "reddit",
  "url": "https://www.reddit.com/r/rust/comments/abc/your-favorite-crate/",
  "data": {
    "post": {
      "title": "Your favorite crate",
      "author": "example_user",
      "points": 342,
      "comment_count": 127,
      "created_utc": 1712345678,
      "body": "..."
    },
    "comments": [
      { "author": "...", "body": "...", "points": 52 }
    ]
  }
}

Thedatafield is extractor-specific. Consult GET /v1/extractors at runtime to discover what's available. New extractors land without requiring an SDK bump.

Catalog

28 extractors. URL shape shown is a template, not a match pattern.

Name	Label	URL shape
`reddit`	Reddit thread	`https://www.reddit.com/r//comments/`
`hackernews`	Hacker News story	`https://news.ycombinator.com/item?id=N`
`github_repo`	GitHub repository	`https://github.com/{owner}/{repo}`
`github_pr`	GitHub pull request	`https://github.com/{owner}/{repo}/pull/{n}`
`github_issue`	GitHub issue	`https://github.com/{owner}/{repo}/issues/{n}`
`github_release`	GitHub release	`https://github.com/{owner}/{repo}/releases/tag/{tag}`
`pypi`	PyPI package	`https://pypi.org/project/{name}/`
`npm`	npm package	`https://www.npmjs.com/package/{name}`
`crates_io`	crates.io package	`https://crates.io/crates/{name}`
`huggingface_model`	HuggingFace model	`https://huggingface.co/{owner}/{name}`
`huggingface_dataset`	HuggingFace dataset	`https://huggingface.co/datasets/{owner}/{name}`
`arxiv`	ArXiv paper	`https://arxiv.org/abs/{id}`
`docker_hub`	Docker Hub repository	`https://hub.docker.com/_/{name}`
`dev_to`	dev.to article	`https://dev.to/{user}/{slug}`
`stackoverflow`	Stack Overflow Q&A	`https://stackoverflow.com/questions/{id}/{slug}`
`substack_post`	Substack post	`https://{pub}.substack.com/p/{slug}`
`youtube_video`	YouTube video	`https://www.youtube.com/watch?v={id}`
`linkedin_post`	LinkedIn post	`https://www.linkedin.com/feed/update/urn:li:share:{id}`
`instagram_post`	Instagram post	`https://www.instagram.com/p/{id}/`
`instagram_profile`	Instagram profile	`https://www.instagram.com/{user}/`
`shopify_product`	Shopify product	`https://{shop}/products/{handle}`
`shopify_collection`	Shopify collection	`https://{shop}/collections/{handle}`
`ecommerce_product`	Ecommerce product (generic)	`https://{store}/products/{slug}`
`woocommerce_product`	WooCommerce product	`https://{shop}/product/{slug}`
`amazon_product`	Amazon product	`https://www.amazon.com/dp/{asin}`
`ebay_listing`	eBay listing	`https://www.ebay.com/itm/{id}`
`etsy_listing`	Etsy listing	`https://www.etsy.com/listing/{id}`
`trustpilot_reviews`	Trustpilot reviews	`https://www.trustpilot.com/review/{domain}`

Auto-dispatch from /v1/scrape

23 of 28 extractors have distinctive URL shapes, so a plain POST /v1/scrape automatically dispatches to the matching vertical and returns the typed payload. The remaining 5 (shopify_product, shopify_collection, ecommerce_product, woocommerce_product, substack_post) use patterns that non-target sites share, so they require the explicit /v1/scrape/{vertical} route.

Discover the catalog

curl

curl https://api.webclaw.io/v1/extractors \
  -H "Authorization: Bearer $WEBCLAW_API_KEY"

json

{
  "extractors": [
    {
      "name": "reddit",
      "label": "Reddit thread",
      "description": "...",
      "url_patterns": ["https://www.reddit.com/r/*/comments/*"]
    },
    ...
  ]
}

Errors

Status	Meaning
`400`	Unknown vertical name, or the URL does not match the extractor's claimed pattern.
`502`	Upstream fetch failed after antibot escalation.

Tip

Bills 1 credit per successful call, same as /v1/scrape. Failed calls are not billed.