Vertical extractors
28 site-specific extractors that return typed JSON instead of generic markdown. Point them at a GitHub PR URL and get back {title, state, author, commits, reviews, ...}. Point them at an Amazon product URL and get back {title, price, rating, review_count, asin, ...}. No CSS selectors to maintain, no markdown to re-parse.
/v1/scrape/{vertical}Run a specific vertical extractor by name. Returns typed JSON with fields specific to the target site.
/v1/extractorsCatalog of every vertical extractor and the URL patterns each one claims.
Run an extractor
Pick the extractor name from the catalog below, put it in the path, and send the target URL in the JSON body.
Response shape
Thedatafield is extractor-specific. Consult GET /v1/extractors at runtime to discover what's available. New extractors land without requiring an SDK bump.
Catalog
28 extractors. URL shape shown is a template, not a match pattern.
| Name | Label | URL shape |
|---|---|---|
reddit | Reddit thread | https://www.reddit.com/r/*/comments/* |
hackernews | Hacker News story | https://news.ycombinator.com/item?id=N |
github_repo | GitHub repository | https://github.com/{owner}/{repo} |
github_pr | GitHub pull request | https://github.com/{owner}/{repo}/pull/{n} |
github_issue | GitHub issue | https://github.com/{owner}/{repo}/issues/{n} |
github_release | GitHub release | https://github.com/{owner}/{repo}/releases/tag/{tag} |
pypi | PyPI package | https://pypi.org/project/{name}/ |
npm | npm package | https://www.npmjs.com/package/{name} |
crates_io | crates.io package | https://crates.io/crates/{name} |
huggingface_model | HuggingFace model | https://huggingface.co/{owner}/{name} |
huggingface_dataset | HuggingFace dataset | https://huggingface.co/datasets/{owner}/{name} |
arxiv | ArXiv paper | https://arxiv.org/abs/{id} |
docker_hub | Docker Hub repository | https://hub.docker.com/_/{name} |
dev_to | dev.to article | https://dev.to/{user}/{slug} |
stackoverflow | Stack Overflow Q&A | https://stackoverflow.com/questions/{id}/{slug} |
substack_post | Substack post | https://{pub}.substack.com/p/{slug} |
youtube_video | YouTube video | https://www.youtube.com/watch?v={id} |
linkedin_post | LinkedIn post | https://www.linkedin.com/feed/update/urn:li:share:{id} |
instagram_post | Instagram post | https://www.instagram.com/p/{id}/ |
instagram_profile | Instagram profile | https://www.instagram.com/{user}/ |
shopify_product | Shopify product | https://{shop}/products/{handle} |
shopify_collection | Shopify collection | https://{shop}/collections/{handle} |
ecommerce_product | Ecommerce product (generic) | https://{store}/products/{slug} |
woocommerce_product | WooCommerce product | https://{shop}/product/{slug} |
amazon_product | Amazon product | https://www.amazon.com/dp/{asin} |
ebay_listing | eBay listing | https://www.ebay.com/itm/{id} |
etsy_listing | Etsy listing | https://www.etsy.com/listing/{id} |
trustpilot_reviews | Trustpilot reviews | https://www.trustpilot.com/review/{domain} |
Auto-dispatch from /v1/scrape
23 of 28 extractors have distinctive URL shapes, so a plain POST /v1/scrape automatically dispatches to the matching vertical and returns the typed payload. The remaining 5 (shopify_product, shopify_collection, ecommerce_product, woocommerce_product, substack_post) use patterns that non-target sites share, so they require the explicit /v1/scrape/{vertical} route.
Discover the catalog
Errors
| Status | Meaning |
|---|---|
400 | Unknown vertical name, or the URL does not match the extractor's claimed pattern. |
502 | Upstream fetch failed after antibot escalation. |
/v1/scrape. Failed calls are not billed.