POST /v1/scrape

YouTube Transcript and Metadata Extraction API

Pass a YouTube watch URL to scrape, get the full transcript plus metadata back.

Send any YouTube watch or shorts URL to /v1/scrape and get the full transcript plus title, channel, duration, and view count as structured JSON. Built for summarization, RAG, and content analysis.

Try it live

View API docs

How it works

Build it step by step.

The real flow, one step at a time. Switch between TypeScript, Python, and cURL on any snippet.

1
Send the watch URL
POST a YouTube watch, shorts, or youtu.be URL to /v1/scrape with your API key, no special parameters required.
```
const result = await webclaw.scrape({  url: "https://www.youtube.com/watch?v=dQw4w9WgXcQ",});
```
2
webclaw routes it
The URL is recognized as YouTube and routed to transcript extraction instead of standard HTML scraping.

Read transcript and metadata

The response carries a top-level transcript string plus a youtube block with title, channel, duration, and view count.

const { youtube, transcript } = result;console.log(youtube.title);     // video titleconsole.log(youtube.channel);   // channel nameconsole.log(youtube.viewCount); // view countconsole.log(transcript);        // full caption text

Feed it downstream

Pass the transcript to summarize, embed it for RAG, or store the metadata, replaying the cached result for free while you iterate.

// Summarize the transcript with one more callconst summary = await webclaw.summarize({  url: "https://www.youtube.com/watch?v=dQw4w9WgXcQ",  focus: "key takeaways",  length: "short",});console.log(summary);

Try it live

Why webclaw

Built for youtube transcripts.

Watch, shorts, and youtu.be URLs auto-route to transcript extraction, no flag needed

One call returns transcript plus title, channel, duration, and view count

Transcript comes back as a clean string, ready to summarize or embed

Same response shape on the TypeScript, Python, and Go SDKs

Cached replay and dashboard history with request, response, timing, and cost

What you get

Everything this use case needs.

Transcript extraction from caption tracks
Watch, shorts, and youtu.be URL support
Structured youtube metadata block
Title, channel, duration, and view count
Cached replay and run history

Where it fits

Built for the messy parts.

Getting a clean transcript out of YouTube is harder than it looks. The captions live behind a player API that throttles and blocks at scale, the on-page text is buried in player state, and stitching transcript text together with the video's title, channel, and stats means juggling two or three separate scraping jobs per video.

webclaw detects YouTube watch, shorts, and youtu.be URLs automatically. Pass one to /v1/scrape and you get the full caption transcript as a single string, alongside a structured youtube block with video id, title, channel, upload date, duration, and view count. One request, one response shape, ready to summarize, embed, or analyze.

Common questions

Frequently asked questions

How do I get a YouTube transcript with webclaw?

Pass a YouTube watch or shorts URL to /v1/scrape. webclaw detects it automatically and returns the full caption transcript as a top-level transcript string, plus a youtube block with the video's metadata. No extra flags or parsing on your side.

What metadata comes back with the transcript?

The youtube block includes the video id, title, channel name, channel URL, upload date, duration, and view count. The transcript itself is returned as a separate string so you can feed it straight into an LLM.

What if a video has no captions?

If a video has no available caption track, the transcript field comes back empty but the youtube metadata block (title, channel, stats) is still returned, so the call stays useful for indexing and discovery.

For AI agents

Or hand it to your agent.

Add the webclaw MCP server to Claude, Cursor, or any MCP client, then paste this prompt. The agent calls the webclaw tools and hands the result back to your model — no code to write.

PROMPT FOR YOUR AGENT

Using the webclaw tools, call scrape on [the YouTube video URL] to pull its full transcript and metadata. The scrape tool detects watch, shorts, and youtu.be links automatically and returns the caption text as one clean string alongside the video's title, channel, upload date, duration, and view count. If you want a tighter writeup instead of the raw transcript, follow up with the summarize tool on the same URL. Return a short header block with the title, channel, duration, and view count, then the full transcript text below it (or a 5-bullet summary of the key points if I ask for one). If the video has no captions, tell me the transcript is empty but still report the metadata you got back.

Set up the MCP server