Diffs clean markdown, not raw HTML, so layout and ad noise do not trigger false alerts
POST /v1/diff
Content change monitoring for terms, pricing, and policy pages
Snapshot any page, diff it on a schedule, and get alerted only when the content actually changes.
Track when a terms of service, pricing, policy, or docs page changes. webclaw snapshots the page as clean markdown, diffs it against the last version, and tells you exactly what changed.
Build it step by step.
The real flow, one step at a time. Switch between TypeScript, Python, and cURL on any snippet.
- 1
Scrape a baseline
Call /v1/scrape with formats markdown to capture the page as clean text and store it as your first snapshot.
const url = "https://example.com/terms";// Capture the page as clean markdownconst baseline = await webclaw.scrape({ url, formats: ["markdown"] });// Store it as your first snapshotawait saveSnapshot(url, baseline.markdown); - 2
Diff on a schedule
On each run, scrape the page again and pass the new version plus the stored snapshot to /v1/diff to get added and removed content.
// On each run, scrape the page againconst current = await webclaw.scrape({ url, formats: ["markdown"] });// Compare it against the last stored snapshotconst diff = await webclaw.diff({ url, previous: await loadLastSnapshot(url), current: current.markdown,}); - 3
Alert on real changes
When the diff reports changed, fire a webhook to Slack or Discord and persist the new version as the next baseline.
if (diff.changed) { // Fire a webhook to Slack or Discord await notify(url, diff.changes); // added / removed lines // Persist the new version as the next baseline await saveSnapshot(url, current.markdown);} - 4
Replay from history
Use the dashboard or cached replay to inspect any past snapshot and confirm exactly what a page said at a given time.
Built for content change monitoring.
Automatic bot-protection handling for pages other scrapers cannot reach
Snapshot and replay model: store a version, compare any two later
118ms on static pages makes watching hundreds of pages affordable
Dashboard history records every request, response, timing, and cost
Everything this use case needs.
- Markdown snapshots with boilerplate stripped
- Structured diff of added and removed content
- Scheduled re-checks with webhook alerts
- Cached replay for fast debugging
- Bot-protection handling on gated pages
Built for the messy parts.
Terms of service, privacy policies, pricing, SLAs, and supplier docs change silently and without notice. Watching them by hand does not scale past a handful of pages, and naive HTML scrapers flag every layout tweak, ad rotation, or session token as a change, so you drown in false positives and miss the edits that matter.
webclaw scrapes each page to clean markdown with navigation, ads, and boilerplate stripped, stores it as a snapshot, then uses /v1/diff to compare the current version against the previous one. You get a structured diff of the meaningful text only, so you can alert on a clause being added to a contract or a price tier changing, not on cosmetic noise.
Frequently asked questions
How is this different from price monitoring?
Price monitoring tracks numeric fields like price and stock on product pages. Content change monitoring watches the full text of a page, terms of service, privacy policies, pricing tables, SLAs, or docs, and tells you which sentences or clauses were added or removed.
How does webclaw avoid false positives from layout changes?
webclaw diffs clean markdown, not raw HTML. Navigation, ads, scripts, and boilerplate are stripped before the comparison, so a redesign or an ad rotation does not register as a change. Only the meaningful body text is compared.
Can I run checks on a schedule and get alerted?
Yes. Run /v1/diff on a cron or job runner against your stored snapshots, and fire a webhook when diff.changed is true. webclaw signs webhook payloads with HMAC and can format them for Slack or Discord.
Or hand it to your agent.
Add the webclaw MCP server to Claude, Cursor, or any MCP client, then paste this prompt. The agent calls the webclaw tools and hands the result back to your model — no code to write.
Using the webclaw tools, monitor [the page URL] for meaningful content changes (for example a terms-of-service, pricing, privacy policy, SLA, or docs page). First call the scrape tool on that URL to capture a clean markdown snapshot with navigation, ads, and boilerplate stripped out, and treat that as the current version. Then call the diff tool to compare it against [paste the previous snapshot here, or say "this is the first run" if you have none], so cosmetic noise like layout tweaks or session tokens is ignored and only real text edits surface. If nothing changed, just tell me "no changes." If something changed, return a short alert that names the page, lists the exact clauses or lines that were added and removed, and gives a one-sentence plain-English summary of what it means; then print the full new markdown snapshot so I can save it for the next comparison.
Ready to build? Start extracting.
Cancel anytime. Clean, structured data on every call.