Changelog

Every release. Every commit.

Webclaw ships in the open. Releases, features, fixes, and performance work — straight from the repository, as they land.

Releases

  1. v0.6.9LatestJun 10, 2026

    v0.6.9 — Maintenance & updates

    • release v0.6.9 (fix multi-arch Docker publish)
    Release notes
  2. v0.6.8Jun 10, 2026

    v0.6.8 — Fixes & improvements

    • add ColdProxy proxy-backed crawling walkthrough
    • harden LLM providers, UTF-8 handling, and webhook/batch reliability
    Release notes
  3. v0.6.7Jun 9, 2026

    v0.6.7 — Maintenance & updates

    • Bump wreq 6.0.0-rc.29 / wreq-util 3.0.0-rc.12 (fingerprint-neutral)
    Release notes
  4. v0.6.6Jun 9, 2026

    v0.6.6 — Maintenance & updates

    • Salvage progress line + --url-encoded from #49
    Release notes
  5. v0.6.5Jun 4, 2026

    v0.6.5 — New features

    • URL truncation warning + --url-encoded flag
    • periodic progress stderr line on slow fetches
    • pin wreq/wreq-util to exact rc versions
    • describe in-process wreq TLS, drop stale patched-deps
    • use Option::zip to satisfy clippy
    • v0.6.5
    • parse old.reddit.com HTML instead of the dead .json API
    • Add sponsor preview placements
    Release notes
  6. v0.6.4May 21, 2026

    v0.6.4 — New features

    • endpoints module for API surface extraction from HTML and JS
    Release notes
  7. v0.6.3May 19, 2026

    v0.6.3 — Fixes & improvements

    • harden core (WASM-safe gating, SSRF, path-traversal, recursion caps)
    Release notes
  8. v0.6.2May 18, 2026

    v0.6.2 — Fixes & improvements

    • add community plugins section
    • clean llm output noise
    Release notes
  9. v0.6.1May 12, 2026

    v0.6.1 — Fixes & improvements

    • prepare 0.6.1 hardening
    Release notes
  10. v0.6.0May 10, 2026

    v0.6.0 — Maintenance & updates

    • Improve --format llm output quality on news index pages
    Release notes
  11. v0.5.9May 10, 2026

    v0.5.9 — Fixes & improvements

    • replace shields.io badges with shieldcn badges
    • @jal-co made their first contribution in https://github.com/0xMassi/webclaw/pull/33
    Release notes
  12. v0.5.8May 4, 2026

    v0.5.8 — Fixes & improvements

    • improve brand extraction signals
    • validate self-host route URLs consistently
    • credit SSRF report
    Release notes
  13. v0.5.7May 4, 2026

    v0.5.7 — Fixes & improvements

    • harden fetch URL validation
    • note youtube.rs role and yt-dlp short-circuit in server
    • add h1 brand heading
    • Add GitHub Sponsors username to FUNDING.yml
    • add star history chart
    • add hosted API callout above Get Started
    • guard markdown pipe slice + detect trustpilot/reddit verify walls
    • send bot-identifying UA on reddit .json API to bypass browser UA block
    Release notes
  14. v0.5.6Apr 23, 2026

    v0.5.6 — New features

    • add fetch_smart with Reddit + Akamai rescue paths, bump 0.5.6
    Release notes
  15. v0.5.5Apr 23, 2026

    v0.5.5 — New features

    • expose safari-ios browser profile + bump to 0.5.5
    • drop sidecar references, mention ProductionFetcher
    Release notes
  16. v0.5.4Apr 23, 2026

    v0.5.4 — Docs & tooling

    • simplify 0.5.4 entry
    • Bump to 0.5.4: SafariIos profile + Chrome fingerprint alignment + locale helper
    Release notes
  17. v0.5.3Apr 22, 2026

    v0.5.3 — Fixes & improvements

    • vertical_scrape uses Firefox profile, not default Chrome
    Release notes
  18. v0.5.2Apr 22, 2026

    v0.5.2 — New features

    • vertical extractor support (28 extractors discoverable + callable)
    Release notes
  19. v0.5.1Apr 22, 2026

    v0.5.1 — New features

    • Fetcher trait so vertical extractors work under any HTTP backend
    • fix stale primp references, document wreq + Fetcher trait
    Release notes
  20. v0.5.0Apr 22, 2026

    v0.5.0 — Features & fixes

    • release v0.5.0 (28 vertical extractors + cloud integration)
    • perfect-score follow-ups (trustpilot 2025 schema, amazon/etsy fallbacks, cloud docs)
    • synthesize HTML from cloud response instead of requesting raw html
    • detect AWS WAF verifying-connection page, add OG fallback to ecommerce_product
    • wave 6b, etsy_listing + HTML fallbacks for substack/youtube
    • wave 6a, 5 easy verticals (27 total)
    • wave 5 \u2014 Amazon, eBay, Trustpilot via cloud fallback
    • consolidate CloudClient + smart_fetch into webclaw-fetch
    Release notes
  21. v0.4.0Apr 22, 2026

    v0.4.0 — Features & fixes

    • v0.4.0: self-hosted REST server, bench subcommand, mcp warning fix (#26, #29, #30)
    Release notes
  22. v0.3.19Apr 17, 2026

    v0.3.19 — Fixes & improvements

    • reproducible 3-way comparison vs trafilatura + firecrawl
    • entrypoint shim so child images with custom CMD work
    Release notes
  23. v0.3.18Apr 17, 2026

    v0.3.18 — Fixes & improvements

    • UTF-8 char boundary panic in find_content_position
    Release notes
  24. v0.3.17Apr 16, 2026

    v0.3.17 — Features & fixes

    • close --on-change command injection via sh -c (P0)
    • surface semaphore-closed as typed error instead of panic (P1)
    • DoS hardening + glob validation + cleanup (P2)
    • robots parser + firefox client cache + Acquire ordering (P3)
    • @0xMassi made their first contribution in https://github.com/0xMassi/webclaw/pull/20
    Release notes
  25. v0.3.13Apr 14, 2026

    v0.3.13 — Fixes & improvements

    • use ENTRYPOINT instead of CMD in Dockerfiles for proper arg passthrough
    Release notes
  26. v0.3.12Apr 14, 2026

    v0.3.12 — New features

    • add allow_subdomains and allow_external_links to CrawlConfig
    Release notes
  27. v0.3.11Apr 10, 2026

    v0.3.11 — Maintenance & updates

    Release notes
  28. v0.3.10Apr 10, 2026

    v0.3.10 — New features

    • add fallback sitemap paths for broader discovery
    • fix rustfmt for 2-element delay array
    • reduce fetch timeout to 12s and retries to 2
    Release notes
  29. v0.3.9Apr 4, 2026

    v0.3.9 — Fixes & improvements

    • layout tables rendered as sections instead of markdown tables
    • @devnen made their first contribution in https://github.com/0xMassi/webclaw/pull/14
    Release notes
  30. v0.3.8Apr 3, 2026

    v0.3.8 — Fixes & improvements

    • MCP research saves to file, returns compact response
    Release notes
  31. v0.3.7Apr 3, 2026

    v0.3.7 — New features

    • CLI --research flag + MCP cloud fallback + structured research output
    Release notes
  32. v0.3.6Apr 2, 2026

    v0.3.6 — Features & fixes

    • structured data in markdown/LLM output + v0.3.6
    • update all 4 Homebrew checksums after Docker build completes
    Release notes
  33. v0.3.5Apr 2, 2026

    v0.3.5 — New features

    • extract __NEXT_DATA__ into structured_data
    • update npm package license to AGPL-3.0
    • update README license references from MIT to AGPL-3.0
    Release notes
  34. v0.3.4Apr 1, 2026

    v0.3.4 — New features

    • SvelteKit data extraction + license change to AGPL-3.0
    Release notes
  35. v0.3.3Apr 1, 2026

    v0.3.3 — Features & fixes

    • fix aarch64 cross-compilation for BoringSSL (boring-sys2)
    • add SKILL.md to repo root for skills.sh discoverability
    • cargo fmt
    • remove reqwest_unstable rustflag (no longer needed)
    • update Dockerfile for BoringSSL build deps (cmake, clang)
    • replace custom TLS stack with wreq (BoringSSL), bump v0.3.3
    Release notes
  36. v0.3.2Mar 31, 2026

    v0.3.2 — New features

    • bump to v0.3.2, update changelog
    • add --cookie-file support for JSON cookie files
    Release notes
  37. v0.3.1Mar 30, 2026

    v0.3.1 — Features & fixes

    • collapse nested if per clippy
    • bump v0.3.1, update CHANGELOG, fix fmt
    • cookie warmup fallback for Akamai-protected pages
    • update webclaw-tls dependencies
    • fix fmt in client.rs test
    • adapt to webclaw-tls v0.1.1 HeaderMap API change
    • fix ambiguous reqwest version in dependency sync
    • replace stale primp check with webclaw-tls dependency sync
    Release notes
  38. v0.3.0Mar 29, 2026

    v0.3.0 — Features & fixes

    • clippy empty-line-after-doc-comment in browser.rs
    • cargo fmt
    • replace primp with webclaw-tls, bump to v0.3.0
    • add QEMU for arm64 apt-get in Docker build
    • single Docker job with plain docker build + manifest
    • align Cargo.toml version with v0.2.3 tag
    • fix Docker binary path extraction from release tarball
    • use pre-built binaries for Docker instead of QEMU cross-compilation
    Release notes
  39. v0.2.3Mar 27, 2026

    v0.2.3 — Maintenance & updates

    • build multi-platform Docker images (amd64 + arm64)
    Release notes
  40. v0.2.2Mar 27, 2026

    v0.2.2 — Fixes & improvements

    • v0.2.2 pre-release check
    • add weekly primp compatibility check
    • add reqwest to patch list, sync with primp 1.2.0
    • add SKILL.md for Claude Code skill integration
    Release notes
  41. v0.2.1Mar 27, 2026

    v0.2.1 — New features

    • v0.2.1 — Docker image on GHCR, QuickJS data island extraction
    • add Docker image build to release workflow
    • enable quickjs for JS data island extraction
    • upgrade README badges to for-the-badge style, add X/Twitter
    Release notes
  42. v0.2.0Mar 26, 2026

    v0.2.0 — Document extraction, HTML format, multi-URL watch

    • webclaw now auto-detects and extracts content from document files:
    • DOCX — Word documents parsed into markdown with headings preserved
    • XLSX/XLS — Spreadsheets converted to markdown tables (multi-sheet support)
    • CSV — Parsed with quoted field handling, output as markdown table
    • Auto-detected by Content-Type header or URL extension. Works in batch mode too:
    • Returns sanitized HTML. Works with crawl, batch, and --output-dir (.html extension).
    • Monitors all URLs in parallel. Reports aggregate changes per check.
    • Combines batch fetching with LLM extraction. Processes URLs sequentially to respect rate limits.
    Release notes
  43. v0.1.7Mar 26, 2026

    v0.1.7 — Fix extraction options in batch mode

    • --only-main-content, --include, and --exclude flags now work correctly in batch mode
    • Previously these options were silently ignored when using --urls-file or multiple URLs
    • Thanks @mixxr for reporting.
    Release notes
  44. v0.1.6Mar 26, 2026

    v0.1.6 — Watch mode + Webhook notifications

    • Monitor any URL for content changes with automatic diffing:
    • Outputs unified diffs to stdout. Status messages to stderr. Ctrl+C stops cleanly.
    • POST JSON payloads on crawl/batch complete and watch changes:
    • Auto-detects Discord and Slack URLs and formats payloads as embeds/blocks. Generic endpoints receive raw JSON.
    • Also available via WEBCLAW_WEBHOOK_URL env var.
    Release notes
  45. v0.1.5Mar 26, 2026

    v0.1.5 — --output-dir: save each page to a separate file

    • --output-dir: save each extracted page to its own file instead of printing to stdout
    • Works with single URL, crawl, and batch modes
    • Filenames derived from URL path: /docs/api → docs/api.md
    • Root URLs use hostname/index.md to avoid collisions
    • Subdirectories created automatically
    • CSV input with custom filenames: url,filename format in --urls-file
    Release notes
  46. v0.1.4Mar 26, 2026

    v0.1.4 — QuickJS: extract data from inline JavaScript

    • QuickJS integration: embeds a sandboxed JavaScript engine to execute inline <script> tags and extract data hidden in JS variable assignments
    • Captures window.__preloadedData (NYTimes), window.__PRELOADED_STATE__ (Wired/Conde Nast), self.__next_f (Next.js RSC), and any window.__* data blobs
    • Smart text filtering: rejects CSS, base64, file paths, code — only keeps readable prose
    • Feature-gated: enabled by default, disable with --no-default-features for WASM builds
    • | Site | Before | After | Gain |
    • |---|---|---|---|
    • | NYTimes | 1,552 words | 4,162 words | +168% |
    • | Wired | 1,459 words | 9,937 words | +580% |
    Release notes
  47. v0.1.3Mar 25, 2026

    v0.1.3 — Crawl streaming, resume/cancel, MCP proxy support

    • Crawl streaming: real-time progress on stderr as pages complete ([2/50] OK https://... (234ms, 1523 words))
    • Crawl resume/cancel: --crawl-state <path> saves visited URLs and pending frontier on Ctrl+C. Resume with the same flag to continue from where you left off
    • MCP proxy support: reads WEBCLAW_PROXY (single proxy) and WEBCLAW_PROXY_FILE (pool file) env vars. Also configurable in Claude Desktop MCP config via env block
    • Crawl results now expose visited set and remaining frontier for accurate state persistence
    Release notes
  48. v0.1.2Mar 25, 2026

    v0.1.2 — TLS fallback + Safari default

    • Default TLS profile: switched from Chrome145/Win to Safari26/Mac — highest pass rate across Cloudflare-protected sites
    • Plain client fallback: when impersonated TLS gets connection error or 403, automatically retries without impersonation. Fixes ycombinator.com, producthunt.com, and similar sites that reject forged TLS fingerprints
    • Reddit scraping: .json endpoint now uses plain HTTP client (TLS fingerprint was getting blocked)
    • YouTube transcript extraction infrastructure in webclaw-core (caption track parsing, timed text XML parser) — will be wired up when cloud API launches
    • Test results: 9/10 previously-failing sites now pass without proxy. StockX passes with proxy rotation.
    Release notes
  49. v0.1.1Mar 24, 2026

    v0.1.1 — MCP identity fix, timeouts, and input validation

    • MCP server identity: server now correctly identifies as webclaw-mcp instead of rmcp in the MCP handshake
    • Research tool timeout: polling loop capped at 200 iterations (~10 min) instead of running forever
    • CLI exit codes: commands return non-zero exit codes on errors (invalid format, fetch failures, missing LLM)
    • Text format: stripped remaining markdown table syntax from plain text output
    • URL validation: all MCP tools validate URLs before network calls with clear error messages
    • Cloud API timeout: 60s request timeout instead of waiting indefinitely
    • Local fetch timeout: 30s timeout to prevent hanging on slow/tarpitting servers
    • Diff cloud fallback: computes actual diff instead of returning raw scrape JSON
    Release notes
  50. v0.1.0Mar 24, 2026

    v0.1.0 — Initial Release

    • Web content extraction for LLMs. CLI + MCP server.
    • 10 MCP tools: scrape, crawl, map, batch, extract, summarize, diff, brand, search, research
    • TLS fingerprinting: bypasses anti-bot without a headless browser
    • 5 output formats: markdown, text, JSON, LLM-optimized, HTML
    • 67% fewer tokens than raw HTML in LLM format
    • Sub-millisecond extraction on static content
    • One command setup: npx create-webclaw
    • 6 Rust crates: webclaw-core (zero network deps, WASM-safe), webclaw-fetch, webclaw-llm, webclaw-pdf, webclaw-cli, webclaw-mcp.
    Release notes

Latest commits

The development feed, straight from main.

Merge pull request #57 from raffaelemancuso/patch-128cd53eAdd Windows binaries to READMEc133478reword residential product line; refresh NodeMaven banner3c72606update NodeMaven banner to new brandingcb78363Merge pull request #56 from 0xMassi/docs/nodemaven-partnerdf7336dadd NodeMaven studio partneracd3021Merge pull request #55 from 0xMassi/fix/docker-multiarch-single-buildbcc58dbbuild the Docker image in one multi-platform pass8015de7Merge pull request #54 from 0xMassi/fix/docker-multiarch-releasebe64409release v0.6.92773474release v0.6.87dfa180Merge pull request #52 from 0xMassi/audit-fixes-2026-06-09598f319Merge pull request #53 from 0xMassi/docs-coldproxyfae2766add ColdProxy proxy-backed crawling walkthroughd0909a2harden LLM providers, UTF-8 handling, and webhook/batch reliability4993450update banner to new webclaw brandingd0d7b83v0.6.76519ac2bump wreq 6.0.0-rc.29, wreq-util 3.0.0-rc.1214ded4bsync Cargo.lock to v0.6.672a451cv0.6.617fce81apply rustfmt to salvaged #49 commits84a0f97URL truncation warning + --url-encoded flag519dfb7periodic progress stderr line on slow fetches985a90bpin wreq/wreq-util to exact rc versionsa1abf62
View full history on GitHub

Watch it ship. Build with it today.

Cancel anytime. Open source, built in Rust.

View on GitHub