Changelog
Every release. Every commit.
Webclaw ships in the open. Releases, features, fixes, and performance work — straight from the repository, as they land.
Releases
- v0.6.9LatestJun 10, 2026
v0.6.9 — Maintenance & updates
- release v0.6.9 (fix multi-arch Docker publish)
- v0.6.8Jun 10, 2026
v0.6.8 — Fixes & improvements
- add ColdProxy proxy-backed crawling walkthrough
- harden LLM providers, UTF-8 handling, and webhook/batch reliability
- v0.6.7Jun 9, 2026
v0.6.7 — Maintenance & updates
- Bump wreq 6.0.0-rc.29 / wreq-util 3.0.0-rc.12 (fingerprint-neutral)
- v0.6.6Jun 9, 2026
v0.6.6 — Maintenance & updates
- Salvage progress line + --url-encoded from #49
- v0.6.5Jun 4, 2026
v0.6.5 — New features
- URL truncation warning + --url-encoded flag
- periodic progress stderr line on slow fetches
- pin wreq/wreq-util to exact rc versions
- describe in-process wreq TLS, drop stale patched-deps
- use Option::zip to satisfy clippy
- v0.6.5
- parse old.reddit.com HTML instead of the dead .json API
- Add sponsor preview placements
- v0.6.4May 21, 2026
v0.6.4 — New features
- endpoints module for API surface extraction from HTML and JS
- v0.6.3May 19, 2026
v0.6.3 — Fixes & improvements
- harden core (WASM-safe gating, SSRF, path-traversal, recursion caps)
- v0.6.2May 18, 2026
v0.6.2 — Fixes & improvements
- add community plugins section
- clean llm output noise
- v0.6.1May 12, 2026
v0.6.1 — Fixes & improvements
- prepare 0.6.1 hardening
- v0.6.0May 10, 2026
v0.6.0 — Maintenance & updates
- Improve --format llm output quality on news index pages
- v0.5.9May 10, 2026
v0.5.9 — Fixes & improvements
- replace shields.io badges with shieldcn badges
- @jal-co made their first contribution in https://github.com/0xMassi/webclaw/pull/33
- v0.5.8May 4, 2026
v0.5.8 — Fixes & improvements
- improve brand extraction signals
- validate self-host route URLs consistently
- credit SSRF report
- v0.5.7May 4, 2026
v0.5.7 — Fixes & improvements
- harden fetch URL validation
- note youtube.rs role and yt-dlp short-circuit in server
- add h1 brand heading
- Add GitHub Sponsors username to FUNDING.yml
- add star history chart
- add hosted API callout above Get Started
- guard markdown pipe slice + detect trustpilot/reddit verify walls
- send bot-identifying UA on reddit .json API to bypass browser UA block
- v0.5.6Apr 23, 2026
v0.5.6 — New features
- add fetch_smart with Reddit + Akamai rescue paths, bump 0.5.6
- v0.5.5Apr 23, 2026
v0.5.5 — New features
- expose safari-ios browser profile + bump to 0.5.5
- drop sidecar references, mention ProductionFetcher
- v0.5.4Apr 23, 2026
v0.5.4 — Docs & tooling
- simplify 0.5.4 entry
- Bump to 0.5.4: SafariIos profile + Chrome fingerprint alignment + locale helper
- v0.5.3Apr 22, 2026
v0.5.3 — Fixes & improvements
- vertical_scrape uses Firefox profile, not default Chrome
- v0.5.2Apr 22, 2026
v0.5.2 — New features
- vertical extractor support (28 extractors discoverable + callable)
- v0.5.1Apr 22, 2026
v0.5.1 — New features
- Fetcher trait so vertical extractors work under any HTTP backend
- fix stale primp references, document wreq + Fetcher trait
- v0.5.0Apr 22, 2026
v0.5.0 — Features & fixes
- release v0.5.0 (28 vertical extractors + cloud integration)
- perfect-score follow-ups (trustpilot 2025 schema, amazon/etsy fallbacks, cloud docs)
- synthesize HTML from cloud response instead of requesting raw html
- detect AWS WAF verifying-connection page, add OG fallback to ecommerce_product
- wave 6b, etsy_listing + HTML fallbacks for substack/youtube
- wave 6a, 5 easy verticals (27 total)
- wave 5 \u2014 Amazon, eBay, Trustpilot via cloud fallback
- consolidate CloudClient + smart_fetch into webclaw-fetch
- v0.4.0Apr 22, 2026
v0.4.0 — Features & fixes
- v0.4.0: self-hosted REST server, bench subcommand, mcp warning fix (#26, #29, #30)
- v0.3.19Apr 17, 2026
v0.3.19 — Fixes & improvements
- reproducible 3-way comparison vs trafilatura + firecrawl
- entrypoint shim so child images with custom CMD work
- v0.3.18Apr 17, 2026
v0.3.18 — Fixes & improvements
- UTF-8 char boundary panic in find_content_position
- v0.3.17Apr 16, 2026
v0.3.17 — Features & fixes
- close --on-change command injection via sh -c (P0)
- surface semaphore-closed as typed error instead of panic (P1)
- DoS hardening + glob validation + cleanup (P2)
- robots parser + firefox client cache + Acquire ordering (P3)
- @0xMassi made their first contribution in https://github.com/0xMassi/webclaw/pull/20
- v0.3.13Apr 14, 2026
v0.3.13 — Fixes & improvements
- use ENTRYPOINT instead of CMD in Dockerfiles for proper arg passthrough
- v0.3.12Apr 14, 2026
v0.3.12 — New features
- add allow_subdomains and allow_external_links to CrawlConfig
- v0.3.11Apr 10, 2026
v0.3.11 — Maintenance & updates
Release notes - v0.3.10Apr 10, 2026
v0.3.10 — New features
- add fallback sitemap paths for broader discovery
- fix rustfmt for 2-element delay array
- reduce fetch timeout to 12s and retries to 2
- v0.3.9Apr 4, 2026
v0.3.9 — Fixes & improvements
- layout tables rendered as sections instead of markdown tables
- @devnen made their first contribution in https://github.com/0xMassi/webclaw/pull/14
- v0.3.8Apr 3, 2026
v0.3.8 — Fixes & improvements
- MCP research saves to file, returns compact response
- v0.3.7Apr 3, 2026
v0.3.7 — New features
- CLI --research flag + MCP cloud fallback + structured research output
- v0.3.6Apr 2, 2026
v0.3.6 — Features & fixes
- structured data in markdown/LLM output + v0.3.6
- update all 4 Homebrew checksums after Docker build completes
- v0.3.5Apr 2, 2026
v0.3.5 — New features
- extract __NEXT_DATA__ into structured_data
- update npm package license to AGPL-3.0
- update README license references from MIT to AGPL-3.0
- v0.3.4Apr 1, 2026
v0.3.4 — New features
- SvelteKit data extraction + license change to AGPL-3.0
- v0.3.3Apr 1, 2026
v0.3.3 — Features & fixes
- fix aarch64 cross-compilation for BoringSSL (boring-sys2)
- add SKILL.md to repo root for skills.sh discoverability
- cargo fmt
- remove reqwest_unstable rustflag (no longer needed)
- update Dockerfile for BoringSSL build deps (cmake, clang)
- replace custom TLS stack with wreq (BoringSSL), bump v0.3.3
- v0.3.2Mar 31, 2026
v0.3.2 — New features
- bump to v0.3.2, update changelog
- add --cookie-file support for JSON cookie files
- v0.3.1Mar 30, 2026
v0.3.1 — Features & fixes
- collapse nested if per clippy
- bump v0.3.1, update CHANGELOG, fix fmt
- cookie warmup fallback for Akamai-protected pages
- update webclaw-tls dependencies
- fix fmt in client.rs test
- adapt to webclaw-tls v0.1.1 HeaderMap API change
- fix ambiguous reqwest version in dependency sync
- replace stale primp check with webclaw-tls dependency sync
- v0.3.0Mar 29, 2026
v0.3.0 — Features & fixes
- clippy empty-line-after-doc-comment in browser.rs
- cargo fmt
- replace primp with webclaw-tls, bump to v0.3.0
- add QEMU for arm64 apt-get in Docker build
- single Docker job with plain docker build + manifest
- align Cargo.toml version with v0.2.3 tag
- fix Docker binary path extraction from release tarball
- use pre-built binaries for Docker instead of QEMU cross-compilation
- v0.2.3Mar 27, 2026
v0.2.3 — Maintenance & updates
- build multi-platform Docker images (amd64 + arm64)
- v0.2.2Mar 27, 2026
v0.2.2 — Fixes & improvements
- v0.2.2 pre-release check
- add weekly primp compatibility check
- add reqwest to patch list, sync with primp 1.2.0
- add SKILL.md for Claude Code skill integration
- v0.2.1Mar 27, 2026
v0.2.1 — New features
- v0.2.1 — Docker image on GHCR, QuickJS data island extraction
- add Docker image build to release workflow
- enable quickjs for JS data island extraction
- upgrade README badges to for-the-badge style, add X/Twitter
- v0.2.0Mar 26, 2026
v0.2.0 — Document extraction, HTML format, multi-URL watch
- webclaw now auto-detects and extracts content from document files:
- DOCX — Word documents parsed into markdown with headings preserved
- XLSX/XLS — Spreadsheets converted to markdown tables (multi-sheet support)
- CSV — Parsed with quoted field handling, output as markdown table
- Auto-detected by Content-Type header or URL extension. Works in batch mode too:
- Returns sanitized HTML. Works with crawl, batch, and --output-dir (.html extension).
- Monitors all URLs in parallel. Reports aggregate changes per check.
- Combines batch fetching with LLM extraction. Processes URLs sequentially to respect rate limits.
- v0.1.7Mar 26, 2026
v0.1.7 — Fix extraction options in batch mode
- --only-main-content, --include, and --exclude flags now work correctly in batch mode
- Previously these options were silently ignored when using --urls-file or multiple URLs
- Thanks @mixxr for reporting.
- v0.1.6Mar 26, 2026
v0.1.6 — Watch mode + Webhook notifications
- Monitor any URL for content changes with automatic diffing:
- Outputs unified diffs to stdout. Status messages to stderr. Ctrl+C stops cleanly.
- POST JSON payloads on crawl/batch complete and watch changes:
- Auto-detects Discord and Slack URLs and formats payloads as embeds/blocks. Generic endpoints receive raw JSON.
- Also available via WEBCLAW_WEBHOOK_URL env var.
- v0.1.5Mar 26, 2026
v0.1.5 — --output-dir: save each page to a separate file
- --output-dir: save each extracted page to its own file instead of printing to stdout
- Works with single URL, crawl, and batch modes
- Filenames derived from URL path: /docs/api → docs/api.md
- Root URLs use hostname/index.md to avoid collisions
- Subdirectories created automatically
- CSV input with custom filenames: url,filename format in --urls-file
- v0.1.4Mar 26, 2026
v0.1.4 — QuickJS: extract data from inline JavaScript
- QuickJS integration: embeds a sandboxed JavaScript engine to execute inline <script> tags and extract data hidden in JS variable assignments
- Captures window.__preloadedData (NYTimes), window.__PRELOADED_STATE__ (Wired/Conde Nast), self.__next_f (Next.js RSC), and any window.__* data blobs
- Smart text filtering: rejects CSS, base64, file paths, code — only keeps readable prose
- Feature-gated: enabled by default, disable with --no-default-features for WASM builds
- | Site | Before | After | Gain |
- |---|---|---|---|
- | NYTimes | 1,552 words | 4,162 words | +168% |
- | Wired | 1,459 words | 9,937 words | +580% |
- v0.1.3Mar 25, 2026
v0.1.3 — Crawl streaming, resume/cancel, MCP proxy support
- Crawl streaming: real-time progress on stderr as pages complete ([2/50] OK https://... (234ms, 1523 words))
- Crawl resume/cancel: --crawl-state <path> saves visited URLs and pending frontier on Ctrl+C. Resume with the same flag to continue from where you left off
- MCP proxy support: reads WEBCLAW_PROXY (single proxy) and WEBCLAW_PROXY_FILE (pool file) env vars. Also configurable in Claude Desktop MCP config via env block
- Crawl results now expose visited set and remaining frontier for accurate state persistence
- v0.1.2Mar 25, 2026
v0.1.2 — TLS fallback + Safari default
- Default TLS profile: switched from Chrome145/Win to Safari26/Mac — highest pass rate across Cloudflare-protected sites
- Plain client fallback: when impersonated TLS gets connection error or 403, automatically retries without impersonation. Fixes ycombinator.com, producthunt.com, and similar sites that reject forged TLS fingerprints
- Reddit scraping: .json endpoint now uses plain HTTP client (TLS fingerprint was getting blocked)
- YouTube transcript extraction infrastructure in webclaw-core (caption track parsing, timed text XML parser) — will be wired up when cloud API launches
- Test results: 9/10 previously-failing sites now pass without proxy. StockX passes with proxy rotation.
- v0.1.1Mar 24, 2026
v0.1.1 — MCP identity fix, timeouts, and input validation
- MCP server identity: server now correctly identifies as webclaw-mcp instead of rmcp in the MCP handshake
- Research tool timeout: polling loop capped at 200 iterations (~10 min) instead of running forever
- CLI exit codes: commands return non-zero exit codes on errors (invalid format, fetch failures, missing LLM)
- Text format: stripped remaining markdown table syntax from plain text output
- URL validation: all MCP tools validate URLs before network calls with clear error messages
- Cloud API timeout: 60s request timeout instead of waiting indefinitely
- Local fetch timeout: 30s timeout to prevent hanging on slow/tarpitting servers
- Diff cloud fallback: computes actual diff instead of returning raw scrape JSON
- v0.1.0Mar 24, 2026
v0.1.0 — Initial Release
- Web content extraction for LLMs. CLI + MCP server.
- 10 MCP tools: scrape, crawl, map, batch, extract, summarize, diff, brand, search, research
- TLS fingerprinting: bypasses anti-bot without a headless browser
- 5 output formats: markdown, text, JSON, LLM-optimized, HTML
- 67% fewer tokens than raw HTML in LLM format
- Sub-millisecond extraction on static content
- One command setup: npx create-webclaw
- 6 Rust crates: webclaw-core (zero network deps, WASM-safe), webclaw-fetch, webclaw-llm, webclaw-pdf, webclaw-cli, webclaw-mcp.
Latest commits
The development feed, straight from main.
commitMerge pull request #57 from raffaelemancuso/patch-1Jun 12, 202628cd53ecommitAdd Windows binaries to READMEJun 12, 2026c133478docsreword residential product line; refresh NodeMaven bannerJun 11, 20263c72606choreupdate NodeMaven banner to new brandingJun 11, 2026cb78363commitMerge pull request #56 from 0xMassi/docs/nodemaven-partnerJun 10, 2026df7336ddocsadd NodeMaven studio partnerJun 10, 2026acd3021commitMerge pull request #55 from 0xMassi/fix/docker-multiarch-single-buildJun 10, 2026bcc58dbcibuild the Docker image in one multi-platform passJun 10, 20268015de7commitMerge pull request #54 from 0xMassi/fix/docker-multiarch-releaseJun 10, 2026be64409chorerelease v0.6.9Jun 10, 20262773474chorerelease v0.6.8Jun 10, 20267dfa180commitMerge pull request #52 from 0xMassi/audit-fixes-2026-06-09Jun 10, 2026598f319commitMerge pull request #53 from 0xMassi/docs-coldproxyJun 10, 2026fae2766docsadd ColdProxy proxy-backed crawling walkthroughJun 10, 2026d0909a2fixharden LLM providers, UTF-8 handling, and webhook/batch reliabilityJun 9, 20264993450docsupdate banner to new webclaw brandingJun 9, 2026d0d7b83chorev0.6.7Jun 9, 20266519ac2chorebump wreq 6.0.0-rc.29, wreq-util 3.0.0-rc.12Jun 9, 202614ded4bchoresync Cargo.lock to v0.6.6Jun 9, 202672a451cchorev0.6.6Jun 9, 202617fce81commitapply rustfmt to salvaged #49 commitsJun 9, 202684a0f97featureURL truncation warning + --url-encoded flagMay 24, 2026519dfb7featureperiodic progress stderr line on slow fetchesMay 23, 2026985a90bcommitpin wreq/wreq-util to exact rc versionsJun 4, 2026a1abf62
View full history on GitHub