Massi
Founder & engineer, webclaw
I'm Massi, also known online as 0xMassi. I build web extraction infrastructure in Rust, focused on the problem of getting clean, reliable web data into language models and AI agents.
My work lives at the intersection of three hard problems: bot protection bypass (TLS fingerprinting, HTTP/2 impersonation), high-throughput content extraction (Rust, async, zero-copy), and LLM tooling (MCP, structured extraction, RAG pipelines). webclaw is where I ship that work as open source.
Before webclaw, I spent years writing iOS apps, backend services, and developer tooling. I've shipped native apps to the App Store, run production APIs, and maintained Rust crates used by other developers.
Areas of expertise
- Rust systems programming
- Web content extraction
- TLS fingerprinting and browser impersonation
- HTTP/2 protocol internals
- Bot protection bypass (Cloudflare, DataDome, AWS WAF)
- Model Context Protocol (MCP) server design
- Retrieval augmented generation (RAG) pipelines
- LLM tooling and agent infrastructure
Projects
webclaw
1.2k starsWeb extraction engine for LLMs
Rust-based web extraction engine. 118ms average response, 20x faster than Chrome-based alternatives. Ships as CLI, MCP server, and hosted API with SDKs for TypeScript, Python, and Go.
Stik
189 starsQuick-capture notes for macOS
Free, open-source quick-capture note app for Mac. Press ⌘⇧S from anywhere, a floating post-it appears, type your thought, and you're back to work in under 3 seconds. Plain markdown files, on-device AI, no cloud. Built with Tauri and Rust.
Akari
500+ membersTicket broker platform
Community and toolkit for independent ticket brokers. Real-time market monitoring across 50+ platforms, browser extension for fast checkout, P&L dashboard, and a 500+ member community. Powered 200k+ tickets secured in 2024.
Articles
Anti-bot scraping API: browser fallback beats browser-first
Choose an anti-bot scraping API that detects blocks, avoids browser-first costs, and returns clean markdown or JSON for AI agents and RAG.
How to evaluate web scraping APIs for AI agents
A practical checklist for testing web scraping APIs on real agent and RAG workflows, not toy URLs like example.com.
Migrating from Firecrawl: compatible API for AI agents
Already using Firecrawl? Learn how Firecrawl-compatible endpoints work, what to test before switching, and how to evaluate webclaw with your existing scrape and crawl calls.
Cloudflare scraping checklist: diagnose the block before you retry
A practical checklist for Cloudflare scraping failures. What to log, what each signal means, and when to change fingerprints, sessions, rate limits, or browser rendering.
TLS fingerprinting in 2026: why curl gets 403 and Chrome does not
The reason curl gets blocked and Chrome gets through is not JavaScript. It is the TLS handshake. Deep dive on JA3, JA4, HTTP/2 fingerprints, and how to match a real browser without launching one.
Cloudflare error codes for scrapers: 403 vs 503 vs 1020 (and the rest)
A 403, a 503, a 1020 and a 1015 are not the same problem. Decision tree for which Cloudflare block you hit, what each code really means, and what to change in the scraper.
Puppeteer stealth vs Cloudflare: why it breaks
Puppeteer stealth still patches browser leaks, but Cloudflare scores more than JavaScript. See what breaks in 2026 and what to do instead.
Cloudflare Turnstile scraping: fixes for 2026
Cloudflare Turnstile scraping fails as 403s, empty shells, or loops. Learn how to detect it, log the right signals, and choose the right fallback.
LlamaIndex web scraping: fix SimpleWebPageReader
LlamaIndex web scraping breaks on blocks, empty shells, and noisy HTML. Feed cleaner markdown into SimpleWebPageReader, RAG, and agents.
LangChain web scraping in 2026: what loaders can't do
LangChain's built-in loaders break on bot-protected sites and return raw HTML your LLM can't use. Here's how to get clean, reliable web data into any LangChain pipeline.
5 ways to scrape Google search results in 2026
Google killed plain HTTP access to search results. Here's what works now, from TLS fingerprinting libraries to headless browsers to APIs, with code examples for each approach.
The 6 best web scraping APIs for LLMs in 2026
If you're building with LLMs, you need web data. Here's how the main scraping APIs compare on the things that actually matter for AI use cases.
Cloudflare Web Scraping: What Works in 2026
A practical guide to Cloudflare scraping blocks in 2026. Learn what causes 403s, what signals matter, and which approaches still work.
Extract structured data from any URL in one call
You don't always need the full page. Sometimes you need three fields from a product listing. Here's how to pull exactly the data you want from any URL.
Build a RAG pipeline with live web data (4 steps)
Most RAG tutorials stop at "upload a PDF." Real apps need live web data. Here's how to build a pipeline that fetches, extracts, and indexes pages.
MCP web scraping for Claude Code and Cursor
MCP web scraping gives Claude Code, Cursor, and AI agents live web access. Scrape, crawl, search, extract, and summarize from one server.
HTML to Markdown for LLMs: cleaner RAG input
Convert HTML to Markdown for LLMs with boilerplate removed, links preserved, and fewer wasted tokens for RAG, agents, and summarization.
Web scraping for AI agents: 3 hidden problems
Most scraping tools were built for data pipelines, not AI agents. Three things quietly break your pipeline and how to fix them.
Why I built webclaw (Rust scraper for LLMs)
I was tired of scrapers that return 403 or need headless Chrome for basic HTML. So I built one in Rust that actually works.