Founder & engineer, webclaw

Massi

Name: webclaw
Price: 19 USD
Author: Massi

I'm Massi, also known online as 0xMassi. I build web extraction infrastructure in Rust, focused on the problem of getting clean, reliable web data into language models and AI agents.

My work lives at the intersection of three hard problems: getting through the defenses that block automated requests, high-throughput content extraction (Rust, async, zero-copy), and LLM tooling (MCP, structured extraction, RAG pipelines). webclaw is where I ship that work as open source.

Before webclaw, I spent years writing iOS apps, backend services, and developer tooling. I've shipped native apps to the App Store, run production APIs, and maintained Rust crates used by other developers.

Areas of expertise

What I go deep on.

Rust systems programming
Web content extraction
TLS fingerprinting and browser impersonation
HTTP/2 protocol internals
Bot protection bypass (Cloudflare, DataDome, AWS WAF)
Model Context Protocol (MCP) server design
Retrieval augmented generation (RAG) pipelines
LLM tooling and agent infrastructure

Projects

Things I've shipped.

webclaw

1.7k stars

Web extraction engine for LLMs

Rust-based web extraction engine. 118ms on static pages, no headless browser to spin up. Ships as CLI, MCP server, and hosted API with SDKs for TypeScript, Python, and Go.

Stik

226 stars

Quick-capture notes for macOS

Free, open-source quick-capture note app for Mac. Press ⌘⇧S from anywhere, a floating post-it appears, type your thought, and you're back to work in under 3 seconds. Plain markdown files, on-device AI, no cloud. Built with Tauri and Rust.

Akari

500+ members

Ticket broker platform

Community and toolkit for independent ticket brokers. Real-time market monitoring across 50+ platforms, browser extension for fast checkout, P&L dashboard, and a 500+ member community. Powered 200k+ tickets secured in 2024.

Articles

Writing & research.

Build a Job Board Scraper: A Production-Ready Guide

2026-07-13

Learn how to build a production-ready job board scraper with Webclaw. This guide covers architecture, anti-bot bypass, structured data, scaling, and LLM prep.

YouTube Transcript Scraper: A 2026 Developer's Guide

2026-07-12

Build a YouTube transcript scraper with methods for developers. From Python libraries to managed APIs, learn to extract clean transcript data for AI pipelines.

Website Change Monitoring Tool: A 2026 Developer Guide

2026-07-11

Discover how a website change monitoring tool works, key features to evaluate, and how to implement one for compliance, SEO, and AI data pipelines in 2026.

A Practical Guide to Duplicate Detection in 2026

2026-07-10

A dev's guide to duplicate detection for AI and web scraping. Learn algorithms, scaling strategies, and how to handle exact, near, and semantic duplicates.

10 Best Site Mapping Tools for Developers in 2026

2026-07-09

Find the best site mapping tools for developers and engineers. Compare 10 top crawlers and APIs for technical SEO, UX design, and AI data extraction.

Web Scraping with Go: A 2026 Guide to Building Scrapers

2026-07-08

Learn web scraping with Go in 2026. This guide covers Colly, Goquery, and Chromedp, plus handling JS, proxies, and bot protection for reliable data.

Bearer Token Authentication: 2026 Guide to Security

2026-07-07

Master bearer token authentication in 2026. Explore its lifecycle, JWTs, security best practices, and REST API integration in this comprehensive guide.

A Modern Python Scraping Tutorial for 2026

2026-07-06

The only Python scraping tutorial you'll need. Go from basic setup to advanced techniques for handling JavaScript, proxies, and preparing data for AI.

Master Web Scraping in Python: 2026 Guide

2026-07-05

Learn modern web scraping in Python. Cover requests, JavaScript, bypassing blocks, & getting LLM-ready data.

Amazon Scrape API: A Guide to Building Reliable Pipelines

2026-07-04

Learn to build a reliable Amazon scrape API pipeline. This guide covers anti-scraping, ASIN extraction, LLM-optimized JSON output, and scaling.

CSV vs JSON: Which Format to Choose in 2026

2026-07-03

Choosing between CSV vs JSON for your data? This guide compares structure, performance, LLM token efficiency, and use cases to help you decide.

Residential Backconnect Proxy: Ultimate Guide 2026

2026-07-02

Uncover how a residential backconnect proxy works for web scraping & geo-targeting. Find providers that defeat modern behavioral blocks in 2026. Get started

Amazon Scraping API: A Developer's Guide for 2026

2026-07-01

A complete guide to using an Amazon scraping API in 2026. Learn to handle anti-bot measures, extract structured data, and integrate with your applications.

XPath Contains Text: Syntax & Best Practices

2026-06-30

Xpath contains text - Master XPath `contains text` for reliable web scraping. Covers syntax, pitfalls (whitespace, case-sensitivity), & alternatives

Proxies for Google: A Developer's Guide for 2026

2026-06-29

A developer-focused guide on using proxies for Google scraping. Learn to choose residential vs. datacenter proxies, manage rotation, and bypass blocks in 2026.

Text Extractor from Website: A 2026 Practical Guide

2026-06-28

Need a text extractor from website that handles modern JS sites and bot blocking? This guide shows how to get clean, LLM-ready text using Python or an API.

Optimize Your Proxy for Downloads Performance

2026-06-27

Choose and configure a proxy for downloads. This guide covers residential vs. datacenter options, performance, and large file handling for reliable data

Residential Proxies for Self-Hosted webclaw Scraping

2026-06-27

Route self-hosted webclaw scrapes through ColdProxy residential proxies with rotation and geo-targeting. Setup, pool files, and crawl commands.

Web Search API: The 2026 Guide for AI Developers

2026-06-26

Explore what a web search API is in 2026. Learn about architectures, features, and how to integrate one for AI agents, RAG, and clean data extraction.

Downloading HTML Files: From Browser to API in 2026

2026-06-25

Learn modern methods for downloading HTML files. This guide covers browser saving, curl/wget, headless browsers for JS, and APIs for developers and AI.

R Programming Web Scraping: The 2026 Practical Guide

2026-06-24

Master R programming web scraping. This guide covers rvest, dynamic sites with RSelenium, anti-scraping, and how to build reliable data pipelines for AI.

Playwright vs Puppeteer: The 2026 Developer's Guide

2026-06-23

Playwright vs Puppeteer: Which to choose in 2026? A technical guide on performance, APIs, and when to use a scraping API like Webclaw instead.

Undetectable Internet Browser: Web Scraping & Compliance

2026-06-22

Discover what an undetectable internet browser is. Learn about browser fingerprinting, legitimate web scraping, and how to stay compliant in 2026.

Python Load JSON File

2026-06-21

Learn to python load json file efficiently. Covers basic loading, large files, performance, error checking, and schema validation with practical examples.

Web Scraping in R: A Practical 2026 Guide

2026-06-20

Learn modern web scraping in R. This guide covers rvest for static sites, RSelenium for JavaScript, and APIs for tough targets. Start scraping data today.

Advanced Crawling in Python: Techniques for 2026

2026-06-19

Crawling in python - Master Python crawling: requests, Scrapy, Playwright, anti-bot, data extraction, & AI scaling in 2026. Build production-grade web scrapers

Curl POST JSON: A Practical Guide for Developers

2026-06-18

Master how to curl post json data. This guide covers sending inline and file-based JSON, auth, headers, and the modern --json flag with practical examples.

Scraping Websites for Data: A 2026 Developer's Guide

2026-06-17

Learn how scraping websites for data works in 2026. This guide covers planning, JS rendering, bypassing bots, and creating clean, LLM-ready data pipelines.

What Is Batch Processing: Essential Guide for 2026

2026-06-16

Discover what is batch processing, its role compared to streaming, and why it's a critical pattern for efficient data pipelines, web scraping, and AI in 2026.

What Is Screen Scraping: Understanding Its Risks & AI Uses

2026-06-15

Discover what is screen scraping, how it works, its legal risks, and comparisons to modern APIs & web scraping for AI in 2026.

How to Scrape a Website for Emails (the 2026 Guide)

2026-06-13

Scraping a website for emails in 2026 is contact discovery plus data-quality control, not regex on a homepage. How to crawl, render, extract, validate, and use email data responsibly.

Competitor Price Tracking: A Developer's Guide 2026

2026-06-12

Competitor price tracking is a production data pipeline, not a dashboard. How to collect, normalize, match, and act on competitor price data without making the wrong pricing call.

Bypassing Web Blocks: Expert Strategies for 2026

2026-06-11

Bypassing web blocks in 2026 is an architecture decision, not a single trick. When raw HTTP is enough, when you need a headless browser, and when to buy a scraping API.

How to Convert HTML to Markdown: The Complete 2026 Guide

2026-06-09

Convert HTML to Markdown the right way: Pandoc for local files, Turndown and markdownify in code, and a URL-to-Markdown API for JavaScript-rendered pages.

Apify Alternative for LLM Web Scraping and AI Agents

2026-06-04

Compare Apify actors, the Apify marketplace, and Webclaw for any-URL markdown extraction, structured JSON, crawling, MCP access, and AI agent web tooling.

Bright Data Alternative for LLM Web Scraping

2026-06-02

Compare Bright Data, Web Unlocker, and Webclaw for proxy infrastructure, markdown extraction, structured JSON, crawling, batching, and AI agent workflows.

Jina Reader Alternative That Handles Cloudflare (2026)

2026-05-28

Jina Reader breaks on Cloudflare and DataDome. Same r.jina.ai-style URL to markdown, plus crawling, batching, and anti-bot bypass that returns content.

Crawl4AI vs Playwright: Which to Use for Scraping (2026)

2026-05-26

Crawl4AI vs Playwright for web scraping: which one to pick, where each breaks, and when you need neither. Markdown output, browser control, RAG input.

JavaScript Rendering API: When You Actually Need a Browser

2026-05-21

Most pages do not need a headless browser. How to detect an empty React shell, when a JavaScript rendering API is worth it, and how to skip the slow path.

Anti-Bot Scraping API 2026: signals that force browser fallback

2026-05-19

The exact block markers, JA4 fingerprints, empty shells, anti-bot cookies, JavaScript heuristics, and content-quality signals that decide when a scraping API should escalate to a browser.

Anti-Bot Scraping API in 2026: Skip Browser-First, Stay Fast

2026-05-14

An anti-bot scraping API that detects the block first, then escalates to a browser only when needed. Faster and cheaper, with clean markdown or JSON out.

How to evaluate web scraping APIs for AI agents

2026-05-12

A practical checklist for testing web scraping APIs on real agent and RAG workflows, not toy URLs like example.com.

Migrating from Firecrawl: compatible API for AI agents

2026-05-08

Already using Firecrawl? Learn how Firecrawl-compatible endpoints work, what to test before switching, and how to evaluate webclaw with your existing scrape and crawl calls.

Cloudflare Scraping Checklist: Diagnose the Block in 2026

2026-05-05

A checklist for Cloudflare scraping failures. What to log, what each signal means, and when to change fingerprint, session, rate limit, or render in a browser.

Cloudflare JA4 / JA3 Fingerprinting Explained (Why curl Gets 403)

2026-04-30

Cloudflare fingerprints your TLS and HTTP/2 handshake with JA3 and JA4 — that is why curl gets 403 and Chrome gets 200 on the same request. How browser-grade clients flip the result.

Cloudflare 403, 503, 1020, 1015: What Each Block Means

2026-04-28

Cloudflare 403, 503, 1020, 1015 each mean a different block. A decision tree to read the code, find the failing layer, and fix it. Includes error 1020.

Why Puppeteer Stealth Still Fails on Cloudflare (2026)

2026-04-24

puppeteer-extra-plugin-stealth still gets caught by Cloudflare in 2026. The network, request, and session signals that give it away, and what to run instead.

Cloudflare Turnstile in 2026: How It Works and What Bypasses It

2026-04-21

How Cloudflare Turnstile works in 2026 and what actually bypasses it. The four signals that decide pass or block: TLS, HTTP/2, token, session. No solver hype.

LlamaIndex Web Scraping: Fix SimpleWebPageReader

2026-04-17

LlamaIndex web scraping fails on blocks, empty shells, and noisy HTML. Feed cleaner markdown into RAG pipelines and agents.

LangChain web scraping in 2026: what loaders can't do

2026-04-14

LangChain's built-in loaders break on bot-protected sites and return raw HTML your LLM can't use. Here's how to get clean, reliable web data into any LangChain pipeline.

How to Scrape Google Search Results in 2026 (5 Ways)

2026-04-10

Google killed plain HTTP to search results. 5 ways that still work in 2026: TLS fingerprinting, headless browsers, SERP APIs. Code examples for each.

The 6 best web scraping APIs for LLMs in 2026

2026-04-07

If you're building with LLMs, you need web data. Here's how the main scraping APIs compare on the things that actually matter for AI use cases.

How to Bypass Cloudflare Bot Protection (2026, No Browser)

2026-04-02

Fix the four signals Cloudflare checks before you reach for a headless browser: TLS, HTTP/2, challenge, session. Why proxy and user-agent rotation alone fails.

Extract structured data from any URL in one call

2026-03-31

You don't always need the full page. Sometimes you need three fields from a product listing. Here's how to pull exactly the data you want from any URL.

Build a RAG pipeline with live web data (4 steps)

2026-03-27

Most RAG tutorials stop at "upload a PDF." Real apps need live web data. Here's how to build a pipeline that fetches, extracts, and indexes pages.

MCP web scraping for Claude Code and Cursor

2026-03-24

MCP web scraping gives Claude Code, Cursor, and AI agents live web access. Scrape, crawl, search, extract, and summarize from one server.

HTML to Markdown for LLMs and RAG

2026-03-20

Convert HTML to Markdown for LLMs with boilerplate removed, links preserved, and cleaner RAG input for agents and summarization.

Web scraping for AI agents: 3 hidden problems

2026-03-17

Most scraping tools were built for data pipelines, not AI agents. Three things quietly break your pipeline and how to fix them.

Why I built webclaw (Rust scraper for LLMs)

2026-03-12

I was tired of scrapers that return 403 or need headless Chrome for basic HTML. So I built one in Rust that actually works.

Elsewhere

Find me around.

Build on what I build. Start with webclaw.

The fastest web extraction engine for LLMs and agents. Open source, built in Rust. Cancel anytime.

Star on GitHub