Blog

Web extraction, LLMs, and building in public.

Technical deep dives on web extraction, content parsing for LLMs, anti-bot bypass, and building open-source infrastructure in Rust. Written by the team behind webclaw.

webclaw turns any website into clean, structured content for AI applications. These posts cover the engineering decisions, trade-offs, and lessons learned building a web extraction toolkit from scratch.

27 postsPage 3 / 3
How to Scrape Google Search Results in 2026 (5 Ways)
Apr 10, 2026Massi

How to Scrape Google Search Results in 2026 (5 Ways)

Google killed plain HTTP to search results. 5 ways that still work in 2026: TLS fingerprinting, headless browsers, SERP APIs. Code examples for each.

The 6 best web scraping APIs for LLMs in 2026
Apr 7, 2026Massi

The 6 best web scraping APIs for LLMs in 2026

If you're building with LLMs, you need web data. Here's how the main scraping APIs compare on the things that actually matter for AI use cases.

How to Bypass Cloudflare Bot Protection (2026, No Browser)
Apr 2, 2026Massi

How to Bypass Cloudflare Bot Protection (2026, No Browser)

Fix the four signals Cloudflare checks before you reach for a headless browser: TLS, HTTP/2, challenge, session. Why proxy and user-agent rotation alone fails.

Extract structured data from any URL in one call
Mar 31, 2026Massi

Extract structured data from any URL in one call

You don't always need the full page. Sometimes you need three fields from a product listing. Here's how to pull exactly the data you want from any URL.

Build a RAG pipeline with live web data (4 steps)
Mar 27, 2026Massi

Build a RAG pipeline with live web data (4 steps)

Most RAG tutorials stop at "upload a PDF." Real apps need live web data. Here's how to build a pipeline that fetches, extracts, and indexes pages.

MCP web scraping for Claude Code and Cursor
Mar 24, 2026Massi

MCP web scraping for Claude Code and Cursor

MCP web scraping gives Claude Code, Cursor, and AI agents live web access. Scrape, crawl, search, extract, and summarize from one server.

HTML to Markdown for LLMs and RAG
Mar 20, 2026Massi

HTML to Markdown for LLMs and RAG

Convert HTML to Markdown for LLMs with boilerplate removed, links preserved, and cleaner RAG input for agents and summarization.

Web scraping for AI agents: 3 hidden problems
Mar 17, 2026Massi

Web scraping for AI agents: 3 hidden problems

Most scraping tools were built for data pipelines, not AI agents. Three things quietly break your pipeline and how to fix them.

Why I built webclaw (Rust scraper for LLMs)
Mar 12, 2026Massi

Why I built webclaw (Rust scraper for LLMs)

I was tired of scrapers that return 403 or need headless Chrome for basic HTML. So I built one in Rust that actually works.

Stop reading. Start scraping.

Cancel anytime. Turn any page into clean, structured content your agent can actually use.

Read the docs