Drop-in replacement

Web scraping for LangChain

Drop-in replacement for WebBaseLoader with bot protection bypass.

LangChain is the most popular framework for building LLM applications in Python and TypeScript. webclaw integrates as a document loader via the Firecrawl-compatible API, giving your LangChain agents access to real-time web data with automatic Cloudflare bypass.

Setup

LangChain Python — document loader

python

from langchain_community.document_loaders import FirecrawlLoader

# Point the Firecrawl loader at webclaw
loader = FirecrawlLoader(
    api_key="wc_...",
    api_url="https://api.webclaw.io",
    url="https://example.com",
    mode="scrape",
)

docs = loader.load()

# Feed into your vector store
vectorstore.add_documents(docs)

Why webclaw for LangChain

Drop-in Firecrawl v2 API compatibility
118ms on static pages, faster than browser-based loaders
Automatic Cloudflare, DataDome, AWS WAF bypass
LLM-optimized markdown cuts token costs ~90%

Common use cases

RAG pipelines with fresh web content
Document loaders with bot protection bypass
LangChain agents with real-time web tools
Multi-source research chains

Frequently asked questions.

Do I need to change my LangChain code to use webclaw?

No. webclaw implements Firecrawl's v2 API. Point the FirecrawlLoader at api.webclaw.io with your webclaw API key and existing code works unchanged.

Can LangChain agents call webclaw as a tool?

Yes. Wrap the webclaw SDK or REST API as a LangChain Tool and register it with your agent. Your agent can then call scrape, crawl, search, and extract at runtime.

Related guides

Other integrations

LlamaIndex CrewAI Mastra Claude Desktop Claude Code Codex Cursor n8n Zapier

Ready to connect?

Start extracting.

LangChain website