Why Puppeteer stealth stopped working on Cloudflare
Your Puppeteer script did not suddenly become stupid.
The setup that worked in 2023 usually looked like this: Puppeteer, puppeteer-extra, puppeteer-extra-plugin-stealth, maybe a residential proxy, maybe headless: false if the site was touchy. You could get past a lot of Cloudflare pages because the obvious leaks were gone. navigator.webdriver was patched. HeadlessChrome disappeared from the user agent. navigator.plugins looked less empty. WebGL stopped screaming "headless browser".
In 2026 that is not enough. You still get a 403, an endless "Just a moment" loop, or a page that loads once and dies on the next navigation. For the broader playbook, read the pillar on bypass Cloudflare bot protection. This post is the narrower version: why the stealth plugin stopped being a reliable answer.
The short version: stealth plugins patch the browser surface. Cloudflare scores the whole request.
What the stealth plugin actually does
Start with the real package, not folklore.
`puppeteer-extra-plugin-stealth` describes itself as a plugin for Puppeteer Extra and Playwright Extra "to prevent detection." The current npm package is 2.11.2, published three years ago at the time of writing.
Its own changelog is useful because it tells you what class of problem the plugin was built for:
navigator.webdrivernavigator.plugins and MIME typeschrome.runtimewebgl.vendorcontentWindow leaksaccept-language behaviorThose are browser JavaScript fingerprints. They matter. If a detection script runs in the page and sees navigator.webdriver === true, you are done. If your headless browser has no plugins, a weird WebGL vendor, or a broken chrome object, you look like automation.
Stealth plugins made sense because a lot of early bot detection was exactly that: run JavaScript, inspect the browser object model, catch the obvious automation artifacts.
But look at what is missing from that list:
That is the gap. The plugin is not useless. It is just solving one slice of a much larger scoring problem.
What changed on Cloudflare's side
Cloudflare does not publish every detection detail, and anyone pretending otherwise is selling certainty they do not have. What Cloudflare does publish is enough to explain why browser-only stealth became brittle.
Cloudflare's Bot detection engines docs describe several layers:
__cf_bm cookie that helps smooth the bot score for a user's request pattern.That last list is the important one. The model is not just asking, "does navigator.webdriver look normal?" It is asking whether the entire request behaves like a real browser session.
Cloudflare's Detection IDs docs are even more direct. They give an example of a detection ID catching a request where headers were sent in a different order than expected for the claimed browser. They also mention detection tags for things like Go traffic, which means Cloudflare is classifying traffic by the implementation fingerprints it observes, not just by what the user agent says.
And Cloudflare's JA4 Signals post explains the network side. JA3 was a hash of TLS ClientHello fields. JA4 is the newer fingerprint family that handles modern protocol behavior better, including ALPN and HTTP/2 context. Cloudflare says JA4 fingerprints and inter-request JA4 Signals are available in Firewall Rules, Bot Analytics, and Workers. The same post says Cloudflare analyzes more than 15 million unique JA4 fingerprints per day, built from more than 500 million user agents and billions of IP addresses.
You do not need a leaked rulebook to see the direction of travel. Cloudflare moved from "does this browser object look fake?" toward "does this client, session, fingerprint, request order, and behavior fit the browser it claims to be?"
The 2023 equilibrium
The old stack worked because it was coherent enough.
A lot of Cloudflare-protected sites were not running the full bot management stack aggressively. Many were using a Managed Challenge, a basic WAF rule, Bot Fight Mode, or JavaScript checks that looked for obvious automation. If your scraper launched Chrome, kept a normal user agent, and patched the top JavaScript leaks, you often got through.
There was also less pressure. Before AI agents and RAG crawlers hit every pricing page and docs site on the internet, many site owners were not tuning bot rules every week. The average scraper was still Python requests with a fake user agent. Puppeteer plus stealth looked expensive and human by comparison.
That equilibrium broke because defenders got better and the volume changed.
Cloudflare now exposes bot management fields for JA3, JA4, detection IDs, JavaScript detection, bot score, verified bots, and session cookies. Their public docs talk about request features, headers, session characteristics, browser signals, and request pattern smoothing. This is a system built to correlate layers.
A stealth plugin is not built to correlate layers. It patches properties.
Why the failure is confusing
The annoying part is that Puppeteer with stealth still works sometimes.
That makes people debug the wrong thing. You change the proxy. You add --disable-blink-features=AutomationControlled. You switch headless modes. You spoof the locale. You add random waits. You try puppeteer-real-browser. You run it headed. One target works, the next one fails, then the first one fails again two days later.
That inconsistency is the signal.
If a site only checks navigator.webdriver, stealth helps. If the site has a loose Cloudflare config, stealth helps. If you already have a warm session with valid cookies, stealth helps keep you from tripping the next page-level script.
But if the block is coming from a mismatch across layers, another browser evasion does not touch it.
Common mismatches:
| Layer | What you claim | What Cloudflare can observe |
|---|---|---|
| User agent | Chrome on macOS | Linux container fonts, GPU, or WebGL |
| Locale | en-US | proxy exits from a different country |
| Browser | normal human session | no history, no cache, no aged cookies |
| Navigation | product page visit | direct deep link with no assets loaded |
| Headers | Chrome-like values | order or Client Hints do not match the browser |
| TLS / HTTP | browser-like client | JA4 or HTTP/2 behavior with odd global ratios |
You can make any one row look right. Cloudflare is looking at the table.
A minimal repro that is honest
I am not going to put a fake benchmark table here. Cloudflare behavior changes by domain, plan, rule set, IP reputation, country, time, and session history. A "95% success rate" number without methodology is decoration.
What you can run is a small repro that tells you whether your target is being blocked before your scraper reaches useful content.
import puppeteer from "puppeteer-extra";
import StealthPlugin from "puppeteer-extra-plugin-stealth";
puppeteer.use(StealthPlugin());
const url = process.argv[2];
if (!url) {
console.error("Usage: node cf-check.js https://example.com");
process.exit(1);
}
const browser = await puppeteer.launch({
headless: "new",
args: ["--no-sandbox"],
});
const page = await browser.newPage();
await page.goto(url, {
waitUntil: "networkidle2",
timeout: 45000,
});
const result = await page.evaluate(() => {
const text = document.body?.innerText || "";
const html = document.documentElement?.innerHTML || "";
return {
title: document.title,
statusText: text.slice(0, 500),
hasCfChallenge:
html.includes("/cdn-cgi/challenge-platform/") ||
html.includes("cf-turnstile") ||
text.includes("Just a moment") ||
text.includes("Checking your browser"),
wordCount: text.trim().split(/\s+/).filter(Boolean).length,
};
});
console.log(JSON.stringify(result, null, 2));
await browser.close();Run it against a page you are allowed to test:
node cf-check.js https://target.example/pageIf hasCfChallenge is true, stealth did not solve the challenge. If wordCount is tiny but your browser shows a real article or product page, you got a shell, a challenge, or a blocked variant. If it passes once and fails later, you are probably looking at score drift across session, IP, cookie, or behavior signals.
That is more useful than a yes-or-no "does stealth work?" test.
Why adding more plugins keeps losing
Most stealth fixes are local. They patch what page JavaScript can read from the browser.
Cloudflare's public material points to a distributed scoring system:
The failure mode is no longer one missing property. It is incoherence.
That is why the "just add one more evasion" approach feels good for a day and then collapses. It can hide a new JavaScript leak. It cannot make a fresh container session have a believable history. It cannot make a noisy datacenter IP look residential. It cannot make a scraper that only requests HTML behave like a browser that loads CSS, fonts, images, XHR, and analytics. It cannot make 1,000 identical sessions across different proxies look like 1,000 different people.
Even when Puppeteer uses real Chrome for navigation, the surrounding system can still betray it.
The fix is architectural
For scraping, the default should not be "launch a browser and keep adding stealth."
The default should be:
That is the architecture webclaw uses.
The fast path is a fingerprinted fetch. No Chrome process. No DevTools session. No 300 MB browser just to read server-rendered HTML. If the response looks like a Cloudflare challenge, the router escalates. If the page is a JavaScript app and the content is missing, it escalates. If interaction is required, it uses browser mode.
The point is not "browsers are bad." Browsers are great when you need a browser. The mistake is making Chrome the default transport for every URL, then pretending a stealth plugin can make every session coherent.
A webclaw version of the same scrape
The Puppeteer version asks you to manage Chrome, stealth, proxies, challenge detection, session state, retries, and extraction.
With webclaw, the request is boring:
import { Webclaw } from "@webclaw/sdk";
const client = new Webclaw({
apiKey: process.env.WEBCLAW_API_KEY,
});
const page = await client.scrape({
url: "https://target.example/page",
format: "llm",
});
console.log(page.markdown);Or over HTTP:
curl -X POST https://api.webclaw.io/v1/scrape \
-H "Authorization: Bearer $WEBCLAW_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://target.example/page",
"formats": ["llm"],
"only_main_content": true
}'The important part is not the SDK. It is the routing. A page that can be fetched with a coherent browser-grade HTTP profile should not pay the browser tax. A page that really needs JavaScript should get a browser. A page that returns a challenge should be detected as a challenge and retried through the right path.
That is also why webclaw returns clean markdown instead of "whatever HTML came back." A Cloudflare challenge page is not content. Treating it as content is how broken scrapers poison RAG indexes and agent memory.
Full endpoint reference is in the scrape API docs.
When Puppeteer stealth is still fine
Do not delete Puppeteer from your toolbox.
It is still useful when:
For those jobs, stealth can still reduce obvious headless artifacts.
Just do not mistake it for a Cloudflare bypass strategy. It is one layer of camouflage in a system that now scores multiple layers.
What to check before blaming Cloudflare
Before you rip out your scraper, check the basics:
| Check | Why it matters |
|---|---|
| Are you getting the real page text? | A 200 with a challenge body is still a failed scrape. |
Does the HTML contain /cdn-cgi/challenge-platform/? | That usually means Cloudflare challenge code was served. |
Does it contain cf-turnstile or challenges.cloudflare.com/turnstile? | That page is using Turnstile. Read the Turnstile guide. |
| Are cookies persisted between requests? | Fresh sessions on every page look automated. |
| Does proxy country match locale and timezone? | Cross-signal mismatches raise suspicion. |
| Are you loading only HTML? | Real browsers load assets. Scrapers often skip them. |
| Are you retrying too fast? | Humans do not reload a blocked page ten times in two seconds. |
If all of that looks clean and you still fail, the target probably needs a different route: stronger TLS impersonation, a warmed session, a residential exit, a challenge solver, or a real browser fallback.
The uncomfortable truth
There is no permanent "Cloudflare bypass."
There are only systems that stay coherent under current detection rules, and systems that drift until they get caught. Puppeteer stealth used to buy a lot of time because the obvious browser leaks were the main problem on many sites. Now the problem includes edge heuristics, machine learning, JA4, header order, JavaScript detections, cookies, and session behavior.
That is why your old setup stopped working even though your code did not change.
The web around it changed.
If you want to keep using Puppeteer, use it where a browser is genuinely required and treat stealth as one patch, not the platform. If you want reliable extraction from Cloudflare-protected pages, build around layered routing: fingerprinted HTTP first, challenge detection, browser fallback when needed, and clean extraction at the end.
That is what webclaw is for.
Read next: Cloudflare Turnstile in 2026 | Bypass Cloudflare bot protection | Web scraping for AI agents