April 24, 2026Massi

Puppeteer Stealth Not Working on Cloudflare?

Name: webclaw
Author: Massi

Your Puppeteer script did not suddenly become stupid.

The setup that worked in 2023 usually looked like this: Puppeteer, puppeteer-extra, puppeteer-extra-plugin-stealth, maybe a residential proxy, maybe headless: false if the site was touchy. You could get past a lot of Cloudflare pages because the obvious leaks were gone. navigator.webdriver was patched. HeadlessChrome disappeared from the user agent. navigator.plugins looked less empty. WebGL stopped screaming "headless browser".

In 2026 that is not enough. You still get a 403, an endless "Just a moment" loop, or a page that loads once and dies on the next navigation. For the broader playbook, read the pillar on bypass Cloudflare bot protection. This post is the narrower version: why the stealth plugin stopped being a reliable answer.

The short version: stealth plugins patch the browser surface. Cloudflare scores the whole request.

Puppeteer stealth not working on Cloudflare: quick answer

If Puppeteer stealth used to work and now fails, the likely miss is outside the browser JavaScript surface.

1. Log whether the response is a real page, a Cloudflare challenge body, or a 403/1020 block.

2. Check whether the page contains Turnstile markers such as cf-turnstile.

3. Keep cookies and proxy identity stable across navigation. Fresh sessions on every page look automated.

4. Move the first fetch to a browser-grade TLS and HTTP/2 profile.

5. Use Puppeteer only when the page genuinely needs JavaScript interaction.

For the specific failure path, use the Turnstile guide, the Cloudflare error code guide, or the deeper TLS fingerprinting guide.

Stealth plugins cover browser JavaScript leaks, but Cloudflare also scores network, request, and behavior signals.

What the stealth plugin actually does

Start with the real package, not folklore.

`puppeteer-extra-plugin-stealth` describes itself as a plugin for Puppeteer Extra and Playwright Extra "to prevent detection." The current npm package is 2.11.2, published three years ago at the time of writing.

Its own changelog is useful because it tells you what class of problem the plugin was built for:

navigator.webdriver

navigator.plugins and MIME types

chrome.runtime

webgl.vendor

user agent, language, and platform overrides

iframe and contentWindow leaks

accept-language behavior

Those are browser JavaScript fingerprints. They matter. If a detection script runs in the page and sees navigator.webdriver === true, you are done. If your headless browser has no plugins, a weird WebGL vendor, or a broken chrome object, you look like automation.

Stealth plugins made sense because a lot of early bot detection was exactly that: run JavaScript, inspect the browser object model, catch the obvious automation artifacts.

But look at what is missing from that list:

TLS ClientHello shape

JA3 and JA4 fingerprints

HTTP/2 behavior

header order at the edge

IP and ASN reputation

request history for the session

cookie age and continuity

navigation timing

whether this "browser" ever loads images, fonts, CSS, or only HTML

That is the gap. The plugin is not useless. It is just solving one slice of a much larger scoring problem.

What changed on Cloudflare's side

Cloudflare does not publish every detection detail, and anyone pretending otherwise is selling certainty they do not have. What Cloudflare does publish is enough to explain why browser-only stealth became brittle.

Cloudflare's Bot detection engines docs describe several layers:

A heuristics engine that processes all requests and matches against malicious fingerprints.

JavaScript Detections that identify headless browsers and malicious fingerprints with invisible client-side code.

Machine learning that uses headers, session characteristics, and browser signals collected across Cloudflare's network.

A __cf_bm cookie that helps smooth the bot score for a user's request pattern.

That last list is the important one. The model is not just asking, "does navigator.webdriver look normal?" It is asking whether the entire request behaves like a real browser session.

Cloudflare's Detection IDs docs are even more direct. They give an example of a detection ID catching a request where headers were sent in a different order than expected for the claimed browser. They also mention detection tags for things like Go traffic, which means Cloudflare is classifying traffic by the implementation fingerprints it observes, not just by what the user agent says.

And Cloudflare's JA4 Signals post explains the network side. JA3 was a hash of TLS ClientHello fields. JA4 is the newer fingerprint family that handles modern protocol behavior better, including ALPN and HTTP/2 context. Cloudflare says JA4 fingerprints and inter-request JA4 Signals are available in Firewall Rules, Bot Analytics, and Workers. The same post says Cloudflare analyzes more than 15 million unique JA4 fingerprints per day, built from more than 500 million user agents and billions of IP addresses.

You do not need a leaked rulebook to see the direction of travel. Cloudflare moved from "does this browser object look fake?" toward "does this client, session, fingerprint, request order, and behavior fit the browser it claims to be?"

The 2023 equilibrium

The old stack worked because it was coherent enough.

A lot of Cloudflare-protected sites were not running the full bot management stack aggressively. Many were using a Managed Challenge, a basic WAF rule, Bot Fight Mode, or JavaScript checks that looked for obvious automation. If your scraper launched Chrome, kept a normal user agent, and patched the top JavaScript leaks, you often got through.

There was also less pressure. Before AI agents and RAG crawlers hit every pricing page and docs site on the internet, many site owners were not tuning bot rules every week. The average scraper was still Python requests with a fake user agent. Puppeteer plus stealth looked expensive and human by comparison.

That equilibrium broke because defenders got better and the volume changed.

Cloudflare now exposes bot management fields for JA3, JA4, detection IDs, JavaScript detection, bot score, verified bots, and session cookies. Their public docs talk about request features, headers, session characteristics, browser signals, and request pattern smoothing. This is a system built to correlate layers.

A stealth plugin is not built to correlate layers. It patches properties.

Why the failure is confusing

The annoying part is that Puppeteer with stealth still works sometimes.

That makes people debug the wrong thing. You change the proxy. You add --disable-blink-features=AutomationControlled. You switch headless modes. You spoof the locale. You add random waits. You try puppeteer-real-browser. You run it headed. One target works, the next one fails, then the first one fails again two days later.

That inconsistency is the signal.

If a site only checks navigator.webdriver, stealth helps. If the site has a loose Cloudflare config, stealth helps. If you already have a warm session with valid cookies, stealth helps keep you from tripping the next page-level script.

But if the block is coming from a mismatch across layers, another browser evasion does not touch it.

Common mismatches:

Layer	What you claim	What Cloudflare can observe
User agent	Chrome on macOS	Linux container fonts, GPU, or WebGL
Locale	`en-US`	proxy exits from a different country
Browser	normal human session	no history, no cache, no aged cookies
Navigation	product page visit	direct deep link with no assets loaded
Headers	Chrome-like values	order or Client Hints do not match the browser
TLS / HTTP	browser-like client	JA4 or HTTP/2 behavior with odd global ratios

You can make any one row look right. Cloudflare is looking at the table.

A minimal repro that is honest

I am not going to put a fake benchmark table here. Cloudflare behavior changes by domain, plan, rule set, IP reputation, country, time, and session history. A "95% success rate" number without methodology is decoration.

What you can run is a small repro that tells you whether your target is being blocked before your scraper reaches useful content.

import puppeteer from "puppeteer-extra";
import StealthPlugin from "puppeteer-extra-plugin-stealth";

puppeteer.use(StealthPlugin());

const url = process.argv[2];
if (!url) {
  console.error("Usage: node cf-check.js https://example.com");
  process.exit(1);
}

const browser = await puppeteer.launch({
  headless: "new",
  args: ["--no-sandbox"],
});

const page = await browser.newPage();

await page.goto(url, {
  waitUntil: "networkidle2",
  timeout: 45000,
});

const result = await page.evaluate(() => {
  const text = document.body?.innerText || "";
  const html = document.documentElement?.innerHTML || "";

  return {
    title: document.title,
    statusText: text.slice(0, 500),
    hasCfChallenge:
      html.includes("/cdn-cgi/challenge-platform/") ||
      html.includes("cf-turnstile") ||
      text.includes("Just a moment") ||
      text.includes("Checking your browser"),
    wordCount: text.trim().split(/\s+/).filter(Boolean).length,
  };
});

console.log(JSON.stringify(result, null, 2));

await browser.close();

Run it against a page you are allowed to test:

node cf-check.js https://target.example/page

If hasCfChallenge is true, stealth did not solve the challenge. If wordCount is tiny but your browser shows a real article or product page, you got a shell, a challenge, or a blocked variant. If it passes once and fails later, you are probably looking at score drift across session, IP, cookie, or behavior signals.

That is more useful than a yes-or-no "does stealth work?" test.

Why adding more plugins keeps losing

Most stealth fixes are local. They patch what page JavaScript can read from the browser.

Cloudflare's public material points to a distributed scoring system:

Request heuristics run at the edge.

JavaScript Detections run invisibly in the browser.

ML uses request features across a huge network.

JA4 Signals aggregate behavior for a fingerprint over the last hour.

Bot cookies track a request pattern across the session.

The failure mode is no longer one missing property. It is incoherence.

That is why the "just add one more evasion" approach feels good for a day and then collapses. It can hide a new JavaScript leak. It cannot make a fresh container session have a believable history. It cannot make a noisy datacenter IP look residential. It cannot make a scraper that only requests HTML behave like a browser that loads CSS, fonts, images, XHR, and analytics. It cannot make 1,000 identical sessions across different proxies look like 1,000 different people.

Even when Puppeteer uses real Chrome for navigation, the surrounding system can still betray it.

The fix is architectural

For scraping, the default should not be "launch a browser and keep adding stealth."

The default should be:

Use a browser-fingerprinted HTTP client for pages that do not need JavaScript rendering.

Keep headers, Client Hints, TLS, HTTP/2 behavior, locale, and proxy geography aligned.

Persist sessions where the target expects returning users.

Detect challenge pages as failures, not as successful HTML.

Escalate to a real browser only when the page actually needs JavaScript or interaction.

That is the architecture webclaw uses.

The fast path is a fingerprinted fetch. No Chrome process. No DevTools session. No 300 MB browser just to read server-rendered HTML. If the response looks like a Cloudflare challenge, the router escalates. If the page is a JavaScript app and the content is missing, it escalates. If interaction is required, it uses browser mode.

The point is not "browsers are bad." Browsers are great when you need a browser. The mistake is making Chrome the default transport for every URL, then pretending a stealth plugin can make every session coherent.

A webclaw version of the same scrape

The Puppeteer version asks you to manage Chrome, stealth, proxies, challenge detection, session state, retries, and extraction.

With webclaw, the request is boring:

import { Webclaw } from "@webclaw/sdk";

const client = new Webclaw({
  apiKey: process.env.WEBCLAW_API_KEY,
});

const page = await client.scrape({
  url: "https://target.example/page",
  format: "llm",
});

console.log(page.markdown);

Or over HTTP:

curl -X POST https://api.webclaw.io/v1/scrape \
  -H "Authorization: Bearer $WEBCLAW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://target.example/page",
    "formats": ["llm"],
    "only_main_content": true
  }'

The important part is not the SDK. It is the routing. A page that can be fetched with a coherent browser-grade HTTP profile should not pay the browser tax. A page that really needs JavaScript should get a browser. A page that returns a challenge should be detected as a challenge and retried through the right path.

That is also why webclaw returns clean markdown instead of "whatever HTML came back." A Cloudflare challenge page is not content. Treating it as content is how broken scrapers poison RAG indexes and agent memory.

Full endpoint reference is in the scrape API docs.

When Puppeteer stealth is still fine

Do not delete Puppeteer from your toolbox.

It is still useful when:

You own the target site and need browser automation for testing.

The site is JavaScript-heavy and does not use strict bot rules.

You need to click, type, scroll, upload files, or wait for client-side XHR.

You already have a legitimate user session and need to automate a small workflow with permission.

The only detection you are hitting is a browser JavaScript leak.

For those jobs, stealth can still reduce obvious headless artifacts.

Just do not mistake it for a Cloudflare bypass strategy. It is one layer of camouflage in a system that now scores multiple layers.

What to check before blaming Cloudflare

Before you rip out your scraper, check the basics:

Check	Why it matters
Are you getting the real page text?	A 200 with a challenge body is still a failed scrape.
Does the HTML contain `/cdn-cgi/challenge-platform/`?	That usually means Cloudflare challenge code was served.
Does it contain `cf-turnstile` or `challenges.cloudflare.com/turnstile`?	That page is using Turnstile. Read the Turnstile guide.
Are cookies persisted between requests?	Fresh sessions on every page look automated.
Does proxy country match locale and timezone?	Cross-signal mismatches raise suspicion.
Are you loading only HTML?	Real browsers load assets. Scrapers often skip them.
Are you retrying too fast?	Humans do not reload a blocked page ten times in two seconds.

If all of that looks clean and you still fail, the target probably needs a different route: stronger TLS impersonation, a warmed session, a residential exit, a challenge solver, or a real browser fallback.

The uncomfortable truth

There is no permanent "Cloudflare bypass."

There are only systems that stay coherent under current detection rules, and systems that drift until they get caught. Puppeteer stealth used to buy a lot of time because the obvious browser leaks were the main problem on many sites. Now the problem includes edge heuristics, machine learning, JA4, header order, JavaScript detections, cookies, and session behavior.

That is why your old setup stopped working even though your code did not change.

The web around it changed.

If you want to keep using Puppeteer, use it where a browser is genuinely required and treat stealth as one patch, not the platform. If you want reliable extraction from Cloudflare-protected pages, build around layered routing: fingerprinted HTTP first, challenge detection, browser fallback when needed, and clean extraction at the end.

That is what webclaw is for.