From 2758526cdbe61a5ca2cee79b4f2f546f5ff8f642 Mon Sep 17 00:00:00 2001 From: ulziibay-kernel <253135130+ulziibay-kernel@users.noreply.github.com> Date: Mon, 11 May 2026 17:25:21 +0000 Subject: [PATCH 1/4] Add tiered site difficulty index to FAQ Replaces flat unsupported-websites list with a five-tier index covering very-hard through very-easy targets, with framing on how to interpret the tiers and a pointer to manual baselining. --- browsers/faq.mdx | 62 +++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 59 insertions(+), 3 deletions(-) diff --git a/browsers/faq.mdx b/browsers/faq.mdx index 47651e5..cd97986 100644 --- a/browsers/faq.mdx +++ b/browsers/faq.mdx @@ -19,13 +19,69 @@ If you're experiencing slower-than-expected browser creation times, review your - Browsers persist independently of CDP. Depending on your timeout configuration, it will continue running even if the CDP connection closes. You can reconnect to the same `cdp_ws_url` if you're unexpectedly disconnected. - We recommend implementing reconnect logic, as network interruptions or lifecycle events can cause CDP sessions to close. Detect disconnects and automatically re-establish a CDP connection when this occurs. -## Unsupported Websites +## Site difficulty index -There are some websites that are not supported by Kernel browsers due to their restrictions around automation and associated bot detection. These include: +Not all websites are equally hard to automate. The tiers below reflect how much friction we typically see when running Kernel browsers against each site — from "works out of the box" to "expect blocks even with stealth mode and a clean residential IP." + +This list is incomplete and will grow as we test more targets. Difficulty also shifts over time as sites change their defenses, so treat these as a starting point — always [run a manual baseline](/browsers/bot-detection/overview#getting-started) before automating. + +### Tier 5 — Very hard + +Aggressive anti-automation. Login and at-scale scraping are routinely blocked even with stealth mode, residential proxies, and warmed profiles. Expect hard blocks, account locks, or shadow bans. - LinkedIn - Facebook - Instagram +- TikTok +- Zillow +- Facebook Marketplace + +### Tier 4 — Hard + +Sophisticated fingerprinting and behavioral analysis. Reachable with stealth mode + careful pacing, but high CAPTCHA pressure and frequent challenges. Persistent [profiles](/auth/profiles) and stable IPs materially improve pass rates. + - X (Twitter) -- Amazon +- Google Search +- Google Maps +- Amazon (logged-in flows) +- Booking.com +- Airbnb +- Glassdoor +- Walmart + +### Tier 3 — Medium + +Real detection in place, but workable with Kernel defaults. Watch request frequency and avoid headless mode. + - Reddit +- YouTube +- Indeed +- Yelp +- Pinterest +- Target +- TripAdvisor +- Crunchbase + +### Tier 2 — Easy + +Light protections. Most automations succeed with default Kernel settings; occasional rate limiting at scale. + +- eBay +- Etsy +- Medium +- IMDb +- Cars.com +- Shopify storefronts + +### Tier 1 — Very easy + +Minimal or no anti-bot measures. Suitable for unrestricted automation under each site's terms. + +- Wikipedia +- GitHub +- Yahoo Finance +- Yellow Pages + + + Hitting friction on a site that should be easier than this list suggests? Check your [proxy type](/browsers/bot-detection/overview#choosing-a-proxy-type) and confirm you're not running headless — those are the two most common causes of unexpected detection. + From 6c3d853af8a8e423123f1f31c944daba1941b406 Mon Sep 17 00:00:00 2001 From: ulziibay-kernel <253135130+ulziibay-kernel@users.noreply.github.com> Date: Mon, 11 May 2026 17:44:08 +0000 Subject: [PATCH 2/4] Replace tier guesses with measured block rates MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Reframes the site difficulty index around an N=5 stealth + US residential proxy test against each site's public homepage. Three groups (Hard / Light / Clear) ranked by observed block + challenge rate, with detection vendor noted per site. Adds a methodology section and explicit caveats that this is a floor, not a ceiling — login flows and at-scale behavior are out of scope for this benchmark. --- browsers/faq.mdx | 75 +++++++++++++++++------------------------------- 1 file changed, 27 insertions(+), 48 deletions(-) diff --git a/browsers/faq.mdx b/browsers/faq.mdx index cd97986..a2eb726 100644 --- a/browsers/faq.mdx +++ b/browsers/faq.mdx @@ -21,67 +21,46 @@ If you're experiencing slower-than-expected browser creation times, review your ## Site difficulty index -Not all websites are equally hard to automate. The tiers below reflect how much friction we typically see when running Kernel browsers against each site — from "works out of the box" to "expect blocks even with stealth mode and a clean residential IP." +Block rates for unauthenticated homepage visits from a stealth Kernel browser through a US residential proxy. Sites are sorted by observed difficulty. See [methodology](#methodology) for the test protocol and important caveats — in particular, these numbers reflect a single landing-page request, not login flows or at-scale scraping. -This list is incomplete and will grow as we test more targets. Difficulty also shifts over time as sites change their defenses, so treat these as a starting point — always [run a manual baseline](/browsers/bot-detection/overview#getting-started) before automating. +This list is incomplete and will grow as we run more tests. Last measured 2026-05-11. -### Tier 5 — Very hard +### Hard — significant friction observed -Aggressive anti-automation. Login and at-scale scraping are routinely blocked even with stealth mode, residential proxies, and warmed profiles. Expect hard blocks, account locks, or shadow bans. +| Site | Block rate | Detection vendor | +|------|-----------:|------------------| +| Yelp | 100% (5/5 blocked) | DataDome | +| Glassdoor | 100% (5/5 challenged) | Cloudflare | +| Indeed | 40% (2/5 challenged) | Cloudflare + Imperva | +| TripAdvisor | 40% (2/5 blocked) | DataDome | -- LinkedIn -- Facebook -- Instagram -- TikTok -- Zillow -- Facebook Marketplace +### Light — partial friction observed -### Tier 4 — Hard +| Site | Block rate | Detection vendor | +|------|-----------:|------------------| +| Yellow Pages | 20% (1/5 blocked) | Cloudflare | +| Zillow | 20% (1/5 challenged) | PerimeterX | -Sophisticated fingerprinting and behavioral analysis. Reachable with stealth mode + careful pacing, but high CAPTCHA pressure and frequent challenges. Persistent [profiles](/auth/profiles) and stable IPs materially improve pass rates. +### Clear — no blocks observed at this layer -- X (Twitter) -- Google Search -- Google Maps -- Amazon (logged-in flows) -- Booking.com -- Airbnb -- Glassdoor -- Walmart +All five sessions returned a usable page. These sites still deploy bot detection — login flows, deep navigation, and high-volume scraping behave very differently — but the public landing page renders cleanly. -### Tier 3 — Medium +Airbnb, Amazon, Booking.com, Cars.com, Crunchbase, eBay, Etsy, Facebook, Facebook Marketplace, GitHub, Google Maps, Google Search, IMDb, Instagram, LinkedIn, Medium, Pinterest, Reddit, Shopify storefronts (Gymshark), Target, TikTok, Walmart, Wikipedia, X (Twitter), Yahoo Finance, YouTube. -Real detection in place, but workable with Kernel defaults. Watch request frequency and avoid headless mode. +### Methodology -- Reddit -- YouTube -- Indeed -- Yelp -- Pinterest -- Target -- TripAdvisor -- Crunchbase +For each site, we open 5 concurrent stealth Kernel browser sessions through a US residential proxy and navigate to the public landing URL (e.g. `https://www.linkedin.com`). Each session uses a different exit IP. We then classify the response: -### Tier 2 — Easy +- **Success** — the expected page rendered, no detection signals tripped. +- **Challenged** — a visible CAPTCHA or "checking your browser" interstitial that requires action to proceed (e.g. Cloudflare Turnstile, hCaptcha, DataDome captcha). +- **Blocked** — a hard block page, 403/429 status, or vendor-branded "Access Denied" response. -Light protections. Most automations succeed with default Kernel settings; occasional rate limiting at scale. +Block rate combines blocked + challenged. Vendor labels reflect the bot-detection product whose signatures we matched. -- eBay -- Etsy -- Medium -- IMDb -- Cars.com -- Shopify storefronts - -### Tier 1 — Very easy - -Minimal or no anti-bot measures. Suitable for unrestricted automation under each site's terms. - -- Wikipedia -- GitHub -- Yahoo Finance -- Yellow Pages + + These results are a floor, not a ceiling. They tell you what the *easiest* automation case — one anonymous homepage visit — looks like. A site that scores 0% here can still be very hard once you add login, repeated requests from the same IP, deep navigation, or large concurrency. We plan to publish login-flow and at-scale benchmarks separately. + - Hitting friction on a site that should be easier than this list suggests? Check your [proxy type](/browsers/bot-detection/overview#choosing-a-proxy-type) and confirm you're not running headless — those are the two most common causes of unexpected detection. + Hitting friction on a site that scored clean here? Check your [proxy type](/browsers/bot-detection/overview#choosing-a-proxy-type) and confirm you're not running headless — those are the two most common causes of unexpected detection. From d1565a9aaef70a5efac5de3fb2ac0cc7079826d5 Mon Sep 17 00:00:00 2001 From: ulziibay-kernel <253135130+ulziibay-kernel@users.noreply.github.com> Date: Mon, 11 May 2026 17:46:18 +0000 Subject: [PATCH 3/4] Simplify difficulty index: drop per-site block-rate numbers Keeps Hard / Light / Clear grouping but drops the percentage tables in favor of plain lists. The methodology section still describes how sites get bucketed. --- browsers/faq.mdx | 51 ++++++++++++++++++++++++++++++++++++------------ 1 file changed, 38 insertions(+), 13 deletions(-) diff --git a/browsers/faq.mdx b/browsers/faq.mdx index a2eb726..cf5ab10 100644 --- a/browsers/faq.mdx +++ b/browsers/faq.mdx @@ -27,25 +27,50 @@ This list is incomplete and will grow as we run more tests. Last measured 2026-0 ### Hard — significant friction observed -| Site | Block rate | Detection vendor | -|------|-----------:|------------------| -| Yelp | 100% (5/5 blocked) | DataDome | -| Glassdoor | 100% (5/5 challenged) | Cloudflare | -| Indeed | 40% (2/5 challenged) | Cloudflare + Imperva | -| TripAdvisor | 40% (2/5 blocked) | DataDome | +Most or all sessions were blocked or challenged. + +- Yelp +- Glassdoor +- Indeed +- TripAdvisor ### Light — partial friction observed -| Site | Block rate | Detection vendor | -|------|-----------:|------------------| -| Yellow Pages | 20% (1/5 blocked) | Cloudflare | -| Zillow | 20% (1/5 challenged) | PerimeterX | +Some sessions were blocked or challenged. -### Clear — no blocks observed at this layer +- Yellow Pages +- Zillow -All five sessions returned a usable page. These sites still deploy bot detection — login flows, deep navigation, and high-volume scraping behave very differently — but the public landing page renders cleanly. +### Clear — no blocks observed at this layer -Airbnb, Amazon, Booking.com, Cars.com, Crunchbase, eBay, Etsy, Facebook, Facebook Marketplace, GitHub, Google Maps, Google Search, IMDb, Instagram, LinkedIn, Medium, Pinterest, Reddit, Shopify storefronts (Gymshark), Target, TikTok, Walmart, Wikipedia, X (Twitter), Yahoo Finance, YouTube. +All sessions returned a usable page. These sites still deploy bot detection — login flows, deep navigation, and high-volume scraping behave very differently — but the public landing page renders cleanly. + +- Airbnb +- Amazon +- Booking.com +- Cars.com +- Crunchbase +- eBay +- Etsy +- Facebook +- Facebook Marketplace +- GitHub +- Google Maps +- Google Search +- IMDb +- Instagram +- LinkedIn +- Medium +- Pinterest +- Reddit +- Shopify storefronts (Gymshark) +- Target +- TikTok +- Walmart +- Wikipedia +- X (Twitter) +- Yahoo Finance +- YouTube ### Methodology From c55e4a66c38722e6e55c0a794f9f5c97fe445275 Mon Sep 17 00:00:00 2001 From: ulziibay-kernel <253135130+ulziibay-kernel@users.noreply.github.com> Date: Mon, 11 May 2026 17:46:49 +0000 Subject: [PATCH 4/4] Drop methodology section from difficulty index --- browsers/faq.mdx | 20 ++------------------ 1 file changed, 2 insertions(+), 18 deletions(-) diff --git a/browsers/faq.mdx b/browsers/faq.mdx index cf5ab10..6836d97 100644 --- a/browsers/faq.mdx +++ b/browsers/faq.mdx @@ -21,9 +21,7 @@ If you're experiencing slower-than-expected browser creation times, review your ## Site difficulty index -Block rates for unauthenticated homepage visits from a stealth Kernel browser through a US residential proxy. Sites are sorted by observed difficulty. See [methodology](#methodology) for the test protocol and important caveats — in particular, these numbers reflect a single landing-page request, not login flows or at-scale scraping. - -This list is incomplete and will grow as we run more tests. Last measured 2026-05-11. +A rough grouping of sites by how much friction we see when running stealth Kernel browsers against the public landing page. This list is incomplete and will grow over time. ### Hard — significant friction observed @@ -72,20 +70,6 @@ All sessions returned a usable page. These sites still deploy bot detection — - Yahoo Finance - YouTube -### Methodology - -For each site, we open 5 concurrent stealth Kernel browser sessions through a US residential proxy and navigate to the public landing URL (e.g. `https://www.linkedin.com`). Each session uses a different exit IP. We then classify the response: - -- **Success** — the expected page rendered, no detection signals tripped. -- **Challenged** — a visible CAPTCHA or "checking your browser" interstitial that requires action to proceed (e.g. Cloudflare Turnstile, hCaptcha, DataDome captcha). -- **Blocked** — a hard block page, 403/429 status, or vendor-branded "Access Denied" response. - -Block rate combines blocked + challenged. Vendor labels reflect the bot-detection product whose signatures we matched. - - - These results are a floor, not a ceiling. They tell you what the *easiest* automation case — one anonymous homepage visit — looks like. A site that scores 0% here can still be very hard once you add login, repeated requests from the same IP, deep navigation, or large concurrency. We plan to publish login-flow and at-scale benchmarks separately. - - - Hitting friction on a site that scored clean here? Check your [proxy type](/browsers/bot-detection/overview#choosing-a-proxy-type) and confirm you're not running headless — those are the two most common causes of unexpected detection. + Hitting friction on a site that's listed under Clear? Check your [proxy type](/browsers/bot-detection/overview#choosing-a-proxy-type) and confirm you're not running headless — those are the two most common causes of unexpected detection.