feat: fix ghost RUM key false positive in wwwUrlResolver by verifying bundle data before accepting toggled hostname#1657
feat: fix ghost RUM key false positive in wwwUrlResolver by verifying bundle data before accepting toggled hostname#1657tkotthakota-adobe wants to merge 1 commit into
Conversation
…e data before accepting toggled hostname
ramboz
left a comment
There was a problem hiding this comment.
Summary
This PR fixes a real correctness bug: wwwUrlResolver treated the mere existence of a RUM domainkey as proof a domain was right, so "ghost keys" (registered but data-less domains) caused audits to query the wrong hostname. The fix adds a hasBundleData probe that confirms real bundle data exists behind the key before accepting the www-toggled hostname. The approach is sound and the diff is small, well-tested, and follows existing conventions. I verified locally that all 1160 tests pass and url-helpers.js is at 100% coverage. My main concern is that the probe only checks today's UTC date, which can introduce the mirror-image false negative for legitimate-but-low-traffic domains. No hard blockers — ready to merge with a few fixes.
Perspectives applied: Architecture, SRE/Reliability, Security, QA/Testing, Product/Consumer, Engineering Practices. Skipped LLM/AI (no LLM code).
Strengths
- Genuinely fixes the described bug. Probing for actual bundle data is the right disambiguator between a real domain and a ghost key, and the fall-through to the original hostname is preserved cleanly.
- The new code paths are actually tested. The three new tests cover the ghost-key (empty
rumBundles), non-200, and network-error cases — and the ghost-key test asserts bothretrieveDomainkeycalls and the resulting debug log, not just the return value. That's behavior verification, not line-hitting. - Conventions respected. The URL path shape (
/bundles/{host}/{Y}/{M}/{D}?domainkey=) and thegetUTCFullYear()/getUTCMonth()+1/padStartdate logic mirror the canonicalconstructUrlinrum-api-client'srum-bundler-client.js. The module's own configured@adobe/fetchis used (not globalfetch), so HTTP/1.1 test forcing +nockinterception work correctly. - Docs kept in sync. The JSDoc was updated to reflect the new behavior and the added
warn()logger dependency. No public API surface changed (hasBundleDatais module-internal;wwwUrlResolver's signature is unchanged), so no.d.tsupdate is needed.
Issues
Should Fix
S1. Probe only checks today's UTC date — risks the mirror-image false negative — url-helpers.js, hasBundleData
The probe URL is built from new Date() and queries a single day: /bundles/{host}/{year}/{month}/{day}. But real RUM consumers (CWV, traffic, etc.) fetch a date range — see generateUrlsForDateRange/constructUrl in rum-bundler-client.js. A single-day window is the narrowest possible check and produces false negatives in two realistic cases:
- Early in the UTC day, a legitimate domain may have few/zero bundles for the new day yet.
- Low-traffic domains may legitimately have no data for any given single day while having plenty across a week.
When that happens and the non-toggled hostname also has a key (common — both www and apex are frequently registered), the resolver now falls through and returns the wrong hostname — exactly the failure mode this PR sets out to fix, just in the other direction. Consider probing a small window (e.g. yesterday + today, or the last N days) and accepting if any day has bundles:
// check a couple of days to absorb the UTC-boundary / low-traffic edge
for (const d of [0, 1]) {
const day = new Date(now); day.setUTCDate(now.getUTCDate() - d);
// build URL for `day`; return true on first non-empty rumBundles
}Perspective: SRE / Product-Consumer
S2. The two new /* c8 ignore */ comments are unnecessary and mask covered code — url-helpers.js (catch block in hasBundleData; the "has key but no bundle data" debug line in wwwUrlResolver)
I verified this empirically: removing both new ignores and re-running the suite gives 100% lines / 100% branches / 100% statements, 1160 passing. The network-error test exercises the catch (and its log.warn + return false), and the ghost-key/non-200 tests exercise the fall-through debug line. Leaving the ignores in place defeats the purpose of the tests you wrote — a future regression in the catch/fallback path (e.g. a malformed-JSON 200, which also routes through that catch) would no longer be caught by coverage. Please remove both:
const { rumBundles } = await resp.json();
return Array.isArray(rumBundles) && rumBundles.length > 0;
- /* c8 ignore next 3 */
} catch (e) {- /* c8 ignore next */
log.debug(`[wwwUrlResolver] ${wwwToggledHostname} has key but no bundle data, trying ${hostname}`);Perspective: QA / Testing
S3. Harden against logging the domainkey — url-helpers.js, hasBundleData catch block
The domainkey is a per-domain secret (it's exchanged from the admin key and cached as a credential). The codebase deliberately redacts it before any logging — sanitizeURL (rum-bundler-client.js:60) and sanitize (rum-api-client/index.js:59). The new code interpolates the full domainkey into the probe URL and, on failure, logs e.message. If the underlying @adobe/fetch error message includes the request URL (several fetch implementations embed it), the domainkey lands in logs in cleartext — violating the established redaction convention. The warn already logs hostname, so the safe move is to never let the key reach the message, e.g. keep the key out of the logged string and, if you log the URL anywhere, route it through sanitizeURL. Cheap insurance against credential leakage.
Perspective: Security
Nice to Have
N1. Duplicated bundler constant and date-path logic — url-helpers.js:~329
RUM_BUNDLER_BASE_URL = 'https://bundles.aem.page' duplicates RUM_BUNDLER_API_HOST (rum-api-client/index.js:33) / BASE_URL (rum-bundler-client.js:18), and the Y/M/D path building duplicates constructUrl. I'm not suggesting importing rum-api-client into utils — that's the wrong dependency direction (rum-api-client already depends on utils). Just flagging that if this bundler-URL logic grows, the shared shape belongs in one place. Low priority as-is.
Perspective: Architecture
N2. No timeout on the probe fetch — url-helpers.js, hasBundleData
hasBundleData now sits inline in the resolution path that audits call. A hung connection to bundles.aem.page (no explicit timeout) would block resolution for the platform default, which can be very long. Consider an AbortSignal.timeout(...) on the probe so a downstream stall can't stall the caller.
Perspective: SRE / Reliability
N3. Probe omits SPACECAT_USER_AGENT — url-helpers.js, hasBundleData
The other two fetch calls in this same file set headers: { 'User-Agent': SPACECAT_USER_AGENT } (line ~134). The new fetch(url) passes no headers. For consistency (and in case the bundler treats requests by UA), pass the same header.
Perspective: Architecture / consistency
N4. The non-200 and network-error tests assert only the final hostname — url-helpers.test.js
Both new fall-back tests assert just result === 'example.com', which would also pass if the probe were never attempted. The ghost-key test is the good model — it asserts the [wwwUrlResolver] ... has key but no bundle data debug fired. Consider the same assertion on the other two so they can't silently pass for the wrong reason. Separately, none of the tests assert the domainkey reaches the probe query string (the nock regex matches only the path), so a bug dropping/garbling the key wouldn't be caught.
Perspective: QA / Testing
N5. Commit type and log-prefix inconsistency — minor
The title is feat: but this is a bug fix; fix: would map to a patch rather than a minor bump for this shared library (semantic-release reads the commit type). Also, the two new logs use a [wwwUrlResolver] prefix while the function's existing logs don't — small inconsistency within the same function.
Perspective: Engineering Practices
Recommendations
- The most valuable change here is S1 — decide on the probe window deliberately. If "must have today's data" is an intentional product rule, a one-line comment saying so would prevent a future reader (or a low-traffic-site incident) from treating it as a bug.
- The
?domainkey=${domainkey}value isn't URL-encoded, but that matches the existingconstructUrl, so I'm not flagging it — just noting it travels with S3.
Out of Scope, Worth Tracking
- Pre-existing grammar bug on unmodified lines:
Could not retrieved RUM domainkey for .... Not introduced here; worth a drive-by fix in a future PR. - If ghost keys are a recurring problem, a dedicated "does this domain have any data" bundler endpoint would be more robust than a per-day GET — a platform/
rum-api-clientconcern, not this PR.
Verdict
Ready to merge: With fixes
Reasoning: The fix is correct and well-tested, but the today-only probe window (S1) can misclassify legitimate low-traffic domains, and the unnecessary coverage ignores (S2, verified) plus the domainkey-logging hardening (S3) should be addressed first — all are small, contained changes.
Security note: this diff touches external I/O and a secret (domainkey). For a deeper dedicated pass, consider invoking /adobe-security-foundations followed by /adobe-security-lang.
https://jira.corp.adobe.com/browse/SITES-46095
Problem
The RUM system can have "ghost keys" — domain keys that exist in the registry but have no actual data behind them. The old wwwUrlResolver treated key existence as proof that a domain was correct. So for a site like aashirvaad.com, it would
find a ghost key for www.aashirvaad.com, assume that was the right domain, and all audits would query the wrong domain and return empty results.
Fix
After finding a domain key, the resolver now makes a second check: it calls the RUM bundler API to confirm there is real bundle data behind that key. Only if actual data is found does it accept the www-toggled domain. If the bundle probe returns empty, fails with a non-200 status, or throws a network error, it treats the key as a ghost and falls through to try the original hostname instead.
Changes