feat: fix ghost RUM key false positive in wwwUrlResolver by verifying bundle data before accepting toggled hostname by tkotthakota-adobe · Pull Request #1657 · adobe/spacecat-shared

tkotthakota-adobe · 2026-06-09T23:28:18Z

https://jira.corp.adobe.com/browse/SITES-46095

Problem

The RUM system can have "ghost keys" — domain keys that exist in the registry but have no actual data behind them. The old wwwUrlResolver treated key existence as proof that a domain was correct. So for a site like aashirvaad.com, it would
find a ghost key for www.aashirvaad.com, assume that was the right domain, and all audits would query the wrong domain and return empty results.

Fix

After finding a domain key, the resolver now makes a second check: it calls the RUM bundler API to confirm there is real bundle data behind that key. Only if actual data is found does it accept the www-toggled domain. If the bundle probe returns empty, fails with a non-200 status, or throws a network error, it treats the key as a ghost and falls through to try the original hostname instead.

Changes

url-helpers.js: Added hasBundleData helper that probes bundles.aem.page to verify real data exists behind a domain key. Updated wwwUrlResolver to call this check before accepting the www-toggled hostname.
url-helpers.test.js: Added bundle probe nock interceptors to existing resolution tests. Added 3 new tests covering ghost key, non-200, and network error paths.

…e data before accepting toggled hostname

ramboz

Summary

This PR fixes a real correctness bug: wwwUrlResolver treated the mere existence of a RUM domainkey as proof a domain was right, so "ghost keys" (registered but data-less domains) caused audits to query the wrong hostname. The fix adds a hasBundleData probe that confirms real bundle data exists behind the key before accepting the www-toggled hostname. The approach is sound and the diff is small, well-tested, and follows existing conventions. I verified locally that all 1160 tests pass and url-helpers.js is at 100% coverage. My main concern is that the probe only checks today's UTC date, which can introduce the mirror-image false negative for legitimate-but-low-traffic domains. No hard blockers — ready to merge with a few fixes.

Perspectives applied: Architecture, SRE/Reliability, Security, QA/Testing, Product/Consumer, Engineering Practices. Skipped LLM/AI (no LLM code).

Strengths

Genuinely fixes the described bug. Probing for actual bundle data is the right disambiguator between a real domain and a ghost key, and the fall-through to the original hostname is preserved cleanly.
The new code paths are actually tested. The three new tests cover the ghost-key (empty rumBundles), non-200, and network-error cases — and the ghost-key test asserts both retrieveDomainkey calls and the resulting debug log, not just the return value. That's behavior verification, not line-hitting.
Conventions respected. The URL path shape (/bundles/{host}/{Y}/{M}/{D}?domainkey=) and the getUTCFullYear()/getUTCMonth()+1/padStart date logic mirror the canonical constructUrl in rum-api-client's rum-bundler-client.js. The module's own configured @adobe/fetch is used (not global fetch), so HTTP/1.1 test forcing + nock interception work correctly.
Docs kept in sync. The JSDoc was updated to reflect the new behavior and the added warn() logger dependency. No public API surface changed (hasBundleData is module-internal; wwwUrlResolver's signature is unchanged), so no .d.ts update is needed.

Issues

Should Fix

S1. Probe only checks today's UTC date — risks the mirror-image false negative — url-helpers.js, hasBundleData
The probe URL is built from new Date() and queries a single day: /bundles/{host}/{year}/{month}/{day}. But real RUM consumers (CWV, traffic, etc.) fetch a date range — see generateUrlsForDateRange/constructUrl in rum-bundler-client.js. A single-day window is the narrowest possible check and produces false negatives in two realistic cases:

Early in the UTC day, a legitimate domain may have few/zero bundles for the new day yet.
Low-traffic domains may legitimately have no data for any given single day while having plenty across a week.

When that happens and the non-toggled hostname also has a key (common — both www and apex are frequently registered), the resolver now falls through and returns the wrong hostname — exactly the failure mode this PR sets out to fix, just in the other direction. Consider probing a small window (e.g. yesterday + today, or the last N days) and accepting if any day has bundles:

// check a couple of days to absorb the UTC-boundary / low-traffic edge
for (const d of [0, 1]) {
  const day = new Date(now); day.setUTCDate(now.getUTCDate() - d);
  // build URL for `day`; return true on first non-empty rumBundles
}

Perspective: SRE / Product-Consumer

S2. The two new /* c8 ignore */ comments are unnecessary and mask covered code — url-helpers.js (catch block in hasBundleData; the "has key but no bundle data" debug line in wwwUrlResolver)
I verified this empirically: removing both new ignores and re-running the suite gives 100% lines / 100% branches / 100% statements, 1160 passing. The network-error test exercises the catch (and its log.warn + return false), and the ghost-key/non-200 tests exercise the fall-through debug line. Leaving the ignores in place defeats the purpose of the tests you wrote — a future regression in the catch/fallback path (e.g. a malformed-JSON 200, which also routes through that catch) would no longer be caught by coverage. Please remove both:

    const { rumBundles } = await resp.json();
    return Array.isArray(rumBundles) && rumBundles.length > 0;
-   /* c8 ignore next 3 */
  } catch (e) {

-   /* c8 ignore next */
    log.debug(`[wwwUrlResolver] ${wwwToggledHostname} has key but no bundle data, trying ${hostname}`);

Perspective: QA / Testing

S3. Harden against logging the domainkey — url-helpers.js, hasBundleData catch block
The domainkey is a per-domain secret (it's exchanged from the admin key and cached as a credential). The codebase deliberately redacts it before any logging — sanitizeURL (rum-bundler-client.js:60) and sanitize (rum-api-client/index.js:59). The new code interpolates the full domainkey into the probe URL and, on failure, logs e.message. If the underlying @adobe/fetch error message includes the request URL (several fetch implementations embed it), the domainkey lands in logs in cleartext — violating the established redaction convention. The warn already logs hostname, so the safe move is to never let the key reach the message, e.g. keep the key out of the logged string and, if you log the URL anywhere, route it through sanitizeURL. Cheap insurance against credential leakage.
Perspective: Security

Nice to Have

N1. Duplicated bundler constant and date-path logic — url-helpers.js:~329
RUM_BUNDLER_BASE_URL = 'https://bundles.aem.page' duplicates RUM_BUNDLER_API_HOST (rum-api-client/index.js:33) / BASE_URL (rum-bundler-client.js:18), and the Y/M/D path building duplicates constructUrl. I'm not suggesting importing rum-api-client into utils — that's the wrong dependency direction (rum-api-client already depends on utils). Just flagging that if this bundler-URL logic grows, the shared shape belongs in one place. Low priority as-is.
Perspective: Architecture

N2. No timeout on the probe fetch — url-helpers.js, hasBundleData
hasBundleData now sits inline in the resolution path that audits call. A hung connection to bundles.aem.page (no explicit timeout) would block resolution for the platform default, which can be very long. Consider an AbortSignal.timeout(...) on the probe so a downstream stall can't stall the caller.
Perspective: SRE / Reliability

N3. Probe omits SPACECAT_USER_AGENT — url-helpers.js, hasBundleData
The other two fetch calls in this same file set headers: { 'User-Agent': SPACECAT_USER_AGENT } (line ~134). The new fetch(url) passes no headers. For consistency (and in case the bundler treats requests by UA), pass the same header.
Perspective: Architecture / consistency

N4. The non-200 and network-error tests assert only the final hostname — url-helpers.test.js
Both new fall-back tests assert just result === 'example.com', which would also pass if the probe were never attempted. The ghost-key test is the good model — it asserts the [wwwUrlResolver] ... has key but no bundle data debug fired. Consider the same assertion on the other two so they can't silently pass for the wrong reason. Separately, none of the tests assert the domainkey reaches the probe query string (the nock regex matches only the path), so a bug dropping/garbling the key wouldn't be caught.
Perspective: QA / Testing

N5. Commit type and log-prefix inconsistency — minor
The title is feat: but this is a bug fix; fix: would map to a patch rather than a minor bump for this shared library (semantic-release reads the commit type). Also, the two new logs use a [wwwUrlResolver] prefix while the function's existing logs don't — small inconsistency within the same function.
Perspective: Engineering Practices

Recommendations

The most valuable change here is S1 — decide on the probe window deliberately. If "must have today's data" is an intentional product rule, a one-line comment saying so would prevent a future reader (or a low-traffic-site incident) from treating it as a bug.
The ?domainkey=${domainkey} value isn't URL-encoded, but that matches the existing constructUrl, so I'm not flagging it — just noting it travels with S3.

Out of Scope, Worth Tracking

Pre-existing grammar bug on unmodified lines: Could not retrieved RUM domainkey for .... Not introduced here; worth a drive-by fix in a future PR.
If ghost keys are a recurring problem, a dedicated "does this domain have any data" bundler endpoint would be more robust than a per-day GET — a platform/rum-api-client concern, not this PR.

Verdict

Ready to merge: With fixes
Reasoning: The fix is correct and well-tested, but the today-only probe window (S1) can misclassify legitimate low-traffic domains, and the unnecessary coverage ignores (S2, verified) plus the domainkey-logging hardening (S3) should be addressed first — all are small, contained changes.

Security note: this diff touches external I/O and a secret (domainkey). For a deeper dedicated pass, consider invoking /adobe-security-foundations followed by /adobe-security-lang.

fix ghost RUM key false positive in wwwUrlResolver by verifying bundl…

8ecadef

…e data before accepting toggled hostname

tkotthakota-adobe requested review from Kanishkavijay39, ramboz and rpapani June 9, 2026 23:28

tkotthakota-adobe mentioned this pull request Jun 9, 2026

feat: Ghost RUM Key Fix #1656

Closed

ramboz approved these changes Jun 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: fix ghost RUM key false positive in wwwUrlResolver by verifying bundle data before accepting toggled hostname#1657

feat: fix ghost RUM key false positive in wwwUrlResolver by verifying bundle data before accepting toggled hostname#1657
tkotthakota-adobe wants to merge 1 commit into
mainfrom
SITES46095

tkotthakota-adobe commented Jun 9, 2026

Uh oh!

ramboz left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tkotthakota-adobe commented Jun 9, 2026

Uh oh!

ramboz left a comment

Choose a reason for hiding this comment

Summary

Strengths

Issues

Should Fix

Nice to Have

Recommendations

Out of Scope, Worth Tracking

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants