Skip to content

fix(signup): Never fall back to apex domain for product trial signups#6628

Open
balamurali27 wants to merge 4 commits into
developfrom
frappe.cloud-domain
Open

fix(signup): Never fall back to apex domain for product trial signups#6628
balamurali27 wants to merge 4 commits into
developfrom
frappe.cloud-domain

Conversation

@balamurali27

Copy link
Copy Markdown
Contributor

Problem

New signups were landing on the apex frappe.cloud instead of a per-cluster subdomain (m.frappe.cloud, v.frappe.cloud, …). Because of AWS Route 53 hosted-zone record limits, the apex is not usable for new sites — they must be spread across cluster subdomains.

The signup-domain resolver silently fell back to the apex whenever it couldn't map the resolved cluster to an enabled *.frappe.cloud Root Domain:

  • cluster resolution returned None (geo/IP lookup down or rate-limited, no country on the Account Request), or
  • the nearest cluster had no *.frappe.cloud subdomain (e.g. UAE).

The resolution logic was duplicated in get_request and create_standby_site (both with the hole), and create_site blindly trusted whatever domain the frontend posted — the apex passes validation since it's an enabled Root Domain.

Fix

Centralised resolution on ProductTrial with a hard "never apex" invariant:

  • get_signup_domain(cluster) — resolves to the cluster's enabled *.{apex} subdomain, falling back to the nearest enabled subdomain (e.g. UAE → Bahrain), then the apex's default-cluster subdomain. Returns the apex only when no subdomains exist at all.
  • get_cluster_for_request(account_request) — moved here from the api module so both get_request and create_site reuse it.
  • create_site re-resolves server-side: an apex posted by a stale frontend / failed geo lookup can no longer slip through.
  • create_standby_site uses the same resolver for pooled standby sites.
  • utils: extracted CLUSTER_COORDINATES + _haversine_distance to module level and added get_nearest_cluster_among(target, candidates) (used by the nearest-subdomain fallback).

Documented in press/saas/README.md.

Tests

New tests in test_product_trial.py cover: exact-cluster subdomain, nearest-fallback (UAE → Bahrain), and apex-only-when-no-subdomains. All pass; ruff and pre-commit clean.

Note

This fixes new signups/standby sites only. Pre-existing standby and signed-up sites may still carry apex frappe.cloud host names — a separate cleanup/migration would be needed for those.

🤖 Generated with Claude Code

balamurali27 and others added 4 commits June 5, 2026 13:12
Move the per-cluster lat/long map and haversine logic out of
get_nearest_cluster_for_ip into module-level CLUSTER_COORDINATES,
_haversine_distance and _nearest_cluster_to_coordinates. Add
get_nearest_cluster_among() to find the candidate cluster nearest to a
given cluster, so callers can pick the closest cluster from a restricted
set (e.g. clusters that actually have a domain).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signup domains must land on a per-cluster subdomain (m.frappe.cloud, ...)
because AWS Route 53 hosted-zone limits make the apex (frappe.cloud)
unusable for new sites. The resolver silently fell back to the apex
whenever the resolved cluster had no enabled subdomain - when geo lookup
returned no cluster, or the nearest cluster (e.g. UAE) had no
*.frappe.cloud domain.

Centralise resolution on ProductTrial.get_signup_domain(), which falls
back from the resolved cluster to the nearest enabled subdomain, then the
default cluster's, and only returns the apex when no subdomains exist.
Move cluster resolution to ProductTrial.get_cluster_for_request() so both
get_request and create_site can reuse it, and have create_site re-resolve
server-side so an apex posted by a stale frontend can't slip through.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add tests for ProductTrial.get_signup_domain: it uses the resolved
cluster's subdomain, falls back to the nearest enabled subdomain when the
cluster has none (UAE -> Bahrain), and returns the apex only when no
subdomains are configured.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Explain the apex-vs-cluster-subdomain distinction, the get_signup_domain
resolution and never-apex invariant, and the old fallback bug it guards
against.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@balamurali27 balamurali27 requested a review from ssiyad as a code owner June 5, 2026 09:33
@greptile-apps

greptile-apps Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Centralises signup-domain resolution on ProductTrial to enforce a "never apex" invariant, fixing a silent fallback where new signups landed on frappe.cloud directly instead of per-cluster subdomains.

  • get_signup_domain(cluster) resolves to the cluster's enabled *.{apex} subdomain, falling back to the geographically nearest enabled subdomain (using extracted get_nearest_cluster_among), then the apex's default-cluster subdomain, and only returns the apex when no subdomains exist at all.
  • create_site now re-resolves server-side when an apex domain slips through, but the cluster variable is not updated after get_signup_domain returns a nearest-fallback domain, leaving self.cluster and the setup_trial_site call with a potentially wrong cluster (e.g., UAE while domain is Bahrain's).

Confidence Score: 3/5

The apex-guard in create_site introduces a cluster/domain mismatch on the new fallback path that could cause site creation to fail or land on the wrong server for users whose geo-IP resolves to a cluster without a subdomain.

The create_site apex-bypass path resolves domain to the nearest available subdomain (e.g., Bahrain) but keeps cluster as the original geo-IP result (e.g., UAE). setup_trial_site and self.cluster then use the stale cluster, which either has no servers (causing a failure) or creates a site on the wrong cluster behind a mismatched DNS zone.

press/saas/doctype/product_trial_request/product_trial_request.py — the apex fallback block (lines 231-238) needs to re-derive cluster from the resolved domain after calling get_signup_domain.

Important Files Changed

Filename Overview
press/saas/doctype/product_trial_request/product_trial_request.py Adds apex-slip-through guard in create_site; when geo-IP resolves to a cluster without a subdomain, cluster is not updated after get_signup_domain returns the nearest fallback domain, causing self.cluster and setup_trial_site to receive a stale/wrong cluster value.
press/saas/doctype/product_trial/product_trial.py Adds get_cluster_for_request and get_signup_domain methods; get_signup_domain has no limit on its get_all call, and the nearest-subdomain fallback can produce a domain whose cluster differs from the cluster variable the caller continues to use.
press/api/product_trial.py Removes duplicated cluster/domain resolution logic from get_request, delegating to the new ProductTrial methods; clean simplification with no issues.
press/utils/init.py Extracts CLUSTER_COORDINATES and _haversine_distance to module level and adds get_nearest_cluster_among; straightforward refactor with no issues.
press/saas/doctype/product_trial/test_product_trial.py Adds three focused tests for get_signup_domain covering the direct-match, nearest-fallback, and apex-only cases; does not test create_site's cluster/domain consistency.
press/saas/README.md Documentation-only addition explaining signup domain resolution; accurate and helpful.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[create_site / get_request] --> B{domain == apex?}
    B -- No --> C[cluster = RootDomain.default_cluster for domain]
    B -- Yes --> D[get_cluster_for_request\nacc. request / geo-IP]
    D --> E[get_signup_domain cluster]
    E --> F{cluster has enabled\n*.apex subdomain?}
    F -- Yes --> G[return cluster subdomain]
    F -- No --> H[get_nearest_cluster_among\nfrom CLUSTER_COORDINATES]
    H --> I{nearest has subdomain?}
    I -- Yes --> J[return nearest subdomain\ne.g. b.frappe.cloud]
    I -- No --> K{apex default_cluster\nhas subdomain?}
    K -- Yes --> L[return default cluster subdomain]
    K -- No --> M[return first enabled subdomain]
    M --> N{any subdomains at all?}
    N -- No --> O[return apex ONLY]
    G --> P[site creation\nwith resolved cluster + domain]
    J --> P
    L --> P
    O --> P
    C --> P
    style J fill:#f9a,stroke:#c55
    style P fill:#adf,stroke:#55c
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
press/saas/doctype/product_trial_request/product_trial_request.py:231-238
**Cluster/domain mismatch when nearest-subdomain fallback is used**

When `domain == product.domain` and `get_cluster_for_request` resolves to a cluster with no subdomain (e.g., UAE), `get_signup_domain` returns the *nearest* cluster's subdomain (e.g., `b.frappe.cloud` = Bahrain). At that point `cluster` is still "UAE" while `domain` is Bahrain's. `self.cluster = cluster` records "UAE", and `setup_trial_site(cluster="UAE")` will call `get_server_from_cluster("UAE")` — which either raises or places the site on UAE servers behind a Bahrain DNS zone.

The cluster should be re-derived from the resolved domain once it is known, matching the pattern used on the non-apex path:

```python
cluster = product.get_cluster_for_request(self.account_request)
domain = product.get_signup_domain(cluster)
cluster = frappe.db.get_value("Root Domain", domain, "default_cluster") or cluster
```

### Issue 2 of 2
press/saas/doctype/product_trial/product_trial.py:384-392
`frappe.db.get_all()` is called without a `limit` in a production code path. Root Domains are a small set in practice, but the Frappe ORM rule requires an explicit limit on every `get_all()` call.

```suggestion
		domain_by_cluster = {
			d.cluster: d.name
			for d in frappe.db.get_all(
				"Root Domain",
				{"name": ("like", f"%.{self.domain}"), "enabled": 1},
				["name", "default_cluster as cluster"],
				order_by="name",
				limit=100,
			)
		}
```

Reviews (1): Last reviewed commit: "docs(saas): Document signup domain resol..." | Re-trigger Greptile

Comment on lines +231 to +238
product: ProductTrial = frappe.get_doc("Product Trial", self.product_trial)
if domain == product.domain:
# Apex domain slipped through (geo lookup failed or stale frontend state).
# Re-resolve to a cluster subdomain so signups never land on the apex.
cluster = product.get_cluster_for_request(self.account_request)
domain = product.get_signup_domain(cluster)
else:
cluster = frappe.db.get_value("Root Domain", domain, "default_cluster")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Cluster/domain mismatch when nearest-subdomain fallback is used

When domain == product.domain and get_cluster_for_request resolves to a cluster with no subdomain (e.g., UAE), get_signup_domain returns the nearest cluster's subdomain (e.g., b.frappe.cloud = Bahrain). At that point cluster is still "UAE" while domain is Bahrain's. self.cluster = cluster records "UAE", and setup_trial_site(cluster="UAE") will call get_server_from_cluster("UAE") — which either raises or places the site on UAE servers behind a Bahrain DNS zone.

The cluster should be re-derived from the resolved domain once it is known, matching the pattern used on the non-apex path:

cluster = product.get_cluster_for_request(self.account_request)
domain = product.get_signup_domain(cluster)
cluster = frappe.db.get_value("Root Domain", domain, "default_cluster") or cluster
Prompt To Fix With AI
This is a comment left during a code review.
Path: press/saas/doctype/product_trial_request/product_trial_request.py
Line: 231-238

Comment:
**Cluster/domain mismatch when nearest-subdomain fallback is used**

When `domain == product.domain` and `get_cluster_for_request` resolves to a cluster with no subdomain (e.g., UAE), `get_signup_domain` returns the *nearest* cluster's subdomain (e.g., `b.frappe.cloud` = Bahrain). At that point `cluster` is still "UAE" while `domain` is Bahrain's. `self.cluster = cluster` records "UAE", and `setup_trial_site(cluster="UAE")` will call `get_server_from_cluster("UAE")` — which either raises or places the site on UAE servers behind a Bahrain DNS zone.

The cluster should be re-derived from the resolved domain once it is known, matching the pattern used on the non-apex path:

```python
cluster = product.get_cluster_for_request(self.account_request)
domain = product.get_signup_domain(cluster)
cluster = frappe.db.get_value("Root Domain", domain, "default_cluster") or cluster
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +384 to +392
domain_by_cluster = {
d.cluster: d.name
for d in frappe.db.get_all(
"Root Domain",
{"name": ("like", f"%.{self.domain}"), "enabled": 1},
["name", "default_cluster as cluster"],
order_by="name",
)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 frappe.db.get_all() is called without a limit in a production code path. Root Domains are a small set in practice, but the Frappe ORM rule requires an explicit limit on every get_all() call.

Suggested change
domain_by_cluster = {
d.cluster: d.name
for d in frappe.db.get_all(
"Root Domain",
{"name": ("like", f"%.{self.domain}"), "enabled": 1},
["name", "default_cluster as cluster"],
order_by="name",
)
}
domain_by_cluster = {
d.cluster: d.name
for d in frappe.db.get_all(
"Root Domain",
{"name": ("like", f"%.{self.domain}"), "enabled": 1},
["name", "default_cluster as cluster"],
order_by="name",
limit=100,
)
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: press/saas/doctype/product_trial/product_trial.py
Line: 384-392

Comment:
`frappe.db.get_all()` is called without a `limit` in a production code path. Root Domains are a small set in practice, but the Frappe ORM rule requires an explicit limit on every `get_all()` call.

```suggestion
		domain_by_cluster = {
			d.cluster: d.name
			for d in frappe.db.get_all(
				"Root Domain",
				{"name": ("like", f"%.{self.domain}"), "enabled": 1},
				["name", "default_cluster as cluster"],
				order_by="name",
				limit=100,
			)
		}
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@codecov-commenter

codecov-commenter commented Jun 5, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 63.51351% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.23%. Comparing base (ecba60f) to head (f6c6503).
⚠️ Report is 232 commits behind head on develop.

Files with missing lines Patch % Lines
press/saas/doctype/product_trial/product_trial.py 38.46% 16 Missing ⚠️
...ype/product_trial_request/product_trial_request.py 0.00% 5 Missing ⚠️
press/utils/__init__.py 80.76% 5 Missing ⚠️
press/api/product_trial.py 0.00% 1 Missing ⚠️

❌ Your patch status has failed because the patch coverage (63.51%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #6628      +/-   ##
===========================================
+ Coverage    50.14%   50.23%   +0.08%     
===========================================
  Files          984      990       +6     
  Lines        82293    82984     +691     
  Branches       523      523              
===========================================
+ Hits         41266    41684     +418     
- Misses       40995    41268     +273     
  Partials        32       32              
Flag Coverage Δ
dashboard 62.80% <ø> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants