fix(signup): Never fall back to apex domain for product trial signups#6628
fix(signup): Never fall back to apex domain for product trial signups#6628balamurali27 wants to merge 4 commits into
Conversation
Move the per-cluster lat/long map and haversine logic out of get_nearest_cluster_for_ip into module-level CLUSTER_COORDINATES, _haversine_distance and _nearest_cluster_to_coordinates. Add get_nearest_cluster_among() to find the candidate cluster nearest to a given cluster, so callers can pick the closest cluster from a restricted set (e.g. clusters that actually have a domain). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signup domains must land on a per-cluster subdomain (m.frappe.cloud, ...) because AWS Route 53 hosted-zone limits make the apex (frappe.cloud) unusable for new sites. The resolver silently fell back to the apex whenever the resolved cluster had no enabled subdomain - when geo lookup returned no cluster, or the nearest cluster (e.g. UAE) had no *.frappe.cloud domain. Centralise resolution on ProductTrial.get_signup_domain(), which falls back from the resolved cluster to the nearest enabled subdomain, then the default cluster's, and only returns the apex when no subdomains exist. Move cluster resolution to ProductTrial.get_cluster_for_request() so both get_request and create_site can reuse it, and have create_site re-resolve server-side so an apex posted by a stale frontend can't slip through. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add tests for ProductTrial.get_signup_domain: it uses the resolved cluster's subdomain, falls back to the nearest enabled subdomain when the cluster has none (UAE -> Bahrain), and returns the apex only when no subdomains are configured. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Explain the apex-vs-cluster-subdomain distinction, the get_signup_domain resolution and never-apex invariant, and the old fallback bug it guards against. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
| Filename | Overview |
|---|---|
| press/saas/doctype/product_trial_request/product_trial_request.py | Adds apex-slip-through guard in create_site; when geo-IP resolves to a cluster without a subdomain, cluster is not updated after get_signup_domain returns the nearest fallback domain, causing self.cluster and setup_trial_site to receive a stale/wrong cluster value. |
| press/saas/doctype/product_trial/product_trial.py | Adds get_cluster_for_request and get_signup_domain methods; get_signup_domain has no limit on its get_all call, and the nearest-subdomain fallback can produce a domain whose cluster differs from the cluster variable the caller continues to use. |
| press/api/product_trial.py | Removes duplicated cluster/domain resolution logic from get_request, delegating to the new ProductTrial methods; clean simplification with no issues. |
| press/utils/init.py | Extracts CLUSTER_COORDINATES and _haversine_distance to module level and adds get_nearest_cluster_among; straightforward refactor with no issues. |
| press/saas/doctype/product_trial/test_product_trial.py | Adds three focused tests for get_signup_domain covering the direct-match, nearest-fallback, and apex-only cases; does not test create_site's cluster/domain consistency. |
| press/saas/README.md | Documentation-only addition explaining signup domain resolution; accurate and helpful. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[create_site / get_request] --> B{domain == apex?}
B -- No --> C[cluster = RootDomain.default_cluster for domain]
B -- Yes --> D[get_cluster_for_request\nacc. request / geo-IP]
D --> E[get_signup_domain cluster]
E --> F{cluster has enabled\n*.apex subdomain?}
F -- Yes --> G[return cluster subdomain]
F -- No --> H[get_nearest_cluster_among\nfrom CLUSTER_COORDINATES]
H --> I{nearest has subdomain?}
I -- Yes --> J[return nearest subdomain\ne.g. b.frappe.cloud]
I -- No --> K{apex default_cluster\nhas subdomain?}
K -- Yes --> L[return default cluster subdomain]
K -- No --> M[return first enabled subdomain]
M --> N{any subdomains at all?}
N -- No --> O[return apex ONLY]
G --> P[site creation\nwith resolved cluster + domain]
J --> P
L --> P
O --> P
C --> P
style J fill:#f9a,stroke:#c55
style P fill:#adf,stroke:#55c
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
press/saas/doctype/product_trial_request/product_trial_request.py:231-238
**Cluster/domain mismatch when nearest-subdomain fallback is used**
When `domain == product.domain` and `get_cluster_for_request` resolves to a cluster with no subdomain (e.g., UAE), `get_signup_domain` returns the *nearest* cluster's subdomain (e.g., `b.frappe.cloud` = Bahrain). At that point `cluster` is still "UAE" while `domain` is Bahrain's. `self.cluster = cluster` records "UAE", and `setup_trial_site(cluster="UAE")` will call `get_server_from_cluster("UAE")` — which either raises or places the site on UAE servers behind a Bahrain DNS zone.
The cluster should be re-derived from the resolved domain once it is known, matching the pattern used on the non-apex path:
```python
cluster = product.get_cluster_for_request(self.account_request)
domain = product.get_signup_domain(cluster)
cluster = frappe.db.get_value("Root Domain", domain, "default_cluster") or cluster
```
### Issue 2 of 2
press/saas/doctype/product_trial/product_trial.py:384-392
`frappe.db.get_all()` is called without a `limit` in a production code path. Root Domains are a small set in practice, but the Frappe ORM rule requires an explicit limit on every `get_all()` call.
```suggestion
domain_by_cluster = {
d.cluster: d.name
for d in frappe.db.get_all(
"Root Domain",
{"name": ("like", f"%.{self.domain}"), "enabled": 1},
["name", "default_cluster as cluster"],
order_by="name",
limit=100,
)
}
```
Reviews (1): Last reviewed commit: "docs(saas): Document signup domain resol..." | Re-trigger Greptile
| product: ProductTrial = frappe.get_doc("Product Trial", self.product_trial) | ||
| if domain == product.domain: | ||
| # Apex domain slipped through (geo lookup failed or stale frontend state). | ||
| # Re-resolve to a cluster subdomain so signups never land on the apex. | ||
| cluster = product.get_cluster_for_request(self.account_request) | ||
| domain = product.get_signup_domain(cluster) | ||
| else: | ||
| cluster = frappe.db.get_value("Root Domain", domain, "default_cluster") |
There was a problem hiding this comment.
Cluster/domain mismatch when nearest-subdomain fallback is used
When domain == product.domain and get_cluster_for_request resolves to a cluster with no subdomain (e.g., UAE), get_signup_domain returns the nearest cluster's subdomain (e.g., b.frappe.cloud = Bahrain). At that point cluster is still "UAE" while domain is Bahrain's. self.cluster = cluster records "UAE", and setup_trial_site(cluster="UAE") will call get_server_from_cluster("UAE") — which either raises or places the site on UAE servers behind a Bahrain DNS zone.
The cluster should be re-derived from the resolved domain once it is known, matching the pattern used on the non-apex path:
cluster = product.get_cluster_for_request(self.account_request)
domain = product.get_signup_domain(cluster)
cluster = frappe.db.get_value("Root Domain", domain, "default_cluster") or clusterPrompt To Fix With AI
This is a comment left during a code review.
Path: press/saas/doctype/product_trial_request/product_trial_request.py
Line: 231-238
Comment:
**Cluster/domain mismatch when nearest-subdomain fallback is used**
When `domain == product.domain` and `get_cluster_for_request` resolves to a cluster with no subdomain (e.g., UAE), `get_signup_domain` returns the *nearest* cluster's subdomain (e.g., `b.frappe.cloud` = Bahrain). At that point `cluster` is still "UAE" while `domain` is Bahrain's. `self.cluster = cluster` records "UAE", and `setup_trial_site(cluster="UAE")` will call `get_server_from_cluster("UAE")` — which either raises or places the site on UAE servers behind a Bahrain DNS zone.
The cluster should be re-derived from the resolved domain once it is known, matching the pattern used on the non-apex path:
```python
cluster = product.get_cluster_for_request(self.account_request)
domain = product.get_signup_domain(cluster)
cluster = frappe.db.get_value("Root Domain", domain, "default_cluster") or cluster
```
How can I resolve this? If you propose a fix, please make it concise.| domain_by_cluster = { | ||
| d.cluster: d.name | ||
| for d in frappe.db.get_all( | ||
| "Root Domain", | ||
| {"name": ("like", f"%.{self.domain}"), "enabled": 1}, | ||
| ["name", "default_cluster as cluster"], | ||
| order_by="name", | ||
| ) | ||
| } |
There was a problem hiding this comment.
frappe.db.get_all() is called without a limit in a production code path. Root Domains are a small set in practice, but the Frappe ORM rule requires an explicit limit on every get_all() call.
| domain_by_cluster = { | |
| d.cluster: d.name | |
| for d in frappe.db.get_all( | |
| "Root Domain", | |
| {"name": ("like", f"%.{self.domain}"), "enabled": 1}, | |
| ["name", "default_cluster as cluster"], | |
| order_by="name", | |
| ) | |
| } | |
| domain_by_cluster = { | |
| d.cluster: d.name | |
| for d in frappe.db.get_all( | |
| "Root Domain", | |
| {"name": ("like", f"%.{self.domain}"), "enabled": 1}, | |
| ["name", "default_cluster as cluster"], | |
| order_by="name", | |
| limit=100, | |
| ) | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: press/saas/doctype/product_trial/product_trial.py
Line: 384-392
Comment:
`frappe.db.get_all()` is called without a `limit` in a production code path. Root Domains are a small set in practice, but the Frappe ORM rule requires an explicit limit on every `get_all()` call.
```suggestion
domain_by_cluster = {
d.cluster: d.name
for d in frappe.db.get_all(
"Root Domain",
{"name": ("like", f"%.{self.domain}"), "enabled": 1},
["name", "default_cluster as cluster"],
order_by="name",
limit=100,
)
}
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (63.51%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #6628 +/- ##
===========================================
+ Coverage 50.14% 50.23% +0.08%
===========================================
Files 984 990 +6
Lines 82293 82984 +691
Branches 523 523
===========================================
+ Hits 41266 41684 +418
- Misses 40995 41268 +273
Partials 32 32
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Problem
New signups were landing on the apex
frappe.cloudinstead of a per-cluster subdomain (m.frappe.cloud,v.frappe.cloud, …). Because of AWS Route 53 hosted-zone record limits, the apex is not usable for new sites — they must be spread across cluster subdomains.The signup-domain resolver silently fell back to the apex whenever it couldn't map the resolved cluster to an enabled
*.frappe.cloudRoot Domain:None(geo/IP lookup down or rate-limited, no country on the Account Request), or*.frappe.cloudsubdomain (e.g. UAE).The resolution logic was duplicated in
get_requestandcreate_standby_site(both with the hole), andcreate_siteblindly trusted whatever domain the frontend posted — the apex passes validation since it's an enabled Root Domain.Fix
Centralised resolution on
ProductTrialwith a hard "never apex" invariant:get_signup_domain(cluster)— resolves to the cluster's enabled*.{apex}subdomain, falling back to the nearest enabled subdomain (e.g. UAE → Bahrain), then the apex's default-cluster subdomain. Returns the apex only when no subdomains exist at all.get_cluster_for_request(account_request)— moved here from the api module so bothget_requestandcreate_sitereuse it.create_sitere-resolves server-side: an apex posted by a stale frontend / failed geo lookup can no longer slip through.create_standby_siteuses the same resolver for pooled standby sites.CLUSTER_COORDINATES+_haversine_distanceto module level and addedget_nearest_cluster_among(target, candidates)(used by the nearest-subdomain fallback).Documented in
press/saas/README.md.Tests
New tests in
test_product_trial.pycover: exact-cluster subdomain, nearest-fallback (UAE → Bahrain), and apex-only-when-no-subdomains. All pass;ruffand pre-commit clean.Note
This fixes new signups/standby sites only. Pre-existing standby and signed-up sites may still carry apex
frappe.cloudhost names — a separate cleanup/migration would be needed for those.🤖 Generated with Claude Code