Conversation
…ts DNS, deployment subnets - Reconnection panic guards: bounds checks on controlStreams, streams, connections slices - Stack trace logging on reconnection panic recovery - SNIHostname field on EndpointMapping gossip type - owner_id on RouteDeployment/RouteSandbox/Unprovision/ExistingDeployment protos - /etc/hosts preference over DNS-over-HTTPS (cached with 30s TTL) - Respect FailIfLocal for hosts file lookups - ComputeDeploymentSubnet/ComputeDeploymentVIP for deterministic deployment VIPs - Discovery refresh after repeated endpoint failures
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughRemoved an unused GCE response wrapper, added per-endpoint reconnection failure tracking with discovery-refresh replacement and TLS SNI handling (plus tests), and introduced deterministic IPv6 deployment subnet/VIP computation with unit tests. Changes
🚥 Pre-merge checks | ✅ 1✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@gravity/discovery/gce.go`:
- Around line 116-123: Current loop appends the first matching address
per-interface and breaks, which picks an IPv4 if the first NIC is IPv4-only even
when a later NIC has IPv6; change the logic to first scan inst.NetworkInterfaces
for any iface.IPv6Address and append that (to peers) if found, and only if none
found then scan (or pick) an iface.NetworkIP; modify the code around the
existing loop over inst.NetworkInterfaces (references: inst.NetworkInterfaces,
iface.IPv6Address, iface.NetworkIP, peers) so you don't break out on the first
found address but instead prefer IPv6 across all NICs before falling back to
IPv4.
In `@gravity/grpc_client.go`:
- Around line 3724-3739: preferredTLSServerName currently calls parseGRPCURL
which resolves /etc/hosts and can replace the original hostname with an IP;
change preferredTLSServerName to extract the hostname directly from the original
endpointURL (use url.Parse or equivalent) and derive SNI from that raw host
(strip port with net.SplitHostPort or url.Hostname()), returning
defaultServerName on parse errors or if the host is an IP; keep the existing
TrimSpace check and only fall back to parseGRPCURL or defaultServerName as a
last resort.
In `@network/subnet.go`:
- Around line 87-103: ComputeDeploymentSubnet currently masks region into 5 bits
which silently truncates values >31; add an explicit guard at the top of
ComputeDeploymentSubnet to validate the Region fits in 5 bits (<= 31) and reject
otherwise by returning an error (change signature to
ComputeDeploymentSubnet(region Region, machineID string) (netip.Prefix, error)
and return a non-nil error when region > 31, or alternatively panic if your
codebase prefers); update all callers to handle the new error return and use a
clear constant (e.g. maxRegion := 0x1F or 31) in the check before the existing
byte packing logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: bf73ae46-6cf7-436e-8fb9-3be24d6b19a0
📒 Files selected for processing (5)
gravity/discovery/gce.gogravity/grpc_client.gogravity/reconnect_discovery_refresh_test.gonetwork/subnet.gonetwork/subnet_test.go
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (go)
🔇 Additional comments (4)
network/subnet.go (2)
12-15: Network type addition looks correct.
NetworkDeploymentSubnetat0x06is consistent with the existing packed network-type scheme and avoids overlap with sandbox subnets.
108-122: VIP derivation and /96 contract enforcement are solid.The function enforces prefix size and deterministically derives host bytes within the deployment subnet as expected.
network/subnet_test.go (1)
366-473: Good targeted coverage for the new deployment path.These tests validate determinism, prefix shape, network tag encoding, containment, and panic behavior for invalid prefix sizes, which aligns well with the new APIs.
gravity/reconnect_discovery_refresh_test.go (1)
26-82: Good regression coverage for the refresh threshold path.These tests lock in both key invariants: no refresh before the threshold, and only the failing endpoint being replaced once the threshold is hit.
| for _, iface := range inst.NetworkInterfaces { | ||
| if iface.IPv6Address != "" { | ||
| peers = append(peers, iface.IPv6Address) | ||
| break | ||
| } | ||
| if iface.NetworkIP != "" { | ||
| peers = append(peers, iface.NetworkIP) | ||
| break |
There was a problem hiding this comment.
Scan all NICs before falling back to IPv4.
This still returns IPv4 when the first interface is IPv4-only, even if a later interface has IPv6. On multi-NIC instances that defeats the new IPv6-preferred behavior and can keep routing peers over the wrong family.
Suggested fix
- for _, iface := range inst.NetworkInterfaces {
- if iface.IPv6Address != "" {
- peers = append(peers, iface.IPv6Address)
- break
- }
- if iface.NetworkIP != "" {
- peers = append(peers, iface.NetworkIP)
- break
- }
- }
+ var ipv4Fallback string
+ for _, iface := range inst.NetworkInterfaces {
+ if iface.IPv6Address != "" {
+ peers = append(peers, iface.IPv6Address)
+ ipv4Fallback = ""
+ break
+ }
+ if ipv4Fallback == "" && iface.NetworkIP != "" {
+ ipv4Fallback = iface.NetworkIP
+ }
+ }
+ if ipv4Fallback != "" {
+ peers = append(peers, ipv4Fallback)
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for _, iface := range inst.NetworkInterfaces { | |
| if iface.IPv6Address != "" { | |
| peers = append(peers, iface.IPv6Address) | |
| break | |
| } | |
| if iface.NetworkIP != "" { | |
| peers = append(peers, iface.NetworkIP) | |
| break | |
| var ipv4Fallback string | |
| for _, iface := range inst.NetworkInterfaces { | |
| if iface.IPv6Address != "" { | |
| peers = append(peers, iface.IPv6Address) | |
| ipv4Fallback = "" | |
| break | |
| } | |
| if ipv4Fallback == "" && iface.NetworkIP != "" { | |
| ipv4Fallback = iface.NetworkIP | |
| } | |
| } | |
| if ipv4Fallback != "" { | |
| peers = append(peers, ipv4Fallback) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@gravity/discovery/gce.go` around lines 116 - 123, Current loop appends the
first matching address per-interface and breaks, which picks an IPv4 if the
first NIC is IPv4-only even when a later NIC has IPv6; change the logic to first
scan inst.NetworkInterfaces for any iface.IPv6Address and append that (to peers)
if found, and only if none found then scan (or pick) an iface.NetworkIP; modify
the code around the existing loop over inst.NetworkInterfaces (references:
inst.NetworkInterfaces, iface.IPv6Address, iface.NetworkIP, peers) so you don't
break out on the first found address but instead prefer IPv6 across all NICs
before falling back to IPv4.
| func ComputeDeploymentSubnet(region Region, machineID string) netip.Prefix { | ||
| machineHash := hashTo32Bits(machineID) | ||
|
|
||
| b := make([]byte, 16) | ||
| b[0] = 0xfd | ||
| b[1] = 0x15 | ||
| b[2] = 0xd7 | ||
| b[3] = 0x10 | ||
| b[4] = (byte(region) << 3) | (byte(NetworkDeploymentSubnet) & 0x07) | ||
| b[5] = byte((machineHash >> 16) & 0xff) | ||
| b[6] = byte((machineHash >> 8) & 0xff) | ||
| b[7] = byte(machineHash & 0xff) | ||
| // Bytes 8-15 are zero for the prefix | ||
|
|
||
| addr, _ := netip.AddrFromSlice(b) | ||
| return netip.PrefixFrom(addr, 96) | ||
| } |
There was a problem hiding this comment.
Add a guard for region overflow before byte packing.
Line 95 packs region into 5 bits, but values > 31 will wrap/truncate and can silently collide with other region encodings. Please reject invalid input explicitly.
Suggested fix
func ComputeDeploymentSubnet(region Region, machineID string) netip.Prefix {
+ if byte(region) > 31 {
+ panic(fmt.Sprintf("ComputeDeploymentSubnet requires region <= 31, got %d", region))
+ }
machineHash := hashTo32Bits(machineID)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func ComputeDeploymentSubnet(region Region, machineID string) netip.Prefix { | |
| machineHash := hashTo32Bits(machineID) | |
| b := make([]byte, 16) | |
| b[0] = 0xfd | |
| b[1] = 0x15 | |
| b[2] = 0xd7 | |
| b[3] = 0x10 | |
| b[4] = (byte(region) << 3) | (byte(NetworkDeploymentSubnet) & 0x07) | |
| b[5] = byte((machineHash >> 16) & 0xff) | |
| b[6] = byte((machineHash >> 8) & 0xff) | |
| b[7] = byte(machineHash & 0xff) | |
| // Bytes 8-15 are zero for the prefix | |
| addr, _ := netip.AddrFromSlice(b) | |
| return netip.PrefixFrom(addr, 96) | |
| } | |
| func ComputeDeploymentSubnet(region Region, machineID string) netip.Prefix { | |
| if byte(region) > 31 { | |
| panic(fmt.Sprintf("ComputeDeploymentSubnet requires region <= 31, got %d", region)) | |
| } | |
| machineHash := hashTo32Bits(machineID) | |
| b := make([]byte, 16) | |
| b[0] = 0xfd | |
| b[1] = 0x15 | |
| b[2] = 0xd7 | |
| b[3] = 0x10 | |
| b[4] = (byte(region) << 3) | (byte(NetworkDeploymentSubnet) & 0x07) | |
| b[5] = byte((machineHash >> 16) & 0xff) | |
| b[6] = byte((machineHash >> 8) & 0xff) | |
| b[7] = byte(machineHash & 0xff) | |
| // Bytes 8-15 are zero for the prefix | |
| addr, _ := netip.AddrFromSlice(b) | |
| return netip.PrefixFrom(addr, 96) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@network/subnet.go` around lines 87 - 103, ComputeDeploymentSubnet currently
masks region into 5 bits which silently truncates values >31; add an explicit
guard at the top of ComputeDeploymentSubnet to validate the Region fits in 5
bits (<= 31) and reject otherwise by returning an error (change signature to
ComputeDeploymentSubnet(region Region, machineID string) (netip.Prefix, error)
and return a non-nil error when region > 31, or alternatively panic if your
codebase prefers); update all callers to handle the new error return and use a
clear constant (e.g. maxRegion := 0x1F or 31) in the check before the existing
byte packing logic.
Summary
Comprehensive routing reliability improvements from the US Central debugging session. Fixes reconnection panics, adds unprovision race guards via
owner_id, improves DNS resolution in containerized environments, and adds deterministic deployment VIP computation.Changes
Reconnection Stability
controlStreams,streams,connectionsslices during reconnectionProto Changes (
gravity_session.proto)owner_idadded toRouteDeploymentRequest,RouteSandboxRequest,UnprovisionRequest, andExistingDeployment— prevents stale unprovision from removing newly re-provisioned endpoints (race guard)SNIHostnameadded toEndpointMappinggossip type for deployment d-hash TLS routingDNS Resolution
/etc/hostspreference over DNS-over-HTTPS inLookupMulti— critical for environments where hosts file has correct mapping but external DNS returns different address (Lima VMs, containers with custom host entries)/etc/hostson every lookup)FailIfLocalrespected for hosts file lookups (private/loopback IPs rejected when configured)parseGRPCURLalso resolves via/etc/hostsbefore falling back to DNSDeployment Subnets (
network/subnet.go)ComputeDeploymentSubnet— deterministic /96 IPv6 subnet per machine for deployment VIPsComputeDeploymentVIP— deterministic IPv6 address per deployment within its machine's subnetTesting
go test ./...)Summary by CodeRabbit
Bug Fixes
New Features
Tests