Conversation
|
🚀 Deployed on https://preview-294--hedgehog-docs.netlify.app |
There was a problem hiding this comment.
Pull request overview
Updates Gateway documentation to better explain stateful NAT behavior and to add operational guidance for diagnosing Gateway issues.
Changes:
- Add a new “Flow Table and Stateful Processing” section to the Gateway user guide.
- Add a new Gateway troubleshooting page with CLI, BGP, NAT, and metrics checks.
- Extend the architecture overview with a summary of Gateway node components (dataplane/FRR/Alloy).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| docs/user-guide/gateway.md | Documents gateway flow table behavior for stateful NAT and failover implications. |
| docs/troubleshooting/gateway.md | New runbook-style troubleshooting guide for Gateway pods, CLI, FRR/BGP, NAT, and metrics. |
| docs/architecture/overview.md | Adds a high-level description of Gateway node components and responsibilities. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| The default idle timeout depends on the NAT mode (see [Masquerade](#masquerade-stateful-source-nat) and | ||
| [Port-Forwarding](#port-forwarding-stateful-destination-nat) sections for defaults). When a flow expires, | ||
| its entry is removed and subsequent packets for that connection are treated as a new flow. |
There was a problem hiding this comment.
The text says the default idle timeout depends on NAT mode and points readers to the Port-Forwarding section for defaults, but the Port-Forwarding section doesn’t currently document a default for the idleTimeout field (it only discusses protocol timeouts). Consider either documenting the Port-Forwarding idleTimeout default explicitly or rewording this sentence to avoid implying that it’s covered there.
| The default idle timeout depends on the NAT mode (see [Masquerade](#masquerade-stateful-source-nat) and | |
| [Port-Forwarding](#port-forwarding-stateful-destination-nat) sections for defaults). When a flow expires, | |
| its entry is removed and subsequent packets for that connection are treated as a new flow. | |
| The default idle timeout depends on the NAT mode. See [Masquerade](#masquerade-stateful-source-nat) | |
| for the default masquerade timeout, and [Port-Forwarding](#port-forwarding-stateful-destination-nat) | |
| for port-forwarding timeout configuration details. When a flow expires, its entry is removed | |
| and subsequent packets for that connection are treated as a new flow. |
| node's available memory. The number of shards used internally by the table is configurable | ||
| via the `flowTableCapacity` field in the Gateway spec (default: 1024 shards). In most | ||
| deployments, the default is sufficient. |
There was a problem hiding this comment.
flowTableCapacity is described here as a shard count with a “default: 1024 shards”, but the Fabric API reference defines flowTableCapacity as the maximum number of flow entries (and does not list a default). This risks misleading users configuring the Gateway spec. Please align the wording with the API (capacity/entries vs shards) and remove or clearly source the default value if it’s not part of the public API contract.
| node's available memory. The number of shards used internally by the table is configurable | |
| via the `flowTableCapacity` field in the Gateway spec (default: 1024 shards). In most | |
| deployments, the default is sufficient. | |
| node's available memory. The maximum number of flow entries can be configured via the | |
| `flowTableCapacity` field in the Gateway spec. In most deployments, the configured capacity | |
| should be chosen based on expected connection scale and available memory. |
11f8a7c to
a7e9a5b
Compare
The architecture page described Control Node and SONiC Switch components but did not mention the gateway. Add a Gateway Node Components section covering the dataplane, FRR, and Alloy pods. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Pau Capdevila <pau@githedgehog.com>
Document the gateway flow table behavior: timeout-based eviction, capacity configuration, and per-gateway state scope. This helps users understand sizing, failover implications, and the importance of keepalives for long-lived connections through stateful NAT. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Pau Capdevila <pau@githedgehog.com>
920d8f5 to
b571473
Compare
Add troubleshooting guide covering: pod health checks, dataplane CLI usage (flow table, NAT state, flow filter, routing), FRR/BGP inspection via vtysh, common issues (traffic not flowing, NAT problems, failover), and Prometheus metrics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Pau Capdevila <pau@githedgehog.com>
b571473 to
51120f9
Compare
No description provided.