Skip to content

Pau/gateway doc#294

Draft
pau-hedgehog wants to merge 3 commits intomasterfrom
pau/gateway-doc
Draft

Pau/gateway doc#294
pau-hedgehog wants to merge 3 commits intomasterfrom
pau/gateway-doc

Conversation

@pau-hedgehog
Copy link
Copy Markdown
Contributor

No description provided.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 15, 2026

🚀 Deployed on https://preview-294--hedgehog-docs.netlify.app

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates Gateway documentation to better explain stateful NAT behavior and to add operational guidance for diagnosing Gateway issues.

Changes:

  • Add a new “Flow Table and Stateful Processing” section to the Gateway user guide.
  • Add a new Gateway troubleshooting page with CLI, BGP, NAT, and metrics checks.
  • Extend the architecture overview with a summary of Gateway node components (dataplane/FRR/Alloy).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
docs/user-guide/gateway.md Documents gateway flow table behavior for stateful NAT and failover implications.
docs/troubleshooting/gateway.md New runbook-style troubleshooting guide for Gateway pods, CLI, FRR/BGP, NAT, and metrics.
docs/architecture/overview.md Adds a high-level description of Gateway node components and responsibilities.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/user-guide/gateway.md Outdated
Comment on lines +119 to +121
The default idle timeout depends on the NAT mode (see [Masquerade](#masquerade-stateful-source-nat) and
[Port-Forwarding](#port-forwarding-stateful-destination-nat) sections for defaults). When a flow expires,
its entry is removed and subsequent packets for that connection are treated as a new flow.
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text says the default idle timeout depends on NAT mode and points readers to the Port-Forwarding section for defaults, but the Port-Forwarding section doesn’t currently document a default for the idleTimeout field (it only discusses protocol timeouts). Consider either documenting the Port-Forwarding idleTimeout default explicitly or rewording this sentence to avoid implying that it’s covered there.

Suggested change
The default idle timeout depends on the NAT mode (see [Masquerade](#masquerade-stateful-source-nat) and
[Port-Forwarding](#port-forwarding-stateful-destination-nat) sections for defaults). When a flow expires,
its entry is removed and subsequent packets for that connection are treated as a new flow.
The default idle timeout depends on the NAT mode. See [Masquerade](#masquerade-stateful-source-nat)
for the default masquerade timeout, and [Port-Forwarding](#port-forwarding-stateful-destination-nat)
for port-forwarding timeout configuration details. When a flow expires, its entry is removed
and subsequent packets for that connection are treated as a new flow.

Copilot uses AI. Check for mistakes.
Comment thread docs/user-guide/gateway.md Outdated
Comment on lines +123 to +125
node's available memory. The number of shards used internally by the table is configurable
via the `flowTableCapacity` field in the Gateway spec (default: 1024 shards). In most
deployments, the default is sufficient.
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flowTableCapacity is described here as a shard count with a “default: 1024 shards”, but the Fabric API reference defines flowTableCapacity as the maximum number of flow entries (and does not list a default). This risks misleading users configuring the Gateway spec. Please align the wording with the API (capacity/entries vs shards) and remove or clearly source the default value if it’s not part of the public API contract.

Suggested change
node's available memory. The number of shards used internally by the table is configurable
via the `flowTableCapacity` field in the Gateway spec (default: 1024 shards). In most
deployments, the default is sufficient.
node's available memory. The maximum number of flow entries can be configured via the
`flowTableCapacity` field in the Gateway spec. In most deployments, the configured capacity
should be chosen based on expected connection scale and available memory.

Copilot uses AI. Check for mistakes.
@pau-hedgehog pau-hedgehog force-pushed the pau/gateway-doc branch 9 times, most recently from 11f8a7c to a7e9a5b Compare April 15, 2026 22:56
pau-hedgehog and others added 2 commits April 16, 2026 01:00
The architecture page described Control Node and SONiC Switch components
but did not mention the gateway. Add a Gateway Node Components section
covering the dataplane, FRR, and Alloy pods.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Pau Capdevila <pau@githedgehog.com>
Document the gateway flow table behavior: timeout-based eviction,
capacity configuration, and per-gateway state scope. This helps users
understand sizing, failover implications, and the importance of
keepalives for long-lived connections through stateful NAT.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Pau Capdevila <pau@githedgehog.com>
@pau-hedgehog pau-hedgehog force-pushed the pau/gateway-doc branch 2 times, most recently from 920d8f5 to b571473 Compare April 15, 2026 23:04
Add troubleshooting guide covering: pod health checks, dataplane CLI
usage (flow table, NAT state, flow filter, routing), FRR/BGP inspection
via vtysh, common issues (traffic not flowing, NAT problems, failover),
and Prometheus metrics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Pau Capdevila <pau@githedgehog.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants