diff --git a/docs/architecture/overview.md b/docs/architecture/overview.md index 4637ce62..1e7066dc 100644 --- a/docs/architecture/overview.md +++ b/docs/architecture/overview.md @@ -4,9 +4,11 @@ Hedgehog Open Network Fabric leverages the Kubernetes API to manage its resource To make network switches Kubernetes-aware, the Fabric employs an **Agent** running on each switch. This agent acts as an interface between the Kubernetes control plane and the switch internal network configuration mechanisms. It continuously syncs desired state from Kubernetes via the Fabric Controller and applies configurations using **gNMI** (gRPC Network Management Interface). +Gateway nodes follow the same Kubernetes-native model. The Fabric Controller manages gateway configuration through a dedicated Kubernetes CRD, which the gateway's Dataplane watches directly, continuously reconciling its running state with the desired configuration and reporting observed status back through the Kubernetes API. This keeps gateway management fully consistent with the rest of the Fabric: operators interact exclusively through Kubernetes resources, and operational state is always visible via standard Kubernetes tooling. + ## Components -Hedgehog Fabric consists of several key components, distributed between the Control Node and the Network devices. The following diagram breaks down the components of a [mesh topology](fabric.md#mesh). Hedgehog components have been highlighted in brown color: +Hedgehog Fabric consists of several key components, distributed between the Control Node and the network devices. The following diagram illustrates these components and their relationships. Hedgehog components have been highlighted in brown color: ``` mermaid graph TD; @@ -19,35 +21,38 @@ graph TD; K -->|Interacts via K8s API| A L[Fabricator]:::ourComponent -->|Installs & Configures| A - A -->|Kubernetes API| B1 - B1 -->|Syncs State| A; - A -->|Kubernetes API| B2 - B2 -->|Syncs State| A; - - %% Mesh - Two Switches - subgraph SONiC Leaf 2 - B1[Fabric Agent]:::ourComponent -->|Scraped by| C1[Alloy]:::thirdParty - C1 -->|Pushes Logs/Metrics| P - D1[gNMI]:::thirdParty - E1[Config DB]:::thirdParty - I1[ASIC]:::thirdParty + A -->|Kubernetes API| SW_AGENT + SW_AGENT -->|Syncs State| A + GWD -->|Syncs State| A + + %% Switch + subgraph Switch + SW_AGENT[Fabric Agent]:::ourComponent + SW_ALLOY[Alloy]:::thirdParty + SW_GNMI[gNMI]:::thirdParty + SW_CDB[Config DB]:::thirdParty + SW_ASIC[ASIC]:::thirdParty + SW_ALLOY -->|scrapes| SW_AGENT + SW_ALLOY -->|Pushes Logs/Metrics| P end - subgraph SONiC Leaf 1 - B2[Fabric Agent]:::ourComponent -->|Scraped by| C2[Alloy]:::thirdParty - C2 -->|Pushes Logs/Metrics| P - D2[gNMI]:::thirdParty - E2[Config DB]:::thirdParty - I2[ASIC]:::thirdParty + %% Gateway + subgraph Gateway + GWD[Dataplane]:::ourComponent + GWFA[FRR Agent]:::ourComponent + GWFRR[FRR]:::thirdParty + GWA[Alloy]:::thirdParty + GWD -->|routing config| GWFA + GWFA -->|config reload| GWFRR + GWFRR -->|routes & BGP state| GWD + GWA -->|scrapes /metrics| GWD + GWA -->|Pushes Logs/Metrics| P end %% Switch Configuration Flow - B1 -->|Applies Config| D1 - B2 -->|Applies Config| D2 - D1 -->|Writes/Reads| E1 - D2 -->|Writes/Reads| E2 - E1 -->|Controls| I1 - E2 -->|Controls| I2 + SW_AGENT -->|Applies Config| SW_GNMI + SW_GNMI -->|Writes/Reads| SW_CDB + SW_CDB -->|Controls| SW_ASIC %% Logs and Metrics Flow P -->|Forwards Logs/Metrics| M @@ -70,10 +75,10 @@ The key components essential for understanding the Fabric architecture are: ### Control Node Components - **Fabric Controller**: The central control plane component that manages Fabric resources and configurations. - **Fabric CLI (kubectl plugin)**: A `kubectl` plugin that provides an easy way to manage Fabric resources. -- **Fabric Proxy**: A pod responsible for collecting logs and metrics from switches (via Alloy) and forwarding them to an external system. +- **Fabric Proxy**: A pod responsible for collecting logs and metrics from switches and gateways (via Alloy) and forwarding them to an external system. - **Fabricator**: A tool for installing and configuring Fabric, including virtual lab environments. -### SONiC Switch Components +### Switch Components - **Fabric Agent**: Runs on each switch and applies configurations received from the control plane. - **Alloy**: Collects logs and telemetry data from the switch. - **gNMI Interface**: The main configuration API used by the Fabric Agent to interact with the switch. @@ -82,6 +87,14 @@ The key components essential for understanding the Fabric architecture are: The SONiC architecture presented here is a high-level abstraction, for simplicity. +### Gateway Components +- **Dataplane**: A packet processing pipeline that handles NAT, flow tracking, and VXLAN encapsulation/decapsulation. It reads the desired peering and NAT configuration from Kubernetes and generates FRR configuration delivered to the FRR Agent. +- **FRR Agent**: A Hedgehog-written component that receives FRR configuration from the dataplane and applies it to FRR via dynamic reload. +- **FRR (Free Range Routing)**: A suite of routing daemons that provides BGP peering with the fabric switches. FRR advertises VPC peering routes to attract traffic to the gateway, and pushes routes received from the fabric back into the dataplane's forwarding table via the Control Plane Interface (CPI). +- **Alloy**: Collects logs and metrics from the gateway and forwards them to the Fabric Proxy. + +Gateway nodes run Flatcar Linux and join the Kubernetes cluster as worker nodes. The Fabric Controller schedules all gateway components onto gateway nodes and delivers configuration through the `GatewayAgent` Kubernetes CRD. The Dataplane watches this CRD directly, keeping its own state synchronized and reporting back observed status. FRR and the FRR Agent are responsible for all routing interactions with the fabric: FRR advertises and receives routes via BGP, while the FRR Agent keeps FRR's configuration in sync with the Dataplane's desired state. + ## Architecture Flow ### 1. **Fabric Installation & Configuration** @@ -99,7 +112,13 @@ The SONiC architecture presented here is a high-level abstraction, for simplicit - The **Fabric Agent** applies configurations using the **gNMI** interface, updating the **Config DB**. - The **Config DB** ensures that all settings are applied to the **ASIC** for packet forwarding. -### 4. **Telemetry & Monitoring** -- The **Alloy** agent on the switch collects logs and metrics. +### 4. **Gateway Configuration & Management** +- The **Fabric Controller** publishes a `GatewayAgent` CRD containing the desired gateway configuration: BGP settings, VPC peerings, NAT rules, and gateway group membership. +- The **Dataplane** watches the `GatewayAgent` CRD via the Kubernetes API, applies the configuration, and writes its observed state (including FRR applied generation and per-VPC traffic statistics) back to the CRD status. +- The **Dataplane** generates FRR configuration from the desired state and delivers it to the **FRR Agent**, which applies it to FRR via dynamic reload. +- **FRR** establishes BGP sessions with the fabric switches to advertise VPC peering routes. It pushes received routes and BGP state back to the **Dataplane** via the Control Plane Interface (CPI) and BGP Monitoring Protocol (BMP) respectively. + +### 5. **Telemetry & Monitoring** +- The **Alloy** agent on switches and gateways collects logs and metrics. - Logs and metrics are sent to the **Fabric Proxy** running in Kubernetes. - The **Fabric Proxy** forwards this data to **LGTM**, an external logging and monitoring system. diff --git a/docs/troubleshooting/gateway.md b/docs/troubleshooting/gateway.md new file mode 100644 index 00000000..b695995b --- /dev/null +++ b/docs/troubleshooting/gateway.md @@ -0,0 +1,173 @@ +# Gateway + +This page covers diagnosing common issues with the Hedgehog Gateway, including +connectivity problems and NAT issues. + +## Health Checks + +Start by verifying the gateway has picked up its current configuration: + +```console +$ kubectl get gatewayagents +NAME APPLIED APPLIEDG CURRENTG VERSION PROTOCOLIP VTEPIP AGE +gateway-1 10 minutes ago 3 3 v1.2.0 ... ... 2d +``` + +`AppliedG` should equal `CurrentG`. If they differ, the gateway has not yet +applied the latest configuration — check the dataplane pod logs. + +If the gateway is not reporting in at all, check that both pods are running: + +```console +$ kubectl get pods -n fab -l app.kubernetes.io/component=gateway +NAME READY STATUS RESTARTS AGE +gw--gateway-1--dataplane-7v9ss 1/1 Running 0 12h +gw--gateway-1--frr-c9kwc 2/2 Running 0 12h +``` + +If either pod is not `Running`, inspect its logs: + +```console +$ kubectl logs -n fab gw--gateway-1--dataplane-7v9ss +$ kubectl logs -n fab gw--gateway-1--frr-c9kwc -c frr +$ kubectl logs -n fab gw--gateway-1--frr-c9kwc -c frr-agent +``` + +## Common Issues + +### Traffic not flowing through gateway + +1. **Check peering is configured**: Verify the GatewayPeering object exists + and is not rejected: + ```console + $ kubectl get gatewaypeerings + ``` + +2. **Check routes on the leaf**: Verify gateway routes are installed on the + leaf switches: + ```console + $ kubectl fabric inspect vpc + ``` + Look for routes pointing to the gateway's VTEP IP. + +3. **Check FRR is advertising routes**: Use the FRR pod to verify BGP + is advertising the peering prefixes (see [FRR and BGP State](#frr-and-bgp-state)). + +4. **Check flow filter**: Use the dataplane CLI `show flow-filter table` to verify + the peering policy is loaded. If the flow filter is empty, the dataplane + configuration may not have been applied yet; check the FRR agent logs. + +### NAT not working as expected + +1. **Check flow table**: Use `show flow-table entries` in the dataplane CLI to see + if flows are being created. If the flow table is empty while traffic + is flowing, the packets may be dropped by the flow filter before + reaching the NAT stage. + +2. **Check NAT state**: Use `show masquerading state`, `show static-nat rules`, or + `show port-forwarding rules` to verify the NAT configuration is loaded. + +3. **Idle timeout**: If connections work briefly then stop, the flow may be + expiring. Check the `idleTimeout` setting in the GatewayPeering spec. + Use TCP or application-layer keepalives for long-lived connections. + +### Gateway failover + +1. **Check both gateways are running**: Verify both gateway pods are healthy. + +2. **Check gateway group membership**: + ```console + $ kubectl get gateways -o yaml + ``` + Verify both gateways are members of the expected group with correct + priorities. + +3. **Check BGP on leaves**: After a failover, the leaf switches should + withdraw routes from the failed gateway and install routes from the + backup. Use `kubectl fabric inspect bgp` to check. + +## Diagnostics + +### Dataplane CLI + +The dataplane includes an interactive CLI for inspecting internal state. +Access it by exec'ing into the dataplane pod: + +```console +$ kubectl exec -n fab -it gw--gateway-1--dataplane-7v9ss -- ./dataplane-cli +``` + +Key commands: + +| Command | Description | +|---------|-------------| +| `show flow-filter table` | Peering policy loaded on the dataplane | +| `show flow-table entries` | Active stateful NAT sessions | +| `show masquerading state` | Masquerade NAT configuration and pool state | +| `show static-nat rules` | Static NAT mappings | +| `show port-forwarding rules` | Port-forwarding rules | +| `show ip fib` | IPv4 forwarding table | +| `show config summary` | Configuration generation and apply status | +| `show tech` | Full diagnostic dump (for support) | + +Use `help` in the CLI to see all available commands. + +### FRR and BGP State + +FRR runs in a separate pod. Use `vtysh` to inspect BGP state: + +```console +$ kubectl exec -n fab -it gw--gateway-1--frr-c9kwc -c frr -- vtysh +``` + +**Check BGP neighbors:** + +``` +gateway-1# show bgp summary +``` + +All neighbors should be in `Established` state. If a neighbor is in `Active` +or `Idle`, the BGP session is not established; check physical connectivity +and IP configuration. + +**Check routes advertised by the gateway:** + +``` +gateway-1# show ip route +``` + +VPC peering prefixes should appear as BGP routes pointing to the gateway's +VTEP IP. + +**Check VRF routing tables:** + +``` +gateway-1# show ip route vrf all +``` + +## Metrics + +The dataplane exposes Prometheus metrics scraped by the Alloy agent on the +gateway node and forwarded to the Fabric Proxy. + +Each metric is emitted with three label variants: + +- `{total=""}`: all traffic in or out of the VPC +- `{drops=""}`: traffic dropped for the VPC +- `{from="",to=""}`: directional traffic between two VPCs + +Available metrics: + +| Metric | Type | Description | +|--------|------|-------------| +| `vpc_packet_count` | Gauge | Packet count | +| `vpc_packet_rate` | Gauge | Packet rate | +| `vpc_byte_count` | Gauge | Byte count | +| `vpc_byte_rate` | Gauge | Byte rate | + +To inspect metrics directly, run on the gateway node itself (the dataplane uses +host networking, so the endpoint is accessible on the node at port 9442): + +```console +$ curl -s http://localhost:9442/metrics +``` diff --git a/docs/user-guide/gateway.md b/docs/user-guide/gateway.md index 523e7aed..6e0bc8f1 100644 --- a/docs/user-guide/gateway.md +++ b/docs/user-guide/gateway.md @@ -106,6 +106,33 @@ style Leaves fill:none,stroke:none style Servers fill:none,stroke:none ``` +## Flow Table and Stateful Processing + +When stateful NAT (masquerade or port-forwarding) is configured on a gateway peering, +the gateway maintains a **flow table** to track active connections. Each unique connection +(identified by its source/destination IPs, ports, and protocol) creates an entry in the +flow table. This entry records the NAT translation applied and the connection's idle timer. + +Key characteristics of the flow table: + +- **Timeout-based eviction**: Flow entries expire after a configurable period of inactivity. + The idle timeout is set per peering via the `idleTimeout` field in the NAT configuration + (default: 2 minutes for masquerade; see [Masquerade](#masquerade-stateful-source-nat) and + [Port-Forwarding](#port-forwarding-stateful-destination-nat) for details). When a flow expires, + its entry is removed and subsequent packets for that connection are treated as a new flow. +- **Capacity**: The flow table can handle millions of concurrent entries depending on the gateway + node's available memory. The maximum number of flow entries can be configured via the + `flowTableCapacity` field in the Gateway spec. In most deployments, the default is sufficient. +- **Per-gateway state**: Each gateway maintains its own flow table independently. Flow state + is not shared between gateways. If a gateway fails and traffic is redirected to a backup + gateway (see [Gateway fail-over](gateway-failover.md)), existing stateful connections must + be re-established, as the backup gateway has no knowledge of the failed gateway's flow table. + +!!! tip + Use TCP keepalives or application-layer keepalives for long-lived connections through + stateful NAT. This prevents the flow entry from expiring due to inactivity during + idle periods. + ## Gateway Peering Just as [VPC Peerings](vpcs.md#vpcpeering) provide VPC-to-VPC connectivity by way of the switches in the fabric, gateway peerings provide connectivity via the gateway nodes.