Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 48 additions & 29 deletions docs/architecture/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,11 @@ Hedgehog Open Network Fabric leverages the Kubernetes API to manage its resource

To make network switches Kubernetes-aware, the Fabric employs an **Agent** running on each switch. This agent acts as an interface between the Kubernetes control plane and the switch internal network configuration mechanisms. It continuously syncs desired state from Kubernetes via the Fabric Controller and applies configurations using **gNMI** (gRPC Network Management Interface).

Gateway nodes follow the same Kubernetes-native model. The Fabric Controller manages gateway configuration through a dedicated Kubernetes CRD, which the gateway's Dataplane watches directly, continuously reconciling its running state with the desired configuration and reporting observed status back through the Kubernetes API. This keeps gateway management fully consistent with the rest of the Fabric: operators interact exclusively through Kubernetes resources, and operational state is always visible via standard Kubernetes tooling.

## Components

Hedgehog Fabric consists of several key components, distributed between the Control Node and the Network devices. The following diagram breaks down the components of a [mesh topology](fabric.md#mesh). Hedgehog components have been highlighted in brown color:
Hedgehog Fabric consists of several key components, distributed between the Control Node and the network devices. The following diagram illustrates these components and their relationships. Hedgehog components have been highlighted in brown color:

``` mermaid
graph TD;
Expand All @@ -19,35 +21,38 @@ graph TD;

K -->|Interacts via K8s API| A
L[Fabricator]:::ourComponent -->|Installs & Configures| A
A -->|Kubernetes API| B1
B1 -->|Syncs State| A;
A -->|Kubernetes API| B2
B2 -->|Syncs State| A;

%% Mesh - Two Switches
subgraph SONiC Leaf 2
B1[Fabric Agent]:::ourComponent -->|Scraped by| C1[Alloy]:::thirdParty
C1 -->|Pushes Logs/Metrics| P
D1[gNMI]:::thirdParty
E1[Config DB]:::thirdParty
I1[ASIC]:::thirdParty
A -->|Kubernetes API| SW_AGENT
SW_AGENT -->|Syncs State| A
GWD -->|Syncs State| A

%% Switch
subgraph Switch
SW_AGENT[Fabric Agent]:::ourComponent
SW_ALLOY[Alloy]:::thirdParty
SW_GNMI[gNMI]:::thirdParty
SW_CDB[Config DB]:::thirdParty
SW_ASIC[ASIC]:::thirdParty
SW_ALLOY -->|scrapes| SW_AGENT
SW_ALLOY -->|Pushes Logs/Metrics| P
end

subgraph SONiC Leaf 1
B2[Fabric Agent]:::ourComponent -->|Scraped by| C2[Alloy]:::thirdParty
C2 -->|Pushes Logs/Metrics| P
D2[gNMI]:::thirdParty
E2[Config DB]:::thirdParty
I2[ASIC]:::thirdParty
%% Gateway
subgraph Gateway
GWD[Dataplane]:::ourComponent
GWFA[FRR Agent]:::ourComponent
GWFRR[FRR]:::thirdParty
GWA[Alloy]:::thirdParty
GWD -->|routing config| GWFA
GWFA -->|config reload| GWFRR
GWFRR -->|routes & BGP state| GWD
GWA -->|scrapes /metrics| GWD
GWA -->|Pushes Logs/Metrics| P
end

%% Switch Configuration Flow
B1 -->|Applies Config| D1
B2 -->|Applies Config| D2
D1 -->|Writes/Reads| E1
D2 -->|Writes/Reads| E2
E1 -->|Controls| I1
E2 -->|Controls| I2
SW_AGENT -->|Applies Config| SW_GNMI
SW_GNMI -->|Writes/Reads| SW_CDB
SW_CDB -->|Controls| SW_ASIC

%% Logs and Metrics Flow
P -->|Forwards Logs/Metrics| M
Expand All @@ -70,10 +75,10 @@ The key components essential for understanding the Fabric architecture are:
### Control Node Components
- **Fabric Controller**: The central control plane component that manages Fabric resources and configurations.
- **Fabric CLI (kubectl plugin)**: A `kubectl` plugin that provides an easy way to manage Fabric resources.
- **Fabric Proxy**: A pod responsible for collecting logs and metrics from switches (via Alloy) and forwarding them to an external system.
- **Fabric Proxy**: A pod responsible for collecting logs and metrics from switches and gateways (via Alloy) and forwarding them to an external system.
- **Fabricator**: A tool for installing and configuring Fabric, including virtual lab environments.

### SONiC Switch Components
### Switch Components
- **Fabric Agent**: Runs on each switch and applies configurations received from the control plane.
- **Alloy**: Collects logs and telemetry data from the switch.
- **gNMI Interface**: The main configuration API used by the Fabric Agent to interact with the switch.
Expand All @@ -82,6 +87,14 @@ The key components essential for understanding the Fabric architecture are:

The SONiC architecture presented here is a high-level abstraction, for simplicity.

### Gateway Components
- **Dataplane**: A packet processing pipeline that handles NAT, flow tracking, and VXLAN encapsulation/decapsulation. It reads the desired peering and NAT configuration from Kubernetes and generates FRR configuration delivered to the FRR Agent.
- **FRR Agent**: A Hedgehog-written component that receives FRR configuration from the dataplane and applies it to FRR via dynamic reload.
- **FRR (Free Range Routing)**: A suite of routing daemons that provides BGP peering with the fabric switches. FRR advertises VPC peering routes to attract traffic to the gateway, and pushes routes received from the fabric back into the dataplane's forwarding table via the Control Plane Interface (CPI).
- **Alloy**: Collects logs and metrics from the gateway and forwards them to the Fabric Proxy.

Gateway nodes run Flatcar Linux and join the Kubernetes cluster as worker nodes. The Fabric Controller schedules all gateway components onto gateway nodes and delivers configuration through the `GatewayAgent` Kubernetes CRD. The Dataplane watches this CRD directly, keeping its own state synchronized and reporting back observed status. FRR and the FRR Agent are responsible for all routing interactions with the fabric: FRR advertises and receives routes via BGP, while the FRR Agent keeps FRR's configuration in sync with the Dataplane's desired state.

## Architecture Flow

### 1. **Fabric Installation & Configuration**
Expand All @@ -99,7 +112,13 @@ The SONiC architecture presented here is a high-level abstraction, for simplicit
- The **Fabric Agent** applies configurations using the **gNMI** interface, updating the **Config DB**.
- The **Config DB** ensures that all settings are applied to the **ASIC** for packet forwarding.

### 4. **Telemetry & Monitoring**
- The **Alloy** agent on the switch collects logs and metrics.
### 4. **Gateway Configuration & Management**
- The **Fabric Controller** publishes a `GatewayAgent` CRD containing the desired gateway configuration: BGP settings, VPC peerings, NAT rules, and gateway group membership.
- The **Dataplane** watches the `GatewayAgent` CRD via the Kubernetes API, applies the configuration, and writes its observed state (including FRR applied generation and per-VPC traffic statistics) back to the CRD status.
- The **Dataplane** generates FRR configuration from the desired state and delivers it to the **FRR Agent**, which applies it to FRR via dynamic reload.
- **FRR** establishes BGP sessions with the fabric switches to advertise VPC peering routes. It pushes received routes and BGP state back to the **Dataplane** via the Control Plane Interface (CPI) and BGP Monitoring Protocol (BMP) respectively.

### 5. **Telemetry & Monitoring**
- The **Alloy** agent on switches and gateways collects logs and metrics.
- Logs and metrics are sent to the **Fabric Proxy** running in Kubernetes.
- The **Fabric Proxy** forwards this data to **LGTM**, an external logging and monitoring system.
173 changes: 173 additions & 0 deletions docs/troubleshooting/gateway.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
# Gateway

This page covers diagnosing common issues with the Hedgehog Gateway, including
connectivity problems and NAT issues.

## Health Checks

Start by verifying the gateway has picked up its current configuration:

```console
$ kubectl get gatewayagents
NAME APPLIED APPLIEDG CURRENTG VERSION PROTOCOLIP VTEPIP AGE
gateway-1 10 minutes ago 3 3 v1.2.0 ... ... 2d
```

`AppliedG` should equal `CurrentG`. If they differ, the gateway has not yet
applied the latest configuration — check the dataplane pod logs.

If the gateway is not reporting in at all, check that both pods are running:

```console
$ kubectl get pods -n fab -l app.kubernetes.io/component=gateway
NAME READY STATUS RESTARTS AGE
gw--gateway-1--dataplane-7v9ss 1/1 Running 0 12h
gw--gateway-1--frr-c9kwc 2/2 Running 0 12h
```

If either pod is not `Running`, inspect its logs:

```console
$ kubectl logs -n fab gw--gateway-1--dataplane-7v9ss
$ kubectl logs -n fab gw--gateway-1--frr-c9kwc -c frr
$ kubectl logs -n fab gw--gateway-1--frr-c9kwc -c frr-agent
```

## Common Issues

### Traffic not flowing through gateway

1. **Check peering is configured**: Verify the GatewayPeering object exists
and is not rejected:
```console
$ kubectl get gatewaypeerings
```

2. **Check routes on the leaf**: Verify gateway routes are installed on the
leaf switches:
```console
$ kubectl fabric inspect vpc <vpc-name>
```
Look for routes pointing to the gateway's VTEP IP.

3. **Check FRR is advertising routes**: Use the FRR pod to verify BGP
is advertising the peering prefixes (see [FRR and BGP State](#frr-and-bgp-state)).

4. **Check flow filter**: Use the dataplane CLI `show flow-filter table` to verify
the peering policy is loaded. If the flow filter is empty, the dataplane
configuration may not have been applied yet; check the FRR agent logs.

### NAT not working as expected

1. **Check flow table**: Use `show flow-table entries` in the dataplane CLI to see
if flows are being created. If the flow table is empty while traffic
is flowing, the packets may be dropped by the flow filter before
reaching the NAT stage.

2. **Check NAT state**: Use `show masquerading state`, `show static-nat rules`, or
`show port-forwarding rules` to verify the NAT configuration is loaded.

3. **Idle timeout**: If connections work briefly then stop, the flow may be
expiring. Check the `idleTimeout` setting in the GatewayPeering spec.
Use TCP or application-layer keepalives for long-lived connections.

### Gateway failover

1. **Check both gateways are running**: Verify both gateway pods are healthy.

2. **Check gateway group membership**:
```console
$ kubectl get gateways -o yaml
```
Verify both gateways are members of the expected group with correct
priorities.

3. **Check BGP on leaves**: After a failover, the leaf switches should
withdraw routes from the failed gateway and install routes from the
backup. Use `kubectl fabric inspect bgp` to check.

## Diagnostics

### Dataplane CLI

The dataplane includes an interactive CLI for inspecting internal state.
Access it by exec'ing into the dataplane pod:

```console
$ kubectl exec -n fab -it gw--gateway-1--dataplane-7v9ss -- ./dataplane-cli
```

Key commands:

| Command | Description |
|---------|-------------|
| `show flow-filter table` | Peering policy loaded on the dataplane |
| `show flow-table entries` | Active stateful NAT sessions |
| `show masquerading state` | Masquerade NAT configuration and pool state |
| `show static-nat rules` | Static NAT mappings |
| `show port-forwarding rules` | Port-forwarding rules |
| `show ip fib` | IPv4 forwarding table |
| `show config summary` | Configuration generation and apply status |
| `show tech` | Full diagnostic dump (for support) |

Use `help` in the CLI to see all available commands.

### FRR and BGP State

FRR runs in a separate pod. Use `vtysh` to inspect BGP state:

```console
$ kubectl exec -n fab -it gw--gateway-1--frr-c9kwc -c frr -- vtysh
```

**Check BGP neighbors:**

```
gateway-1# show bgp summary
```

All neighbors should be in `Established` state. If a neighbor is in `Active`
or `Idle`, the BGP session is not established; check physical connectivity
and IP configuration.

**Check routes advertised by the gateway:**

```
gateway-1# show ip route
```

VPC peering prefixes should appear as BGP routes pointing to the gateway's
VTEP IP.

**Check VRF routing tables:**

```
gateway-1# show ip route vrf all
```

## Metrics

The dataplane exposes Prometheus metrics scraped by the Alloy agent on the
gateway node and forwarded to the Fabric Proxy.

Each metric is emitted with three label variants:

- `{total="<vpc>"}`: all traffic in or out of the VPC
- `{drops="<vpc>"}`: traffic dropped for the VPC
- `{from="<src>",to="<dst>"}`: directional traffic between two VPCs

Available metrics:

| Metric | Type | Description |
|--------|------|-------------|
| `vpc_packet_count` | Gauge | Packet count |
| `vpc_packet_rate` | Gauge | Packet rate |
| `vpc_byte_count` | Gauge | Byte count |
| `vpc_byte_rate` | Gauge | Byte rate |

To inspect metrics directly, run on the gateway node itself (the dataplane uses
host networking, so the endpoint is accessible on the node at port 9442):

```console
$ curl -s http://localhost:9442/metrics
```
27 changes: 27 additions & 0 deletions docs/user-guide/gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,33 @@ style Leaves fill:none,stroke:none
style Servers fill:none,stroke:none
```

## Flow Table and Stateful Processing

When stateful NAT (masquerade or port-forwarding) is configured on a gateway peering,
the gateway maintains a **flow table** to track active connections. Each unique connection
(identified by its source/destination IPs, ports, and protocol) creates an entry in the
flow table. This entry records the NAT translation applied and the connection's idle timer.

Key characteristics of the flow table:

- **Timeout-based eviction**: Flow entries expire after a configurable period of inactivity.
The idle timeout is set per peering via the `idleTimeout` field in the NAT configuration
(default: 2 minutes for masquerade; see [Masquerade](#masquerade-stateful-source-nat) and
[Port-Forwarding](#port-forwarding-stateful-destination-nat) for details). When a flow expires,
its entry is removed and subsequent packets for that connection are treated as a new flow.
- **Capacity**: The flow table can handle millions of concurrent entries depending on the gateway
node's available memory. The maximum number of flow entries can be configured via the
`flowTableCapacity` field in the Gateway spec. In most deployments, the default is sufficient.
- **Per-gateway state**: Each gateway maintains its own flow table independently. Flow state
is not shared between gateways. If a gateway fails and traffic is redirected to a backup
gateway (see [Gateway fail-over](gateway-failover.md)), existing stateful connections must
be re-established, as the backup gateway has no knowledge of the failed gateway's flow table.

!!! tip
Use TCP keepalives or application-layer keepalives for long-lived connections through
stateful NAT. This prevents the flow entry from expiring due to inactivity during
idle periods.

## Gateway Peering

Just as [VPC Peerings](vpcs.md#vpcpeering) provide VPC-to-VPC connectivity by way of the switches in the fabric, gateway peerings provide connectivity via the gateway nodes.
Expand Down
Loading