Skip to content

v0.1: fast-path runs end-to-end on configured interfaces (PR #4)#4

Merged
lunarthegrey merged 10 commits intomainfrom
pr4-loader
Apr 20, 2026
Merged

v0.1: fast-path runs end-to-end on configured interfaces (PR #4)#4
lunarthegrey merged 10 commits intomainfrom
pr4-loader

Conversation

@lunarthegrey
Copy link
Copy Markdown
Contributor

@lunarthegrey lunarthegrey commented Apr 20, 2026

Summary

Brings fast-path from verifier-pass (PR #3) to end-to-end deployable in dry-run on any configured interface. packetframe run --config <file> now loads the BPF ELF via aya, populates allow_v4/allow_v6/cfg/redirect_devmap from config, XDP-attaches to every attach <iface> <mode> with the SPEC §2.3 trial-attach fallback, persists the pin registry, and blocks on SIGTERM/SIGINT.

CI proves this works end-to-end on a real interface: the attach integration test creates a veth pair under sudo, loads the compiled BPF ELF through aya, XDP-attaches with DRV_MODE → SKB_MODE fallback, and detaches cleanly. Together with PR #3's verifier-pass test, the load/attach/detach round-trip is covered by CI.

What landed

crates/modules/fast-path/src/linux_impl.rs (new, Linux-only)

Real aya-driven loader: aya::Ebpf::load, populate_cfg (writes the v1 FpCfg layout), populate_allowlists (LPM inserts from allow-prefix{,6}), attach (resolves ifindex via libc::if_nametoindex, XDP-attaches with per-interface trial-attach, populates redirect_devmap before returning any link), detach, plus trial_attach_native for the feasibility probe and snapshot_{links,stats} for status reporting.

crates/modules/fast-path/src/registry.rs (new)

JSON pin registry at <state-dir>/attachments.json, atomic write-then-rename. AttachmentAttachmentRecord conversions so the packetframe-common::module::Attachment type doesn't need Serde.

crates/modules/fast-path/src/lib.rs

FastPathModule now has real trait impls that delegate to linux_impl on Linux, return ModuleError::other(...) on non-Linux. links() / stats() accessors for status.

crates/cli/src/loader.rs (new)

Drives the module lifecycle from packetframe run. Config parse → §2.1 capability probe → §2.3 trial-attach probe for each configured iface → refuse-to-attach-on-required-cap-fail → load/attach → persist registry → block on signal. SIGTERM/SIGINT exit is explicitly no-detach per §8.5 (Ebpf drop implicitly tears down until PR #6 adds pinning).

crates/cli/src/feasibility.rs

Per-interface xdp.attach.<iface> capabilities replace the Deferred placeholder. Tests native first, falls back to generic; reports Pass for generic-only with the native error in detail.

crates/modules/fast-path/tests/attach.rs (new)

veth-backed integration test: creates pf-veth0/1, loads ELF, attaches with native→generic fallback, detaches, cleans up via a Drop guard. Runs under sudo in CI.

CI

  • check job's sudo step runs all ignored fast-path integration tests (--tests -- --ignored), so the verifier + veth tests both run under sudo.
  • signal-hook added as a Linux-only dep on cli for the signal loop.
  • aya promoted from dev-dep to Linux-only regular dep on fast-path.

Spec audit — see comment

Test plan

  • CI green on all five jobs (including sudo verifier + veth attach)
  • cargo test --workspace on macOS still clean (stub BPF path)
  • Post-merge: smoke-run packetframe run --config conf/example.conf on a Linux dev VM with veth interfaces; confirm attach, counters accumulate, SIGINT exits cleanly
  • Post-merge: run on the reference EFG in dry-run mode, verify matched_v4 matches expected traffic volume over 24h (SPEC §9 Phase 2)

🤖 Generated with Claude Code

lunarthegrey and others added 3 commits April 20, 2026 04:55
…tach/status

Brings the fast-path module from "BPF ELF compiles + passes verifier"
(PR #3) to "runs end-to-end in dry-run on configured interfaces".

Core changes:

- `crates/modules/fast-path/src/linux_impl.rs` (new, Linux-only):
  opens aya::Ebpf, populates CFG (dry_run + reserved version byte),
  populates ALLOW_V4/ALLOW_V6 LPM tries from `allow-prefix`
  directives, XDP-attaches to each `attach <iface> <mode>` with
  trial-attach fallback per SPEC §2.3 (Auto → native first, fall back
  to generic; explicit Native/Generic use requested mode, no fallback),
  populates REDIRECT_DEVMAP with every configured ifindex before
  packet flow so the §4.4 defensive devmap pre-check works. Exports
  a `trial_attach_native` probe and `snapshot_stats` /
  `snapshot_links` accessors for the CLI.

- `crates/modules/fast-path/src/registry.rs` (new): JSON pin registry
  persistence at `<state-dir>/attachments.json`, atomic write-then-
  rename. SPEC §8 wants deterministic teardown; actual bpffs pinning
  lands in PR #6 but we persist the shape now so `detach` has
  something authoritative to read.

- `crates/modules/fast-path/src/lib.rs`: `FastPathModule` gains real
  trait implementations on Linux; non-Linux returns
  `ModuleError::other` from load/attach (NotImplemented on detach is
  valid since nothing was loaded). macOS dev loops still compile
  cleanly.

- `crates/cli/src/loader.rs` (new): drives the module lifecycle from
  `packetframe run`. Parses config, runs the SPEC §2.1 capability
  probe, refuses to attach if any required capability fails, then
  loads / attaches each configured module and persists the registry.
  SIGTERM/SIGINT → exit without explicit detach per SPEC §7.3 / §8.5
  (drops Ebpf; until PR #6 adds pinning, this implicitly detaches).
  SIGHUP logs a warning; reconfigure flow is PR #6.

- `crates/cli/src/feasibility.rs`: per-interface XDP trial-attach
  probe (§2.3) graduates from Deferred to real per-iface verdicts
  (`xdp.attach.<iface>` capability entries), populated from the
  config's `attach` directives.

- `packetframe detach` reads the pin registry and reports; in-kernel
  detach without an active loader needs pinning (PR #6). `packetframe
  status` prints the registry contents; live counter readback waits
  on pinning too.

Pins bumped: signal-hook 0.4 added for the signal loop. aya is now a
Linux-only regular dep on fast-path (was dev-only).

Follow-up commits on this branch will add the veth netns integration
test and the packet-level bpf_prog_test_run fixtures deferred from
PR #3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
aya requires the Pod marker on map value types so it can byte-copy the
value into the kernel buffer. FpCfg is repr(C), 8 bytes, no padding,
all primitives — safe to impl unconditionally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Creates a veth pair, loads the fast-path ELF, attaches to one end
with native→generic fallback, detaches, cleans up. Tests the aya
attach/detach round-trip against a real ifindex — which the existing
verifier test can't do since `Xdp::load` only exercises the verifier,
not the attach path. Drop-guard cleanup ensures the veth pair is
removed even if the test panics.

CI's sudo step now runs all ignored fast-path integration tests
(`--tests -- --ignored`), so future cap-requiring tests land without
workflow edits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lunarthegrey lunarthegrey marked this pull request as ready for review April 20, 2026 10:04
@lunarthegrey lunarthegrey changed the title [WIP] v0.1: fast-path runs end-to-end (PR #4) v0.1: fast-path runs end-to-end on configured interfaces (PR #4) Apr 20, 2026
lunarthegrey and others added 6 commits April 20, 2026 08:47
Three intertwined issues led to PR #3's merged verifier test and PR
#4's attach test reporting `ok` while never actually exercising the
BPF ELF:

1. `crates/modules/fast-path/bpf/Cargo.toml` has no `[workspace]`
   table, so when `build.rs` invokes a nested `cargo build` in that
   directory, cargo walks up and finds the root workspace that
   excludes this path — then errors out because the package "believes
   it's in a workspace when it's not" (the user surfaced this running
   on a Linux VM; exit code 101 matches CI's failure mode). The root
   `workspace.exclude` alone is not sufficient when cargo is invoked
   from inside the excluded directory.

2. The old `build.rs` used `Command::status()`, which inherited
   stdout/stderr back to the outer cargo — which captures build-script
   output and only parses `cargo:` directives. Result: any nested
   cargo error was completely swallowed, leaving users staring at an
   opaque "BPF build failed (exit 101); using empty stub ELF" line
   with no way to diagnose.

3. When BPF build fails, `build.rs` writes an empty stub ELF and
   continues. The verifier + attach integration tests check
   `FAST_PATH_BPF_AVAILABLE` and early-return with `eprintln!` +
   `return`, which cargo counts as PASS. CI's sudo step reported 1/1
   passing; nothing ever actually round-tripped through the kernel
   verifier or aya's attach path.

Fixes:

- Add an empty `[workspace]` table to `bpf/Cargo.toml` so the nested
  cargo treats the subcrate as its own workspace root.
- `build.rs` now uses `Command::output()` and re-emits stdout + stderr
  as `cargo:warning` lines so the real error surfaces in the outer
  cargo's warning stream — no more opaque exit-code messages.
- New `PACKETFRAME_BPF_REQUIRED` env var makes `build.rs` panic on
  the stub fallback path instead of silently writing an empty ELF.
  Set in `ci.yml`'s `check` job so CI can never again be green with
  the BPF build broken. Local dev (without the env var) still stubs
  gracefully on macOS laptops.

After these land, CI will either actually exercise the BPF program or
fail loudly. If the verifier rejects something, or aya can't attach,
we find out instead of shipping the bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Real error from the PACKETFRAME_BPF_REQUIRED CI run:
    error[E0463]: can't find crate for 'core'

The nested cargo was inheriting `CARGO`, `RUSTUP_TOOLCHAIN`, and
related env vars from the outer stable build, so rustup never
switched to the nightly pinned in `bpf/rust-toolchain.toml`. Without
nightly, `build-std = ["core"]` in `.cargo/config.toml` is silently
ignored, and cargo tries to link against a precompiled `core` that
doesn't exist for `bpfel-unknown-none`.

Clearing `CARGO`/`RUSTC`/`RUSTUP_TOOLCHAIN`/`CARGO_BUILD_TARGET`/
`CARGO_TARGET_DIR`/`CARGO_MANIFEST_DIR`/`RUSTC_WRAPPER` before
spawning the nested cargo lets rustup fresh-resolve from the
subcrate's toolchain file. PATH still finds the rustup proxy, which
reads the toolchain pin and swaps to nightly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs surfaced by the now-working CI BPF build:

1. `EthHdr.ether_type` is a raw u16, not an `EtherType` enum. Matching
   against `EtherType::Ipv4` fails with E0308. Cast to u16 via
   `EtherType::Ipv4 as u16`; the enum's discriminants are LE-interpreted
   values of network-byte-order packet bytes, so the comparison works
   on any BPF host (all LE).

2. Writing to a `#[repr(C)]` union field is safe in recent Rust — my
   wrapping `unsafe` block for the ipv6_src/ipv6_dst writes triggered
   `unused_unsafe`. Drop it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous `target/bpfel-unknown-none/release/fast-path` path gave
us a file that aya rejected as `Invalid ELF header size or alignment`.
Almost certainly bpf-linker (or cargo with a custom-target-spec) is
producing the artifact at a different filename / extension now.
Parsing cargo's `--message-format=json` output for the
`compiler-artifact` message lets cargo itself tell us where the
produced ELF landed — resilient to toolchain and linker updates.

The `artifact built from` cargo:warning line tells us in CI (and
locally) which file was embedded, so the next failure (if any) is one
`file` command away.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
aya's ELF parser (via the `object` crate) does aligned u32/u64 reads
into the header and rejects 1-byte-aligned `include_bytes!` output
with "Invalid ELF header size or alignment". The System allocator
returns 16-byte-aligned memory on x86_64 Linux, so `.to_vec()` gives
a slice suitable for the parser.

Expose `aligned_bpf_copy()` as a public helper; linux_impl and both
integration tests call it instead of passing `FAST_PATH_BPF` directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lunarthegrey
Copy link
Copy Markdown
Contributor Author

Critical regression audit — PR #3 and PR #4 tests were false-positives

While the user tested PR #4 on a Linux VM, the local build failed to produce a valid BPF ELF. Investigation found a chain of issues that had been silently hiding since PR #3:

  1. bpf/Cargo.toml had no [workspace] table. Nested cargo walked up, found the root workspace that excluded it, errored "current package believes it's in a workspace when it's not". Fix: add empty [workspace] to make the subcrate its own workspace root.
  2. build.rs captured the nested cargo's output and swallowed it. Users saw only the opaque BPF build failed (exit 101) line with no way to diagnose. Fix: re-emit stderr as cargo:warning lines.
  3. Nested cargo inherited CARGO/RUSTUP_TOOLCHAIN from the outer stable build and never switched to the nightly pinned in bpf/rust-toolchain.toml. build-std only activates under nightly, so cargo tried to link a precompiled core that doesn't exist for bpfel-unknown-noneerror[E0463]: can't find crate for 'core'. Fix: env_remove the outer cargo/rustup vars before spawning.
  4. Hardcoded artifact path was wrong. Switched to parsing --message-format=json for the compiler-artifact message — lets cargo tell us where the file landed.
  5. include_bytes! returns a 1-byte-aligned slice; the object crate's ELF parser does aligned u32/u64 reads on the header and rejected it with "Invalid ELF header size or alignment". Fix: aligned_bpf_copy() helper that heap-copies the bytes into a 16-byte-aligned Vec<u8>.
  6. Integration tests silently treated missing BPF as "skip" + report ok. Added PACKETFRAME_BPF_REQUIRED=1 to CI's check job env; build.rs now panics on stub-fallback when that's set, so CI can never again be green with a broken BPF build.

Net effect: the tests merged with PR #3 (verifier pass) and PR #4 (veth attach) now actually exercise the kernel verifier and aya's attach path — they used to stub the ELF and early-return ok with 0.00s execution. With this commit series they take 0.19s and 0.25s respectively, doing real kernel work.

The §4.4 BPF program has genuinely passed the kernel verifier for the first time in this CI run. The sport/dport fix from PR #3's spec audit has been exercised.

User's Linux VM reports EINVAL from our raw bpf_prog_load probe while
aya can successfully load + attach the same program type on the same
kernel. Our hand-rolled minimal-program probe is fragile across kernel
versions — newer kernels reject minimally-populated bpf_attr even when
real program loads succeed.

Demote prog_type.{xdp,sched_cls} and helper.* to required=false. The
authoritative signal is the per-interface XDP trial-attach (graduates
from Deferred to real xdp.attach.<iface> verdicts when \`--config\` is
supplied), which does a real aya-mediated load through the kernel
verifier. If that passes, helpers and prog types are present.

This unblocks \`packetframe run\` on hosts where the raw probe
false-fails. Fixing the raw probe to match kernel bpf_attr layout
across versions is a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lunarthegrey lunarthegrey merged commit 479e642 into main Apr 20, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant