diff --git a/docs/planning/ARM64_CPU0_SMP_INVESTIGATION.md b/docs/planning/ARM64_CPU0_SMP_INVESTIGATION.md index b2ba921d..838c911b 100644 --- a/docs/planning/ARM64_CPU0_SMP_INVESTIGATION.md +++ b/docs/planning/ARM64_CPU0_SMP_INVESTIGATION.md @@ -3239,3 +3239,85 @@ AHCI Port1 `IS=0x1` and no completion for a later command after the local wake path has already switched successfully. The next audit should focus on AHCI interrupt completion / EOI-deactivation ordering or command publication, not on the local IRQ-return wake-buffer drain. + +## 2026-04-16 - F18 AHCI CI-level completion loop + +F18 targeted the AHCI completion race isolated by F17: timeout dumps showed +`CI=0x0` and `AHCI_PORT1_IS=0x1` while the waiter was blocked on a command +newer than the last command completed by the ISR. That means hardware had +completed the command, but the software completion path missed the waiter wake. + +### Audit summary + +The F17 Breenix AHCI handler was edge-sensitive. It read `PORT_IS`, cleared +`PORT_IS`/`HBA_IS`, read `PORT_CI` once, and completed at most one slot from +that single sampled interrupt state. That can miss a completion once the +hardware `PORT_CI` bit has already dropped but no new software completion pass +runs for the active slot. + +Linux v6.8 is level-sensitive. In `/tmp/linux-v6.8/drivers/ata/libahci.c`, +`ahci_port_intr()` reads and acknowledges `PORT_IRQ_STAT` at lines 1963-1964, +then calls `ahci_handle_port_interrupt()` at line 1966. The command completion +path runs through `ahci_qc_complete()`; it reads `PORT_SCR_ACT` and +`PORT_CMD_ISSUE` into `qc_active` at lines 1875-1885, then calls +`ata_qc_complete_multiple(ap, qc_active)` at line 1888. The important parity +point is deriving completions from currently active hardware command bits, not +from a single interrupt edge. + +### Fix description + +`kernel/src/drivers/ahci/mod.rs` now bounds the AHCI completion drain loop at +eight iterations. For each active port, it: + +- reads the per-port active software mask and `PORT_CI`; +- computes completed slots as `PORT_ACTIVE_MASK & !PORT_CI`; +- clears completed active bits atomically; +- acknowledges sampled `PORT_IS`; +- re-reads `PORT_IS` and `PORT_CI`, looping if the port reasserted or another + active slot is now clear. + +Because the current driver issues only slot 0, the slot-0 completion token is +recorded during the CI loop and the actual wake is published only after the +port interrupt is stable. This prevents the woken waiter from issuing the next +slot-0 command while the prior AHCI interrupt line remains asserted. The prior +single-active-slot fallback for controllers that signal completion before +`PORT_CI` is observed clear was preserved. + +F18 also added AHCI ring site `CI_LOOP`; the loop iteration count is emitted +in the event `token` field. Passing validation runs did not trigger +timeout-time AHCI ring dumps, so visible `ahci_ci_loop_iterations` is `0` in +the final serial logs. + +### Validation sweep + +Artifacts: + +```text +logs/breenix-parallels-cpu0/f18-ahci-ci-loop/run{1..5}/ +logs/breenix-parallels-cpu0/f18-ahci-ci-loop/summary.txt +logs/breenix-parallels-cpu0/f18-ahci-ci-loop/exit.md +``` + +Each `./run.sh --parallels --test 60` invocation exited 1 because the +Parallels screenshot helper could not find the generated VM window. As in +prior F-series sweeps, the serial log is the validation source. + +| Run | Reached bsshd | AHCI timeout | Corruption markers | Failed exec | Soft lockup | `ahci_ci_loop_iterations` | Result | +| --- | ---: | ---: | ---: | ---: | ---: | ---: | --- | +| run1 | 1 | 0 | 0 | 0 | 0 | 0 | PASS | +| run2 | 1 | 0 | 0 | 0 | 0 | 0 | PASS | +| run3 | 1 | 0 | 0 | 0 | 0 | 0 | PASS | +| run4 | 1 | 0 | 0 | 0 | 0 | 0 | PASS | +| run5 | 1 | 0 | 0 | 0 | 0 | 0 | PASS | + +### Verdict + +Verdict: **PASS**. F18 reached the target 5/5 Parallels sweep: every final run +reached `[init] bsshd started (PID 2)` with `ahci_timeouts=0` and +`corruption_markers=0`. + +The ARM64 CPU0/AHCI timeout investigation is therefore complete for this +failure signature. Recommended next step: open a cleanup PR to remove the +F8-F17 diagnostic scaffolding and keep only the minimal AHCI CI-level +completion behavior plus any low-cost regression signal needed for future +AHCI timeout triage. diff --git a/kernel/src/drivers/ahci/mod.rs b/kernel/src/drivers/ahci/mod.rs index a7e40458..cd7bbea2 100644 --- a/kernel/src/drivers/ahci/mod.rs +++ b/kernel/src/drivers/ahci/mod.rs @@ -291,6 +291,7 @@ pub(crate) const AHCI_TRACE_RESCHED_CHECK_ENTRY: u32 = 32; pub(crate) const AHCI_TRACE_RESCHED_CHECK_DRAINED_WAKE: u32 = 33; pub(crate) const AHCI_TRACE_RESCHED_CHECK_SWITCHED: u32 = 34; pub(crate) const AHCI_TRACE_RESCHED_CHECK_RETURN: u32 = 35; +pub(crate) const AHCI_TRACE_CI_LOOP: u32 = 36; struct AhciTraceSlot { site: AtomicU32, @@ -461,6 +462,7 @@ fn ahci_trace_site_name(site: u32) -> &'static str { AHCI_TRACE_RESCHED_CHECK_DRAINED_WAKE => "RESCHED_CHECK_DRAINED_WAKE", AHCI_TRACE_RESCHED_CHECK_SWITCHED => "RESCHED_CHECK_SWITCHED", AHCI_TRACE_RESCHED_CHECK_RETURN => "RESCHED_CHECK_RETURN", + AHCI_TRACE_CI_LOOP => "CI_LOOP", _ => "UNKNOWN", } } @@ -2625,13 +2627,17 @@ fn probe_platform_irq(ctrl: &AhciController) { /// AHCI MSI interrupt handler — called from the IRQ dispatch in `exception.rs`. /// -/// Reads HBA_IS to identify which port(s) fired, reads and clears PORT_IS, -/// then sets the per-port `AHCI_PORT_COMPLETE` flag so `issue_cmd_slot0` can -/// wake up. Clears HBA_IS last (AHCI spec requires PORT_IS cleared first). +/// Reads HBA_IS to identify which port(s) fired, acknowledges the sampled +/// PORT_IS, then derives completions from the per-port active-slot mask and +/// PORT_CI until the port is stable. /// /// This function must be lock-free and allocation-free (called from IRQ context). +const AHCI_CI_COMPLETION_LOOP_LIMIT: u32 = 8; +const AHCI_TRACKED_SLOT_MASK: u32 = (1u32 << AHCI_MAX_CONCURRENT) - 1; + #[inline] fn detect_completed_slots(active: u32, ci_after: u32, port_is: u32) -> u32 { + let active = active & AHCI_TRACKED_SLOT_MASK; let mut completed = active & !ci_after; if completed == 0 @@ -2715,11 +2721,12 @@ pub fn handle_interrupt() { if !check_all && (hba_is & (1 << port)) == 0 { continue; } - let is = port_read(abar, port, PORT_IS); - if is == 0 { + let mut is = port_read(abar, port, PORT_IS); + let ci_entry = port_read(abar, port, PORT_CI); + let active_entry = PORT_ACTIVE_MASK[port].load(Ordering::Acquire) & AHCI_TRACKED_SLOT_MASK; + if is == 0 && (active_entry & !ci_entry) == 0 { continue; } - let ci_entry = port_read(abar, port, PORT_CI); let sact_entry = port_read(abar, port, PORT_SACT); let serr_entry = port_read(abar, port, PORT_SERR); push_ahci_event( @@ -2730,100 +2737,162 @@ pub fn handle_interrupt() { sact_entry, serr_entry, AHCI_COMPLETIONS[port][0].waiter_tid(), - 0, + active_entry, PORT_ACTIVE_CMD_NUM[port].load(Ordering::Acquire), false, ); - AHCI_ISR_PORT_HIT[port].fetch_add(1, Ordering::Relaxed); + if is != 0 { + AHCI_ISR_PORT_HIT[port].fetch_add(1, Ordering::Relaxed); + } if port == 1 { AHCI_LAST_ISR_PORT1_IS.store(is, Ordering::Relaxed); } - ack_port_interrupt(abar, port, is); - let is_after_clear = port_read(abar, port, PORT_IS); - let ci_after_clear = port_read(abar, port, PORT_CI); - let sact_after_clear = port_read(abar, port, PORT_SACT); - let active_mask = PORT_ACTIVE_MASK[port].load(Ordering::Acquire); - let completed_slots = detect_completed_slots(active_mask, ci_after_clear, is); let mut waiter_tid = 0u64; - let mut cmd_num = 0u32; let mut wake_success = false; let mut slots_processed = 0u32; + let mut loop_iterations = 0u32; + // Slot 0 is the only issued slot today. Defer the wake until after + // PORT_IS has been acknowledged and the CI drain loop is stable, so the + // woken thread cannot issue the next command while this interrupt line + // is still asserted. + let mut pending_cmd_num = 0u32; + let mut pending_waiter_tid = 0u64; + let mut pending_slot_bit = 0u32; + let mut pending_is = 0u32; + let mut pending_ci = 0u32; + let mut pending_sact = 0u32; + let mut pending_serr = 0u32; + let mut is_after_clear: u32; + let mut ci_after_clear: u32; + let mut sact_after_clear: u32; + let mut serr_after_clear: u32; + + loop { + loop_iterations += 1; + + let sampled_is = is; + if sampled_is != 0 { + ack_port_interrupt(abar, port, sampled_is); + } + + let ci = port_read(abar, port, PORT_CI); + let sact = port_read(abar, port, PORT_SACT); + let serr = port_read(abar, port, PORT_SERR); + let active_mask = + PORT_ACTIVE_MASK[port].load(Ordering::Acquire) & AHCI_TRACKED_SLOT_MASK; + let completed_slots = detect_completed_slots(active_mask, ci, sampled_is); + + push_ahci_event( + AHCI_TRACE_CI_LOOP, + port as u32, + sampled_is, + ci, + sact, + serr, + AHCI_COMPLETIONS[port][0].waiter_tid(), + completed_slots, + loop_iterations, + completed_slots != 0, + ); + + let mut remaining = completed_slots; + while remaining != 0 { + let slot = remaining.trailing_zeros() as usize; + let slot_bit = 1u32 << slot; + remaining &= !slot_bit; - if (is & (PORT_IRQ_COMPLETE | PORT_IRQ_ERROR)) != 0 { - if port < MAX_AHCI_PORTS { - // Atomically load-and-clear the active command number. - // If cmd_num > 0, this is the first ISR invocation for this - // command — signal completion. If cmd_num == 0, a previous - // ISR already completed this command (level-triggered - // re-delivery or spurious). Calling complete(0) would - // overwrite done with 0, destroying the real token and - // causing the waiter to time out. - cmd_num = PORT_ACTIVE_CMD_NUM[port].swap(0, Ordering::AcqRel); - if port == 1 { - AHCI_LAST_ISR_PORT1_CMD_NUM.store(cmd_num, Ordering::Relaxed); + let was_active = + (PORT_ACTIVE_MASK[port].fetch_and(!slot_bit, Ordering::AcqRel) & slot_bit) != 0; + if !was_active { + continue; } - waiter_tid = AHCI_COMPLETIONS[port][0].waiter_tid(); - push_ahci_event( - AHCI_TRACE_POST_CLEAR, - port as u32, - is_after_clear, - ci_after_clear, - sact_after_clear, - serr_entry, - waiter_tid, - completed_slots, - cmd_num, - false, - ); - if cmd_num != 0 { - AHCI_LAST_ISR_COMPLETE_PORT.store(port as u32, Ordering::Relaxed); - AHCI_LAST_ISR_COMPLETE_CMD.store(cmd_num, Ordering::Relaxed); - AHCI_LAST_ISR_COMPLETE_WAITER.store(waiter_tid, Ordering::Relaxed); - push_ahci_event( - AHCI_TRACE_BEFORE_COMPLETE, - port as u32, - is_after_clear, - ci_after_clear, - sact_after_clear, - serr_entry, - waiter_tid, - completed_slots, - cmd_num, - false, - ); - AHCI_COMPLETIONS[port][0].complete(cmd_num); - push_ahci_event( - AHCI_TRACE_AFTER_COMPLETE, - port as u32, - is_after_clear, - ci_after_clear, - sact_after_clear, - serr_entry, - waiter_tid, - completed_slots, - cmd_num, - false, - ); - AHCI_ISR_COMPLETE_HIT[port].fetch_add(1, Ordering::Relaxed); - wake_success = waiter_tid != 0; - slots_processed = completed_slots.count_ones(); + + slots_processed += 1; + + // The current driver issues only slot 0. The arrays are sized + // for future multi-slot work, but only slot 0 has a command + // token to publish back to wait_cmd_slot0(). + if slot == 0 { + let cmd_num = PORT_ACTIVE_CMD_NUM[port].swap(0, Ordering::AcqRel); + if port == 1 { + AHCI_LAST_ISR_PORT1_CMD_NUM.store(cmd_num, Ordering::Relaxed); + } + waiter_tid = AHCI_COMPLETIONS[port][slot].waiter_tid(); + if cmd_num != 0 { + pending_cmd_num = cmd_num; + pending_waiter_tid = waiter_tid; + pending_slot_bit = slot_bit; + pending_is = sampled_is; + pending_ci = ci; + pending_sact = sact; + pending_serr = serr; + } } - PORT_ACTIVE_MASK[port].store(0, Ordering::Release); } - } else { + + is_after_clear = port_read(abar, port, PORT_IS); + ci_after_clear = port_read(abar, port, PORT_CI); + sact_after_clear = port_read(abar, port, PORT_SACT); + serr_after_clear = port_read(abar, port, PORT_SERR); + let active_after = + PORT_ACTIVE_MASK[port].load(Ordering::Acquire) & AHCI_TRACKED_SLOT_MASK; + let completed_after_clear = active_after & !ci_after_clear; + push_ahci_event( AHCI_TRACE_POST_CLEAR, port as u32, is_after_clear, ci_after_clear, sact_after_clear, - serr_entry, - AHCI_COMPLETIONS[port][0].waiter_tid(), - completed_slots, - 0, + serr_after_clear, + waiter_tid, + completed_after_clear, + loop_iterations, + completed_after_clear != 0, + ); + + if loop_iterations >= AHCI_CI_COMPLETION_LOOP_LIMIT { + break; + } + if is_after_clear == 0 && completed_after_clear == 0 { + break; + } + is = is_after_clear; + } + + if pending_cmd_num != 0 { + AHCI_LAST_ISR_COMPLETE_PORT.store(port as u32, Ordering::Relaxed); + AHCI_LAST_ISR_COMPLETE_CMD.store(pending_cmd_num, Ordering::Relaxed); + AHCI_LAST_ISR_COMPLETE_WAITER.store(pending_waiter_tid, Ordering::Relaxed); + push_ahci_event( + AHCI_TRACE_BEFORE_COMPLETE, + port as u32, + pending_is, + pending_ci, + pending_sact, + pending_serr, + pending_waiter_tid, + pending_slot_bit, + pending_cmd_num, + false, + ); + AHCI_COMPLETIONS[port][0].complete(pending_cmd_num); + push_ahci_event( + AHCI_TRACE_AFTER_COMPLETE, + port as u32, + pending_is, + pending_ci, + pending_sact, + pending_serr, + pending_waiter_tid, + pending_slot_bit, + pending_cmd_num, false, ); + AHCI_ISR_COMPLETE_HIT[port].fetch_add(1, Ordering::Relaxed); + waiter_tid = pending_waiter_tid; + wake_success |= pending_waiter_tid != 0; } push_ahci_event( @@ -2832,10 +2901,10 @@ pub fn handle_interrupt() { is_after_clear, ci_after_clear, sact_after_clear, - serr_entry, + serr_after_clear, waiter_tid, slots_processed, - cmd_num, + loop_iterations, wake_success, ); } diff --git a/logs/breenix-parallels-cpu0/f18-ahci-ci-loop/exit.md b/logs/breenix-parallels-cpu0/f18-ahci-ci-loop/exit.md new file mode 100644 index 00000000..ce5aa08c --- /dev/null +++ b/logs/breenix-parallels-cpu0/f18-ahci-ci-loop/exit.md @@ -0,0 +1,78 @@ +# F18 Exit Report: AHCI CI-Level Completion Loop + +## VERDICT + +PASS. + +Branch: `probe/f18-ahci-ci-loop` + +Commits: + +- `1dc65313 fix(ahci): drain completions by CI level` +- `03357aa2 test(ahci): record F18 Parallels sweep` +- pending doc commit: investigation doc and exit report + +## What I built + +- `kernel/src/drivers/ahci/mod.rs`: changed the AHCI ISR from single-shot `PORT_IS` completion handling to a bounded CI-level drain loop. The handler now computes completions from `PORT_ACTIVE_MASK & !PORT_CI`, atomically clears completed active bits, acknowledges sampled `PORT_IS`, loops until the port is stable, and defers slot-0 wake publication until after the port interrupt is no longer asserted. +- `logs/breenix-parallels-cpu0/f18-ahci-ci-loop/summary.txt`: recorded the final 5-run Parallels sweep with prior F-series fields plus `ahci_ci_loop_iterations`. +- `docs/planning/ARM64_CPU0_SMP_INVESTIGATION.md`: appended the F18 audit, Linux citation, fix description, validation table, and completion verdict. +- `logs/breenix-parallels-cpu0/f18-ahci-ci-loop/exit.md`: this report. + +## What the original ask was + +Fix the F17 AHCI missed-completion failure by auditing the Breenix handler +against Linux AHCI behavior, implementing a CI-diff completion loop in +`handle_interrupt()`, validating with five Parallels runs, and documenting the +result with a clear verdict. + +## How what I built meets that ask + +- Audit: implemented. Commit `1dc65313` compares the old Breenix edge-sensitive `PORT_IS` flow to Linux v6.8's CI-active completion model in `drivers/ata/libahci.c`. +- Fix: implemented. `kernel/src/drivers/ahci/mod.rs` now uses `AHCI_CI_COMPLETION_LOOP_LIMIT=8`, `AHCI_TRACKED_SLOT_MASK`, `AHCI_TRACE_CI_LOOP`, and a bounded active-mask/`PORT_CI` drain loop. +- Linux cite: implemented. The investigation doc cites `/tmp/linux-v6.8/drivers/ata/libahci.c` lines 1875-1888 and 1963-1966. +- Build clean: implemented. `cargo build --release --target aarch64-breenix.json -Z build-std=core,alloc -Z build-std-features=compiler-builtins-mem -p kernel --bin kernel-aarch64` completed with no warning/error lines. +- Five-run sweep: implemented. Final logs are in `logs/breenix-parallels-cpu0/f18-ahci-ci-loop/run{1..5}/`. +- Pass criteria: implemented. All five final serial logs reached `[init] bsshd started (PID 2)` with `Port 1 TIMEOUT` count 0 and corruption marker count 0. +- Investigation doc: implemented. The appended 2026-04-16 F18 section declares the investigation complete and recommends a cleanup PR. + +## What I did NOT build + +- I did not remove F8-F17 diagnostic scaffolding. That was explicitly a non-goal for F18 and should be a follow-up cleanup PR. +- I did not change GIC code or any prohibited files. +- I did not add a permanent regression test beyond the required Parallels sweep summary. + +## Known risks and gaps + +- `./run.sh --parallels --test 60` still exits 1 because the screenshot helper cannot find the generated Parallels VM window. This matches earlier F-series sweeps; serial logs were used as the validation source. +- Passing runs do not dump the AHCI ring, so `ahci_ci_loop_iterations` is visible as 0 in final serial logs. The `CI_LOOP` event exists for timeout-time AHCI ring dumps and was observed during failed intermediate sweeps. +- The current driver still only issues slot 0; the code preserves arrays for future multi-slot work, but only slot 0 has a completion token today. + +## How to verify + +```bash +cargo build --release --target aarch64-breenix.json -Z build-std=core,alloc -Z build-std-features=compiler-builtins-mem -p kernel --bin kernel-aarch64 2>&1 | grep -E "^(warning|error)" +``` + +Expected: no output. + +```bash +for i in 1 2 3 4 5; do + f="logs/breenix-parallels-cpu0/f18-ahci-ci-loop/run$i/serial.log" + printf "run$i bsshd=" + rg -c "\\[init\\] bsshd started \\(PID 2\\)" "$f" || true + printf "run$i ahci_timeouts=" + rg -c "Port 1 TIMEOUT" "$f" || true + printf "run$i corruption_markers=" + rg -c "CORRUPTION|corruption|PC_ALIGN|panic|BAD" "$f" || true +done +``` + +Expected: every run reports `bsshd=1`, `ahci_timeouts=0`, and +`corruption_markers=0`. + +## Recommendation + +Open the follow-up cleanup PR to remove the accumulated F8-F17 diagnostic +scaffolding now that F18 has produced a 5/5 pass and closed this AHCI timeout +signature. diff --git a/logs/breenix-parallels-cpu0/f18-ahci-ci-loop/summary.txt b/logs/breenix-parallels-cpu0/f18-ahci-ci-loop/summary.txt new file mode 100644 index 00000000..a4b7b9a0 --- /dev/null +++ b/logs/breenix-parallels-cpu0/f18-ahci-ci-loop/summary.txt @@ -0,0 +1,25 @@ +# F18 AHCI CI Loop Validation Summary + +Branch: `probe/f18-ahci-ci-loop` +Fix commit: `1dc65313 fix(ahci): drain completions by CI level` +Command: `./run.sh --parallels --test 60` +Log root: `logs/breenix-parallels-cpu0/f18-ahci-ci-loop/` + +Note: each `run.sh` invocation exited `1` because the Parallels screenshot +helper could not find the generated VM window. Serial logs are the validation +source, matching prior F-series sweeps. + +| run | bsshd_started | ahci_timeouts | corruption_markers | failed_exec | soft_lockups | scan_start | scan_cpu | scan_done | unblock_before_send_sgi | unblock_after_send_sgi | ttwu_local_entry | ttwu_local_set_resched | irq_tail_check_resched | resched_check_entry | resched_check_drained_wake | resched_check_switched | resched_check_return | ahci_ci_loop_iterations | result | +| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | --- | +| run1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | PASS | +| run2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | PASS | +| run3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | PASS | +| run4 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | PASS | +| run5 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | PASS | + +VERDICT: PASS. All five final runs reached `[init] bsshd started (PID 2)` with +`ahci_timeouts=0` and `corruption_markers=0`. + +`ahci_ci_loop_iterations` counts visible `site=CI_LOOP` AHCI ring events in the +serial logs. Passing runs did not trigger timeout-time AHCI ring dumps, so the +visible count is `0` for all final runs.