wip: sidecar: sync RAX from intercept message in handle_io_port_exit#3360
wip: sidecar: sync RAX from intercept message in handle_io_port_exit#3360emirceski wants to merge 4 commits intomicrosoft:mainfrom
Conversation
Fixes crash at `assert_eq!(message.rax, cpu_context.gps[RAX])` crash in
`handle_io_port_exit` for sidecar VPs.
The sidecar's `run_vp_once()` uses inline asm `inout`/`lateout` operands
to preserve VTL2's own registers across the VTL return hypercall. On
VTL2 re-entry, the hypervisor restores VTL2's saved register state —
not the guest's. The resulting `cpu_context.gps[]` values are VTL2
round-tripped values, not the guest's actual registers at intercept time.
When the `run_vp_once()` loop handles an interrupt re-entry (loops back)
followed by an intercept, the guest may have changed registers between
the two exits. The VMM then reads stale `cpu_context.gps[RAX]` and the
existing `assert_eq!` against `message.rax` fires.
After copying the intercept message, sync GP and XMM0–5 registers from
the hypervisor's register page into `cpu_context`. The register page
contains the authoritative guest register state, populated by the
hypervisor on each intercept.
All three conditions must be true simultaneously:
1. VP is running in sidecar (non-sidecar VPs use the kernel driver)
2. An interrupt re-entry precedes the intercept (guest runs more code)
3. The VMM reads `cpu_context.gps[]` for the intercept type (IO port)
Sidecar VPs are removed after their first intercept, so the window is
one exit sequence per VP lifetime.
```
VTL2 (sidecar) Hypervisor VTL0 (guest)
───────────── ────────── ────────────
1. Write RAX/RCX to
VtlControl.registers[]
2. "call rax" (VTL return) ──────► 3. Save VTL2 regs to shadow
4. Restore VTL0 regs ────────► 5. Guest runs
6. Guest hits
interrupt/intercept
7. Save VTL0 regs ◄──────────
8. Restore VTL2 regs from shadow
◄────── RE-ENTRY ──────────────── 9. Return to VTL2
(execution resumes after "call rax")
10. lateout("rax") captures
physical RAX
= VTL2's saved RAX from step 3
≠ guest's RAX from step 6
```
- 20 crashes, all on TIP build `0.8.277.0`, April 16–18 2026
- Consistent `+0xB0` offset: `cpu_context = (message.rax + 0xB0) & 0xFFFFFFFF`
- `0xB0` = `offsetof(HvVpAssistPage, intercept_message.payload.rax)`
- Not a servicing path (`keepalive=false`, no per-CPU override logs)
- HV team confirmed no hypervisor-side RAX changes
- XMM6–15 remain stale (register page only carries XMM0–5)
- FP control state (MXCSR, FCW) remains stale (`fxsave` captures VTL2's)
- If register page is not mapped (older HV), stale values remain
|
This PR modifies files containing For more on why we check whole files, instead of just diffs, check out the Rustonomicon |
There was a problem hiding this comment.
Pull request overview
This PR fixes a crash in sidecar VP intercept handling by ensuring the VMM-visible cpu_context reflects the guest’s authoritative register state at intercept time, rather than VTL2 round-tripped register values after interrupt re-entry.
Changes:
- Pass
register_page_mappedintorun_vp_once()to conditionally sync registers on intercept. - On intercept, copy GP registers and XMM0–5 from the hypervisor-provided register page into
command_page.cpu_context.
…sters - Skip index 4 (CR2 in CpuContextX64, RSP in register page) during GP register sync to preserve the captured CR2 value. - Gate register page sync on is_valid != 0 to avoid copying stale/zero data when the page is mapped but not populated.
| let interruption_pending = message.header.execution_state.interruption_pending(); | ||
|
|
||
| if message.access_info.string_op() || message.access_info.rep_prefix() { | ||
| self.vp.runner.cpu_context_mut().gps_no_rsp[protocol::RAX] = rax; |
There was a problem hiding this comment.
RAX is written into cpu_context in both branches, which makes it easy to accidentally diverge later. Since the assignment is unconditional, consider moving it before the if and/or taking a single mutable reference to the CPU context once and reusing it to avoid repeated cpu_context_mut() calls. This reduces duplication and makes the intent clearer.
| let access_size = message.access_info.access_size(); | ||
| let is_write = message.header.intercept_access_type == HvInterceptAccessType::WRITE; | ||
| let port = message.port_number; | ||
| self.vp.runner.cpu_context_mut().gps_no_rsp[protocol::RAX] = rax; |
There was a problem hiding this comment.
RAX is written into cpu_context in both branches, which makes it easy to accidentally diverge later. Since the assignment is unconditional, consider moving it before the if and/or taking a single mutable reference to the CPU context once and reusing it to avoid repeated cpu_context_mut() calls. This reduces duplication and makes the intent clearer.
sidecar: sync RAX from intercept message in IO port exit handler
Fixes ICM 781515258 —
assert_eq!(message.rax, cpu_context.gps[RAX])crash inhandle_io_port_exitfor sidecar VPs.Some or all of the below is wrong. Ignore it until we get a better understanding of the crash and fix.
Problem
On sidecar VPs, the
cpu_context.gps[]array contains VTL2's own register values, not the guest's. The sidecar'srun_vp_once()inline asm captures VTL2's registers on VTL re-entry — the hypervisor restores VTL2's saved state, not the guest's. When an interrupt re-entry precedes an intercept exit, the guest may change RAX between exits, causing theassert_eq!againstmessage.raxto fire.Fix
In
handle_io_port_exit, replace theassert_eq!with an unconditional sync of RAX from the intercept message intocpu_context. The intercept message always carries the correct guest RAX value, regardless of whether the VP is a sidecar VP or not. This is safe for non-sidecar VPs too, since the intercept message RAX matchescpu_contextRAX in that case.Other intercept handlers (CPUID, MSR) already write their result registers to
cpu_contextfrom the intercept message fields, so they are not affected.Call flow