vfio_assigned_device: quiesce device via Command register on reset#3338
vfio_assigned_device: quiesce device via Command register on reset#3338jstarks wants to merge 1 commit intomicrosoft:mainfrom
Conversation
Before issuing VFIO_DEVICE_RESET (or when the device doesn't support it at all), clear Bus Master Enable, Memory Space Enable, and I/O Space Enable in the PCI Command register. VFIO passes these bits through to real hardware, so this stops the device from initiating DMA and from decoding MMIO/PIO on the physical bus. Without this, a device that lacks VFIO_DEVICE_RESET would retain BME after a VM reset, allowing it to continue DMA through the IOMMU with stale mappings.
There was a problem hiding this comment.
Pull request overview
This PR hardens VFIO-assigned PCI device reset behavior by quiescing the physical device via the PCI Command register prior to attempting VFIO_DEVICE_RESET, preventing post-reset DMA/MMIO/PIO activity from persisting on real hardware.
Changes:
- Clear Bus Master Enable, Memory Space Enable, and I/O Space Enable in the physical device’s PCI Command register during
reset(). - Ensure quiescing occurs before the existing VFIO reset path, covering devices that don’t support
VFIO_DEVICE_RESET.
| let command = cfg_space::Command::from_bits( | ||
| self.read_phys_config(HeaderType00::STATUS_COMMAND.0) as u16, | ||
| ) | ||
| .with_pio_enabled(false) | ||
| .with_mmio_enabled(false) | ||
| .with_bus_master(false); | ||
| self.write_phys_config(HeaderType00::STATUS_COMMAND.0, command.into_bits().into()); |
There was a problem hiding this comment.
read_phys_config() returns !0 on read failure. If that happens here, Command::from_bits(0xFFFF) will preserve reserved bits as 1, and the subsequent write may push an invalid Command value to hardware (potentially toggling unrelated control bits). Consider reading the config DWORD with the fallible helper (or otherwise detecting the error) and skipping the quiesce write on failure, or explicitly clearing reserved bits before writing.
| let command = cfg_space::Command::from_bits( | |
| self.read_phys_config(HeaderType00::STATUS_COMMAND.0) as u16, | |
| ) | |
| .with_pio_enabled(false) | |
| .with_mmio_enabled(false) | |
| .with_bus_master(false); | |
| self.write_phys_config(HeaderType00::STATUS_COMMAND.0, command.into_bits().into()); | |
| let raw_command = self.read_phys_config(HeaderType00::STATUS_COMMAND.0); | |
| if raw_command == u32::MAX { | |
| tracing::warn!( | |
| pci_id = self.pci_id.as_str(), | |
| "failed to read physical PCI command register; skipping quiesce write" | |
| ); | |
| } else { | |
| let command = cfg_space::Command::from_bits(raw_command as u16) | |
| .with_pio_enabled(false) | |
| .with_mmio_enabled(false) | |
| .with_bus_master(false); | |
| self.write_phys_config(HeaderType00::STATUS_COMMAND.0, command.into_bits().into()); | |
| } |
| // from mastering DMA and from decoding MMIO/PIO on the physical | ||
| // bus. Without this, a device that lacks VFIO_DEVICE_RESET support | ||
| // would continue mastering the bus after a VM reset. | ||
| let command = cfg_space::Command::from_bits( |
There was a problem hiding this comment.
Should this be an else of the if *supports_reset { ... } block below? In other words: prefer VFIO reset then fallback to this if not supported
|
I think this is a good change, but if you just clear BME while a device is operating, it will probably fall into an error state, requiring a reset before it is useful again. Can you force one of the other PCI reset paths, like moving it from D0->D3->D0? |
Before issuing VFIO_DEVICE_RESET (or when the device doesn't support it at all), clear Bus Master Enable, Memory Space Enable, and I/O Space Enable in the PCI Command register. VFIO passes these bits through to real hardware, so this stops the device from initiating DMA and from decoding MMIO/PIO on the physical bus.
Without this, a device that lacks VFIO_DEVICE_RESET would retain BME after a VM reset, allowing it to continue DMA through the IOMMU with stale mappings.