vfio_assigned_device: dynamic IOMMU mapping for P2P DMA#3367
vfio_assigned_device: dynamic IOMMU mapping for P2P DMA#3367jstarks wants to merge 2 commits intomicrosoft:mainfrom
Conversation
|
This PR modifies files containing For more on why we check whole files, instead of just diffs, check out the Rustonomicon |
There was a problem hiding this comment.
Pull request overview
Enables peer-to-peer DMA for VFIO-assigned devices by keeping VFIO Type1 IOMMU mappings synchronized with the evolving guest memory map (including dynamically-mapped device BARs) via a new DMA-mapping consumer interface in membacking.
Changes:
- Add
membacking::DmaTarget+DmaMapperClientto publish per-submapping map/unmap events (with optional host VA support viaVaMapper). - Update
vfio_assigned_deviceto register each VFIO container as a DMA-mapper consumer instead of snapshotting guest RAM mappings at container creation. - Make
vfio_sys::Container::map_dmaexplicitlyunsafeand take a typed pointer (*const u8) for the mapped host VA.
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| vm/devices/user_driver/vfio_sys/src/lib.rs | Make DMA mapping API unsafe and pointer-typed to reflect memory safety constraints. |
| vm/devices/pci/vfio_assigned_device/src/resolver.rs | Switch resolver construction to accept a DmaMapperClient rather than GuestMemory. |
| vm/devices/pci/vfio_assigned_device/src/manager.rs | Register VFIO containers as DmaTarget consumers for dynamic IOMMU updates. |
| vm/devices/pci/vfio_assigned_device/src/lib.rs | Allow unsafe usage at crate level (needed for DMA mapping integration). |
| vm/devices/pci/vfio_assigned_device/Cargo.toml | Replace guestmem dependency with membacking. |
| openvmm/openvmm_core/src/worker/dispatch.rs | Wire GuestMemoryManager::dma_mapper_client() into VFIO resolver creation. |
| openvmm/membacking/src/region_manager.rs | Introduce DMA mapper registration + replay + live update plumbing in the region manager. |
| openvmm/membacking/src/memory_manager/mod.rs | Expose dma_mapper_client() from GuestMemoryManager. |
| openvmm/membacking/src/memory_manager/device_memory.rs | Clarify that dma_target=false affects sharing, not IOMMU DMA notifications. |
| openvmm/membacking/src/mapping_manager/va_mapper.rs | Add small public accessors/doc comments used by DMA mapping logic. |
| openvmm/membacking/src/lib.rs | Re-export DmaTarget/DmaMapperClient/DmaMapperHandle and Mappable for consumers. |
| Cargo.lock | Update dependency graph for the new membacking usage. |
| async fn add_dma_mapper( | ||
| &mut self, | ||
| target: Arc<dyn DmaTarget>, | ||
| needs_va: bool, | ||
| ) -> anyhow::Result<DmaMapperId> { | ||
| // Create a VaMapper if the target needs host VAs for IOMMU programming. | ||
| let va_mapper = if needs_va { | ||
| Some( | ||
| self.inner | ||
| .mapping_manager | ||
| .new_mapper() | ||
| .await | ||
| .map_err(|e| anyhow::anyhow!(e))?, | ||
| ) | ||
| } else { | ||
| None | ||
| }; | ||
|
|
||
| let id = DmaMapperId(self.inner.next_dma_mapper_id); | ||
| self.inner.next_dma_mapper_id += 1; | ||
|
|
||
| let mapper = DmaMapper { | ||
| id, | ||
| target, | ||
| va_mapper, | ||
| }; | ||
|
|
||
| // Replay existing active sub-mappings so the new IOMMU consumer | ||
| // gets the current state. | ||
| for region in &self.regions { | ||
| if region.is_active { | ||
| for mapping in ®ion.mappings { | ||
| let iova = region.params.range.start() + mapping.params.range_in_region.start(); | ||
| let size = mapping.params.range_in_region.len(); | ||
| mapper | ||
| .map_dma( | ||
| iova, | ||
| &mapping.params.mappable, | ||
| mapping.params.file_offset, | ||
| size, | ||
| ) | ||
| .await?; | ||
| } | ||
| } | ||
| } | ||
| self.inner.dma_mappers.push(mapper); | ||
| Ok(id) | ||
| } |
There was a problem hiding this comment.
New DMA-mapper functionality (registration + replay + live map/unmap notifications) isn’t covered by unit tests, while this file already has tests. Adding a fake DmaTarget in tests that records map_dma/unmap_dma calls would let you validate: (1) replay on add_dma_mapper, (2) notifications on add_mapping / remove_mappings, and (3) correct unmap ordering on region disable/removal.
| unsafe fn map_dma( | ||
| &self, | ||
| iova: u64, | ||
| host_va: Option<*const u8>, | ||
| _mappable: &membacking::Mappable, | ||
| _file_offset: u64, | ||
| size: u64, | ||
| ) -> anyhow::Result<()> { | ||
| let vaddr = host_va.expect("VFIO type1 requires host VA (registered with needs_va=true)"); | ||
| // SAFETY: The caller (DmaMapper in membacking) guarantees that the | ||
| // host VA is backed and stable via ensure_mapped + VaMapper lifetime. | ||
| unsafe { |
There was a problem hiding this comment.
VfioType1DmaTarget::map_dma uses host_va.expect(...), which will panic if the region manager is misconfigured or if a future backend registers with needs_va=false by mistake. Since this is fallible plumbing, prefer returning a structured error when host_va is None (e.g., anyhow::bail!/ensure!) instead of panicking.
| // Ensure the VaMapper has the backing pages faulted in and compute | ||
| // the host VA for type1 backends. | ||
| let host_va = if let Some(va_mapper) = &self.va_mapper { | ||
| va_mapper | ||
| .ensure_mapped(MemoryRange::new(iova..iova + size)) | ||
| .await | ||
| .map_err(|_| { | ||
| anyhow::anyhow!( | ||
| "VA range {:#x}..{:#x} has no backing mapping", | ||
| iova, | ||
| iova + size | ||
| ) | ||
| })?; | ||
| Some(va_mapper.as_ptr().wrapping_add(iova as usize).cast_const()) | ||
| } else { | ||
| None | ||
| }; | ||
| // SAFETY: When host_va is Some, the VaMapper has been ensure_mapped | ||
| // for this range. The VaMapper is held alive by this DmaMapper (via | ||
| // Arc), so the VA reservation persists. The pages are backed because | ||
| // ensure_mapped succeeded. The IOMMU mapping will be torn down | ||
| // (via unmap_dma in disable_region or remove_dma_mapper) before the | ||
| // VaMapper releases the VA range. | ||
| unsafe { |
There was a problem hiding this comment.
DmaMapper::map_dma builds a MemoryRange::new(iova..iova + size) and later computes as_ptr().wrapping_add(iova as usize). This can panic or produce an incorrect pointer if iova + size overflows u64, or if iova/iova+size exceed the VaMapper reservation (and iova as usize truncates on platforms where usize < u64). Please add checked arithmetic and bounds validation against va_mapper.len() and avoid wrapping_add/truncating casts here (return an error instead).
| // Ensure the VaMapper has the backing pages faulted in and compute | |
| // the host VA for type1 backends. | |
| let host_va = if let Some(va_mapper) = &self.va_mapper { | |
| va_mapper | |
| .ensure_mapped(MemoryRange::new(iova..iova + size)) | |
| .await | |
| .map_err(|_| { | |
| anyhow::anyhow!( | |
| "VA range {:#x}..{:#x} has no backing mapping", | |
| iova, | |
| iova + size | |
| ) | |
| })?; | |
| Some(va_mapper.as_ptr().wrapping_add(iova as usize).cast_const()) | |
| } else { | |
| None | |
| }; | |
| // SAFETY: When host_va is Some, the VaMapper has been ensure_mapped | |
| // for this range. The VaMapper is held alive by this DmaMapper (via | |
| // Arc), so the VA reservation persists. The pages are backed because | |
| // ensure_mapped succeeded. The IOMMU mapping will be torn down | |
| // (via unmap_dma in disable_region or remove_dma_mapper) before the | |
| // VaMapper releases the VA range. | |
| unsafe { | |
| let end = iova | |
| .checked_add(size) | |
| .ok_or_else(|| anyhow::anyhow!("DMA range overflows u64: {:#x} + {:#x}", iova, size))?; | |
| // Ensure the VaMapper has the backing pages faulted in and compute | |
| // the host VA for type1 backends. | |
| let host_va = if let Some(va_mapper) = &self.va_mapper { | |
| let iova_usize = usize::try_from(iova).map_err(|_| { | |
| anyhow::anyhow!("DMA start {:#x} does not fit in usize for VA mapping", iova) | |
| })?; | |
| let end_usize = usize::try_from(end).map_err(|_| { | |
| anyhow::anyhow!("DMA end {:#x} does not fit in usize for VA mapping", end) | |
| })?; | |
| if end_usize > va_mapper.len() { | |
| anyhow::bail!( | |
| "DMA range {:#x}..{:#x} exceeds VA reservation size {:#x}", | |
| iova, | |
| end, | |
| va_mapper.len() | |
| ); | |
| } | |
| va_mapper | |
| .ensure_mapped(MemoryRange::new(iova..end)) | |
| .await | |
| .map_err(|_| { | |
| anyhow::anyhow!( | |
| "VA range {:#x}..{:#x} has no backing mapping", | |
| iova, | |
| end | |
| ) | |
| })?; | |
| Some((va_mapper, iova_usize)) | |
| } else { | |
| None | |
| }; | |
| // SAFETY: When host_va is Some, the VaMapper range was validated to | |
| // fit within the reservation and ensure_mapped succeeded for this | |
| // range. The VaMapper is held alive by this DmaMapper (via Arc), so | |
| // the VA reservation persists. The pages are backed because | |
| // ensure_mapped succeeded. The IOMMU mapping will be torn down | |
| // (via unmap_dma in disable_region or remove_dma_mapper) before the | |
| // VaMapper releases the VA range. | |
| unsafe { | |
| let host_va = host_va.map(|(va_mapper, iova_usize)| va_mapper.as_ptr().add(iova_usize).cast_const()); |
| async fn add_dma_mapper( | ||
| &mut self, | ||
| target: Arc<dyn DmaTarget>, | ||
| needs_va: bool, | ||
| ) -> anyhow::Result<DmaMapperId> { | ||
| // Create a VaMapper if the target needs host VAs for IOMMU programming. | ||
| let va_mapper = if needs_va { | ||
| Some( | ||
| self.inner | ||
| .mapping_manager | ||
| .new_mapper() | ||
| .await | ||
| .map_err(|e| anyhow::anyhow!(e))?, | ||
| ) | ||
| } else { | ||
| None | ||
| }; | ||
|
|
||
| let id = DmaMapperId(self.inner.next_dma_mapper_id); | ||
| self.inner.next_dma_mapper_id += 1; | ||
|
|
||
| let mapper = DmaMapper { | ||
| id, | ||
| target, | ||
| va_mapper, | ||
| }; | ||
|
|
||
| // Replay existing active sub-mappings so the new IOMMU consumer | ||
| // gets the current state. | ||
| for region in &self.regions { | ||
| if region.is_active { | ||
| for mapping in ®ion.mappings { | ||
| let iova = region.params.range.start() + mapping.params.range_in_region.start(); | ||
| let size = mapping.params.range_in_region.len(); | ||
| mapper | ||
| .map_dma( | ||
| iova, | ||
| &mapping.params.mappable, | ||
| mapping.params.file_offset, | ||
| size, | ||
| ) | ||
| .await?; | ||
| } | ||
| } | ||
| } | ||
| self.inner.dma_mappers.push(mapper); | ||
| Ok(id) |
There was a problem hiding this comment.
add_dma_mapper replays existing active sub-mappings by calling mapper.map_dma(...).await? before the mapper is registered in self.inner.dma_mappers. If replay fails partway through, the DMA target may have a partially-programmed IOMMU state, and (when needs_va=true) the VaMapper can be dropped while those IOMMU mappings remain live. This can leave stale pinned mappings and violate the safety contract. Consider registering the mapper first and rolling back (unmapping already-mapped ranges) on error, or keeping a list of successfully-mapped ranges and unmapping them before returning the error.
| let region_range = region.params.range; | ||
| for dma_mapper in &mut self.dma_mappers { | ||
| dma_mapper.unmap_dma(region_range.start(), region_range.len()); |
There was a problem hiding this comment.
disable_region calls dma_mapper.unmap_dma(region_range.start(), region_range.len()) for each DMA mapper. Regions can contain multiple sub-mappings (and potentially gaps), and DmaTarget::unmap_dma is allowed to be called with a larger range than individual map_dma calls. The current VFIO implementation forwards this directly to VFIO_IOMMU_UNMAP_DMA, whose wrapper docs require the range to match a prior map; if the kernel returns EINVAL or only partially unmaps, the region manager will still tear down VA mappings, potentially leaving live IOMMU entries pointing at unmapped memory. Prefer iterating per region.mappings and unmapping each mapped sub-range (or otherwise ensuring the backend can handle gaps) before removing the VA mappings.
| let region_range = region.params.range; | |
| for dma_mapper in &mut self.dma_mappers { | |
| dma_mapper.unmap_dma(region_range.start(), region_range.len()); | |
| // | |
| // Unmap each mapped sub-range individually to match the granularity | |
| // used by enable_region. A region can contain multiple sub-mappings | |
| // and gaps, and some DMA backends require unmap ranges to correspond | |
| // to prior map ranges. | |
| let region_range = region.params.range; | |
| for mapping in ®ion.mappings { | |
| let iova = region.params.range.start() + mapping.params.range_in_region.start(); | |
| let size = mapping.params.range_in_region.len(); | |
| for dma_mapper in &mut self.dma_mappers { | |
| dma_mapper.unmap_dma(iova, size); | |
| } |
VFIO device assignment needs peer-to-peer DMA between assigned devices — one device must be able to DMA into another device's BAR. The old VFIO container manager snapshot guest RAM at creation time via GuestMemory::full_mapping() and programmed a static set of IOMMU identity mappings, so device BARs (which are mapped dynamically as the guest configures them) were never visible to the IOMMU. Introduce a DmaTarget trait in the region manager that receives incremental sub-mapping events (map_dma / unmap_dma) as regions are enabled, disabled, or modified. VFIO containers register themselves as DMA mapper consumers via a new DmaMapperClient, and the region manager replays all existing active sub-mappings on registration, then sends live updates as the memory map evolves. This means device BAR mappings are now programmed into every VFIO container's IOMMU as they appear, enabling P2P DMA between assigned devices. The region manager maintains the VaMapper lifecycle for backends that need host VAs (VFIO type1's pin_user_pages), controlled by a needs_va flag at registration. This also sets up the abstraction for future iommufd support, which maps from fd+offset directly and won't need a VaMapper at all. On the vfio_sys side, Container::map_dma now takes *const u8 instead of a raw u64 and is marked unsafe, properly reflecting the memory safety requirements that were previously hidden in the API.
| /// The manager registers each new VFIO container with the region manager | ||
| /// so that DMA mappings are kept in sync with the VM's memory map. | ||
| pub fn new(spawner: impl pal_async::task::Spawn, dma_mapper_client: DmaMapperClient) -> Self { |
There was a problem hiding this comment.
The user guide has a dedicated VFIO page (Guide/src/user_guide/openvmm/vfio.md). Since this PR changes VFIO DMA behavior significantly (dynamic BAR/IOMMU mapping enabling P2P DMA between assigned devices), please update that page (or add a short note + limitation/requirements section) so the documented VFIO behavior matches the implementation.
| // SAFETY: sysconf(_SC_PAGESIZE) is always safe and always succeeds on Linux. | ||
| let page_size = unsafe { libc::sysconf(libc::_SC_PAGESIZE) as u64 }; |
There was a problem hiding this comment.
host_page_size() no longer checks for sysconf(_SC_PAGESIZE) failure. If sysconf returns -1, casting to u64 yields a huge value that will be cached and will break the page-alignment logic in map_dma (and the assert!(page_size >= 4096) won’t catch it). Please keep an explicit > 0 / != -1 check (ideally before casting) and return an error or panic with a clear message if it ever fails.
| // SAFETY: sysconf(_SC_PAGESIZE) is always safe and always succeeds on Linux. | |
| let page_size = unsafe { libc::sysconf(libc::_SC_PAGESIZE) as u64 }; | |
| // SAFETY: sysconf(_SC_PAGESIZE) is always safe to call on Linux. | |
| let raw_page_size = unsafe { libc::sysconf(libc::_SC_PAGESIZE) }; | |
| assert!( | |
| raw_page_size > 0, | |
| "sysconf(_SC_PAGESIZE) failed or returned invalid page size: {}", | |
| raw_page_size | |
| ); | |
| let page_size = raw_page_size as u64; |
| // Ensure the VaMapper has the backing pages faulted in and compute | ||
| // the host VA for type1 backends. | ||
| let host_va = if let Some(va_mapper) = &self.va_mapper { | ||
| va_mapper | ||
| .ensure_mapped(MemoryRange::new(iova..iova + size)) | ||
| .await | ||
| .map_err(|_| { | ||
| anyhow::anyhow!( | ||
| "VA range {:#x}..{:#x} has no backing mapping", | ||
| iova, | ||
| iova + size |
There was a problem hiding this comment.
MemoryRange::new(iova..iova + size) can panic on u64 overflow (or if the result becomes unaligned), since MemoryRange::new asserts invariants. Please use checked_add for iova + size and return an error if it overflows, and avoid using the potentially-overflowing value again in the error message formatting.
| // Ensure the VaMapper has the backing pages faulted in and compute | |
| // the host VA for type1 backends. | |
| let host_va = if let Some(va_mapper) = &self.va_mapper { | |
| va_mapper | |
| .ensure_mapped(MemoryRange::new(iova..iova + size)) | |
| .await | |
| .map_err(|_| { | |
| anyhow::anyhow!( | |
| "VA range {:#x}..{:#x} has no backing mapping", | |
| iova, | |
| iova + size | |
| let end = iova.checked_add(size).ok_or_else(|| { | |
| anyhow::anyhow!( | |
| "VA range start {:#x} with size {:#x} overflows u64", | |
| iova, | |
| size | |
| ) | |
| })?; | |
| // Ensure the VaMapper has the backing pages faulted in and compute | |
| // the host VA for type1 backends. | |
| let host_va = if let Some(va_mapper) = &self.va_mapper { | |
| va_mapper | |
| .ensure_mapped(MemoryRange::new(iova..end)) | |
| .await | |
| .map_err(|_| { | |
| anyhow::anyhow!( | |
| "VA range {:#x}..{:#x} has no backing mapping", | |
| iova, | |
| end |
| va_mapper | ||
| .ensure_mapped(MemoryRange::new(iova..iova + size)) | ||
| .await | ||
| .map_err(|_| { | ||
| anyhow::anyhow!( | ||
| "VA range {:#x}..{:#x} has no backing mapping", | ||
| iova, | ||
| iova + size | ||
| ) | ||
| })?; | ||
| Some(va_mapper.as_ptr().wrapping_add(iova as usize).cast_const()) |
There was a problem hiding this comment.
Computing host_va via va_mapper.as_ptr().wrapping_add(iova as usize) can silently wrap/truncate (e.g., if iova doesn’t fit in usize or exceeds the VA reservation length). Instead, validate iova + size <= va_mapper.len() and that iova fits in usize, then use a non-wrapping pointer add; otherwise return an error so we don’t hand VFIO a bogus pointer.
| va_mapper | |
| .ensure_mapped(MemoryRange::new(iova..iova + size)) | |
| .await | |
| .map_err(|_| { | |
| anyhow::anyhow!( | |
| "VA range {:#x}..{:#x} has no backing mapping", | |
| iova, | |
| iova + size | |
| ) | |
| })?; | |
| Some(va_mapper.as_ptr().wrapping_add(iova as usize).cast_const()) | |
| let end = iova.checked_add(size).ok_or_else(|| { | |
| anyhow::anyhow!( | |
| "VA range {:#x}..{:#x} overflows", | |
| iova, | |
| iova.saturating_add(size) | |
| ) | |
| })?; | |
| let va_len = va_mapper.len() as u64; | |
| if end > va_len { | |
| return Err(anyhow::anyhow!( | |
| "VA range {:#x}..{:#x} exceeds reserved VA length {:#x}", | |
| iova, | |
| end, | |
| va_len | |
| )); | |
| } | |
| let offset = usize::try_from(iova).map_err(|_| { | |
| anyhow::anyhow!("VA offset {:#x} does not fit in usize", iova) | |
| })?; | |
| va_mapper | |
| .ensure_mapped(MemoryRange::new(iova..end)) | |
| .await | |
| .map_err(|_| { | |
| anyhow::anyhow!( | |
| "VA range {:#x}..{:#x} has no backing mapping", | |
| iova, | |
| end | |
| ) | |
| })?; | |
| // SAFETY: `offset` was validated to fit in `usize`, and | |
| // `end <= va_mapper.len() as u64` ensures the resulting pointer | |
| // stays within the reserved VA range. | |
| Some(unsafe { va_mapper.as_ptr().add(offset).cast_const() }) |
| // Replay existing active sub-mappings so the new IOMMU consumer | ||
| // gets the current state. | ||
| for region in &self.regions { | ||
| if region.is_active { | ||
| for mapping in ®ion.mappings { | ||
| let iova = region.params.range.start() + mapping.params.range_in_region.start(); | ||
| let size = mapping.params.range_in_region.len(); | ||
| mapper | ||
| .map_dma( | ||
| iova, | ||
| &mapping.params.mappable, | ||
| mapping.params.file_offset, | ||
| size, | ||
| ) | ||
| .await?; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
If replaying existing sub-mappings fails partway through, add_dma_mapper returns an error but any mappings already programmed into the target IOMMU remain live (and the mapper isn’t registered, so there’s no handle to later remove/unmap them). Please add rollback on failure (best-effort unmap of everything mapped so far) before returning the error.
| _file_offset: u64, | ||
| size: u64, | ||
| ) -> anyhow::Result<()> { | ||
| let vaddr = host_va.expect("VFIO type1 requires host VA (registered with needs_va=true)"); |
There was a problem hiding this comment.
Using host_va.expect(...) will panic the VMM if the mapper is ever registered with needs_va = false (or if the region manager has a bug). Since this can be triggered by configuration/feature work in the future, it’s safer to return a regular error (e.g., anyhow::bail!) when host_va is None rather than panicking.
| let vaddr = host_va.expect("VFIO type1 requires host VA (registered with needs_va=true)"); | |
| let Some(vaddr) = host_va else { | |
| anyhow::bail!("VFIO type1 requires host VA (registered with needs_va=true)"); | |
| }; |
| /// A consumer of IOMMU-granularity DMA mapping events. | ||
| /// | ||
| /// Unlike [`PartitionMemoryMap`](virt::PartitionMemoryMap), which maps entire | ||
| /// regions by VA pointer for lazy SLAT resolution, this trait receives | ||
| /// individual sub-mapping events with the backing fd + offset, suitable for | ||
| /// explicit IOMMU programming (VFIO type1, iommufd, etc.). | ||
| /// | ||
| /// DMA targets receive notifications for **all** active sub-mappings, | ||
| /// including device BAR memory (regions with `dma_target: false`). The | ||
| /// `dma_target` flag controls only whether a region is exposed via | ||
| /// `GuestMemorySharing` (for vhost-user); IOMMU consumers need the full | ||
| /// GPA→backing map to program identity mappings for all guest-visible memory. | ||
| /// | ||
| /// Implementations must be `Send + Sync` because they are stored behind `Arc` | ||
| /// in the region manager task. | ||
| pub trait DmaTarget: Send + Sync { | ||
| /// Program an IOMMU mapping from `iova` to the backing described by | ||
| /// `mappable` at `file_offset` for `size` bytes. | ||
| /// | ||
| /// `host_va` is the host virtual address of the mapping, provided when | ||
| /// the target was registered with `needs_va = true`. When `None`, the | ||
| /// implementation should use `mappable` and `file_offset` directly | ||
| /// (e.g., iommufd). | ||
| /// | ||
| /// # Safety | ||
| /// When `host_va` is `Some`, the pointed-to memory must be backed and | ||
| /// must not be unmapped for the duration of the resulting IOMMU mapping. | ||
| /// The caller (the crate-internal `DmaMapper`) guarantees this by holding an | ||
| /// [`Arc<VaMapper>`] and calling `ensure_mapped` before each invocation. | ||
| unsafe fn map_dma( | ||
| &self, | ||
| iova: u64, | ||
| host_va: Option<*const u8>, | ||
| mappable: &Mappable, | ||
| file_offset: u64, | ||
| size: u64, | ||
| ) -> anyhow::Result<()>; | ||
|
|
||
| /// Remove an IOMMU mapping at `iova` for `size` bytes. | ||
| /// | ||
| /// The region manager may call this with a range that is larger than any | ||
| /// single `map_dma` call (e.g., unmapping an entire region at once even | ||
| /// though individual sub-mappings were mapped separately). Implementations | ||
| /// must handle this gracefully — for example, by unmapping whatever | ||
| /// sub-ranges are currently mapped within the requested range and ignoring | ||
| /// any gaps. | ||
| fn unmap_dma(&self, iova: u64, size: u64) -> anyhow::Result<()>; | ||
| } |
There was a problem hiding this comment.
This introduces new dynamic DMA mapping behavior (replay on registration + live map/unmap updates). region_manager.rs already has unit tests, but none exercise the new DmaTarget/DmaMapperClient path. Please add tests with a mock DmaTarget to verify: (1) replay maps all active sub-mappings in order, (2) add_mapping triggers map_dma for active regions, (3) remove_mappings/disable_region trigger unmap_dma for the right ranges, and (4) remove_dma_mapper unmaps all remaining mappings.
VFIO device assignment needs peer-to-peer DMA between assigned devices — one device must be able to DMA into another device's BAR. The old VFIO container manager snapshot guest RAM at creation time via GuestMemory::full_mapping() and programmed a static set of IOMMU identity mappings, so device BARs (which are mapped dynamically as the guest configures them) were never visible to the IOMMU.
Introduce a DmaTarget trait in the region manager that receives incremental sub-mapping events (map_dma / unmap_dma) as regions are enabled, disabled, or modified. VFIO containers register themselves as DMA mapper consumers via a new DmaMapperClient, and the region manager replays all existing active sub-mappings on registration, then sends live updates as the memory map evolves. This means device BAR mappings are now programmed into every VFIO container's IOMMU as they appear, enabling P2P DMA between assigned devices.
The region manager maintains the VaMapper lifecycle for backends that need host VAs (VFIO type1's pin_user_pages), controlled by a needs_va flag at registration. This also sets up the abstraction for future iommufd support, which maps from fd+offset directly and won't need a VaMapper at all.