[WIP] vhdx: add pure-Rust VHDX parser and disk layer by jstarks · Pull Request #3347 · microsoft/openvmm

jstarks · 2026-04-21T20:26:42Z

Add a pure-Rust VHDX implementation with full read/write support, write-ahead logging, crash consistency, free space management, trim, differencing disk chains, and a disk_backend integration layer.

This replaces the Windows-only vhdmp kernel driver for VHDX files, enabling cross-platform VHDX support.

Add a pure-Rust VHDX implementation with full read/write support, write-ahead logging, crash consistency, free space management, trim, differencing disk chains, and a disk_backend integration layer. This replaces the Windows-only vhdmp kernel driver for VHDX files, enabling cross-platform VHDX support.

github-actions · 2026-04-21T20:27:16Z

⚠️ Unsafe Code Detected

This PR modifies files containing unsafe Rust code. Extra scrutiny is required during review.

For more on why we check whole files, instead of just diffs, check out the Rustonomicon

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a cross-platform, pure-Rust VHDX implementation (parser + read/write + WAL pipeline) and wires it into OpenVMM’s layered disk stack so .vhdx works without the Windows-only kernel driver.

Changes:

Introduces a new vhdx crate with VHDX format parsing, metadata/region/header handling, and a WAL-backed commit/apply pipeline.
Adds a new disklayer_vhdx crate implementing LayerIo + resolver, including parent-locator chain walking.
Updates OpenVMM resource registration and disk-opening logic to use the new VHDX backend for .vhdx.

Reviewed changes

Copilot reviewed 34 out of 41 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
vm/devices/storage/vhdx/src/sector_bitmap.rs	Sector bitmap cache + partially-present read resolution; adds unit tests.
vm/devices/storage/vhdx/src/region.rs	Region table parsing/validation + rewrite via WAL; adds tests.
vm/devices/storage/vhdx/src/metadata.rs	Metadata table parsing + item reads; adds tests.
vm/devices/storage/vhdx/src/lsn_watermark.rs	Async LSN/FSN watermark primitive with poisoning + tests.
vm/devices/storage/vhdx/src/log_task.rs	WAL logging task plumbing and tail advancement + tests.
vm/devices/storage/vhdx/src/log_permits.rs	Failable semaphore for pipeline backpressure + tests.
vm/devices/storage/vhdx/src/locator.rs	Parent locator parsing (UTF-16 KV table) + tests.
vm/devices/storage/vhdx/src/lib.rs	Crate root + module exports + `AsyncFile` trait.
vm/devices/storage/vhdx/src/known_meta.rs	Parses well-known metadata items + required-item validation + tests.
vm/devices/storage/vhdx/src/header.rs	Dual-header parsing + serialized header updates + tests.
vm/devices/storage/vhdx/src/format.rs	On-disk structs/constants + checksums + GUIDs + layout tests.
vm/devices/storage/vhdx/src/flush.rs	Flush sequencer implementation + tests.
vm/devices/storage/vhdx/src/error.rs	Error and corruption enums used across the crate.
vm/devices/storage/vhdx/src/create.rs	VHDX creation logic (headers/regions/metadata/BAT) + tests.
vm/devices/storage/vhdx/src/apply_task.rs	Apply task writing logged pages to final offsets.
vm/devices/storage/vhdx/Cargo.toml	Defines the new `vhdx` crate and dependencies.
vm/devices/storage/disklayer_vhdx/src/resolver.rs	Resource resolver for `VhdxDiskLayerHandle`.
vm/devices/storage/disklayer_vhdx/src/lib.rs	`LayerIo` implementation bridging `vhdx` ranges to buffers.
vm/devices/storage/disklayer_vhdx/src/io.rs	`BlockingFile` implementing `vhdx::AsyncFile` via blocking pool.
vm/devices/storage/disklayer_vhdx/src/chain.rs	Helpers to open single VHDX or auto-walk differencing chains.
vm/devices/storage/disklayer_vhdx/Cargo.toml	Defines the new disklayer crate + deps/tests.
vm/devices/storage/disk_backend_resources/src/layer.rs	Adds `VhdxDiskLayerHandle` resource type.
openvmm/openvmm_resources/src/lib.rs	Registers the new VHDX disklayer resolver.
openvmm/openvmm_resources/Cargo.toml	Adds `disklayer_vhdx` dependency.
openvmm/openvmm_helpers/src/disk.rs	Routes `.vhdx` open to the pure-Rust VHDX chain implementation.
openvmm/openvmm_helpers/Cargo.toml	Adds `disklayer_vhdx` dependency.
Cargo.toml	Adds workspace members + `crc32c` dependency.

Copilot

Pull request overview

Copilot reviewed 34 out of 41 changed files in this pull request and generated 7 comments.

Copilot

Pull request overview

Copilot reviewed 34 out of 41 changed files in this pull request and generated 6 comments.

Copilot

Pull request overview

Copilot reviewed 40 out of 47 changed files in this pull request and generated 9 comments.

Copilot · 2026-04-22T05:20:33Z

+        let sector_number = virtual_offset / self.logical_sector_size as u64;
+        let chunk_number = (sector_number / SECTORS_PER_CHUNK) as u32;
+        let sector_count = length as u64 / self.logical_sector_size as u64;


sector_count is computed with integer division, which silently truncates if length is not a multiple of logical_sector_size. That would cause resolve_partial_block_read to return ranges shorter than the requested length without error. Consider validating length % logical_sector_size == 0 here and returning VhdxIoErrorInner::UnalignedIo (or asserting this invariant if it is guaranteed by the caller) to avoid silent data loss.

Suggested change

let sector_number = virtual_offset / self.logical_sector_size as u64;

let chunk_number = (sector_number / SECTORS_PER_CHUNK) as u32;

let sector_count = length as u64 / self.logical_sector_size as u64;

let logical_sector_size = self.logical_sector_size as u64;

if (length as u64) % logical_sector_size != 0 {

return Err(VhdxIoErrorInner::UnalignedIo.into());

}

let sector_number = virtual_offset / logical_sector_size;

let chunk_number = (sector_number / SECTORS_PER_CHUNK) as u32;

let sector_count = length as u64 / logical_sector_size;

Copilot · 2026-04-22T05:20:33Z

+        let sector_number = virtual_offset / self.logical_sector_size as u64;
+        let chunk_number = (sector_number / SECTORS_PER_CHUNK) as u32;
+        let sector_count = length as u64 / self.logical_sector_size as u64;


set_sector_bitmap_bits also truncates sector_count via integer division. If length is not sector-aligned, the tail bytes of the request will be ignored and the bitmap will be updated for fewer sectors than intended. Consider enforcing alignment (returning an error) or asserting this is always sector-aligned at the call site to prevent subtle metadata/data mismatches.

Suggested change

let sector_number = virtual_offset / self.logical_sector_size as u64;

let chunk_number = (sector_number / SECTORS_PER_CHUNK) as u32;

let sector_count = length as u64 / self.logical_sector_size as u64;

let logical_sector_size = self.logical_sector_size as u64;

assert!(

virtual_offset % logical_sector_size == 0,

"virtual_offset {virtual_offset} must be aligned to logical sector size {logical_sector_size}"

);

assert!(

length as u64 % logical_sector_size == 0,

"length {length} must be aligned to logical sector size {logical_sector_size}"

);

let sector_number = virtual_offset / logical_sector_size;

let chunk_number = (sector_number / SECTORS_PER_CHUNK) as u32;

let sector_count = length as u64 / logical_sector_size;

Copilot · 2026-04-22T05:20:34Z

+    // Reasonable depth limit to detect cycles or absurdly long chains.
+    const MAX_CHAIN_DEPTH: usize = 1024;
+
+    let mut paths: Vec<std::path::PathBuf> = vec![path.to_path_buf()];
+    let mut current_path = path.to_path_buf();
+
+    loop {
+        if paths.len() > MAX_CHAIN_DEPTH {
+            anyhow::bail!(
+                "vhdx chain exceeds maximum depth of {} — possible cycle",
+                MAX_CHAIN_DEPTH
+            );
+        }


Cycle detection relies solely on a max-depth limit, so a short cycle (e.g., A→B→A) will loop until depth 1024, doing repeated IO and producing a delayed, less actionable error. Consider tracking visited canonicalized paths (or raw paths if canonicalization is undesirable) in a HashSet and failing immediately on repeat to make cycle failures faster and more diagnosable.

Copilot · 2026-04-22T05:20:34Z

+                    // (for newly-allocated block padding).
+                    let zeros = vec![0u8; length as usize];
+                    self.file
+                        .write_at(file_offset, &zeros)
+                        .await
+                        .map_err(DiskError::Io)?;


This allocates a fresh Vec<u8> sized to length for every WriteRange::Zero. For large padding ranges, this can create significant transient allocations and memory pressure. Consider using a small reusable zero buffer (e.g., 64 KiB) and writing in chunks, or providing a zero-write helper in BlockingFile that streams zeros without allocating length bytes.

Suggested change

// (for newly-allocated block padding).

let zeros = vec![0u8; length as usize];

self.file

.write_at(file_offset, &zeros)

.await

.map_err(DiskError::Io)?;

// (for newly-allocated block padding) without

// allocating a buffer for the entire range.

let zeros = [0u8; 64 * 1024];

let mut remaining = length as usize;

let mut current_offset = file_offset;

while remaining != 0 {

let chunk_len = remaining.min(zeros.len());

self.file

.write_at(current_offset, &zeros[..chunk_len])

.await

.map_err(DiskError::Io)?;

remaining -= chunk_len;

current_offset += chunk_len as u64;

}

Copilot · 2026-04-22T05:20:34Z

+        while remaining_sectors > 0 {
+            // Recompute bitmap page coordinates for current position.
+            let cur_sector = current_virtual_offset / self.logical_sector_size as u64;
+            let cur_chunk_sector = cur_sector % SECTORS_PER_CHUNK;
+            let cur_page_number = cur_chunk_sector / SECTORS_PER_BITMAP_PAGE;
+            let start_bit = cur_chunk_sector % SECTORS_PER_BITMAP_PAGE;
+            let bits_in_this_page =
+                std::cmp::min(start_bit + remaining_sectors, SECTORS_PER_BITMAP_PAGE);


When a data/unmapped run spans a bitmap-page boundary, this implementation will always end the run at the page boundary and start a new ReadRange in the next iteration. That can inflate the number of ranges (especially for large reads) and increase downstream overhead. Consider carrying the previous run state across iterations and coalescing adjacent runs of the same type when guest_offset is contiguous (and file_offset is contiguous for Data).

Copilot · 2026-04-22T05:20:35Z

+                        ranges.push(ReadRange::Unmapped {
+                            guest_offset: current_virtual_offset,
+                            length: unmapped_bytes as u32,
+                        });


When a data/unmapped run spans a bitmap-page boundary, this implementation will always end the run at the page boundary and start a new ReadRange in the next iteration. That can inflate the number of ranges (especially for large reads) and increase downstream overhead. Consider carrying the previous run state across iterations and coalescing adjacent runs of the same type when guest_offset is contiguous (and file_offset is contiguous for Data).

Copilot · 2026-04-22T05:20:35Z

+                        ranges.push(ReadRange::Data {
+                            guest_offset: current_virtual_offset,
+                            length: data_bytes as u32,
+                            file_offset,
+                        });


When a data/unmapped run spans a bitmap-page boundary, this implementation will always end the run at the page boundary and start a new ReadRange in the next iteration. That can inflate the number of ranges (especially for large reads) and increase downstream overhead. Consider carrying the previous run state across iterations and coalescing adjacent runs of the same type when guest_offset is contiguous (and file_offset is contiguous for Data).

Copilot · 2026-04-22T05:20:35Z

+            // Advance to next page.
+            let sectors_processed = bits_in_this_page - start_bit;
+            remaining_sectors -= sectors_processed;
+        }


When a data/unmapped run spans a bitmap-page boundary, this implementation will always end the run at the page boundary and start a new ReadRange in the next iteration. That can inflate the number of ranges (especially for large reads) and increase downstream overhead. Consider carrying the previous run state across iterations and coalescing adjacent runs of the same type when guest_offset is contiguous (and file_offset is contiguous for Data).

Copilot · 2026-04-22T05:20:35Z

+parser. Fixed VHD1 images also work on all platforms. Dynamic and
+differencing VHD1 files are **not** supported — convert them to VHDX
+first:



The Guide style typically prefers explicit shell language tags (e.g., bash) over shell for code fences to improve highlighting consistency. Consider changing the fence to bash (and ensuring line-wrapping stays within the style guide’s preferred width).

Copilot

Pull request overview

Copilot reviewed 40 out of 47 changed files in this pull request and generated 6 comments.

Copilot · 2026-04-22T06:17:31Z

+        let sbm_mapping = self.bat.get_sector_bitmap_mapping(chunk_number);
+        assert!(
+            sbm_mapping.bat_state() == BatEntryState::FullyPresent,
+            "SBM for chunk {chunk_number} must be allocated for PartiallyPresent block"
+        );


These assert! checks can panic when opening/reading a malformed or adversarial VHDX file, which violates the project’s trust-boundary requirement of not panicking on untrusted input. Prefer returning a typed error (e.g., a corruption/runtime I/O error) instead of panicking, and apply the same change to the identical assert! in set_sector_bitmap_bits().

Copilot · 2026-04-22T06:17:32Z

+    /// caller's FSN after the in-progress flush finishes, a new flush is
+    /// issued.
+    ///
+    /// Returns the FSN that was assigned to this flush request.


The docs say flush() returns “the FSN that was assigned”, but flush_until(..., None) returns the resolved/completed FSN (which can be higher than the caller’s assigned FSN under concurrency). Please update the doc comment (and/or method naming) to match the actual semantics, since callers may rely on the return value for ordering.

Suggested change

/// Returns the FSN that was assigned to this flush request.

/// Returns the completed FSN of the flush that satisfied this request.

/// Under concurrency this may be greater than the FSN originally assigned

/// to this caller, because a single completed flush can cover multiple

/// requests.

Copilot · 2026-04-22T06:17:32Z

+                    // Read from the VHDX file into a temporary buffer.
+                    let mut data = vec![0u8; length as usize];
+                    self.file
+                        .read_at(file_offset, &mut data)
+                        .await
+                        .map_err(DiskError::Io)?;


This creates an allocation per ReadRange::Data and then calls BlockingFile::read_at, which itself allocates internally (due to blocking::unblock needing 'static), resulting in avoidable double allocation/copy per range. Consider adding an internal BlockingFile API that returns an owned buffer (so the internal allocation can be reused instead of allocating here), or restructure the read path to minimize per-range allocations (e.g., reuse a scratch buffer for small reads).

Copilot · 2026-04-22T06:17:32Z

+                    // Read data from the request buffers.
+                    let mut data = vec![0u8; length as usize];
+                    buffers
+                        .subrange(buf_offset, length as usize)
+                        .reader()
+                        .read(&mut data)?;
+
+                    // Write to the VHDX file at the resolved offset.
+                    self.file
+                        .write_at(file_offset, &data)
+                        .await
+                        .map_err(DiskError::Io)?;


Similar to reads, this allocates a fresh Vec for every WriteRange::Data. On top of that, BlockingFile::write_at() currently clones the provided slice into another Vec to move into blocking::unblock, doubling memory traffic. Consider adding a BlockingFile::write_at_owned(Vec<u8>) (or similar) so this path can move data into the blocking closure without an extra copy.

Copilot · 2026-04-22T06:17:32Z

+- The crate currently provides a low-level I/O resolution API. A
+  higher-level `DiskIo` integration layer is needed to use it as a
+  drop-in storage backend in the OpenVMM pipeline.
+- Differencing disk chains require the caller to manage parent
+  resolution (the crate parses parent locators but does not open
+  parent files).


This section appears outdated relative to the PR: disklayer_vhdx provides an OpenVMM integration layer, and disklayer_vhdx::chain::open_vhdx_chain implements parent locator walking. Please update these bullets to reflect the new integration and clarify what remains the responsibility of the vhdx crate vs. the OpenVMM disk layer.

Suggested change

- The crate currently provides a low-level I/O resolution API. A

higher-level `DiskIo` integration layer is needed to use it as a

drop-in storage backend in the OpenVMM pipeline.

- Differencing disk chains require the caller to manage parent

resolution (the crate parses parent locators but does not open

parent files).

- The `vhdx` crate provides the low-level VHDX format

implementation and I/O resolution API. For OpenVMM integration,

`disklayer_vhdx` supplies the `DiskIo`-compatible backend layer

used in the storage pipeline.

- For differencing disks, the `vhdx` crate parses parent locator

metadata, while the OpenVMM disk layer can walk and open parent

chains via `disklayer_vhdx::chain::open_vhdx_chain`.

Copilot · 2026-04-22T06:17:33Z

+platforms via the pure-Rust [`vhdx`](../../reference/backends/vhdx.md)
+parser. Fixed VHD1 images also work on all platforms. Dynamic and
+differencing VHD1 files are **not** supported — convert them to VHDX
+first:


This claims OpenVMM uses the pure-Rust vhdx parser on all platforms, but the code changes in openvmm_helpers/src/disk.rs indicate Windows still uses the kernel-mode VHD path for .vhdx while Linux uses disklayer_vhdx. Please either adjust the docs to match the current platform behavior, or update the Windows open path to also use disklayer_vhdx so the statement is accurate.

Suggested change

platforms via the pure-Rust [`vhdx`](../../reference/backends/vhdx.md)

parser. Fixed VHD1 images also work on all platforms. Dynamic and

differencing VHD1 files are **not** supported — convert them to VHDX

first:

platforms. On Linux, VHDX support uses the pure-Rust

[`vhdx`](../../reference/backends/vhdx.md) parser via

`disklayer_vhdx`. On Windows, `.vhdx` files currently use the native

VHD open path instead. Fixed VHD1 images also work on all platforms.

Dynamic and differencing VHD1 files are **not** supported — convert

them to VHDX first:

Copilot AI review requested due to automatic review settings April 21, 2026 20:26

jstarks commented Apr 21, 2026

View reviewed changes

Comment thread openvmm/openvmm_helpers/src/disk.rs Outdated

github-actions Bot added the unsafe Related to unsafe code label Apr 21, 2026

jstarks commented Apr 21, 2026

View reviewed changes

Comment thread vm/devices/storage/disklayer_vhdx/src/chain.rs Outdated

jstarks commented Apr 21, 2026

View reviewed changes

Comment thread vm/devices/storage/disklayer_vhdx/src/chain.rs Outdated

Copilot AI reviewed Apr 21, 2026

View reviewed changes

feedback

3c3594c

jstarks requested a review from Copilot April 21, 2026 21:07

Copilot AI reviewed Apr 21, 2026

View reviewed changes

jstarks added 7 commits April 21, 2026 21:58

fix

80ea83d

error refactor

35b2a85

error tweaks

83bff61

tweaks

cdecac2

refac

35ad71f

refac

4a52747

cleanup

43d204d

Copilot AI review requested due to automatic review settings April 22, 2026 04:39

Copilot AI reviewed Apr 22, 2026

View reviewed changes

jstarks added 3 commits April 22, 2026 04:51

cleanup

a5bc16c

doc

528fa69

feedback

36994b2

Copilot AI review requested due to automatic review settings April 22, 2026 05:16

github-actions Bot added the Guide label Apr 22, 2026

Copilot AI reviewed Apr 22, 2026

View reviewed changes

jstarks added 3 commits April 22, 2026 05:55

cleanup

0e1b9aa

tweaks

a59c0f5

tweak

334940c

Copilot AI review requested due to automatic review settings April 22, 2026 06:13

Copilot AI reviewed Apr 22, 2026

View reviewed changes

-        let sector_number = virtual_offset / self.logical_sector_size as u64;
-        let chunk_number = (sector_number / SECTORS_PER_CHUNK) as u32;
-        let sector_count = length as u64 / self.logical_sector_size as u64;
+        let logical_sector_size = self.logical_sector_size as u64;
+        if (length as u64) % logical_sector_size != 0 {
+            return Err(VhdxIoErrorInner::UnalignedIo.into());
+        }
+        let sector_number = virtual_offset / logical_sector_size;
+        let chunk_number = (sector_number / SECTORS_PER_CHUNK) as u32;
+        let sector_count = length as u64 / logical_sector_size;

-                    // (for newly-allocated block padding).
-                    let zeros = vec![0u8; length as usize];
-                    self.file
-                        .write_at(file_offset, &zeros)
-                        .await
-                        .map_err(DiskError::Io)?;
+                    // (for newly-allocated block padding) without
+                    // allocating a buffer for the entire range.
+                    let zeros = [0u8; 64 * 1024];
+                    let mut remaining = length as usize;
+                    let mut current_offset = file_offset;
+                    while remaining != 0 {
+                        let chunk_len = remaining.min(zeros.len());
+                        self.file
+                            .write_at(current_offset, &zeros[..chunk_len])
+                            .await
+                            .map_err(DiskError::Io)?;
+                        remaining -= chunk_len;
+                        current_offset += chunk_len as u64;
+                    }

-    /// Returns the FSN that was assigned to this flush request.
+    /// Returns the completed FSN of the flush that satisfied this request.
+    /// Under concurrency this may be greater than the FSN originally assigned
+    /// to this caller, because a single completed flush can cover multiple
+    /// requests.

-- The crate currently provides a low-level I/O resolution API. A
-  higher-level `DiskIo` integration layer is needed to use it as a
-  drop-in storage backend in the OpenVMM pipeline.
-- Differencing disk chains require the caller to manage parent
-  resolution (the crate parses parent locators but does not open
-  parent files).
+- The `vhdx` crate provides the low-level VHDX format
+  implementation and I/O resolution API. For OpenVMM integration,
+  `disklayer_vhdx` supplies the `DiskIo`-compatible backend layer
+  used in the storage pipeline.
+- For differencing disks, the `vhdx` crate parses parent locator
+  metadata, while the OpenVMM disk layer can walk and open parent
+  chains via `disklayer_vhdx::chain::open_vhdx_chain`.

-platforms via the pure-Rust [`vhdx`](../../reference/backends/vhdx.md)
-parser. Fixed VHD1 images also work on all platforms. Dynamic and
-differencing VHD1 files are **not** supported — convert them to VHDX
-first:
+platforms. On Linux, VHDX support uses the pure-Rust
+[`vhdx`](../../reference/backends/vhdx.md) parser via
+`disklayer_vhdx`. On Windows, `.vhdx` files currently use the native
+VHD open path instead. Fixed VHD1 images also work on all platforms.
+Dynamic and differencing VHD1 files are **not** supported — convert
+them to VHDX first:

Conversation

jstarks commented Apr 21, 2026

Uh oh!

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 22, 2026