openhcl_boot: handle multi numa private pool fallback by chris-oo · Pull Request #3312 · microsoft/openvmm

chris-oo · 2026-04-17T00:04:29Z

On some systems, there is not enough ram in numa node 0 to fufill the private pool allocation request. Change the code to support falling back to allocating the private pool across numa nodes if we cannot match the old behavior. This also opens up the possibility of adding numa awareness to dma_manager and drivers in the future.

Note that previous versions of openhcl_boot only support a single range for the private pool, but those systems shouldn't have been able to boot at all, so this change should be safe.

github-actions · 2026-04-17T00:05:19Z

⚠️ Unsafe Code Detected

This PR modifies files containing unsafe Rust code. Extra scrutiny is required during review.

For more on why we check whole files, instead of just diffs, check out the Rustonomicon

Copilot

Pull request overview

This PR updates OpenHCL boot-time private pool allocation to support multi-NUMA systems by falling back to allocating the VTL2 GPA pool across NUMA nodes when a single-node allocation (node 0) can’t satisfy the request. It also introduces a command-line option to force NUMA-split behavior and adds VMM tests to exercise the multi-range pool behavior across boot and servicing.

Changes:

Add multi-NUMA private pool allocation logic with a “try node 0 first, else split” strategy.
Extend persisted-state handling to accept multiple VTL2 GPA pool ranges (instead of a single range).
Add new tests covering multi-NUMA pool split boot + servicing scenarios and add a new cmdline knob to force split mode.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`openhcl/openhcl_boot/src/host_params/dt/mod.rs`	Implements the multi-NUMA fallback allocation behavior for the VTL2 private pool and updates persisted-state pool-range handling.
`openhcl/openhcl_boot/src/memory.rs`	Updates the address space builder to accept multiple existing pool ranges and adds unit tests around multi-range pool reservation/allocation behavior.
`openhcl/openhcl_boot/src/cmdline.rs`	Adds parsing for `OPENHCL_VTL2_GPA_POOL_NUMA=split` to force NUMA-split pool allocation.
`vmm_tests/vmm_tests/tests/tests/x86_64/openhcl_uefi.rs`	Adds a boot test that forces NUMA split and validates the VM reaches VTL2-ready and supports inspect.
`vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs`	Adds a servicing test that forces NUMA split and validates servicing works with split pool configuration.

Copilot · 2026-04-17T00:12:23Z

+    let num_nodes = numa_nodes.len() as u64;
+    let per_node_size = (pool_size_bytes / num_nodes) & !(HV_PAGE_SIZE - 1);
+    // Give the remainder (due to rounding) to the last node.
+    let last_node_size = pool_size_bytes - per_node_size * (num_nodes - 1);
+    let mut remaining = pool_size_bytes;
+
+    for (i, vnode) in numa_nodes.iter().enumerate() {
+        if remaining == 0 {
+            break;
+        }
+
+        let is_last = i == numa_nodes.len() - 1;


allocate_private_pool can hit edge cases where numa_nodes is empty (division by zero in pool_size_bytes / num_nodes) or where per_node_size becomes 0 after page-aligning (e.g., pool smaller than num_nodes * HV_PAGE_SIZE). In the latter case, intermediate nodes get alloc_size == 0, and AddressSpaceManager::allocate() returns None for len == 0, causing an unnecessary panic. Consider guarding num_nodes > 0 and distributing in pages (e.g., cap node count to min(nodes, pool_pages) and ensure every attempted allocation is at least one page), or skipping zero-sized allocations.

Suggested change

let num_nodes = numa_nodes.len() as u64;

let per_node_size = (pool_size_bytes / num_nodes) & !(HV_PAGE_SIZE - 1);

// Give the remainder (due to rounding) to the last node.

let last_node_size = pool_size_bytes - per_node_size * (num_nodes - 1);

let mut remaining = pool_size_bytes;

for (i, vnode) in numa_nodes.iter().enumerate() {

if remaining == 0 {

break;

}

let is_last = i == numa_nodes.len() - 1;

if numa_nodes.is_empty() {

panic!(

"failed to allocate VTL2 pool: no NUMA nodes found in VTL2 RAM \

(pool total {pool_size_bytes:#x} bytes, \

enable_vtl2_gpa_pool={enable_vtl2_gpa_pool:?}, \

device_dma_page_count={device_dma_page_count:#x?}, \

vp_count={vp_count}, mem_size={mem_size:#x})"

);

}

// Split only across enough nodes to keep each non-final allocation at

// least one page. For pools smaller than a page, use a single node.

let pool_pages = pool_size_bytes / HV_PAGE_SIZE;

let split_nodes = if pool_pages == 0 {

1usize

} else {

core::cmp::min(numa_nodes.len(), pool_pages as usize)

};

let num_nodes = split_nodes as u64;

let per_node_size = if split_nodes == 1 {

0

} else {

(pool_size_bytes / num_nodes) & !(HV_PAGE_SIZE - 1)

};

// Give the remainder (due to rounding) to the last node.

let last_node_size = pool_size_bytes - per_node_size * (num_nodes - 1);

let mut remaining = pool_size_bytes;

for (i, vnode) in numa_nodes.iter().take(split_nodes).enumerate() {

if remaining == 0 {

break;

}

let is_last = i == split_nodes - 1;

Copilot · 2026-04-17T00:12:23Z

+            } else if arg.starts_with(VTL2_GPA_POOL_NUMA) {
+                if let Some((_, arg)) = arg.split_once('=') {
+                    match arg {
+                        "split" => self.vtl2_gpa_pool_numa_split = true,
+                        _ => log::warn!("Unknown value for OPENHCL_VTL2_GPA_POOL_NUMA: {arg}"),
+                    }
+                }
            }


New command-line parsing for OPENHCL_VTL2_GPA_POOL_NUMA=split isn’t covered by the existing unit tests in this module. Adding a small test case that asserts vtl2_gpa_pool_numa_split toggles to true for split (and stays false for unknown values) would help prevent regressions in this parsing behavior.

chris-oo · 2026-04-17T21:02:18Z

This is blocked waiting on openvmm to at least report different numa nodes for memory so we can test this. looking at that now.

github-actions · 2026-04-17T21:58:03Z

3 Petri tests failed (0 unstable)

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Copilot · 2026-04-22T22:44:48Z

+        } else {
+            per_node_size
+        };
+


allocate_private_pool can compute per_node_size == 0 when pool_size_bytes < num_nodes * HV_PAGE_SIZE (or when num_nodes is large). In that case the non-last iterations pass alloc_size = 0 to address_space.allocate(), which returns None for zero-length allocations and triggers the panic path even though the total pool could still be allocated (e.g., entirely on the last node). Consider guarding against this by ensuring each attempted allocation is at least one page (or skipping zero-sized per-node allocations / falling back to a single-node allocation when the pool is too small to split).

Suggested change

if alloc_size == 0 {

continue;

}

Copilot · 2026-04-22T22:44:49Z

        // Choose the memory layout of the VM.
-        let mem_layout = MemoryLayout::new(
-            cfg.memory.mem_size,
-            &cfg.memory.mmio_gaps,
-            &cfg.memory.pci_ecam_gaps,
-            &cfg.memory.pci_mmio_gaps,
-            vtl2_range,
-        )
+        //
+        // When numa_mem_sizes is set, distribute guest RAM across vNUMA nodes
+        // for ACPI SRAT / FDT reporting.
+        //
+        // TODO: The vNUMA nodes reported are meant for test usage only, as they
+        // are not aligned to any physical NUMA node. There is more work to do
+        // to support useful vNUMA reporting.
+        let mem_layout = if let Some(ref sizes) = cfg.memory.numa_mem_sizes {
+            MemoryLayout::new_with_numa(
+                sizes,
+                &cfg.memory.mmio_gaps,
+                &cfg.memory.pci_ecam_gaps,
+                &cfg.memory.pci_mmio_gaps,
+                vtl2_range,
+            )
+        } else {
+            MemoryLayout::new(
+                cfg.memory.mem_size,
+                &cfg.memory.mmio_gaps,
+                &cfg.memory.pci_ecam_gaps,
+                &cfg.memory.pci_mmio_gaps,
+                vtl2_range,
+            )
+        }


When cfg.memory.numa_mem_sizes is set, MemoryLayout::new_with_numa() derives the total RAM size from the sum of sizes, but other parts of this function still use cfg.memory.mem_size (e.g., earlier VTL2 range selection for Vtl2BaseAddressType::MemoryLayout, and the mem_size log). If mem_size and sum(numa_mem_sizes) diverge (possible via non-CLI config sources), the VM can end up with an inconsistent memory configuration. It would be safer to validate sum(sizes) == cfg.memory.mem_size here (or treat mem_size as derived and ignore the caller-provided value) and return an error on mismatch.

Copilot · 2026-04-22T22:44:49Z

+    /// per-NUMA-node guest RAM sizes (comma-separated, e.g. "2G,2G").
+    /// Distributes memory across vNUMA nodes for ACPI SRAT / FDT reporting.
+    /// Mutually exclusive with --memory.
+    ///
+    /// TODO: This is informational topology only — backing pages are not pinned
+    /// to host NUMA nodes. This should change once we implement real numa
+    /// support.
+    #[clap(long, value_name = "SIZES", value_parser = parse_numa_memory, conflicts_with = "memory")]
+    pub numa_memory: Option<Vec<u64>>,
+


The PR introduces a new user-facing --numa-memory CLI flag, but the CLI reference doc in Guide/src/reference/openvmm/management/cli.md currently documents --memory and does not mention --numa-memory. Please update the Guide (or explicitly note in the PR description that docs will be updated in a follow-up) so users can discover and correctly use the new option.

…upport this yet

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Copilot · 2026-04-23T18:28:23Z

+/// Test servicing an OpenHCL VM with a multi-NUMA pool split.
+/// The pool is split across NUMA nodes via `OPENHCL_VTL2_GPA_POOL_NUMA=split`,
+/// and pool ranges must be preserved identically across the service boundary.
+#[openvmm_test(openhcl_linux_direct_x64[LATEST_LINUX_DIRECT_TEST_X64])]
+async fn servicing_numa_private_pool<T: PetriVmmBackend>(
+    config: PetriVmBuilder<T>,
+    (igvm_file,): (ResolvedArtifact<impl petri_artifacts_common::tags::IsOpenhclIgvm>,),
+) -> anyhow::Result<()> {
+    let mut flags = config.default_servicing_flags();
+    flags.override_version_checks = true;
+    openhcl_servicing_core(
+        config
+            .with_processor_topology(ProcessorTopology {
+                vp_count: 4,
+                vps_per_socket: Some(2),
+                ..Default::default()
+            })
+            .with_memory(MemoryConfig {
+                numa_mem_sizes: Some(vec![2 * 1024 * 1024 * 1024, 2 * 1024 * 1024 * 1024]),
+                ..Default::default()
+            })
+            .with_openhcl_command_line(
+                "OPENHCL_VTL2_GPA_POOL_NUMA=split OPENHCL_ENABLE_VTL2_GPA_POOL=512 OPENHCL_DISABLE_NVME_KEEP_ALIVE=0",
+            ),


The doc comment says this test validates that pool ranges are preserved identically across the service boundary, but the test body only exercises openhcl_servicing_core() (which checks inspect serialization) and doesn’t assert anything about private_pool_ranges before/after servicing. Either add an explicit check of the pool ranges across a restart, or soften the comment to match what the test actually verifies (e.g., that servicing succeeds when the pool is split).

chris-oo requested a review from a team as a code owner April 17, 2026 00:04

Copilot AI review requested due to automatic review settings April 17, 2026 00:04

github-actions Bot added the unsafe Related to unsafe code label Apr 17, 2026

Copilot started reviewing on behalf of chris-oo April 17, 2026 00:05 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings April 22, 2026 22:40

chris-oo force-pushed the private-pool-numa branch from 989ce01 to 34b59f2 Compare April 22, 2026 22:40

Copilot started reviewing on behalf of chris-oo April 22, 2026 22:40 View session

chris-oo force-pushed the private-pool-numa branch from 34b59f2 to 89a29d5 Compare April 22, 2026 22:41

Copilot AI reviewed Apr 22, 2026

View reviewed changes

openvmm: support multiple numa memory ranges

a4b7f4b

chris-oo force-pushed the private-pool-numa branch from 89a29d5 to 47b3fbf Compare April 22, 2026 22:59

chris-oo mentioned this pull request Apr 22, 2026

openvmm: support multiple numa memory ranges for test only purposes #3361

Open

chris-oo added 6 commits April 23, 2026 11:00

fix build

f5140f8

simpilfy new_with_numa

a1c48d5

clarify comment

00e60b2

openhcl_boot: handle multi numa private pool fallback

e134c5b

actually verify multi numa behavior, breaks because openvmm doesn't s…

9e5fa66

…upport this yet

vmm_tests: fix tests for multi-numa memory support

9521145

Copilot AI review requested due to automatic review settings April 23, 2026 18:21

chris-oo force-pushed the private-pool-numa branch from 47b3fbf to 9521145 Compare April 23, 2026 18:21

Copilot started reviewing on behalf of chris-oo April 23, 2026 18:22 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

+        if alloc_size == 0 {
+            continue;
+        }

Conversation

chris-oo commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

chris-oo commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants