Skip to content

HV vsock: silent truncation of responses > ~8 KiB (partial libc::write) #256

@AprilNEA

Description

@AprilNEA

Symptom

Responses larger than ~8 KiB (macOS default Unix socketpair SO_SNDBUF) are silently truncated by the HV vsock device. The blocking transport on the host then times out waiting for the missing bytes.

Reproducer: ABX-362 E2E harness with a DAX fixture ≥ 16 KiB. Log pattern:

INFO  Vsock TX: op=5 src=3:1024 dst=2:50009 len=65536 (packet_data=65580 bytes)
INFO  Vsock TX: op=5 src=3:1024 dst=2:50009 len=18 (packet_data=62 bytes)
[5s later, ping_blocking 5s timeout fires]

or the variant:

WARN  Vsock: write to fd 7 failed: Resource temporarily unavailable (os error 35)

Root cause

virt/arcbox-virtio-vsock/src/device.rs:314-335:

let written = unsafe {
    libc::write(fd, payload.as_ptr().cast(), payload.len())
};
if written > 0 {
    conns.advance_fwd_cnt(src_port, dst_port, written as u32);
} else if written < 0 {
    tracing::warn!("Vsock: write to fd {fd} failed: {}", ...);
}

Three problems:

  1. Partial writes are lost. If libc::write returns n < payload.len() (socketpair buffer full), the remaining bytes are discarded. The fwd_cnt is advanced by n, so the peer's credit accounting is also wrong.
  2. EAGAIN is a warning, not a retry. When the socketpair send buffer is full, the write fails with EAGAIN and the whole packet is dropped.
  3. No flow control propagation. When the host can't drain fast enough, the guest keeps sending because we ack the fwd_cnt eagerly. Buffer fills up, packets silently drop.

Fix options

Option A — loop on partial writes + EAGAIN backoff (preferred):

let mut offset = 0;
while offset < payload.len() {
    match unsafe { libc::write(fd, payload[offset..].as_ptr().cast(), payload.len() - offset) } {
        n if n > 0 => offset += n as usize,
        _ if std::io::Error::last_os_error().raw_os_error() == Some(libc::EAGAIN) => {
            // poll for POLLOUT with a small deadline, or buffer and retry next iter
        }
        _ => { /* log and break */ }
    }
}

Option B — bump socketpair SO_SNDBUF / SO_RCVBUF to, say, 1 MiB when creating the socketpair in connect_vsock_hv. Reduces but doesn't eliminate the issue.

Option A + bumping the buffer is the belt-and-suspenders fix.

Scope guard

Must not block the vCPU thread for long; the vsock device runs on the BSP's run loop. Buffering unsent bytes in a queue inside the VsockConnectionManager (and draining on next TX handler invocation) is the correct approach.

Files

  • virt/arcbox-virtio-vsock/src/device.rs:314 (write site)
  • virt/arcbox-virtio-vsock/src/manager.rs (connection state — may need a per-conn TX buffer)
  • virt/arcbox-vmm/src/vmm/darwin_hv/mod.rs:1177 (socketpair creation — SO_SNDBUF bump)

Acceptance

  • Guest → host transfer of at least 1 MiB in a single RpcResponse round-trips correctly.
  • No "Resource temporarily unavailable" warnings under sustained load.
  • The ABX-362 E2E harness can use an 8 MiB DAX fixture (matches the original plan) and still pass.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions