Symptom
Responses larger than ~8 KiB (macOS default Unix socketpair SO_SNDBUF) are silently truncated by the HV vsock device. The blocking transport on the host then times out waiting for the missing bytes.
Reproducer: ABX-362 E2E harness with a DAX fixture ≥ 16 KiB. Log pattern:
INFO Vsock TX: op=5 src=3:1024 dst=2:50009 len=65536 (packet_data=65580 bytes)
INFO Vsock TX: op=5 src=3:1024 dst=2:50009 len=18 (packet_data=62 bytes)
[5s later, ping_blocking 5s timeout fires]
or the variant:
WARN Vsock: write to fd 7 failed: Resource temporarily unavailable (os error 35)
Root cause
virt/arcbox-virtio-vsock/src/device.rs:314-335:
let written = unsafe {
libc::write(fd, payload.as_ptr().cast(), payload.len())
};
if written > 0 {
conns.advance_fwd_cnt(src_port, dst_port, written as u32);
} else if written < 0 {
tracing::warn!("Vsock: write to fd {fd} failed: {}", ...);
}
Three problems:
- Partial writes are lost. If
libc::write returns n < payload.len() (socketpair buffer full), the remaining bytes are discarded. The fwd_cnt is advanced by n, so the peer's credit accounting is also wrong.
- EAGAIN is a warning, not a retry. When the socketpair send buffer is full, the write fails with
EAGAIN and the whole packet is dropped.
- No flow control propagation. When the host can't drain fast enough, the guest keeps sending because we ack the
fwd_cnt eagerly. Buffer fills up, packets silently drop.
Fix options
Option A — loop on partial writes + EAGAIN backoff (preferred):
let mut offset = 0;
while offset < payload.len() {
match unsafe { libc::write(fd, payload[offset..].as_ptr().cast(), payload.len() - offset) } {
n if n > 0 => offset += n as usize,
_ if std::io::Error::last_os_error().raw_os_error() == Some(libc::EAGAIN) => {
// poll for POLLOUT with a small deadline, or buffer and retry next iter
}
_ => { /* log and break */ }
}
}
Option B — bump socketpair SO_SNDBUF / SO_RCVBUF to, say, 1 MiB when creating the socketpair in connect_vsock_hv. Reduces but doesn't eliminate the issue.
Option A + bumping the buffer is the belt-and-suspenders fix.
Scope guard
Must not block the vCPU thread for long; the vsock device runs on the BSP's run loop. Buffering unsent bytes in a queue inside the VsockConnectionManager (and draining on next TX handler invocation) is the correct approach.
Files
virt/arcbox-virtio-vsock/src/device.rs:314 (write site)
virt/arcbox-virtio-vsock/src/manager.rs (connection state — may need a per-conn TX buffer)
virt/arcbox-vmm/src/vmm/darwin_hv/mod.rs:1177 (socketpair creation — SO_SNDBUF bump)
Acceptance
- Guest → host transfer of at least 1 MiB in a single
RpcResponse round-trips correctly.
- No
"Resource temporarily unavailable" warnings under sustained load.
- The ABX-362 E2E harness can use an 8 MiB DAX fixture (matches the original plan) and still pass.
Symptom
Responses larger than ~8 KiB (macOS default Unix socketpair SO_SNDBUF) are silently truncated by the HV vsock device. The blocking transport on the host then times out waiting for the missing bytes.
Reproducer: ABX-362 E2E harness with a DAX fixture ≥ 16 KiB. Log pattern:
or the variant:
Root cause
virt/arcbox-virtio-vsock/src/device.rs:314-335:Three problems:
libc::writereturnsn < payload.len()(socketpair buffer full), the remaining bytes are discarded. Thefwd_cntis advanced byn, so the peer's credit accounting is also wrong.EAGAINand the whole packet is dropped.fwd_cnteagerly. Buffer fills up, packets silently drop.Fix options
Option A — loop on partial writes + EAGAIN backoff (preferred):
Option B — bump socketpair SO_SNDBUF / SO_RCVBUF to, say, 1 MiB when creating the socketpair in
connect_vsock_hv. Reduces but doesn't eliminate the issue.Option A + bumping the buffer is the belt-and-suspenders fix.
Scope guard
Must not block the vCPU thread for long; the vsock device runs on the BSP's run loop. Buffering unsent bytes in a queue inside the
VsockConnectionManager(and draining on next TX handler invocation) is the correct approach.Files
virt/arcbox-virtio-vsock/src/device.rs:314(write site)virt/arcbox-virtio-vsock/src/manager.rs(connection state — may need a per-conn TX buffer)virt/arcbox-vmm/src/vmm/darwin_hv/mod.rs:1177(socketpair creation — SO_SNDBUF bump)Acceptance
RpcResponseround-trips correctly."Resource temporarily unavailable"warnings under sustained load.