HV vsock: silent truncation of responses > ~8 KiB (partial libc::write)

## Symptom

Responses larger than \~8 KiB (macOS default Unix socketpair SO_SNDBUF) are silently truncated by the HV vsock device. The blocking transport on the host then times out waiting for the missing bytes.

Reproducer: ABX-362 E2E harness with a DAX fixture ≥ 16 KiB. Log pattern:

```
INFO  Vsock TX: op=5 src=3:1024 dst=2:50009 len=65536 (packet_data=65580 bytes)
INFO  Vsock TX: op=5 src=3:1024 dst=2:50009 len=18 (packet_data=62 bytes)
[5s later, ping_blocking 5s timeout fires]
```

or the variant:

```
WARN  Vsock: write to fd 7 failed: Resource temporarily unavailable (os error 35)
```

## Root cause

`virt/arcbox-virtio-vsock/src/device.rs:314-335`:

```rust
let written = unsafe {
    libc::write(fd, payload.as_ptr().cast(), payload.len())
};
if written > 0 {
    conns.advance_fwd_cnt(src_port, dst_port, written as u32);
} else if written < 0 {
    tracing::warn!("Vsock: write to fd {fd} failed: {}", ...);
}
```

Three problems:

1. **Partial writes are lost.** If `libc::write` returns `n < payload.len()` (socketpair buffer full), the remaining bytes are discarded. The `fwd_cnt` is advanced by `n`, so the peer's credit accounting is also wrong.
2. **EAGAIN is a warning, not a retry.** When the socketpair send buffer is full, the write fails with `EAGAIN` and the whole packet is dropped.
3. **No flow control propagation.** When the host can't drain fast enough, the guest keeps sending because we ack the `fwd_cnt` eagerly. Buffer fills up, packets silently drop.

## Fix options

**Option A — loop on partial writes + EAGAIN backoff** (preferred):

```rust
let mut offset = 0;
while offset < payload.len() {
    match unsafe { libc::write(fd, payload[offset..].as_ptr().cast(), payload.len() - offset) } {
        n if n > 0 => offset += n as usize,
        _ if std::io::Error::last_os_error().raw_os_error() == Some(libc::EAGAIN) => {
            // poll for POLLOUT with a small deadline, or buffer and retry next iter
        }
        _ => { /* log and break */ }
    }
}
```

**Option B — bump socketpair SO_SNDBUF / SO_RCVBUF** to, say, 1 MiB when creating the socketpair in `connect_vsock_hv`. Reduces but doesn't eliminate the issue.

Option A + bumping the buffer is the belt-and-suspenders fix.

## Scope guard

Must not block the vCPU thread for long; the vsock device runs on the BSP's run loop. Buffering unsent bytes in a queue inside the `VsockConnectionManager` (and draining on next TX handler invocation) is the correct approach.

## Files

* `virt/arcbox-virtio-vsock/src/device.rs:314` (write site)
* `virt/arcbox-virtio-vsock/src/manager.rs` (connection state — may need a per-conn TX buffer)
* `virt/arcbox-vmm/src/vmm/darwin_hv/mod.rs:1177` (socketpair creation — SO_SNDBUF bump)

## Acceptance

* Guest → host transfer of at least 1 MiB in a single `RpcResponse` round-trips correctly.
* No `"Resource temporarily unavailable"` warnings under sustained load.
* The ABX-362 E2E harness can use an 8 MiB DAX fixture (matches the original plan) and still pass.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HV vsock: silent truncation of responses > ~8 KiB (partial libc::write) #256

Symptom

Root cause

Fix options

Scope guard

Files

Acceptance

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

HV vsock: silent truncation of responses > ~8 KiB (partial libc::write) #256

Description

Symptom

Root cause

Fix options

Scope guard

Files

Acceptance

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions