Note: This issue was created by a scheduled automated Claude code-review check. It was found by randomly selecting a component (mavlink2-bridge) and reviewing it for correctness issues. Please treat the severity assessment below as a starting point for a human triage, not a confirmed defect. This is distinct from the earlier automated review #2023 (which covered the missing round-trip tests for the write-path impls and explicitly ruled out the sequence/field-width concerns); this issue is about a concrete decode-path behavior on multi-row input batches.
Summary
The Arrow → MAVLink decode path in libraries/extensions/mavlink2-bridge/src/arrow_convert.rs reads only row 0 of each incoming RecordBatch and never checks the row count. A dora producer that publishes an N-row batch (N > 1) on a command input (e.g. command_long_cmd, set_position_target_local_ned_cmd) will have rows 1..N-1 silently discarded — only the first command is encoded and sent to the vehicle. There is no error, warning, or trace.
Because this is the dora → vehicle path (the side that arms motors and commands position/velocity), silently dropping all-but-the-first command is a safety-relevant silent data loss, not just a cosmetic gap.
Affected code
read_primitive (the single chokepoint every read_* helper funnels through) guards against zero rows but then unconditionally returns value(0):
// libraries/extensions/mavlink2-bridge/src/arrow_convert.rs:86-96
if arr.is_empty() {
return Err(BridgeError::Mavlink(format!(
"column '{name}' has zero rows; expected ≥1 row"
)));
}
if arr.is_null(0) {
return Err(BridgeError::Mavlink(format!(
"column '{name}' is null at row 0"
)));
}
Ok(arr.value(0)) // ← rows 1..N-1 are never read
Note the error message even says "expected ≥1 row" — i.e. the contract was written as "at least one row", but every from_record_batch impl only ever consumes row 0, so the effective contract is "exactly one row" and the surplus is dropped instead of rejected.
The dispatch path has no row-count guard either:
// binaries/mavlink2-bridge-node/src/main.rs:322-331
fn decode_input<T: MavlinkArrow>(data: &ArrayRef, input: &str) -> Result<T> {
let struct_array = data.as_struct_opt().ok_or_else(|| { /* ... */ })?;
let batch = RecordBatch::from(struct_array); // N rows preserved here…
T::from_record_batch(&batch) // …but only row 0 is decoded
.with_context(|| format!("decoding {input} from incoming Arrow"))
}
A StructArray of length N round-trips into an N-row RecordBatch, so the row count is available at decode time — it is simply ignored.
Why this is a correctness/safety risk
The module's own doc header states the convention is "One row per call." (arrow_convert.rs:4-5), and the negative-test file notes these inputs are public — "any node in the dataflow can publish to them" (tests/arrow_malformed.rs). But "one row per call" is enforced nowhere:
- A Rust/Python node that batches several setpoints/commands into a single Arrow message (a natural thing to do — e.g. a planner emitting a short trajectory of
SET_POSITION_TARGET_LOCAL_NED rows) will have only the first row flown, with the rest silently dropped.
- The failure is invisible: no
Err, no log. The operator believes N commands were sent; one was.
This is the opposite of the existing, deliberate hardening on this path: read_primitive was specifically written to turn malformed input into BridgeError rather than a panic (see the comment block at arrow_convert.rs:57-70). Multi-row input is just another malformed-shape case that should be handled explicitly rather than truncated.
Verification
grep -rn "num_rows\|len() != 1" libraries/extensions/mavlink2-bridge/src binaries/mavlink2-bridge-node/src → no row-count guard anywhere.
tests/arrow_malformed.rs covers wrong-column-type, null-at-row-0, and zero-row cases for COMMAND_LONG_DATA, but has no multi-row case — so this behavior is untested.
- The encode side (
to_record_batch) always emits exactly one row, so telemetry (vehicle → dora) is unaffected; the bug is purely on inbound commands.
Reproducer sketch
// Build a 2-row COMMAND_LONG batch (two distinct commands) and decode it.
let batch = /* RecordBatch with the 11 COMMAND_LONG columns, each length 2,
row0 = MAV_CMD_NAV_TAKEOFF, row1 = MAV_CMD_NAV_LAND */;
let decoded = COMMAND_LONG_DATA::from_record_batch(&batch).unwrap();
assert_eq!(decoded.command as u32, 22); // row 0 (TAKEOFF) — row 1 (LAND) silently lost
Suggested fix
Reject (or, if intentional, explicitly iterate) multi-row batches at the decode boundary. The minimal, lowest-risk fix is a single guard in decode_input (one place, covers all six command inputs):
let batch = RecordBatch::from(struct_array);
if batch.num_rows() != 1 {
bail!(
"input '{input}' expects exactly one row per message, got {} \
(multi-row command batches are not supported; send one command per message)",
batch.num_rows()
);
}
Alternatively, enforce it inside read_primitive (change the is_empty() check to arr.len() != 1) so the library contract is self-enforcing regardless of caller, and add a rejects_multi_row case to tests/arrow_malformed.rs. If batched commands are a desired feature, the proper fix is to loop over all rows in handle_input and send one MAVLink frame per row — but that is a larger change and should be a deliberate decision.
Things I checked and ruled out (no action needed)
Other observation (lower priority, not the focus of this issue)
Inbound telemetry forwarded via the reader thread can sit in the flume channel for up to POLL_INTERVAL (100 ms) before main drains it, because main only drains the channel at the top of each loop iteration and then blocks in events.recv_timeout(POLL_INTERVAL) with no wake from the reader (main.rs:453-481). This adds up to ~100 ms of latency/jitter to telemetry outputs. Flagging for awareness; a human may judge this acceptable for v1 or worth a separate tracking issue.
Summary
The Arrow → MAVLink decode path in
libraries/extensions/mavlink2-bridge/src/arrow_convert.rsreads only row 0 of each incomingRecordBatchand never checks the row count. A dora producer that publishes an N-row batch (N > 1) on a command input (e.g.command_long_cmd,set_position_target_local_ned_cmd) will have rows 1..N-1 silently discarded — only the first command is encoded and sent to the vehicle. There is no error, warning, or trace.Because this is the dora → vehicle path (the side that arms motors and commands position/velocity), silently dropping all-but-the-first command is a safety-relevant silent data loss, not just a cosmetic gap.
Affected code
read_primitive(the single chokepoint everyread_*helper funnels through) guards against zero rows but then unconditionally returnsvalue(0):Note the error message even says "expected ≥1 row" — i.e. the contract was written as "at least one row", but every
from_record_batchimpl only ever consumes row 0, so the effective contract is "exactly one row" and the surplus is dropped instead of rejected.The dispatch path has no row-count guard either:
A
StructArrayof length N round-trips into an N-rowRecordBatch, so the row count is available at decode time — it is simply ignored.Why this is a correctness/safety risk
The module's own doc header states the convention is "One row per call." (
arrow_convert.rs:4-5), and the negative-test file notes these inputs are public — "any node in the dataflow can publish to them" (tests/arrow_malformed.rs). But "one row per call" is enforced nowhere:SET_POSITION_TARGET_LOCAL_NEDrows) will have only the first row flown, with the rest silently dropped.Err, no log. The operator believes N commands were sent; one was.This is the opposite of the existing, deliberate hardening on this path:
read_primitivewas specifically written to turn malformed input intoBridgeErrorrather than a panic (see the comment block atarrow_convert.rs:57-70). Multi-row input is just another malformed-shape case that should be handled explicitly rather than truncated.Verification
grep -rn "num_rows\|len() != 1" libraries/extensions/mavlink2-bridge/src binaries/mavlink2-bridge-node/src→ no row-count guard anywhere.tests/arrow_malformed.rscovers wrong-column-type, null-at-row-0, and zero-row cases forCOMMAND_LONG_DATA, but has no multi-row case — so this behavior is untested.to_record_batch) always emits exactly one row, so telemetry (vehicle → dora) is unaffected; the bug is purely on inbound commands.Reproducer sketch
Suggested fix
Reject (or, if intentional, explicitly iterate) multi-row batches at the decode boundary. The minimal, lowest-risk fix is a single guard in
decode_input(one place, covers all six command inputs):Alternatively, enforce it inside
read_primitive(change theis_empty()check toarr.len() != 1) so the library contract is self-enforcing regardless of caller, and add arejects_multi_rowcase totests/arrow_malformed.rs. If batched commands are a desired feature, the proper fix is to loop over all rows inhandle_inputand send one MAVLink frame per row — but that is a larger change and should be a deliberate decision.Things I checked and ruled out (no action needed)
MavHeader { sequence: 0 }inmain.rs): not a bug — independently confirmed against published mavlink-core 0.13.1 source that the concrete TCPsend()overrides the header sequence with an internalwrapping_add(1)counter (consistent with the note in mavlink2-bridge: write-path (dora→vehicle) Arrow conversions have no round-trip test coverage; silent same-type field-reorder regressions can ship undetected #2023).SERVO_OUTPUT_RAW.time_usec=UInt32): safe by construction, as already noted in mavlink2-bridge: write-path (dora→vehicle) Arrow conversions have no round-trip test coverage; silent same-type field-reorder regressions can ship undetected #2023.Other observation (lower priority, not the focus of this issue)
Inbound telemetry forwarded via the reader thread can sit in the
flumechannel for up toPOLL_INTERVAL(100 ms) beforemaindrains it, becausemainonly drains the channel at the top of each loop iteration and then blocks inevents.recv_timeout(POLL_INTERVAL)with no wake from the reader (main.rs:453-481). This adds up to ~100 ms of latency/jitter to telemetry outputs. Flagging for awareness; a human may judge this acceptable for v1 or worth a separate tracking issue.