Skip to content

[DO NOT MERGE] Diagnostic test for QEMU s390x netlink issue#506

Closed
ESoapW wants to merge 2 commits intomainfrom
netdiag-qemu-test
Closed

[DO NOT MERGE] Diagnostic test for QEMU s390x netlink issue#506
ESoapW wants to merge 2 commits intomainfrom
netdiag-qemu-test

Conversation

@ESoapW
Copy link
Copy Markdown
Contributor

@ESoapW ESoapW commented Apr 2, 2026

DO NOT MERGE - diagnostic only

Adding a test package (netdiag/) to figure out exactly which network interface detection methods break under QEMU s390x emulation. We've been seeing parsenetlinkrouteattr: invalid argument on COPR s390x builds since they lost native builders around Mar 13.

This tests netlink, sysfs, procfs, and raw socket paths independently and prints a summary table. Expecting netlink methods to fail while sysfs/procfs alternatives work fine, which would confirm the issue is QEMU's incomplete byte-order translation of netlink rtattr structs.

Will close after reviewing COPR build logs.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 2, 2026
@ESoapW ESoapW force-pushed the netdiag-qemu-test branch from 2584170 to 6999a69 Compare April 2, 2026 18:13
@ESoapW ESoapW changed the title Add diagnostic test for QEMU s390x netlink issue [DO NOT MERGE] Diagnostic test for QEMU s390x netlink issue Apr 7, 2026
@ESoapW
Copy link
Copy Markdown
Contributor Author

ESoapW commented Apr 7, 2026

packit copr-build

@ESoapW ESoapW force-pushed the netdiag-qemu-test branch 4 times, most recently from 2bd727d to 4b7db2c Compare April 8, 2026 14:38
@ESoapW ESoapW closed this Apr 8, 2026
@ESoapW ESoapW force-pushed the netdiag-qemu-test branch from 4b7db2c to 541f8cc Compare April 8, 2026 15:20
@ESoapW ESoapW reopened this Apr 8, 2026
@ESoapW ESoapW force-pushed the netdiag-qemu-test branch 5 times, most recently from 46b2a4f to d8e8b40 Compare April 9, 2026 12:07
@ESoapW ESoapW force-pushed the netdiag-qemu-test branch from d8e8b40 to d33b52a Compare April 9, 2026 14:19
ESoapW added a commit that referenced this pull request Apr 9, 2026
COPR lost native s390x builders around March 2026 and switched to QEMU
user-mode emulation on x86_64. QEMU doesn't properly byte-swap netlink
RTM_GETLINK rtattr structs when emulating big-endian s390x on a
little-endian host, causing Go's net.InterfaceByName() to fail with
'parsenetlinkrouteattr: invalid argument'.

We confirmed this is a QEMU bug, not a code issue:
- Native s390x (Koji): all tests pass
- Native x86_64: all tests pass
- QEMU s390x (COPR): only netlink RTM_GETLINK tests fail
- Diagnostic test (PR #506) shows RTM_GETADDR works, RTM_GETLINK doesn't,
  sysfs/procfs alternatives work fine

A patch has been submitted to QEMU upstream. Removing s390x from COPR
targets until the QEMU fix lands. This only affects CI testing, not the
official Fedora package. Koji still builds s390x with real hardware and
Fedora users are unaffected.
meta-codesync Bot pushed a commit that referenced this pull request Apr 9, 2026
Summary:
COPR lost native s390x builders around March 2026 and switched to QEMU user-mode emulation on x86_64. QEMU doesn't properly byte-swap netlink RTM_GETLINK rtattr structs when emulating big-endian s390x on a little-endian host, causing Go's net.InterfaceByName() to fail with 'parsenetlinkrouteattr: invalid argument'.

We confirmed this is a QEMU bug, not a code issue:
- Native s390x (Koji): all tests pass
- Native x86_64: all tests pass
- QEMU s390x (COPR): only netlink RTM_GETLINK tests fail
- Diagnostic test (PR #506) shows RTM_GETADDR works, RTM_GETLINK doesn't, sysfs/procfs alternatives work fine

A patch has been submitted to QEMU upstream. Removing s390x from COPR targets until the QEMU fix lands. This only affects CI testing, not the official Fedora package. Koji still builds s390x with real hardware and Fedora users are unaffected.

https://gitlab.com/qemu-project/qemu/-/work_items/2485#note_3236597357

Pull Request resolved: #508

Reviewed By: vvfedorenko

Differential Revision: D100185224

Pulled By: ESoapW

fbshipit-source-id: 9070472bd76975b7dbb94de3514ffa300ba19f50
@ESoapW
Copy link
Copy Markdown
Contributor Author

ESoapW commented Apr 9, 2026

Closing this PR. The diagnostic test served its purpose and we found the root cause.

What we found

The netdiag test helped us trace the issue to a one-character off-by-one bug in QEMU's linux-user/fd-trans.c. The function host_to_target_for_each_rtattr() uses while (len > sizeof(struct rtattr)) but should use >=. When the last rtattr in a netlink RTM_GETLINK response has exactly 4 bytes remaining, the loop exits without byte-swapping it. The s390x binary reads rta_len in big-endian, gets garbage, returns EINVAL.

Key evidence from the diagnostic

  • uname -r: 6.18.4-200.fc43.x86_64 confirmed QEMU user-mode on x86_64 host
  • Raw byte walk showed 37 rtattrs correctly swapped, the 38th (4 bytes remaining) was not
  • RTM_GETADDR (control case) ended with 0 bytes remaining and worked fine
  • The companion function target_to_host_for_each_rtattr already uses >= correctly (written 7 years later in 2023)
  • The kernel's own RTA_OK macro uses >=

Status

The bug has been in QEMU since 2016 (commit 6c5b5645ae). Patch sent to qemu-devel and posted on the existing QEMU issue: https://gitlab.com/qemu-project/qemu/-/work_items/2485

@ESoapW ESoapW closed this Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant