Skip to content

Crash in iter_notes() when parsing NT_GNU_BUILD_ID notes inside ET_CORE files (type-3 note collision) #656

@JonathonReinhart

Description

@JonathonReinhart

Bug Description

When iterating over note entries inside a coredump ELF binary (ET_CORE type) using iter_notes(), the notes parser unconditionally decodes note type 3 as a process information note (NT_PRPSINFO), which expects a 124-byte struct payload (Elf_Prpsinfo).

However, type 3 inside a coredump note segment is only NT_PRPSINFO if the note name is 'CORE'. If the note name is 'GNU', type 3 represents a standard NT_GNU_BUILD_ID note containing a raw binary build UUID (typically 16 or 20 bytes long).

Because pyelftools ignores the Note Name and switches the enums map solely on the ELF type ET_CORE, it incorrectly attempts to parse the 20-byte GNU Build ID note payload as the 124-byte Elf_Prpsinfo structure, leading to a fatal parser overrun exception:
elftools.common.exceptions.ELFParseError: expected 4, found 0


Steps to Reproduce

Run the following minimal, self-contained Python script to reproduce the crash. It builds a valid 32-bit little-endian ARM coredump (ET_CORE type) containing a standard GNU Build ID note segment (type 3, name 'GNU'), and attempts to parse it using iter_notes():

import subprocess
from elftools.elf.elffile import ELFFile
from elftools.elf.segments import NoteSegment

# Construct a minimal, valid binary coredump containing a GNU Build ID note segment (type 3, name GNU)
# Using a 32-bit little-endian EM_ARM core header
data = bytearray()
# ELF Header (32-bit, little-endian)
data.extend(b"\x7fELF\x01\x01\x01\x00" + b"\x00" * 8) # e_ident
data.extend(b"\x04\x00")            # e_type = ET_CORE (4)
data.extend(b"\x28\x00")            # e_machine = EM_ARM (40)
data.extend(b"\x01\x00\x00\x00")    # e_version = 1
data.extend(b"\x00\x00\x00\x00")    # e_entry = 0
data.extend(b"\x34\x00\x00\x00")    # e_phoff = 52
data.extend(b"\x00\x00\x00\x00")    # e_shoff = 0
data.extend(b"\x00\x00\x00\x00")    # e_flags = 0
data.extend(b"\x34\x00")            # e_ehsize = 52
data.extend(b"\x20\x00")            # e_phentsize = 32
data.extend(b"\x01\x00")            # e_phnum = 1
data.extend(b"\x00\x00")            # e_shentsize = 0
data.extend(b"\x00\x00")            # e_shnum = 0
data.extend(b"\x00\x00")            # e_shstrndx = 0

# Program Header (PT_NOTE Segment)
data.extend(b"\x04\x00\x00\x00")    # p_type = PT_NOTE (4)
data.extend(b"\x54\x00\x00\x00")    # p_offset = 84
data.extend(b"\x54\x00\x00\x00")    # p_vaddr = 84
data.extend(b"\x54\x00\x00\x00")    # p_paddr = 84
data.extend(b"\x18\x00\x00\x00")    # p_filesz = 24
data.extend(b"\x18\x00\x00\x00")    # p_memsz = 24
data.extend(b"\x00\x00\x00\x00")    # p_flags = 0
data.extend(b"\x04\x00\x00\x00")    # p_align = 4

# Note Header (n_namesz=4, n_descsz=8, n_type=3)
data.extend(b"\x04\x00\x00\x00")    # n_namesz = 4
data.extend(b"\x08\x00\x00\x00")    # n_descsz = 8
data.extend(b"\x03\x00\x00\x00")    # n_type = 3
data.extend(b"GNU\x00")             # n_name = 'GNU'
data.extend(b"\xde\xad\xc0\xde\xde\xad\xc0\xde") # n_desc = 8 bytes Build ID (deadc0dedeadc0de)

# Write the constructed ELF data to disk
elf_filename = 'mock_core.elf'
with open(elf_filename, 'wb') as f:
    f.write(data)
print(f"Written mock core ELF to '{elf_filename}' ({len(data)} bytes)")

# Execute 'readelf --notes' on the written file
print("\nExecuting 'readelf --notes %s':" % elf_filename)
result = subprocess.run(['readelf', '--notes', elf_filename], capture_output=True, text=True, check=True)
print("--- readelf output start ---")
print(result.stdout.strip())
print("--- readelf output end ---")

# Open the file from disk and parse with pyelftools
print("\nParsing with pyelftools:")
with open(elf_filename, 'rb') as f:
    elf = ELFFile(f)
    note_seg = next(seg for seg in elf.iter_segments() if isinstance(seg, NoteSegment))
    notes = list(note_seg.iter_notes())
    
    assert len(notes) == 1, "Expected 1 note, got %d" % len(notes)
    note = notes[0]
    print("Parsed Note Type:", note['n_type'])
    print("Parsed Note Name:", note['n_name'])
    print("Parsed Note Desc:", note['n_desc'])
    
    assert note['n_type'] == 'NT_GNU_BUILD_ID', "Expected NT_GNU_BUILD_ID, got %r" % note['n_type']
    assert note['n_name'] == 'GNU', "Expected 'GNU', got %r" % note['n_name']
    expected_desc = 'deadc0dedeadc0de'
    assert note['n_desc'] == expected_desc, "Expected %r, got %r" % (expected_desc, note['n_desc'])
    print("Success!")

Expected Behavior

iter_notes() should successfully parse the note segment. It should yield the GNU Build ID note entry with type NT_GNU_BUILD_ID and the description payload.


Actual Behavior

iter_notes() crashes with ELFParseError:

  File ".../site-packages/elftools/elf/notes.py", line 45, in iter_notes
    note['n_desc'] = struct_parse(elffile.structs.Elf_Prpsinfo, elffile.stream, offset)
  File ".../site-packages/elftools/common/utils.py", line 45, in struct_parse
    raise ELFParseError(str(e))
elffile.common.exceptions.ELFParseError: expected 4, found 0

Root Cause Analysis

Inside elftools/elf/structs.py on lines 440-447:

self.Elf_Nhdr = Struct('Elf_Nhdr',
    self.Elf_word('n_namesz'),
    self.Elf_word('n_descsz'),
    Enum(self.Elf_word('n_type'),
        **(ENUM_NOTE_N_TYPE if e_type != "ET_CORE"
           else ENUM_CORE_NOTE_N_TYPE)),
)

The note header type enum mapping is switched unconditionally based on the binary's e_type. If e_type == "ET_CORE", it maps note type 3 to 'NT_PRPSINFO' instead of 'NT_GNU_BUILD_ID'.

This dates back to #147 ("Better support for core dumps"), which assumed that every note segment inside a coredump ELF has coredump-specific types, overlooking the fact that coredumps can also contain standard generic GNU note segments.


Suggested Fix Strategy

The note type enums decoder mapping should not be switched unconditionally based solely on e_type. It should also check the Note Name (n_name) if it is a coredump file:

  • If n_name == 'GNU', use ENUM_NOTE_N_TYPE to decode the type (resolving type 3 to NT_GNU_BUILD_ID).
  • If n_name == 'CORE', use ENUM_CORE_NOTE_N_TYPE (resolving type 3 to NT_PRPSINFO).

Footnote

Note that it is rare, but valid, for a coredump to directly contain a NT_GNU_BUILD_ID note. Linux coredumps will instead typically contain the first page of all memory-mapped ELF files, and tools like GDB will discover the build ID(s) from those snapshots. However, other custom coredump producers (particularly embedded) may not have any memory-mapped ELF headers, and will emit the NT_GNU_BUILD_ID note directly.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions