Skip to content

fix out-of-range getbits width in legacy get_old_word1#240

Merged
dbry merged 1 commit into
dbry:masterfrom
aizu-m:get-old-word1-getbits-width
Jun 9, 2026
Merged

fix out-of-range getbits width in legacy get_old_word1#240
dbry merged 1 commit into
dbry:masterfrom
aizu-m:get-old-word1-getbits-width

Conversation

@aizu-m

@aizu-m aizu-m commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

UBSan (clang -fsanitize=undefined), decoding a crafted version-3 file:

src/unpack3.c:1371:5: runtime error: shift exponent 35 is too large for 32-bit type 'uint32_t'
    #0 get_old_word1 unpack3.c:1371
    #1 unpack_samples3 unpack3.c:372

Was checking over the other legacy readers after the get_word3 fix. get_old_word1 pulls the entropy width k out of w1.k_value[chan] and passes it to getbits() before checking it. The (k & ~31) guard sits below the getbits call, folded into the bc==32 test, so an out-of-range width gets used first and rejected after.

On the first sample guess_k is 0, so a bitstream that opens with roughly 48 or more 1-bits drives k_value up to 32..47 (or wraps it negative). getbits then shifts the 32-bit register left by >= 32 while refilling. Behaviour is undefined. Reached through WavpackUnpackSamples on a version-3 lossless mono high (not new-high) file, so it only builds with --enable-legacy.

get_word1, get_word2, get_word3 and get_word4 all range-check the width before getbits. get_old_word1 is the odd one out. Moving its check ahead of the getbits call brings it in line and returns WORD_EOF on a bad width.

@dbry

dbry commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Thanks again for the catch, analysis and patch!

Can you please provide the trigger file? This one is not as easy to create as the last one...

Thanks!

@aizu-m

aizu-m commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Generator is below. It is only a 104-byte file, so a script is cleaner than an attachment and it shows why it is awkward to craft by hand.

import struct
def le16(v): return struct.pack("<H", v & 0xffff)
def le32(v): return struct.pack("<I", v & 0xffffffff)
wavhdr = le16(1)+le16(1)+le32(44100)+le32(88200)+le16(2)+le16(16)   # PCM, 1ch, 16-bit
out  = b"RIFF"+le32(0)+b"WAVE"
out += b"fmt "+le32(len(wavhdr))+wavhdr
out += b"data"+le32(4)                                             # data size -> 2 samples
wph  = b"wvpk"+le32(36)+le16(3)+le16(0)+le16(0x11)+le16(0)         # version=3, bits=0, MONO|HIGH
wph += le32(2)+le32(0)+le32(0)+b"\x00"*4+b"\x00"*4
out += wph
out += bytes([0xFF]*7)+bytes([0x0F])+bytes([0x00]*16)             # 60 leading 1-bits then padding
open("old_word1.wv","wb").write(out)

The trick is the bitstream opening. On the first sample guess_k is 0, so the run of 1-bits at the start drives k_value straight past 31 before a single sample has been decoded. Here it lands on 35, and getbits then shifts the 32-bit register left by 35 while refilling.

check-undefined before the patch:

src/unpack3.c:1371:5: runtime error: shift exponent 35 is too large for 32-bit type "uint32_t"
    #0 get_old_word1 unpack3.c:1371
    #1 unpack_samples3 unpack3.c:372

After the patch get_old_word1 returns WORD_EOF on the bad width and 1371 is clean. The file decodes to a graceful "missing samples" EOF and a normal lossless roundtrip is unaffected.

One heads-up before you add it to the suite: the same input also trips a separate, pre-existing left-shift-of-negative at unpack3.c:334 in the clamp setup, which is unrelated to this fix and I have left it untouched. So check-undefined will still flag 334 after the patch. That is why I did not drop the file straight into fuzzing/regression. Happy to look at 334 on its own if you want it sorted.

@dbry

dbry commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Okay, I fixed the pre-existing left-shift-of-negative at unpack3.c:334 and also included the crafted file. Now I'll pull in this and all should be fixed.

Thanks!

@dbry dbry merged commit 841d337 into dbry:master Jun 9, 2026
10 checks passed
@dbry

dbry commented Jun 9, 2026

Copy link
Copy Markdown
Owner

I also see that there are other left-shift-of-negative and signed-integer-overflow occurrences in the legacy version 3 code that trigger on valid files. I don't think this is too serious (they're certainly not vulnerabilities) but I'll fix them when I get a chance.

In the meantime I was able to continue using undefined sanitizers by adding -fno-sanitize=signed-integer-overflow and -fno-sanitize=shift-base to the builds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants