fix out-of-range getbits width in legacy get_old_word1#240
Conversation
|
Thanks again for the catch, analysis and patch! Can you please provide the trigger file? This one is not as easy to create as the last one... Thanks! |
|
Generator is below. It is only a 104-byte file, so a script is cleaner than an attachment and it shows why it is awkward to craft by hand. import struct
def le16(v): return struct.pack("<H", v & 0xffff)
def le32(v): return struct.pack("<I", v & 0xffffffff)
wavhdr = le16(1)+le16(1)+le32(44100)+le32(88200)+le16(2)+le16(16) # PCM, 1ch, 16-bit
out = b"RIFF"+le32(0)+b"WAVE"
out += b"fmt "+le32(len(wavhdr))+wavhdr
out += b"data"+le32(4) # data size -> 2 samples
wph = b"wvpk"+le32(36)+le16(3)+le16(0)+le16(0x11)+le16(0) # version=3, bits=0, MONO|HIGH
wph += le32(2)+le32(0)+le32(0)+b"\x00"*4+b"\x00"*4
out += wph
out += bytes([0xFF]*7)+bytes([0x0F])+bytes([0x00]*16) # 60 leading 1-bits then padding
open("old_word1.wv","wb").write(out)The trick is the bitstream opening. On the first sample guess_k is 0, so the run of 1-bits at the start drives k_value straight past 31 before a single sample has been decoded. Here it lands on 35, and getbits then shifts the 32-bit register left by 35 while refilling. check-undefined before the patch: After the patch get_old_word1 returns WORD_EOF on the bad width and 1371 is clean. The file decodes to a graceful "missing samples" EOF and a normal lossless roundtrip is unaffected. One heads-up before you add it to the suite: the same input also trips a separate, pre-existing left-shift-of-negative at unpack3.c:334 in the clamp setup, which is unrelated to this fix and I have left it untouched. So check-undefined will still flag 334 after the patch. That is why I did not drop the file straight into fuzzing/regression. Happy to look at 334 on its own if you want it sorted. |
|
Okay, I fixed the pre-existing left-shift-of-negative at unpack3.c:334 and also included the crafted file. Now I'll pull in this and all should be fixed. Thanks! |
|
I also see that there are other left-shift-of-negative and signed-integer-overflow occurrences in the legacy version 3 code that trigger on valid files. I don't think this is too serious (they're certainly not vulnerabilities) but I'll fix them when I get a chance. In the meantime I was able to continue using undefined sanitizers by adding |
UBSan (clang -fsanitize=undefined), decoding a crafted version-3 file:
Was checking over the other legacy readers after the get_word3 fix. get_old_word1 pulls the entropy width k out of w1.k_value[chan] and passes it to getbits() before checking it. The (k & ~31) guard sits below the getbits call, folded into the bc==32 test, so an out-of-range width gets used first and rejected after.
On the first sample guess_k is 0, so a bitstream that opens with roughly 48 or more 1-bits drives k_value up to 32..47 (or wraps it negative). getbits then shifts the 32-bit register left by >= 32 while refilling. Behaviour is undefined. Reached through WavpackUnpackSamples on a version-3 lossless mono high (not new-high) file, so it only builds with --enable-legacy.
get_word1, get_word2, get_word3 and get_word4 all range-check the width before getbits. get_old_word1 is the odd one out. Moving its check ahead of the getbits call brings it in line and returns WORD_EOF on a bad width.