Context
CompCert's experience is instructive: despite having full Coq proofs for every compiler pass, they still found bugs using Csmith (a random C program generator). The bugs were always outside the proven core — in the parser, the assembler, the linker. NASA GMAT uses similar Monte Carlo campaigns for trajectory edge cases.
Problem
Synth's Rocq proofs cover instruction selection (T1 for i32), but synth-frontend (parser), synth-backend (ELF builder), and linker scripts are unproven. Fuzz directories exist but aren't scaled up into systematic campaigns.
Proposal
Build a Monte Carlo verification framework:
-
WASM Module Generator: Generate thousands of random but valid WASM modules with varying:
- Instruction sequences (all opcodes, edge-case operands)
- Type combinations (i32/i64/f32/f64 mixed)
- Control flow patterns (nested blocks, loops, br_table)
- Component Model features (multiple memories, resources)
-
Differential Testing Pipeline:
- Compile each module through Loom → Synth
- Execute on Kiln (reference) and Renode ARM emulation (compiled)
- Compare outputs bit-for-bit
- Track which instruction sequences/types/patterns trigger failures
-
Boundary Focus: Specifically target unproven code:
- Parser edge cases (malformed sections, unusual orderings)
- ELF builder (symbol table overflow, section alignment)
- Linker scripts (memory region overflow, alignment)
- Vector table generation (exception handler addresses)
-
Campaign Infrastructure:
- Nightly CI job running N thousand random modules
- Coverage tracking per code path
- Regression corpus of interesting failures
Effort
Low-Medium — fuzz infrastructure exists, this scales it up and adds differential testing.
Priority
High — fastest way to catch bugs in unproven boundaries.
Context
CompCert's experience is instructive: despite having full Coq proofs for every compiler pass, they still found bugs using Csmith (a random C program generator). The bugs were always outside the proven core — in the parser, the assembler, the linker. NASA GMAT uses similar Monte Carlo campaigns for trajectory edge cases.
Problem
Synth's Rocq proofs cover instruction selection (T1 for i32), but
synth-frontend(parser),synth-backend(ELF builder), and linker scripts are unproven. Fuzz directories exist but aren't scaled up into systematic campaigns.Proposal
Build a Monte Carlo verification framework:
WASM Module Generator: Generate thousands of random but valid WASM modules with varying:
Differential Testing Pipeline:
Boundary Focus: Specifically target unproven code:
Campaign Infrastructure:
Effort
Low-Medium — fuzz infrastructure exists, this scales it up and adds differential testing.
Priority
High — fastest way to catch bugs in unproven boundaries.