diff --git a/src/loader/passes/BLOCKLESS_DAG.md b/src/loader/passes/BLOCKLESS_DAG.md new file mode 100644 index 0000000..4ac62a2 --- /dev/null +++ b/src/loader/passes/BLOCKLESS_DAG.md @@ -0,0 +1,120 @@ +# Blockless DAG Pass + +**Source:** `blockless_dag.rs` + +**Input:** `DanglingOptDag` (optimized DAG with nested blocks and loops) +**Output:** `BlocklessDag` (flat DAG with labels; only loops retain sub-DAGs) + +## Purpose + +This is the last common pipeline pass before the backend-specific stages. It +flattens the nested block structure into a linear sequence of nodes with labels +marking jump targets. After this pass, the only nesting that remains is for +loops — each loop still has its own sub-DAG, because loops represent a separate +"frame" with its own address space in the final output. + +Non-loop blocks are fully inlined into their parent DAG, with their outputs +becoming labels that breaks can jump to. This makes the representation much +closer to assembly: a flat sequence of operations with forward-only jumps to +labels. + +## Key Transformation + +### Blocks Become Labels + +A non-loop block in the input DAG: +``` +Block { + kind: Block, + sub_dag: [Inputs, ..., Br(0, outputs)] +} +``` + +is inlined into the parent. The block's input node is suppressed (its outputs +are remapped to the corresponding inputs in the parent scope), and a `Label` +node is inserted where the block's outputs would be consumed. Break instructions +targeting the block become jumps to this label. + +### Loops Remain Nested + +Loop blocks keep their sub-DAG structure: +``` +Loop { + sub_dag: BlocklessDag { nodes: [...] }, + break_targets: [(depth, [target_types])] +} +``` + +The `break_targets` field records all the break targets that the loop body +uses, relative to the parent frame. This lets the backend know which external +labels/frames the loop may jump to. + +## Break Target Resolution + +In the input DAG, break targets are relative depths into the block stack. In the +blockless DAG, targets are resolved into `BreakTarget { depth, kind }`: + +- **`depth`**: The number of frame levels between the break and the target. At + the top level, depth 0 means the current function/loop frame. Inside a loop, + depth 1 means the parent frame, depth 2 the grandparent, etc. + +- **`kind`**: Either `FunctionOrLoop` (targeting the function return or a loop's + next iteration) or `Label(id)` (targeting a specific label created from an + inlined block). + +The key property: **jumps to labels are always forward** (labels appear after +the jumps that target them), while **jumps to loops go backward** (to the loop +header at the start of the loop's sub-DAG). + +## Example + +Input DAG (with nested block): +``` +Node 0: Inputs → [x] +Node 1: Block { + kind: Block, + sub_dag: [ + Node 0: Inputs → [x] + Node 1: i32.const 10 + Node 2: i32.gt_s ← [(0,0), (1,0)] + Node 3: br_if 0 ← [(0,0), (2,0)] ;; exit block if x > 10 + Node 4: i32.const 0 + Node 5: br 1 ← [(4,0)] ;; return 0 + ] +} → [result] +Node 2: br 0 ← [(1,0)] ;; return result +``` + +Output blockless DAG (flattened): +``` +Node 0: Inputs → [x] +Node 1: i32.const 10 +Node 2: i32.gt_s ← [(0,0), (1,0)] +Node 3: BrIf(Label(42)) ← [(0,0), (2,0)] ;; jump to label if x > 10 +Node 4: i32.const 0 +Node 5: Br(Function) ← [(4,0)] ;; return 0 +Node 6: Label { id: 42 } → [result] ;; target for the br_if +Node 7: Br(Function) ← [(6,0)] ;; return result +``` + +The block's internal input node (its node 0) was suppressed and its references +were remapped to the parent's node 0. The block itself became a label node. + +## Node Remapping + +When blocks are inlined, node indices change. The pass maintains an +`outputs_map: HashMap` that translates old +`(node, output)` pairs to new ones. For inlined block inputs, the map redirects +through the `input_mapping` to the actual source nodes in the parent. + +## Design Notes + +- Labels use unique IDs generated by a shared `AtomicU32` counter (the + `LabelGenerator`), ensuring uniqueness across all functions and all frames. + +- The pass preserves the `NodeInput::Constant` variant, passing inline + constants through unchanged. + +- Break targets are resolved relative to frame boundaries, not block nesting. + This is important because the backends allocate registers per-frame (per + function or per loop body), not per-block. diff --git a/src/loader/passes/BLOCK_TREE.md b/src/loader/passes/BLOCK_TREE.md new file mode 100644 index 0000000..07a9d30 --- /dev/null +++ b/src/loader/passes/BLOCK_TREE.md @@ -0,0 +1,131 @@ +# Block Tree Pass + +**Source:** `block_tree.rs` + +**Input:** Raw WASM function bytecode (`Unparsed`) +**Output:** `BlockTree` (tree of `Block` and `Instruction` elements) + +## Purpose + +This is the first pass in the compilation pipeline. It takes the raw stream of +WASM operators and parses them into a tree structure where control flow is +represented by nested blocks and loops, and instructions within each block form +a linear sequence. + +The pass also normalizes several WASM patterns into simpler, more uniform +representations that are easier for subsequent passes to handle. + +## Normalizations + +### If-Else to Block + BrIf + +WASM's `if-else-end` construct is desugared into blocks with conditional +breaks. This reduces the number of control flow constructs that later passes +need to handle. + +**If without else:** +``` +;; Original WASM ;; Normalized BlockTree +if block (params..., i32) -> (results...) + br_if_zero 0 ;; skip if_body when false +end + end +``` + +**If with else:** +``` +;; Original WASM ;; Normalized BlockTree +if block (params..., i32) -> (results...) + block (params..., i32) -> (params...) +else br_if 0 ;; skip else_body when true + +end br 1 ;; skip if_body + end + + end +``` + +The condition value is carried as an extra block input and consumed by the +conditional break at the top. + +### Return to Br + +WASM `return` is converted to a `br` targeting the outermost block (the +function body). This makes the function body just another block, simplifying +break handling. + +``` +;; Original ;; Normalized +return br +``` + +### Explicit Fallthrough Breaks + +Every block that can fall through gets an explicit `br 0` appended. This +guarantees that all blocks are exited via a break instruction, which simplifies +the locals data flow pass (it can assume all values leave blocks through break +inputs). + +``` +;; Original ;; Normalized +block block + i32.const 42 i32.const 42 +end br 0 ;; explicit fallthrough + end +``` + +### Loop Wrapping + +When a loop can fall through (i.e., it doesn't always branch back to the loop +header or exit via a break), an outer block is added around it. The fallthrough +becomes a break to the outer block. This ensures loops are only exited through +breaks. + +``` +;; Original ;; Normalized +loop block -> (results...) + loop (params...) +end + br 1 ;; exit to outer block + end + end +``` + +### Dead Code Removal + +After any instruction that unconditionally diverts control flow (`br`, +`br_table`, `unreachable`, or a non-fallthrough loop), all subsequent +instructions up to the next `end` or `else` are discarded. + +``` +;; Original ;; Normalized +br 0 br 0 +i32.const 1 ;; dead code removed +i32.add ;; dead code removed +``` + +### Constant Global Inlining + +`global.get` on immutable globals is replaced with the global's constant +initializer. This is done early because it enables the downstream constant +optimization passes to work with these values. + +``` +;; Original (global 0 is immutable, initialized to 42) +global.get 0 ;; Normalized: i32.const 42 +``` + +## Output Structure + +The output `BlockTree` is a `Vec` where each `Element` is either: + +- **`Instruction`**: A WASM operator, a `BrIfZero`, or a `BrTable`. +- **`Block`**: A nested block containing: + - `block_kind`: `Block` or `Loop` + - `interface_type`: The block's input and output types + - `elements`: The block's contents (recursively) + - `input_locals`, `output_locals`, `carried_locals`: Initially empty; filled + by the next pass + +At this stage, all blocks have well-defined stack-level interfaces (params and +results), but local variable flow is still implicit. diff --git a/src/loader/passes/CONST_COLLAPSE.md b/src/loader/passes/CONST_COLLAPSE.md new file mode 100644 index 0000000..b46e8e8 --- /dev/null +++ b/src/loader/passes/CONST_COLLAPSE.md @@ -0,0 +1,73 @@ +# Constant Collapse Pass + +**Source:** `dag/const_collapse.rs` + +**Input:** `PlainDag` (the DAG after construction) +**Output:** `ConstCollapsedDag` (same DAG, with some constant references replaced by inline constants) + +## Purpose + +This optional optimization pass identifies constant values that can be folded +into the instructions that consume them, eliminating the need for a separate +register to hold the constant. This is driven by the target ISA: if the ISA +supports immediate operands on certain instructions (e.g., RISC-V's `addi`), +the constant can be inlined directly. + +## How It Works + +The pass is gated by `Settings::get_const_collapse_processor()`. If the ISA +implementor returns `None`, no collapsing is performed and the DAG passes +through unchanged. + +If a processor function is provided, the pass walks every `WASMOp` node in the +DAG and checks whether any of its inputs reference constant nodes. For each +such node, it calls the processor with the operator and a slice of +`MaybeConstant` values describing each input: + +- **`NonConstant`**: The input is not a constant. +- **`ReferenceConstant { value, must_collapse }`**: The input references a + constant node with a known value. The processor can set `must_collapse` to + `true` to indicate the constant should be inlined. +- **`CollapsedConstant(value)`**: The input is already an inline constant + (from a previous pass; not expected in the default pipeline). + +When `must_collapse` is set to `true`, the pass replaces the `NodeInput::Reference` +with a `NodeInput::Constant`, severing the dependency on the constant node. + +## Example + +Before collapse: +``` +Node 0: Inputs → [x] +Node 1: i32.const 5 → [5] +Node 2: i32.add ← [(0,0), (1,0)] → [result] +``` + +If the ISA processor recognizes that `i32.add` with a constant second operand +can become an "add immediate" instruction, it sets `must_collapse = true` for +input 1. After collapse: + +``` +Node 0: Inputs → [x] +Node 1: i32.const 5 → [5] (may now be unused) +Node 2: i32.add ← [(0,0), Constant(5)] → [result] +``` + +Node 1 is now potentially dangling (no references to it). The dangling removal +pass will clean it up later. + +## Recursion Into Blocks + +The pass recurses into block sub-DAGs. For non-loop blocks, it propagates +knowledge of which block inputs are constants, so that constants flowing through +block boundaries can also be collapsed inside the block. + +For loops, constant inputs are **not** propagated, because a loop input might be +constant on the first iteration but different on subsequent iterations (it could +be updated by a break back to the loop header). In practice, optimized WASM +rarely has constant loop inputs anyway. + +## Statistics + +The pass returns the total count of collapsed constants, which is aggregated in +`Statistics::constants_collapsed`. diff --git a/src/loader/passes/CONST_DEDUP.md b/src/loader/passes/CONST_DEDUP.md new file mode 100644 index 0000000..3b701d0 --- /dev/null +++ b/src/loader/passes/CONST_DEDUP.md @@ -0,0 +1,87 @@ +# Constant Deduplication Pass + +**Source:** `dag/const_dedup.rs` + +**Input:** `ConstCollapsedDag` (DAG after constant collapse) +**Output:** `ConstDedupDag` (DAG with deduplicated constants) + +## Purpose + +After the DAG is constructed, the same constant value may be defined by multiple +independent nodes (e.g., two different `i32.const 0` instructions). This pass +deduplicates them: all references to a given constant value are remapped to +point to the first definition of that constant in the current scope. + +This reduces the number of nodes in the DAG and, more importantly, saves +registers in the final output — without deduplication, each constant definition +would occupy its own register. + +## Algorithm + +The pass does a single forward traversal over the nodes, maintaining two maps: + +### `const_to_origin: HashMap>` + +Maps each known constant value to the node that defines it. The `Option` is +`Some(origin)` if the constant is defined at the current depth, or `None` if it +is known from an outer scope but not yet materialized at this depth. + +### `origin_to_const: HashMap` + +The reverse map: for every node that defines a constant, records what constant +value it produces. + +For each node: + +1. **Remap inputs:** If an input references a node that produces a known + constant, and a previous definition of that constant exists, redirect the + input to the earlier definition. + +2. **Record constants:** If the node itself defines a constant, add it to both + maps. If a previous definition already exists, the node is now a duplicate + (it will be cleaned up by the dangling removal pass). + +3. **Recurse into blocks:** For non-loop blocks, the parent's constant + knowledge is inherited. If a constant from the parent scope is needed inside + the block, a new block input is added to thread it through. + +4. **Loops start fresh:** Loop sub-DAGs start with empty maps, because + constants should be redefined inside the loop rather than copied through the + iteration interface (which would add unnecessary loop inputs). + +## Example + +Before deduplication: +``` +Node 0: Inputs → [x] +Node 1: i32.const 0 → [zero_a] +Node 2: i32.add ← [(0,0), (1,0)] → [x_plus_0] +Node 3: i32.const 0 → [zero_b] (duplicate!) +Node 4: i32.sub ← [(0,0), (3,0)] → [x_minus_0] +``` + +After deduplication, node 4's input is remapped to node 1: +``` +Node 0: Inputs → [x] +Node 1: i32.const 0 → [zero] +Node 2: i32.add ← [(0,0), (1,0)] → [x_plus_0] +Node 3: i32.const 0 → [zero_b] (now unreferenced) +Node 4: i32.sub ← [(0,0), (1,0)] → [x_minus_0] +``` + +Node 3 is now dangling and will be removed by the dangling removal pass. + +## Cross-Block Deduplication + +When a constant is defined in the parent scope and needed inside a child block, +the pass adds a new input to the block to thread the constant through, rather +than allowing the block to redefine it. This ensures that the constant occupies +a single register even across block boundaries. + +This does not apply to loops, where constants are cheaper to redefine than to +carry as loop inputs. + +## Statistics + +The pass returns the total count of deduplicated constants, which is aggregated +in `Statistics::constants_deduplicated`. diff --git a/src/loader/passes/DAG_CONSTRUCTION.md b/src/loader/passes/DAG_CONSTRUCTION.md new file mode 100644 index 0000000..a3f728a --- /dev/null +++ b/src/loader/passes/DAG_CONSTRUCTION.md @@ -0,0 +1,139 @@ +# DAG Construction Pass + +**Source:** `dag/mod.rs` + +**Input:** `LiftedBlockTree` (block tree with explicit locals data flow) +**Output:** `Dag` (directed acyclic graph of value-producing nodes) + +## Purpose + +This pass eliminates the WASM stack and local variables entirely, replacing them +with a directed acyclic graph where every value has a single explicit origin. +Nodes in the graph are operations (WASM instructions, blocks, loops, breaks), +and edges are values flowing from producers to consumers. After this pass, the +IR is fully register-like — there is no stack, no locals, just values identified +by `(node_index, output_index)` pairs. + +## Core Data Structures + +### Node + +Each node has: +- **`operation`**: What the node does (`Inputs`, `WASMOp`, `BrIfZero`, + `BrTable`, or `Block`). +- **`inputs: Vec`**: The values this node consumes. Each input is + either a `Reference(ValueOrigin)` pointing to another node's output, or a + `Constant(WasmValue)` (only after the constant collapse pass). +- **`output_types: Vec`**: The types of values this node produces. + +### ValueOrigin + +A `(node_index, output_index)` pair identifying a specific output of a specific +node. This is the "register name" in the DAG world. + +### Dag + +A `Dag` is simply a `Vec`. Node 0 is always an `Inputs` node whose +outputs are the block's input values. + +## Algorithm + +The pass simulates WASM execution using two structures that track where each +value lives: + +- **Stack** (`Vec`): Mirrors the WASM operand stack, but instead + of holding values, it holds references to the nodes that produced them. +- **Locals** (`Vec`): Maps each local index to either a `ValueOrigin` + (if the local has been set) or `UnusedLocal` (if it has never been written). + +The pass walks the instruction sequence in order. For each instruction: + +1. **Stack/local manipulation** (`local.get`, `local.set`, `local.tee`, + `drop`): Resolved purely by moving references between the stack and locals + arrays. No DAG nodes are created. + +2. **Break instructions** (`br`, `br_if`, `br_if_zero`, `br_table`): Pop the + appropriate values from the stack and collect the required local values (as + determined by the locals data flow pass). These become the break node's + inputs. The break targets are looked up in a block stack to determine what + types are expected. + +3. **Regular WASM operations**: Pop inputs from the stack, create a new node, + push the node's outputs onto the stack. + +4. **Blocks and loops**: Recursively build a sub-DAG. The block's stack and + local inputs (from the lifted block tree) become the inputs to the new + sub-DAG. The block's outputs go back onto the parent's stack and locals. + +## Example + +Consider this WASM fragment (inside a function with `$x` as parameter 0): +```wasm +local.get $x ;; push $x +i32.const 1 ;; push 1 +i32.add ;; pop both, push ($x + 1) +``` + +The resulting DAG nodes would be: + +``` +Node 0: Inputs → outputs: [$x] +Node 1: WASMOp(i32.const 1) → outputs: [1] +Node 2: WASMOp(i32.add) ← inputs: [(0,0), (1,0)] + → outputs: [result] +``` + +`local.get` does not create a node — it just pushes `(0, 0)` onto the stack +(referring to the first output of the Inputs node). The `i32.add` node +references both the input parameter and the constant. + +## Block Handling + +Blocks in the DAG are represented as a single node with an embedded sub-DAG: + +``` +Node N: Block { + kind: Block | Loop, + sub_dag: Dag { nodes: [...] } +} +inputs: [stack values..., local values...] +output_types: [stack results..., local results...] +``` + +Inside the sub-DAG, node 0 (`Inputs`) provides the block's input values. Breaks +to the block provide the block's output values. + +### Blocks vs. Loops + +The key difference is what break targets mean: + +- **Block:** A `br 0` targets the block's *outputs*. The break carries the + values that become the block's results. +- **Loop:** A `br 0` targets the loop's *inputs*. The break carries the values + that become the next iteration's inputs. + +This means a block's outputs are determined by its breaks, while a loop's +inputs may be updated by breaks back to it. + +## Unused Locals + +When a local is read before being written (its initial value is used), the pass +materializes a default constant for it (0 for numeric types, `ref.null` for +reference types). This happens at the function level only — inside blocks, +attempting to read an uninitialized local triggers a panic, because the locals +data flow pass should have already ensured it is provided as a block input. + +## Design Notes + +- The pass produces a DAG, not a general graph, because WASM's structured + control flow guarantees that non-loop value dependencies are always acyclic. + Loops create nested sub-DAGs, so even loop back-edges don't introduce cycles + at any single DAG level. + +- Constants at this stage are represented as zero-input `WASMOp` nodes (e.g., + `WASMOp(i32.const 42)`). The constant collapse and dedup passes will later + optimize them. + +- The `BreakArgs` struct on the block stack tracks both the expected stack types + and the expected local indices for each break target, combining the + information from the block's interface type and the lifted locals data flow. diff --git a/src/loader/passes/DANGLING_REMOVAL.md b/src/loader/passes/DANGLING_REMOVAL.md new file mode 100644 index 0000000..1bc0591 --- /dev/null +++ b/src/loader/passes/DANGLING_REMOVAL.md @@ -0,0 +1,103 @@ +# Dangling Removal Pass + +**Source:** `dag/dangling_removal.rs` + +**Input:** `ConstDedupDag` (DAG after constant deduplication) +**Output:** `DanglingOptDag` (DAG with unused nodes and outputs removed) + +## Purpose + +This pass is a dead code elimination step for the DAG. It removes: + +1. **Dangling nodes:** Nodes whose outputs are never used by any other node and + that have no side effects (pure computations whose results are discarded). +2. **Unused block outputs:** Block outputs that are never consumed by the parent + DAG. +3. **Unused block inputs:** Block inputs (for non-loop blocks) that are never + read inside the block. + +This pass is the natural cleanup after constant collapse and dedup, which may +leave constant nodes unreferenced. It also catches dead code patterns in the +original WASM. + +## Algorithm + +The pass operates in two phases, recursing into block sub-DAGs: + +### Phase 1: Bottom-Up Usage Analysis + +Starting from the last node and working backward: + +1. **Recurse into blocks** to clean their sub-DAGs first. +2. **Check each node:** Is any of its outputs referenced by a later node? If + not, and the node has no side effects, mark it for removal. +3. **Mark inputs as used:** For every node that is kept, mark all of its inputs' + origins as used. + +### Phase 2: Top-Down Removal and Remapping + +Traverse the nodes forward: + +1. **Remove marked nodes** from the node list. +2. **Remap references:** All `ValueOrigin` references in remaining nodes are + adjusted to account for removed nodes (shifted indices) and removed block + outputs (shifted output indices). +3. **Fix break inputs:** Break instructions targeting blocks that had outputs + removed get their corresponding inputs removed as well. + +## What Counts as Pure + +A node is considered pure (safe to remove if unused) if its operation is one of: + +- Constants (`i32.const`, `i64.const`, `f32.const`, `f64.const`, `v128.const`) +- Arithmetic, bitwise, and comparison operations +- Type conversion operations +- Reference operations (`ref.null`, `ref.is_null`, `ref.func`) +- `select` / `typed_select` +- `global.get` (reading state has no side effects) +- Memory and table reads (`i32.load`, `table.get`, `memory.size`, etc.) + +Everything else is considered to have side effects and is never removed, even if +its outputs are unused. This includes stores, calls, `global.set`, table +mutations, and `unreachable`. + +## Example + +Before removal: +``` +Node 0: Inputs → [x] +Node 1: i32.const 5 → [five] (unused after const collapse) +Node 2: i32.add ← [(0,0), Constant(5)] → [result] +Node 3: i32.const 99 → [ninety_nine] (never used at all) +Node 4: br 0 ← [(2,0)] +``` + +After removal, nodes 1 and 3 are pure and unused: +``` +Node 0: Inputs → [x] +Node 1: i32.add ← [(0,0), Constant(5)] → [result] +Node 2: br 0 ← [(1,0)] +``` + +All references are remapped: what was `(2,0)` becomes `(1,0)`. + +## Block Output Pruning + +When a block has outputs that the parent never reads, those outputs are removed +from the block node's `output_types`, and the corresponding break inputs are +removed from all breaks targeting that block. This cascading effect may make +additional nodes inside the block dangling, which are then caught by the +recursive application of the same pass inside the block. + +## Block Input Pruning + +For non-loop blocks, if an input is never read by any node inside the block, it +is removed from the `Inputs` node's output types and from the parent's block +node's inputs. Loop inputs are not pruned, as it would require adjusting all +back-edge breaks, which is more complex. + +## Statistics + +The pass returns: +- `removed_nodes`: Total pure nodes removed across all scopes. +- `removed_block_outputs`: Total unused block outputs pruned. diff --git a/src/loader/passes/LOCALS_DATA_FLOW.md b/src/loader/passes/LOCALS_DATA_FLOW.md new file mode 100644 index 0000000..c43911d --- /dev/null +++ b/src/loader/passes/LOCALS_DATA_FLOW.md @@ -0,0 +1,122 @@ +# Locals Data Flow Pass + +**Source:** `locals_data_flow.rs` + +**Input:** `BlockTree` (normalized block tree from parsing) +**Output:** `LiftedBlockTree` (block tree with explicit local variable flow) + +## Purpose + +In WASM, local variables are implicit mutable state that can flow freely in and +out of blocks without being declared in the block's type. This pass makes that +flow explicit: for every block, it computes which locals must be provided as +inputs and which are produced as outputs, so that later passes can treat blocks +as pure functions of their inputs. + +After this pass, local variables are no longer "magic" — they are just +additional block inputs and outputs alongside the stack values declared in the +block's interface type. + +## What It Computes + +For each block, the pass fills in three sets: + +### `input_locals: BTreeSet` + +The set of local indices that the block reads (directly or through nested +blocks/breaks) before any internal write. These locals must be provided to the +block as inputs, in addition to its stack parameters. + +### `output_locals: BTreeSet` (blocks only) + +The set of local indices that the block's break instructions write. Since the +block tree pass guarantees all blocks are exited via breaks, these represent the +locals modified by the block that are visible to the parent scope. + +### `carried_locals: BTreeSet` (loops only) + +The set of local indices that are carried across loop iterations. When a break +targets a loop (i.e., continues to the next iteration), any locals it modifies +must be provided as loop inputs on every iteration. + +## Example + +Consider this WASM function: +```wasm +(func (param $x i32) (result i32) + (local $acc i32) + (local.set $acc (i32.const 0)) + (block $exit (result i32) + (loop $loop + ;; acc = acc + x + (local.set $acc (i32.add (local.get $acc) (local.get $x))) + ;; if acc > 100, break with acc + (br_if $exit (i32.gt_s (local.get $acc) (i32.const 100))) + ;; continue loop + (br $loop) + ) + (unreachable) + ) +) +``` + +After lifting, the block and loop annotations would be: +- **$exit block:** `input_locals = {$acc, $x}`, `output_locals = {$acc}` + (break to $exit carries $acc) +- **$loop:** `input_locals = {$acc, $x}`, `carried_locals = {$acc}` + (break back to $loop carries $acc) + +The key insight is that `$x` is read inside the loop but never written, so it +appears as an input at every level. `$acc` is both read and written, so it +appears as both an input and a carried/output local. + +## Algorithm + +The pass works by iterating over each block until a fixed point is reached: + +1. **Push the block onto a control stack.** The control stack tracks what + locals each nesting level expects from breaks targeting it. + +2. **Scan the block's elements:** + - `local.get` → Mark the local as an input of the current block. + - `local.set` / `local.tee` → Mark the local as an output of the current + scope (tracked in `local_outputs` on the control stack entry). + - `br` / `br_if` / `br_if_zero` / `br_table` → Process the break target: + all locals output by scopes up to the target depth, plus all carried + locals of intervening loops, are added as break locals for the target. + - Nested blocks → Recurse; the sub-block's `input_locals` become inputs of + the current block, and its `output_locals` become outputs of the current + scope. + +3. **Pop the block from the control stack** and assemble the final + `input_locals`, `output_locals`, and `carried_locals`. + +4. **Repeat until stable.** If any set grew during the scan, the block is + reprocessed. This handles cases where a break target's locals requirements + propagate to inner blocks that didn't know about them yet. + +## The Control Stack + +Each entry in the control stack tracks: + +- `old_break_locals`: The break locals known from the previous iteration (for + convergence checking). +- `new_break_locals`: The break locals discovered during the current iteration. +- `carried_locals`: For loops, the locals that must be carried across + iterations. +- `local_outputs`: The locals written (via `local.set`/`local.tee`) at this + scope level. + +## Design Notes + +- The fixed-point iteration is necessary because a break can target an outer + block, and the locals it requires may depend on locals computed by other + breaks at different nesting levels. Each iteration propagates this information + one level further. + +- The sets are `BTreeSet` for deterministic ordering, which ensures that + locals are always laid out in the same order in the block interface. + +- After this pass, every local variable reference (`local.get`, `local.set`) + still exists in the instruction stream. They will be resolved into DAG node + references by the DAG construction pass. diff --git a/src/loader/passes/PIPELINE.md b/src/loader/passes/PIPELINE.md new file mode 100644 index 0000000..92cc0ff --- /dev/null +++ b/src/loader/passes/PIPELINE.md @@ -0,0 +1,139 @@ +# Common Pipeline Overview + +The common pipeline is the shared frontend that both the WOM and RWM backends +consume. It takes raw WebAssembly bytecode and progressively transforms it into +a blockless DAG — a flat, optimized, register-based intermediate representation +that is ready for backend-specific lowering. + +## Stages + +``` +WASM bytecode + │ + ▼ +Unparsed (raw function body bytes) + │ + ▼ +BlockTree block_tree.rs + │ Parses WASM operators into a tree of blocks and + │ loops. Normalizes if-else into block+br_if, + │ converts return into br, removes dead code, and + │ inlines constant globals. + │ + ▼ +LiftedBlockTree locals_data_flow.rs + │ Makes locals data flow explicit: exposes every + │ local read/write as a block input or output, so + │ later passes can treat locals like any other value. + │ + ▼ +PlainDag dag/mod.rs + │ Builds a directed acyclic graph where nodes are + │ operations and edges are values. The WASM stack + │ and locals are fully resolved into node references. + │ + ▼ +ConstCollapsedDag dag/const_collapse.rs + │ (Optional) Collapses constant values into the + │ instructions that use them, if the target ISA + │ supports immediate operands. + │ + ▼ +ConstDedupDag dag/const_dedup.rs + │ Deduplicates identical constant definitions so + │ each unique constant is defined at most once per + │ scope. + │ + ▼ +DanglingOptDag dag/dangling_removal.rs + │ Removes pure nodes whose outputs are never used. + │ Also trims unused block inputs and outputs. + │ + ▼ +BlocklessDag blockless_dag.rs + Flattens non-loop blocks into a single linear + sequence with labels. Only loops retain their + own sub-DAG. Forward-only jumps target labels; + backward jumps target loop headers. +``` + +## What Happens After + +The `BlocklessDag` is the handoff point to the backend pipelines: + +- **WOM pipeline** (`src/loader/wom/`): Flattens the DAG into write-once + register directives using frame allocation. See `wom/` for details. + +- **RWM pipeline** (`src/loader/rwm/`): Performs liveness analysis, register + allocation, and flattening into read-write register directives. See + `rwm/PIPELINE.md` for details. + +## Detailed Documentation + +Each pass has its own documentation file: + +- **[BLOCK_TREE.md](BLOCK_TREE.md)** — Parsing WASM operators into a + normalized block tree with structural simplifications. + +- **[LOCALS_DATA_FLOW.md](LOCALS_DATA_FLOW.md)** — Lifting locals into + explicit block inputs and outputs. + +- **[DAG_CONSTRUCTION.md](DAG_CONSTRUCTION.md)** — Building the value DAG from + the lifted block tree, resolving the stack and locals. + +- **[CONST_COLLAPSE.md](CONST_COLLAPSE.md)** — ISA-driven constant folding + into immediate operands. + +- **[CONST_DEDUP.md](CONST_DEDUP.md)** — Deduplicating identical constant + nodes across scopes. + +- **[DANGLING_REMOVAL.md](DANGLING_REMOVAL.md)** — Dead node elimination and + unused output pruning. + +- **[BLOCKLESS_DAG.md](BLOCKLESS_DAG.md)** — Flattening block structure into + labels and converting to the blockless representation. + +## Key Design Decisions + +### DAG Over SSA + +The IR uses a DAG (directed acyclic graph) rather than a traditional SSA form. +Each node represents an operation, each edge represents a value. Values are +identified by their origin: `(node_index, output_index)`. This is a natural fit +because WASM's structured control flow guarantees that non-loop blocks can be +inlined into the parent, producing a flat sequence of forward-only jumps. + +### Locals Are Lifted Early + +WASM locals act as implicit mutable state that crosses block boundaries. By +lifting them into explicit block inputs and outputs in the `LiftedBlockTree` +pass, all subsequent passes can treat the IR as purely value-based, with no +hidden state. This simplifies the DAG construction and all downstream +optimizations. + +### Loops Are Special + +Throughout the pipeline, loops receive special treatment: + +- **Block tree:** Loops are wrapped in an outer block if they can fall through, + ensuring loops are only exited via breaks. +- **DAG construction:** Loops create nested sub-DAGs with their own input + nodes; breaks to a loop target its inputs (next iteration), while breaks to a + block target its outputs. +- **Blockless DAG:** Only loops retain their own sub-DAG. Non-loop blocks are + inlined into the parent frame with labels for jump targets. + +This distinction reflects a fundamental property: blocks have forward-only +control flow (can be inlined), while loops have backward edges (need their own +frame). + +### Optimization Order + +The three DAG optimizations run in a specific order for good reason: + +1. **Constant collapse** runs first because it changes reference inputs into + inline constants, potentially making the original constant nodes unused. +2. **Constant dedup** runs second because collapse may have severed some + references, leaving duplicate constants that can now be merged. +3. **Dangling removal** runs last as a cleanup pass, garbage-collecting any + nodes that the previous passes made unreachable. diff --git a/src/loader/rwm/FLATTENING.md b/src/loader/rwm/FLATTENING.md new file mode 100644 index 0000000..be31e89 --- /dev/null +++ b/src/loader/rwm/FLATTENING.md @@ -0,0 +1,146 @@ +# Flattening Pass + +**Source:** `flattening/mod.rs`, `flattening/sequence_parallel_copies.rs` + +**Input:** `AllocatedDag` (DAG with concrete register assignments) +**Output:** `FunctionAsm` (linear sequence of assembly-like directives) + +## Purpose + +The flattening pass converts the DAG representation into a linear sequence of +instructions. By this point, all the hard decisions (register allocation, copy +minimization) have already been made. The flattening is a straightforward, linear +traversal that emits directives through the `rwm::Settings` trait. + +## Algorithm + +The pass does a forward traversal over the DAG nodes, processing each one into +directives: + +### Node Types + +- **Inputs:** At the function level, emits the function label (and an exported name + alias if the function is exported). At loop level, emits nothing (the loop label + is emitted by the parent Loop node). + +- **Label:** Emits a label directive. + +- **Loop:** Copies loop inputs to where the loop body expects them (if they are not + already there), emits the loop label, then recursively processes the loop body. + A control stack tracks the allocation context for each nesting level. + +- **Br (unconditional break):** Emits the copies needed to place break inputs at + their target locations, then emits a jump. Three kinds of targets: + - **Forward label:** Jump to a label in the current block. + - **Loop back-edge:** Jump to the loop's header label, with copies to set up + the next iteration's inputs. + - **Function return:** Copy outputs to the return slots, then emit a `return` + instruction (which reads RA and FP from their known positions). + +- **BrIf / BrIfZero (conditional break):** Combines a conditional jump with the + break logic. The pass tries several strategies in order of preference: + 1. If the target is a plain jump (no copies needed) and the ISA supports the + matching condition, emit a single conditional jump. + 2. If the ISA supports the inverse condition, emit: inverse-conditional-jump to + continuation, then the full break code, then the continuation label. + 3. If only the matching condition is available, emit: conditional-jump to jump + code, then jump to continuation, then jump code, then continuation label. + +- **BrTable:** Handles multi-way branching. Emits a bounds check for the default + case, then a relative jump (jump table) into per-target jump code. Targets that + are plain jumps are inlined into the jump table; complex targets get an extra + indirection through a local label. + +- **Call:** For imported functions, emits a direct imported-call directive. For + local functions, prepares the call frame: copies inputs to the expected positions, + emits the call instruction (with frame offset, RA, and FP locations), then copies + outputs from the return slots to where consumers expect them. + +- **CallIndirect:** Like a normal call, but first loads the function reference from + the table, checks the function type against the expected signature (trapping on + mismatch), then emits an indirect call. + +- **WASMOp:** Delegates directly to `emit_wasm_op` with the resolved input + registers/constants and output register. + +- **Unreachable:** Emits a trap. + +## Parallel Copy Sequencing + +A critical sub-problem in flattening is emitting register copies correctly when +multiple values need to move simultaneously (e.g., setting up a loop iteration's +inputs, or preparing function call arguments). The naive approach of emitting copies +one by one can fail when source and destination registers overlap. + +### The Problem + +Consider needing to move `r0 → r1` and `r1 → r0` (a swap). Doing them sequentially +would overwrite `r1` before reading it. More generally, the copy set forms a directed +graph that may contain: + +1. **Trees:** Leaves are destination-only; root is source-only. Safe to copy in + reverse topological order. +2. **Cycles:** Every register is both source and destination. Requires a temporary + register to break the cycle. +3. **Cycles with attached trees:** A single cycle with trees branching off. The + tree-pruning phase naturally breaks the cycle through source-swapping. + +### The Algorithm (`sequence_parallel_copies.rs`) + +**Phase 1 — Tree pruning:** +1. Find all "tree ends" (registers with no outgoing edges, i.e., destination-only). +2. For each tree end, emit the copy from its source and remove the edge. +3. Apply **source-swapping**: transfer the source's remaining outgoing edges to the + just-written destination. This is the key insight — since the destination now + holds the same value as the original source, it can serve as the source for + remaining copies, potentially breaking a connected cycle. +4. If the original source now has no outgoing edges but has an incoming edge, it + becomes a new tree end. Repeat until no tree ends remain. + +**Phase 2 — Cycle breaking:** +1. The remaining graph consists only of pure cycles. +2. Pick a temporary register — either reuse a destination register from Phase 1 + (since its original value is never read it can serve double duty before Phase 1's copies + execute), or allocate a new `Temp` register. +3. For each cycle: save one value to temp, rotate the rest, restore from temp. + +**Output ordering:** Phase 2 copies are emitted first, then Phase 1 copies in +sequence. This ensures the temporary register is not overwritten by Phase 1 copies +before it is consumed by Phase 2. + +### Correctness Guarantee + +Every destination register appears exactly once (precondition). The algorithm +produces a valid sequential ordering that achieves the same effect as executing all +copies simultaneously. At most one temporary register is needed, and it is avoided +entirely when the copy graph is acyclic or has trees attached to cycles. + +## Temporary Register Allocation + +During flattening, some operations need temporary registers (e.g., loading a +function reference for indirect calls, or the temp register for parallel copies). +The `Context` struct provides `allocate_tmp_type` which: + +1. Lazily computes the set of free register gaps at the current node by examining + the occupation map from register allocation. +2. Allocates from the first gap that fits. +3. For function call nodes, can also allocate temporaries in the callee's frame + space (after the calling convention prelude). + +## Settings Trait + +The flattening pass is parameterized by `rwm::Settings`, which provides all the +`emit_*` methods. This trait defines how each operation maps to the target ISA's +directives. The reference implementation is `GenericIrSetting` in +`src/interpreter/generic_ir.rs`. + +Key emission methods used by flattening: +- `emit_label`, `emit_jump`, `emit_trap` +- `emit_copy` — Single-word register copy +- `emit_conditional_jump` — Jump on a boolean condition +- `emit_conditional_jump_cmp_immediate` — Jump on comparison with immediate (for BrTable bounds) +- `emit_relative_jump` — Jump by offset (for BrTable dispatch) +- `emit_return` — Function return (restores RA/FP) +- `emit_function_call`, `emit_indirect_call` — Static and indirect calls +- `emit_imported_call` — Imported (external) function call +- `emit_wasm_op` — Generic WASM instruction emission diff --git a/src/loader/rwm/JUMP_REMOVAL.md b/src/loader/rwm/JUMP_REMOVAL.md new file mode 100644 index 0000000..1dbc83a --- /dev/null +++ b/src/loader/rwm/JUMP_REMOVAL.md @@ -0,0 +1,47 @@ +# Jump Removal Pass + +**Source:** `../wom/dumb_jump_removal.rs` (shared between WOM and RWM pipelines) + +**Input:** `FunctionAsm` (`PlainFlatAsm` — linear directive sequence) +**Output:** `FunctionAsm` (`DumbJumpOptFlatAsm` — optimized directive sequence) + +## Purpose + +This is a simple peephole optimization that removes unconditional jumps whose target +is the immediately following instruction. These "dumb jumps" are an artifact of the +DAG representation, where all breaks are explicit, even when the target +happens to be placed right after the jump. + +## Algorithm + +The pass does a single linear scan over the directive sequence, examining each +consecutive pair of directives: + +1. For each directive, check if it is an unconditional local jump (via + `Settings::to_plain_local_jump`). +2. If it is, check whether the next directive is a label matching the jump target + (via `Settings::is_label`). +3. If both conditions hold, drop the jump — it is redundant. +4. Otherwise, keep the directive. + +## Why This Happens + +The flattening pass emits jumps for every break instruction in the DAG. When a +break targets a label that happens to appear immediately after the break in the +linearized output, the resulting jump is unnecessary. This is common in patterns +like: + +``` + ; end of if-true branch + jump label_42 ; ← dumb jump, label_42 is right below +label_42: + ; continuation +``` + +The flattening pass does not attempt to detect this during emission. Instead, this cheap +post-processing pass cleans them up. + +## Statistics + +The pass returns the count of removed jumps, which is aggregated in the +`Statistics::useless_jumps_removed` counter. diff --git a/src/loader/rwm/LIVENESS_ANALYSIS.md b/src/loader/rwm/LIVENESS_ANALYSIS.md new file mode 100644 index 0000000..f8fd713 --- /dev/null +++ b/src/loader/rwm/LIVENESS_ANALYSIS.md @@ -0,0 +1,84 @@ +# Liveness Analysis Pass + +**Source:** `liveness_dag.rs` + +**Input:** `BlocklessDag` (from the common pipeline) +**Output:** `LivenessDag` (same DAG structure, annotated with `Liveness` data per block) + +## Purpose + +The liveness analysis pass takes the blockless DAG produced by the common pipeline and +annotates it with information about when each value is last used. This information is +essential for the register allocation pass that follows, enabling it to reuse registers +once their values are no longer needed. + +## What It Computes + +For each block (the function body and each loop body), the pass produces a `Liveness` +struct containing: + +### `last_usage`: HashMap<(node_index, output_index), node_index> + +Maps each value (identified by its producing node and output index) to the index of the +last node that reads it. This tells the register allocator exactly when a register can +be freed. + +For example, if node 3 produces a value at output 0, and the last node that uses it is +node 7, then `last_usage[(3, 0)] = 7`. After node 7, the register holding this value +can be reused by another value. + +Values that are never used by any other node have their last usage set to their own +node index (i.e., they are dead immediately after being produced). + +### `redirected_inputs`: Vec + +A sorted, deduplicated list of loop input indices that are simply forwarded unchanged +to the next iteration. This is an optimization hint for register allocation: if a loop +input is always passed through without modification, the register allocator can keep it +in the same register across all iterations, avoiding unnecessary copies at each loop +back-edge. + +## Algorithm + +The pass does a single forward traversal over the nodes in each block: + +1. **Forward scan:** For each node, iterate over its inputs. If an input references + another node's output, update `last_usage` for that output to the current node index. + Also initialize each node's own outputs with `last_usage = current_index` (dead by + default). + +2. **Recursive processing of loops:** When a `Loop` operation is encountered, the pass + recurses into the loop's sub-DAG. Before recursing, it sets up a control stack entry + to track input redirection. + +3. **Input redirection tracking:** For loop blocks, the pass tracks which inputs are + simply forwarded as-is to the next iteration. It does this by examining every break + instruction that targets the loop and checking whether each break input is a direct + reference to the corresponding loop input (node 0). The tracking accounts for nested + loops by mapping input indices through the control stack. + +## Control Stack + +The pass maintains a `VecDeque` to track nested loop contexts: + +- `is_input_redirected: Vec` — One flag per loop input, initially all `true`. + Set to `false` when any break to this loop provides a value other than the + corresponding input passed through unchanged. + +- `input_map: HashMap` — Maps input indices of the current loop to + output indices of the parent block's input node. This is needed to trace redirected + inputs through nested loops. For example, if loop input 2 comes from the parent + block's input 5, then `input_map[2] = 5`. + +## Design Notes + +- The liveness information is conservative (pessimistic): `last_usage` reflects the + last usage across *all* control flow paths, not just the path currently being taken. + A TODO in the code notes that per-path liveness could yield better register allocation. + +- A TODO also suggests that this pass could potentially be merged with register + allocation itself, using a bottom-up traversal similar to the WOM flattening pass. + +- The pass handles all break variants (`Br`, `BrIf`, `BrIfZero`, `BrTable`) when + checking input redirection. For conditional breaks, only the non-condition inputs + are checked. For `BrTable`, each target's input permutation is respected. diff --git a/src/loader/rwm/PIPELINE.md b/src/loader/rwm/PIPELINE.md new file mode 100644 index 0000000..c9faf3e --- /dev/null +++ b/src/loader/rwm/PIPELINE.md @@ -0,0 +1,81 @@ +# RWM Pipeline Overview + +The read-write registers (RWM) pipeline converts a blockless DAG into a linear +sequence of assembly-like directives for machines with standard read-write registers. + +## Stages + +``` +BlocklessDag (from common pipeline) + │ + ▼ +LivenessDag liveness_dag.rs + │ Annotates each value with its last usage, and + │ detects loop inputs that are forwarded unchanged. + │ + ▼ +RegisterAllocatedDag register_allocation/ + │ Assigns concrete register numbers to all values, + │ using liveness to reuse registers and heuristics + │ to minimize copies. + │ + ▼ +PlainFlatAsm flattening/ + │ Linearizes the DAG into directives, emitting + │ copies where register assignments don't match, + │ and handling control flow, calls, and jumps. + │ + ▼ +DumbJumpOptFlatAsm ../wom/dumb_jump_removal.rs + Removes unconditional jumps to the immediately + following label. +``` + +## Detailed Documentation + +Each pass has its own documentation file: + +- **[LIVENESS_ANALYSIS.md](LIVENESS_ANALYSIS.md)** — Forward analysis computing + last-usage information and loop input redirection detection. + +- **[REGISTER_ALLOCATION.md](REGISTER_ALLOCATION.md)** — Bottom-up optimistic + register allocation with hint-based placement and occupation tracking. + +- **[FLATTENING.md](FLATTENING.md)** — DAG linearization, parallel copy sequencing, + and directive emission through the Settings trait. + +- **[JUMP_REMOVAL.md](JUMP_REMOVAL.md)** — Peephole pass removing redundant + unconditional jumps. + +- **[CALLING_CONVENTION.md](CALLING_CONVENTION.md)** — Frame layout and calling + convention for stacked read-write registers. + +## Key Design Decisions + +### Bottom-Up Register Allocation + +The allocation runs in reverse node order. This means that by the time a value is +allocated, we already know where its consumers want it. The allocator can then +propose ("hint") register placements that align with consumer expectations, avoiding +copies. This is the main source of optimization in the pipeline. + +### Nested Occupation Tracking + +Loops create a nested scope for register allocation. The parent's occupied registers +are inherited as blocked ranges in the child tracker. After the loop body is +processed, registers used internally by the loop are projected back to the parent as +blocked, preventing the parent from placing long-lived values in registers that the +loop would overwrite. + +### Parallel Copy Resolution + +When multiple values need to move simultaneously (loop back-edges, function calls), +the flattening pass uses a graph-based algorithm to find a valid sequential ordering. +It handles trees with topological sorting, and breaks cycles with at most one +temporary register. + +### Separation of Concerns + +The pipeline cleanly separates liveness analysis, register allocation, and code +emission into distinct passes. This makes each pass simpler and easier to test in +isolation, at the cost of one extra traversal compared to a fused approach. diff --git a/src/loader/rwm/REGISTER_ALLOCATION.md b/src/loader/rwm/REGISTER_ALLOCATION.md new file mode 100644 index 0000000..ff93614 --- /dev/null +++ b/src/loader/rwm/REGISTER_ALLOCATION.md @@ -0,0 +1,133 @@ +# Register Allocation Pass + +**Source:** `register_allocation/mod.rs`, `register_allocation/occupation_tracker.rs` + +**Input:** `LivenessDag` (DAG annotated with liveness information) +**Output:** `AllocatedDag` (same DAG structure, annotated with `Allocation` data per block) + +## Purpose + +This pass assigns concrete register numbers to every value in the DAG. It uses the +liveness information from the previous pass to reuse registers once their values are +no longer needed. The allocator also applies heuristics to minimize the number of +register-to-register copies that the flattening pass will need to emit. + +## Algorithm Overview + +The allocation is done **bottom-up** (from the last node to the first), following +execution paths independently. This reverse traversal is key: by the time we allocate +a value, we already know where its consumers expect it to be, allowing us to propose +register assignments that avoid copies. + +The main function is `optimistic_allocation`, which: + +1. Fixes the function input registers at positions 0, 1, 2, ... (tightly packed + according to word count per type). +2. Reserves space for the return address (RA) and frame pointer (FP) after + `MAX(input_words, output_words)`, per the calling convention. +3. Runs the recursive bottom-up allocation on all nodes. + +### Optimistic Allocation Strategy + +The allocator is called "optimistic" because it tries to place values at hinted +locations (where their consumers expect them), falling back to the first available +gap only when the hint is unavailable. This two-phase approach avoids copies when +possible while guaranteeing correctness. + +For each node, processed in reverse order: + +- **Generic WASM operations:** Inputs and outputs are allocated wherever there is + space. No special hinting is needed. + +- **Function calls (Call/CallIndirect):** The allocator first determines the call + frame start (the first register after all currently occupied ones). Then it tries + to place each input at the exact register where the callee expects it, saving a + copy if successful. + +- **Labels:** Outputs are allocated at whatever position is available. Break + instructions targeting this label will try to match these positions. + +- **Breaks (Br/BrIf/BrIfZero):** For each break input, the allocator tries to + place it at the same register where the target (loop input, label output, or + function return slot) expects it. This is the main copy-saving mechanism. + +- **BrTable:** Each target's input permutation is processed like a regular break. + The selector input is allocated separately. + +- **Loops:** A child occupation tracker is created, inheriting the parent's blocked + registers. Two heuristics minimize copies for loop inputs: + 1. If the loop input is the last usage of a value in the outer scope, reuse its + register for the loop input. + 2. If the loop input is a "redirected input" (forwarded unchanged across + iterations, as detected by the liveness pass), force the same register + allocation, always saving a copy. + +## Occupation Tracker + +The `OccupationTracker` (`occupation_tracker.rs`) is the core data structure that +tracks which registers are occupied at each point in the program. + +### Data Model + +It maintains an `IntervalMap` that maps **liveness ranges** (expressed +as node index ranges) to allocation entries. Each entry records: + +- **`AllocationType`**: What kind of allocation it is: + - `Value(ValueOrigin)` — A value produced by a DAG node (pointed by the `ValueOrigin` data). + - `FunctionFrame` — Space reserved for a callee's frame during a function call. + - `SubBlockInternal` — Registers used inside a loop body, blocked at the parent level. + - `BlockedRegistersAtParent` — Parent-level registers inherited by a sub-tracker. + - `ExplicitlyBlocked` — Reserved registers (e.g., RA/FP slots). + +- **`reg_range: Range`** — The register range this allocation occupies. + +### Key Operations + +- **`try_allocate(origin, size)`**: Allocates a value at the first available gap. + Returns `None` if already allocated. + +- **`try_allocate_with_hint(origin, hint)`**: Tries to place a value at a specific + register range. If the hint is occupied, falls back to first-fit. Returns whether + the hint was used. + +- **`set_allocation(origin, range)`**: Forces an allocation at a specific range + (used for fixed positions like function inputs). + +- **`reserve_range(range)`**: Permanently blocks a register range (used for RA/FP). + +- **`allocate_fn_call(call_index, output_sizes)`**: Reserves a function call frame + starting after all currently occupied registers. Allocates unused outputs at their + natural positions on the frame. + +- **`make_sub_tracker(sub_block_index, sub_liveness)`**: Creates a child tracker + for a loop body, with all registers occupied at the loop's node index blocked. + +- **`project_from_sub_tracker(sub_block_index, sub_tracker)`**: After processing a + loop, blocks the registers that the loop body used internally, preventing the + parent from overwriting them with values that need to survive across the loop. + +### Gap-Finding Algorithm + +When a hint is not available, `allocate_where_possible` uses a simple first-fit +strategy: it consolidates all occupied ranges into sorted, non-overlapping intervals, +then scans for the first gap large enough to fit the requested size. If no gap exists, +it allocates at the end. + +## Statistics + +The pass tracks `register_copies_saved`: the total number of word-level copies that +were avoided by successfully placing values at their hinted locations. This metric +is aggregated across all functions and reported at the end of compilation. + +## Output + +The `Allocation` struct stored in the output `AllocatedDag` contains: + +- `nodes_outputs: BTreeMap>` — The concrete register range + assigned to each value. +- `occupation: Occupation` — The full register occupation map, used by the flattening + pass to find free registers for temporary allocations. +- `labels: HashMap` — Maps label IDs to their node indices, for quick + lookup when processing breaks. +- `call_frames: HashMap` — Maps call node indices to the start register + of their callee frame.