Skip to content
120 changes: 120 additions & 0 deletions src/loader/passes/BLOCKLESS_DAG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Blockless DAG Pass

**Source:** `blockless_dag.rs`

**Input:** `DanglingOptDag` (optimized DAG with nested blocks and loops)
**Output:** `BlocklessDag` (flat DAG with labels; only loops retain sub-DAGs)

## Purpose

This is the last common pipeline pass before the backend-specific stages. It
flattens the nested block structure into a linear sequence of nodes with labels
marking jump targets. After this pass, the only nesting that remains is for
loops — each loop still has its own sub-DAG, because loops represent a separate
"frame" with its own address space in the final output.

Non-loop blocks are fully inlined into their parent DAG, with their outputs
becoming labels that breaks can jump to. This makes the representation much
closer to assembly: a flat sequence of operations with forward-only jumps to
labels.

## Key Transformation

### Blocks Become Labels

A non-loop block in the input DAG:
```
Block {
kind: Block,
sub_dag: [Inputs, ..., Br(0, outputs)]
}
```

is inlined into the parent. The block's input node is suppressed (its outputs
are remapped to the corresponding inputs in the parent scope), and a `Label`
node is inserted where the block's outputs would be consumed. Break instructions
targeting the block become jumps to this label.

### Loops Remain Nested

Loop blocks keep their sub-DAG structure:
```
Loop {
sub_dag: BlocklessDag { nodes: [...] },
break_targets: [(depth, [target_types])]
}
```

The `break_targets` field records all the break targets that the loop body
uses, relative to the parent frame. This lets the backend know which external
labels/frames the loop may jump to.

## Break Target Resolution

In the input DAG, break targets are relative depths into the block stack. In the
blockless DAG, targets are resolved into `BreakTarget { depth, kind }`:

- **`depth`**: The number of frame levels between the break and the target. At
the top level, depth 0 means the current function/loop frame. Inside a loop,
depth 1 means the parent frame, depth 2 the grandparent, etc.

- **`kind`**: Either `FunctionOrLoop` (targeting the function return or a loop's
next iteration) or `Label(id)` (targeting a specific label created from an
inlined block).

The key property: **jumps to labels are always forward** (labels appear after
the jumps that target them), while **jumps to loops go backward** (to the loop
header at the start of the loop's sub-DAG).

## Example

Input DAG (with nested block):
```
Node 0: Inputs → [x]
Node 1: Block {
kind: Block,
sub_dag: [
Node 0: Inputs → [x]
Node 1: i32.const 10
Node 2: i32.gt_s ← [(0,0), (1,0)]
Node 3: br_if 0 ← [(0,0), (2,0)] ;; exit block if x > 10
Node 4: i32.const 0
Node 5: br 1 ← [(4,0)] ;; return 0
]
} → [result]
Node 2: br 0 ← [(1,0)] ;; return result
```

Output blockless DAG (flattened):
```
Node 0: Inputs → [x]
Node 1: i32.const 10
Node 2: i32.gt_s ← [(0,0), (1,0)]
Node 3: BrIf(Label(42)) ← [(0,0), (2,0)] ;; jump to label if x > 10
Node 4: i32.const 0
Node 5: Br(Function) ← [(4,0)] ;; return 0
Node 6: Label { id: 42 } → [result] ;; target for the br_if
Node 7: Br(Function) ← [(6,0)] ;; return result
```

The block's internal input node (its node 0) was suppressed and its references
were remapped to the parent's node 0. The block itself became a label node.

## Node Remapping

When blocks are inlined, node indices change. The pass maintains an
`outputs_map: HashMap<ValueOrigin, ValueOrigin>` that translates old
`(node, output)` pairs to new ones. For inlined block inputs, the map redirects
through the `input_mapping` to the actual source nodes in the parent.

## Design Notes

- Labels use unique IDs generated by a shared `AtomicU32` counter (the
`LabelGenerator`), ensuring uniqueness across all functions and all frames.

- The pass preserves the `NodeInput::Constant` variant, passing inline
constants through unchanged.

- Break targets are resolved relative to frame boundaries, not block nesting.
This is important because the backends allocate registers per-frame (per
function or per loop body), not per-block.
Comment on lines +118 to +120
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't understand this paragraph. I suggest removing it.

131 changes: 131 additions & 0 deletions src/loader/passes/BLOCK_TREE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Block Tree Pass

**Source:** `block_tree.rs`

**Input:** Raw WASM function bytecode (`Unparsed`)
**Output:** `BlockTree` (tree of `Block` and `Instruction` elements)

## Purpose

This is the first pass in the compilation pipeline. It takes the raw stream of
WASM operators and parses them into a tree structure where control flow is
represented by nested blocks and loops, and instructions within each block form
a linear sequence.

The pass also normalizes several WASM patterns into simpler, more uniform
representations that are easier for subsequent passes to handle.

## Normalizations

### If-Else to Block + BrIf

WASM's `if-else-end` construct is desugared into blocks with conditional
breaks. This reduces the number of control flow constructs that later passes
need to handle.

**If without else:**
```
;; Original WASM ;; Normalized BlockTree
if block (params..., i32) -> (results...)
<if_body> br_if_zero 0 ;; skip if_body when false
end <if_body>
end
```

**If with else:**
```
;; Original WASM ;; Normalized BlockTree
if block (params..., i32) -> (results...)
<if_body> block (params..., i32) -> (params...)
else br_if 0 ;; skip else_body when true
<else_body> <else_body>
end br 1 ;; skip if_body
end
<if_body>
end
```

The condition value is carried as an extra block input and consumed by the
conditional break at the top.

### Return to Br

WASM `return` is converted to a `br` targeting the outermost block (the
function body). This makes the function body just another block, simplifying
break handling.

```
;; Original ;; Normalized
return br <function_depth>
```

### Explicit Fallthrough Breaks

Every block that can fall through gets an explicit `br 0` appended. This
guarantees that all blocks are exited via a break instruction, which simplifies
the locals data flow pass (it can assume all values leave blocks through break
inputs).

```
;; Original ;; Normalized
block block
i32.const 42 i32.const 42
end br 0 ;; explicit fallthrough
end
```

### Loop Wrapping

When a loop can fall through (i.e., it doesn't always branch back to the loop
header or exit via a break), an outer block is added around it. The fallthrough
becomes a break to the outer block. This ensures loops are only exited through
breaks.

```
;; Original ;; Normalized
loop block -> (results...)
<body> loop (params...)
end <body>
br 1 ;; exit to outer block
end
end
```

### Dead Code Removal

After any instruction that unconditionally diverts control flow (`br`,
`br_table`, `unreachable`, or a non-fallthrough loop), all subsequent
instructions up to the next `end` or `else` are discarded.

```
;; Original ;; Normalized
br 0 br 0
i32.const 1 ;; dead code removed
i32.add ;; dead code removed
```

### Constant Global Inlining

`global.get` on immutable globals is replaced with the global's constant
initializer. This is done early because it enables the downstream constant
optimization passes to work with these values.

```
;; Original (global 0 is immutable, initialized to 42)
global.get 0 ;; Normalized: i32.const 42
```

## Output Structure

The output `BlockTree` is a `Vec<Element>` where each `Element` is either:

- **`Instruction`**: A WASM operator, a `BrIfZero`, or a `BrTable`.
- **`Block`**: A nested block containing:
- `block_kind`: `Block` or `Loop`
- `interface_type`: The block's input and output types
- `elements`: The block's contents (recursively)
- `input_locals`, `output_locals`, `carried_locals`: Initially empty; filled
by the next pass

At this stage, all blocks have well-defined stack-level interfaces (params and
results), but local variable flow is still implicit.
73 changes: 73 additions & 0 deletions src/loader/passes/CONST_COLLAPSE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Constant Collapse Pass

**Source:** `dag/const_collapse.rs`

**Input:** `PlainDag` (the DAG after construction)
**Output:** `ConstCollapsedDag` (same DAG, with some constant references replaced by inline constants)

## Purpose

This optional optimization pass identifies constant values that can be folded
into the instructions that consume them, eliminating the need for a separate
register to hold the constant. This is driven by the target ISA: if the ISA
supports immediate operands on certain instructions (e.g., RISC-V's `addi`),
the constant can be inlined directly.

## How It Works

The pass is gated by `Settings::get_const_collapse_processor()`. If the ISA
implementor returns `None`, no collapsing is performed and the DAG passes
through unchanged.

If a processor function is provided, the pass walks every `WASMOp` node in the
DAG and checks whether any of its inputs reference constant nodes. For each
such node, it calls the processor with the operator and a slice of
`MaybeConstant` values describing each input:

- **`NonConstant`**: The input is not a constant.
- **`ReferenceConstant { value, must_collapse }`**: The input references a
constant node with a known value. The processor can set `must_collapse` to
`true` to indicate the constant should be inlined.
- **`CollapsedConstant(value)`**: The input is already an inline constant
(from a previous pass; not expected in the default pipeline).

When `must_collapse` is set to `true`, the pass replaces the `NodeInput::Reference`
with a `NodeInput::Constant`, severing the dependency on the constant node.

## Example

Before collapse:
```
Node 0: Inputs → [x]
Node 1: i32.const 5 → [5]
Node 2: i32.add ← [(0,0), (1,0)] → [result]
```

If the ISA processor recognizes that `i32.add` with a constant second operand
can become an "add immediate" instruction, it sets `must_collapse = true` for
input 1. After collapse:

```
Node 0: Inputs → [x]
Node 1: i32.const 5 → [5] (may now be unused)
Node 2: i32.add ← [(0,0), Constant(5)] → [result]
```

Node 1 is now potentially dangling (no references to it). The dangling removal
pass will clean it up later.

## Recursion Into Blocks

The pass recurses into block sub-DAGs. For non-loop blocks, it propagates
knowledge of which block inputs are constants, so that constants flowing through
block boundaries can also be collapsed inside the block.

For loops, constant inputs are **not** propagated, because a loop input might be
constant on the first iteration but different on subsequent iterations (it could
be updated by a break back to the loop header). In practice, optimized WASM
rarely has constant loop inputs anyway.

## Statistics

The pass returns the total count of collapsed constants, which is aggregated in
`Statistics::constants_collapsed`.
Loading