Skip to content

format: add transform context for compiler optimizations#212

Open
gnidan wants to merge 3 commits intotransform-contextfrom
architect-transform-context
Open

format: add transform context for compiler optimizations#212
gnidan wants to merge 3 commits intotransform-contextfrom
architect-transform-context

Conversation

@gnidan
Copy link
Copy Markdown
Member

@gnidan gnidan commented Apr 16, 2026

Summary

Adds a new context type, transform, annotating instructions
with the compiler transformations that produced them. The value
is an array of short identifiers; the list may repeat the same
identifier when the transformation has been applied multiple
times (e.g., ["inline", "inline"] for doubly-inlined code).

Design

Role: additional annotation. A transform context does not
replace semantic contexts. When the compiler inlines a function,
the caller's debug info still carries invoke/return contexts
naming the inlined callee at the call boundary — so the
debugger's logical call stack reflects the source-level
structure. The transform context is additional information
telling the debugger how the call was realized.

Consumers that ignore transform contexts get a sound
source-level view from the invoke/return contexts alone.
Consumers that understand them can offer optimization-aware
presentations (collapsible inlined blocks, TCO-aware call
stacks, collapsed coalesce sequences, etc.).

v1 identifiers (based on bugc optimizer's audit):

  • "inline" — marked instruction is part of an inlined function
    body; surrounding invoke/return contexts name the inlined
    callee.
  • "tailcall" — marked instruction is a tail-call-optimized
    back-edge JUMP or continuation, where the call was realized
    without pushing/popping a full activation.
  • "fold" — marked instruction carries the result of a
    compile-time constant fold (typically a PUSH replacing a
    compute sequence from source).
  • "coalesce" — marked instruction is part of a read-write
    merging sequence (e.g., SHL/OR packing narrower fields into a
    wider word) the user did not explicitly write.

propagate was considered for v1 and deferred as borderline.

The identifier set is extensible. Debuggers unfamiliar with a
given identifier should preserve it as an opaque label. Order
in the array is not semantically significant — the multiset is
what matters.

Composing with other contexts

For instructions carrying transform alongside other contexts,
use gather. A TCO back-edge JUMP typically combines three
facts:

gather:
  - return: { identifier: "fact", declaration: { ... } }
  - invoke: { jump: true, identifier: "fact", target: { ... } }
  - transform: ["tailcall"]

The return and invoke state the source-level facts; the
transform explains how the compiler realized that pair as a
single JUMP.

Changes

  • schemas/program/context/transform.schema.yaml — new schema.
  • schemas/program/context.schema.yaml — wire into the if/$ref
    union.
  • packages/format/src/types/program/context.ts
    Context.Transform interface, isTransform guard, and
    Transform.Identifier union ("inline" | "tailcall" | "fold" | "coalesce" | (string & {})) preserving autocomplete
    for known values while allowing compiler-emitted extensions.
  • packages/format/src/types/program/context.test.ts — register
    Context.isTransform in the schema guard test harness.
  • packages/web/spec/program/context/transform.mdx — spec page
    covering role, v1 identifiers (with EVM-level examples for
    each), repetition/composition, and interaction with gather.

Test plan

  • yarn build passes
  • yarn test passes (new: transform guard test + schema
    validity/examples suites auto-pick up the new schema)
  • yarn lint clean (only pre-existing warnings)
  • Schema guard test exercises all 8 examples in
    transform.schema.yaml covering single identifiers and
    mixed combinations

gnidan added 2 commits April 16, 2026 03:51
Adds a new context type annotating instructions with the
compiler transformations that produced them. The value is an
array of short identifiers; the list may repeat the same
identifier when the transformation has been applied multiple
times (e.g., ["inline", "inline"] for doubly-inlined code).

Transform is *additional* annotation. The invoke/return contexts
for the logical call are still emitted at the call boundary so
debuggers see the source-level call stack; the transform context
tells debuggers how the call was physically realized. Consumers
that ignore transform contexts get a sound source-level view
from the semantic contexts alone.

v1 identifiers:
  - "inline": marked instruction is part of an inlined function
    body; surrounding invoke/return contexts name the inlined
    callee.
  - "tailcall": marked instruction is a tail-call-optimized
    back-edge JUMP or continuation, where the call was realized
    without pushing/popping a full activation.

The identifier set is extensible. Debuggers unfamiliar with a
given identifier should preserve it as an opaque label. Order
in the array is not semantically significant — the multiset is
what matters.

Unblocks the final shape of TCO back-edge annotations in
bugc (#210): a tail-call-optimized JUMP can now carry
`gather: [return, invoke, transform: ["tailcall"]]`.

Includes:
- schemas/program/context/transform.schema.yaml
- schemas/program/context.schema.yaml: wire into the if/$ref
  union.
- packages/format/src/types/program/context.ts: Context.Transform
  interface, isTransform guard, and Transform.Identifier union
  preserving autocomplete for known values.
- packages/format/src/types/program/context.test.ts: register
  Context.isTransform with the schema guard test harness.
- packages/web/spec/program/context/transform.mdx: spec page
  covering role, v1 identifiers, repetition/composition, and
  interaction with gather.
Adds two more identifiers to the v1 transform context
vocabulary, based on bugc optimizer's audit of transformations
the compiler currently performs or will perform:

  - "fold" — compile-time constant folding. The marked
    instruction carries the result (typically a PUSH) replacing
    a compute sequence that appeared in source.
  - "coalesce" — read-write merging. The marked instruction is
    part of a SHL/OR sequence (or similar) introduced by the
    compiler to combine adjacent source-level reads or writes,
    such as packing narrower fields into a single storage slot.

Together with the previously-defined "inline" and "tailcall",
this covers the four transformations bugc emits today or will
emit in the near term (inline once a function inlining pass
lands). Propagate was considered for v1 and deferred as
borderline.

Updates:
- transform.schema.yaml: description enumerates the four v1
  identifiers; examples include single-identifier cases for
  each plus combinations ["inline", "fold"], ["coalesce",
  "coalesce"].
- context.ts: Transform.Identifier union extended with "fold"
  and "coalesce" (still keeps `string & {}` for extensibility
  and autocomplete).
- transform.mdx: subsection for each identifier with a concrete
  EVM-level example, updated repetition/composition section
  with new combinations.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 16, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://ethdebug.github.io/format/pr-preview/pr-212/

Built to branch gh-pages at 2026-04-16 09:10 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@gnidan gnidan changed the base branch from main to transform-context April 16, 2026 08:05
The context schema's discriminator keys combine via allOf of
if/then rules, so a single context object can carry multiple
keys at once (e.g., `invoke`, `return`, and `transform` all
side by side). Use gather only when two contexts would collide
on the same key.

- transform spec: switch the TCO back-edge example from gather
  to the flat form; revise the tailcall bullet accordingly
- transform schema: note in the description that flat
  composition is preferred; gather is for key collisions
- gather spec: add a "When to use" section flagging the flat
  form as the default and listing the canonical collision
  cases (multiple frames, multiple variables blocks)
gnidan added a commit that referenced this pull request Apr 16, 2026
Pair with #212's flat-form guidance: when an inlined body's
first instruction carries both an invoke and a transform,
those belong as sibling keys on a single context — gather
isn't needed because `invoke` and `transform` don't collide.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant