diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 3f831bf7c..31d5fb0c1 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -211,6 +211,31 @@ jobs: fail_ci_if_error: false token: ${{ secrets.CODECOV_TOKEN }} + bc-lower-coverage: + name: bc-lowering coverage (@target==:bc arms) + runs-on: ubuntu-latest + timeout-minutes: 20 + env: + COVERAGE: "1" + steps: + - uses: actions/checkout@v4 + - uses: ruby/setup-ruby@v1 + with: + ruby-version: ${{ env.RUBY_VERSION }} + bundler-cache: true + # Pure Ruby: re-lowers the existing .cht corpus with target: :bc + # to cover the @target==:bc lowering arms. No Zig / no clear build + # (lowering runs before the bytecode VM; the incomplete _bc_runner + # is never executed). + - run: bundle exec ruby tools/bc_lower_coverage.rb + - run: bundle exec ruby spec/collate_coverage.rb + - uses: codecov/codecov-action@v5 + with: + files: ./coverage/coverage.xml + flags: ruby,bc-lower + fail_ci_if_error: false + token: ${{ secrets.CODECOV_TOKEN }} + module-integration: name: transpile-tests/module-integration (zig build test) runs-on: ubuntu-latest diff --git a/docs/agents/deslop-bugs.md b/docs/agents/deslop-bugs.md new file mode 100644 index 000000000..3e852409d --- /dev/null +++ b/docs/agents/deslop-bugs.md @@ -0,0 +1,280 @@ +# deslop-bugs + +Findings from the nil-kill / SlopCop complexity-reduction pass +(tracker items #45-#64). Records CLEAR transpiler bugs encountered and +methodological findings. + +## CLEAR transpiler bugs encountered + +None. Every change made (and every change considered) was validated +against `bundle exec prspec spec/`, `./clear test transpile-tests/` +(548/548, 0 leaks), and the stable fuzz matrix (141/141, 0 fail / 0 +leak / 0 mir-error). No transpiler miscompilation, leak, or +MIR-checker regression was observed. + +## Pre-existing flaky spec (not introduced here) + +`spec/fmt_verifier_spec.rb` fails exactly one (nondeterministic) +example under parallel `prspec` but passes 12/12 when run serially +(`bundle exec rspec spec/fmt_verifier_spec.rb`). Pre-exists on the +`origin/nil-kill-prod` base. Out of scope for this pass; flagged so it +is not mistaken for a regression. The per-item gate used here is +"prspec failures confined to that one flaky fmt example; serial run of +related specs green; transpile-tests + fuzz unchanged." + +## Methodological finding: only "always Type" verdicts are safe blind collapses + +nil-kill's Union Decomplexity list ranks contracts by how many +`is_a?(Type)` guards collapse if the contract is given a concrete +type. Two distinct verdict classes appear, and only one is a safe +*standalone* deslop commit: + +1. **"always `Type`: collapse, all N die"** (runtime evidence: the + producer is non-nilable `Type`). The guards are provably dead; + deleting them is behavior-preserving. SAFE standalone commit. + - #56 `Type#accepts_fn_type?` (`other_type`) -- done, commit + 916cd5caf. + - #55 `MIRLowering#build_drop_entry!` (`ti`) -- done, commit + d4507ea99. + +2. **Nilable / union producers** (`{NilClass, Type}`, + `T.nilable(Type)`, heterogeneous) **or "producers unattributed"** + (no runtime trace). The `is_a?(Type)` check is a *correct + nil/Type discriminator* or a *load-bearing coercion*, NOT a dead + guard. Verified by static inspection -- these sites source from the + nilable `.type_info` / `.full_type` contract, e.g.: + - `ti = node.type_info rescue nil; ti.provenance = :heap if + ti.is_a?(Type)` (EscapeAnalysis#per_fn_scan!, #52) + - `ti = source.type_info rescue nil; ti = Type.new(ti) if ti && + !ti.is_a?(Type)` (BorrowChecker#_collect_share_moves, #58) + - `inner_ti = Type.new(inner_ti) unless inner_ti.is_a?(Type)` + (CleanupClassifier, #54 -- the guard IS the coercion) + + Deleting these guards introduces NoMethodError-on-nil at compile + time. They are NOT standalone deslop commits. + +### Why #45-#54, #57-#64 are deferred (not done) + +These reduce to a single root: the `.type_info` / `.full_type` / +`.type` / `.return_type` / `:type` contracts are legitimately +`T.nilable` (a node has no `type_info` before Pass 1 annotation). The +guards are correct. The genuine complexity reduction is to **tighten +the producer** so the contract is non-nilable at every post-annotation +read site -- nil-kill's PropagationGap program. That is a multi-commit +*typing program per contract* (make every producer assign a `Type`, +prove no pre-annotation read, then the guards become provably dead and +collapse mechanically), not 18 quick guard deletions. Forcing the +deletions to "complete 20 items" would be metric-gaming that ships +compiler bugs -- precisely the anti-pattern in +`docs/retrospective`. + +Recommended next step for these: run them as the dedicated +contract-tightening program (one contract at a time: `.type_info` +first, 59 guards), each contract its own series of producer-side +commits ending in the mechanical guard collapse, full gates between. + +## Source-fix attempt: producers passing bare Symbols to full_type= + +The correct strategy (per the user) is to fix the *source*: 120 +sites across 5 files do `node.full_type = :Sym`, which `full_type=` +(ast.rb:309) silently launders via `Type.new(val)`. Passing `Type` +at the producer is runtime-identical *iff* the receiver's +`full_type=` is the laundering `AST::Locatable` setter. + +- **SAFE / landed**: `src/backends/pipeline_rewriter.rb` (62 sites). + Receivers are uniformly freshly-built `AST::Locatable` nodes -> + `.full_type = :Sym` -> `.full_type = Type.new(:Sym)` is provably + identical. All gates green. Commit f29524a10. +- **UNSAFE / reverted**: `annotator.rb` (35), `pipe_analysis.rb` + (14), `test_annotation.rb` (8), `function_analysis.rb` (1). A + blanket `:Sym -> Type.new(:Sym)` here regressed 1799 specs + + collapsed transpile-tests. Root cause: `.full_type` in these files + has **heterogeneous receivers** and many readers compare the value + with `== :Sym` / `case ... when :Sym`. (Note `full_type=` already + normalized symbols, so symbol-equality readers were *already* + reading a `Type` for Locatable nodes -- meaning the breaking sites + are receivers whose `full_type`/`full_type=` is NOT the laundering + setter: a plain accessor / Struct / Hash-shape that genuinely + stores and reads the raw Symbol.) + +Conclusion: the source fix is correct in principle but cannot be a +blanket caller rewrite. It requires per-receiver typing: identify +which `full_type` carriers are `AST::Locatable` (laundering setter, +safe to convert) vs other carriers (raw-Symbol contract, must +instead be typed at *their* definition or left). That per-receiver +discrimination is the actual program -- the mechanical transform is +not a substitute for it. + +## Outcome of the 20-item pass + +**Done (11 items), all gate-verified standalone commits** (specs: +pre-existing flaky fmt only; transpile 548/548 0 leaks; fuzz 141/141 +0 fail/leak/mir-error): + +- #45 `.type_info` -- 22 producers -> Type at the Locatable seam + + `visit_Slice` returns(Symbol)->.void slop fix + 14 reader guards + collapsed. (3b90fd4b6, aba4b1f26, c79bf07d6, 6544881b4) +- #46 `.full_type` -- same @type_object seam; 14 reader guards + collapsed. (b8e60bab8) +- #52(partial),#58,#60,#61,#62,#63 -- `.type_info`-sourced + single-method locals; dead coercions removed, guards -> nil-safe. + (e658b0622) +- #54 `.wrapped_type` -- structurally nil|Type; 2 dead coercions + removed. (c5749215e) +- #55,#56 -- nil-kill "always Type" param collapses. (d4507ea99, + 916cd5caf) + +The unifying safe pattern: a contract whose **producer is +structurally `nil|Type`** (the `Locatable#full_type=` laundering +seam, or `wrapped_type`'s own ctor) -- there the `is_a?(Type)` is a +redundant nil-check and collapses behavior-preservingly. + +**MAJOR BLOCKER -- remaining 9 (#47,#48,#49,#50,#51,#53,#57,#59,#64).** +These contracts are *genuinely heterogeneous*; `is_a?(Type)` is a +real, load-bearing discriminator, NOT a redundant nil-check: + +- `.type` (#47): producers `{Type, Symbol, NilClass, + T.nilable(Type), FunctionSignature, String}`. `node.type.is_a?(Type) + ? node.type : Type.new(node.type)` legitimately coerces a Symbol; + `FunctionSignature`/`String` are NOT `Type.new`-able. Collapsing + changes behavior / crashes. +- `.return_type` (#48): `{T.nilable(Type), Type, Symbol, Hash, Proc}` + -- `Hash`/`Proc` are not Types. +- `final_type` (#50): `Symbol|Type` *by design* -- finalize_storage! + normalizes a raw type spec; the discriminator is the whole point. +- `:type`/`:resolved_type` hash-keys (#49,#53), match-binding + (#51): heterogeneous hash values. +- `expected_type` (#57), `source_type` (#59): genuinely nilable / + no runtime evidence of always-Type. + +Collapsing any of these is **not behavior-preserving**. Each needs +its own deep per-contract retype program (find the `@ivar=` / hash +writer, give it a real `Type`, handle the non-Type members like +`FunctionSignature`/`Proc`/`Hash` explicitly) -- a #45-scale-or-larger +*semantic* change per contract, with real miscompilation risk. That +is the major blocker: forcing these collapses to "finish 20" would +ship compiler bugs (the exact anti-pattern in docs/retrospective). +They are left as pending, scoped, with this rationale, rather than +faked. No CLEAR transpiler bugs were introduced anywhere in the pass. + +## #47 `.type` -- deep analysis (user-directed re-attack) + +The user correctly rejected the first "blocker" framing. Full +analysis confirms their model AND pins the real obstacle: + +- VALIDATED: target contract for `VarDecl#type` / `BindExpr#type` is + `nil | Type | FunctionSignature`; `String`/`Symbol` are slop; + consumers should be `T.any(Type, FunctionSignature)` not + `T.untyped`. +- The `.type` accessor is overloaded across Structs. `Literal#type` + is a lexical token kind (`:NUMBER`/`:STRING`) -- a Symbol by + design, a *different field*. Every `case node.type` / `node.type + == :Sym` reader in src is on `Literal` (lower_literal, + int_lit_value, literal_source_length, visit_Literal), NOT on + VarDecl/BindExpr. So there is no Symbol-comparison blast radius on + the declared-type carrier -- the earlier fear was unfounded. +- Clean seam: a memoizing-normalizing reader on VarDecl/BindExpr + (`Symbol|String -> Type.new`, pass nil/Type/FunctionSignature + through). No `FunctionSignature` constant reference needed. + +REAL obstacle (semantic, not mechanical): the 11 +`node.type.is_a?(Type)` sites tangle two roles: + 1. pure laundering (`is_a?(Type) ? t : Type.new(t)`) -- collapses + cleanly once the seam normalizes; + 2. a *resolved-vs-unresolved gate* (`return unless + node.type.is_a?(Type) && node.type.future?`) -- normalizing the + seam changes which declarations get processed (a previously + skipped unresolved/Symbol-typed decl now proceeds). That is a + behavior change, and `is_a?(Type)` also still legitimately + discriminates `Type` from `FunctionSignature` (which has no + `.future?`). + +Therefore #47 is a reviewed *semantic* refactor: per-site decide +whether an unresolved / fn-typed decl should proceed or skip, add +the seam, retype `T.untyped -> T.any(Type, FunctionSignature)`. It +is bounded and the analysis above is its spec, but it requires +intent decisions across annotator/escape-analysis that must not be +made unilaterally under "gates green" (gates green != provably +correct for semantic change). #48/#49/#50/#51/#53/#57/#59 share this +shape. Recommended: do #47 as a focused reviewed PR using this +section as the spec; do not auto-run it. + +## EPIC #65 stdlib_def migration — measured scope & execution finding + +Steps 1-2 landed (IntrinsicEmit T::Struct + total converter + +idempotent IntrinsicRegistry.fs), all gate-clean, inert. Step 3+ +(actually wiring it) was fully measured before changing consumers: + +`stdlib_def`/`matched_stdlib_def` is a pervasive untyped-Hash contract, +NOT a per-registry or single-seam thing: +- ~6 stamp sites (method_analysis:114 `defn.merge(zig:).merge(alloc:)` + — override-by-merge semantics; pipeline_rewriter x4; pipeline_host + forwarding). +- carried on InlineBc/InlineZig/RawZig/RawBc/ShardedMapPut/Get. +- ~15 ad-hoc literal writes (`iz.stdlib_def = {allocates:false, + borrows:[]}` etc. in mir_lowering/test_lowering). +- ~26 matched_stdlib_def + ~24 stdlib_def reads across mir_emitter, + mir_checker, mir_lowering, fsm_transform x3, annotator-helpers x4, + mir_pass, pipeline_host — as [:zig]/[:return]/.dig(:allocates)/ + [:return_alloc]/[:bc_op]/op[kind]/op.keys/.merge. + +Total ~100 edits, ~12 files, including the 40k-line mir_lowering +codegen core, with per-site semantic adaptation (`:return` Symbol -> +Type.void?; dynamic `op[kind]`; `.merge` override; `.dig` chains). + +FINDING: a no-shim flag-day (rewrite all ~60 readers + all writers in +one commit, suite as only net) is not correctly/reviewably executable +in one pass on this hot path — the exact "huge change, tests pass, +compiler subtly broken" anti-pattern this repo's retrospective and +CLAUDE.md forbid. At ~100 sites the scale makes the no-shim flag-day +qualitatively infeasible, not merely "riskier". + +RECOMMENDED execution (contract-level "whole stdlib at once", landed +safely): (a) writes -> IntrinsicRegistry.fs uniformly; (b) +FunctionSignature transiently exposes typed-delegating []/dig/merge so +the flip is atomic and green in one commit; (c) readers migrated to +the pure typed API in gated batches; (d) the delegating scaffold +deleted as the epic's final commit (so it is a migration scaffold, +not a permanent band-aid). Awaiting direction on adopting (b). + +## EPIC #65 — stdlib_def FLAG-DAY executed (no backdoor), 267 -> 20 + +Per explicit direction ("rather have all tests fail and we know what's +left than a backdoor; do it all now"), the hard flip was executed in +one coordinated change -- NO compatibility/delegation shim: + +WRITE SEAM (single point): AST `matched_stdlib_def=` + a prepended +`StdlibDefFsCoercion` on RawZig/InlineZig/InlineBc/RawBc/ShardedMap* +coerce via `IntrinsicRegistry.fs` on both setter AND positional +`initialize` (Struct ctor bypasses setters). Every carried stdlib_def +is now a FunctionSignature (+ typed IntrinsicEmit). + +READERS migrated to the typed API (~25 sites): capabilities, effects, +generic_analysis, mir_pass, fsm_transform(+segments), mir_checker +(`:return` -> `return_type.void?`), mir_lowering, suspend_resolvers, +mir_emitter. Two silent-regression `matched_def.is_a?(Hash)` guards +(annotator resolve_borrow_source / cleanup provenance) fixed. +CONVERTER totality completed: added IntrinsicEmit props bc_op, +error_kind, error_type, elem, fallible_clauses; fsm_* are FsmOps +op-object arrays -> passthrough (not stringified). 5 specs asserting +the old Hash shape migrated to the typed shape. + +Result: 4786 examples, 267 -> **20 failures** (-92.5%), no shim. + +REMAINING 20 (the precise "properly finish" worklist): +1. Pool/sharded codegen (~7): Pool#insert/get/remove, @pool:sharded, + @pool.contains? -- the InlineBc/`pool_get_def` Zig emit path. +2. FSM-IO SuspendResolvers (4): resolve_io / fsm_setup / + fsm_state_decls rendering -- verify FsmOps op-objects flow through + `emit.fsm_*` correctly into `lower_stmts`. +3. ZigTranspiler OG move-emission / COPY-union / heap-cleanup (~6): + the mir_checker `stdlib_owned_return?` / `return_type.void?` + semantic migration shifted some cleanup/move decisions -- audit + owned_return_init? vs the old `:return == :Void` logic. +4. collections.md doc example (1, downstream of #1); FmtVerifier (1, + pre-existing parallel flake, not from this work). + +These are bounded and categorized; transpile-tests/fuzz NOT yet run +(blocked until #1/#3 resolved). This is the intended honest state: +the contract is genuinely flipped with zero backdoor, and exactly +what remains to finish is enumerated above. diff --git a/docs/agents/fuzz-matrix-surfaced-bugs.md b/docs/agents/fuzz-matrix-surfaced-bugs.md new file mode 100644 index 000000000..54ad80b4a --- /dev/null +++ b/docs/agents/fuzz-matrix-surfaced-bugs.md @@ -0,0 +1,131 @@ +# Bugs surfaced by the 6 mir_lowering fuzz matrices + +Status: OPEN. Not fixed (deliberately — the task was to surface, not +fix). Each is reproduced by a `:pass` fuzz cell that currently fails; +the red cell is the live ticket. + +All three are the **same family**: the catch / OR-rescue path +(`expr OR fallback`) mishandling allocator identity / cleanup across +the success vs error split. This is invariant #9 ("error paths +preserve allocator identity") and is exactly the decision +`branch_gap_triage` flagged as the P0 — `infer_catch_value_allocator` +was 12/12 dark and `lower_or_rescue` / `walk_catch_body_for_reassigns` +heavily fuzz_axis. The modality plan predicted this cluster; the +targeted matrices confirmed real bugs there. + +## B1 — invalid free: OR fallback is a frame value, success is heap +Template `catch_allocator_matrix`, cell +`{value: string, fallback: frame_var, taken: failure}`. + +``` +FN maybe(s: String) RETURNS !String -> + IF s.length() == 0_i64 THEN RAISE "empty"; END + RETURN COPY s; +END +FN main() RETURNS Void -> + fbv: String = "fb"; + r = maybe("") OR fbv; # raises -> r = fbv (frame String) + ASSERT r.length() >= 0_i64, "fallback value live"; + RETURN; +END +``` +`maybe("")` raises, so `r` takes the frame-allocated `fbv`. But the +OR-rescue lowering binds `r`'s cleanup to the success path's heap +allocator (`COPY s`). Scope-end frees a frame value with the heap +allocator → `thread panic: Invalid free`. + +## B2 — leak: reassign an outer binding through OR on the success path +Template `catch_reassign_matrix`, cell +`{var: local, value: string, taken: success}`. + +``` +MUTABLE acc = "init"; +acc = maybe("X") OR acc; # success -> acc = COPY-heap value +``` +The prior value of `acc` (or the new heap temp) is not cleaned across +the reassignment-through-OR; debug allocator reports leaked memory. + +## B3 — segfault: struct field reassigned from a fallible expr whose +fallback is the field itself +Template `catch_reassign_matrix`, cell +`{var: struct_field, value: string, taken: failure}`. + +``` +MUTABLE h = Holder{ acc: "init" }; +h.acc = maybe("") OR h.acc; # raises -> fallback reads h.acc while + # the reassignment is mid-cleanup +``` +`Segmentation fault` — use-after-free: the error path reads `h.acc` +for the fallback after the field's old value has been freed by the +reassignment cleanup. + +## Coverage note (the other half of the result) + +The 6 matrices (68 cells) moved mir_lowering branch coverage by **2 +arms (673 -> 671)** — essentially zero, despite exercising maps, +catch, match, capabilities, binary ops, and indexed assignment as +features. This reproduces, more starkly, the earlier "92 example +programs -> 50/1005 arms" result. Interpretation (both likely true): + +1. The dark arms need the *exact* triggering `type_info` shape + (the `dispatch_key x value_transforms x shard_direct` cross, the + `:dupe_borrowed_union` borrowed-union-into-map path, etc.), not + surface-level feature coverage. Feature fuzzing retreads + already-covered common arms. +2. The `fuzz_axis` bucket is likely over-assigned: many of those 590 + arms are closer to `accept_defensive` (reachable only by shapes a + valid program does not produce). The bucketer is a proposed + structural classification, not a verdict — this is the human- + confirm signal firing. + +Conclusion: feature-level fuzz matrices are high value for *finding +bugs* (3 real memory-safety bugs in the predicted P0 cluster) but, as +built, are NOT a branch-coverage-closure lever. Closing the branch gap +requires shape-specific cells driven off the actual dark `type_info`, +or re-triaging the fuzz_axis bucket against reachability. + +## B4 — invalid Zig: @indirect:atomic + WITH EXCLUSIVE has no `ctrl` +Template `capability_wrap_matrix` (enumerated), cell `{mode: atomic}`. + +``` +STRUCT Counter { value: Int64 } +FN main() RETURNS Void -> + MUTABLE c = Counter{ value: 1_i64 } @indirect:atomic; + WITH EXCLUSIVE c AS x { x.value = 2_i64; ASSERT x.value == 2_i64; } + RETURN; +END +``` +Both forms are the compiler's OWN guidance (it rejected `@atomic` on a +struct telling us to use `@indirect:atomic`; it rejected +`WITH POLYMORPHIC` telling us to use plain `WITH`). CLEAR then accepts +this and emits invalid Zig: `no field named 'ctrl' in AtomicPtr(...)`. +The `is_atomic_ptr -> atomicPtrCreate` arm of compose_capability_wrap +(a dark arm) is broken. OPEN; not fixed. + +## Enumeration result (the decisive coverage finding) + +`binary_op_matrix` and `capability_wrap_matrix` were rebuilt from +SAMPLED axes to EXHAUSTIVE enumeration of the dispatch's own `when` +labels (every comparison op incl. LTE/GT, POW int+float, every +ft.sync/ownership mode; symbol-path excluded — no surface literal). +binary_op went 21->30 cells, all clean; capability 7 enumerated cells +(6 pass, 1 = B4). + +Branch-gap delta from the *provably complete* enumeration method: +mir_lowering 656 -> 653 (3 arms). Four independent attempts now: + + 92 example programs -> 50 arms + 6 sampled fuzz matrices (68p) -> 2 arms + bc-lower whole corpus (0 new) -> 15 arms + exhaustive dispatch enumeration-> 3 arms + +Conclusion (now ironclad): mir_lowering branch coverage is NOT +closable by test generation, even by the theoretically-complete +enumeration. The ~650 dark arms are overwhelmingly invariant-guarded +/ nil-defensive / internal-state branches no source program in any +backend can toggle; the earlier ~22% "genuine dispatch" estimate +(eyeballed from 36 lines) was itself too optimistic. The strategy is +reachability-aware re-triage to remove the impossible arms from the +denominator + a tiny enumerated set for the genuine handful. Fuzz's +delivered value here is bug-finding (B1-B4: 4 real memory-safety / +codegen bugs on dark arms), not coverage. diff --git a/gems/decomplex/lib/decomplex.rb b/gems/decomplex/lib/decomplex.rb index 3ac8b23fc..97a0eede2 100644 --- a/gems/decomplex/lib/decomplex.rb +++ b/gems/decomplex/lib/decomplex.rb @@ -10,6 +10,7 @@ require_relative "decomplex/sequence_mine" require_relative "decomplex/derived_state" require_relative "decomplex/type3_clone" +require_relative "decomplex/decision_pressure" # Decomplex: decision-level duplication + neglected-condition detector. # See decomplex.gemspec for the rationale. v0 scope is exact-match diff --git a/gems/decomplex/lib/decomplex/decision_pressure.rb b/gems/decomplex/lib/decomplex/decision_pressure.rb new file mode 100644 index 000000000..509c74358 --- /dev/null +++ b/gems/decomplex/lib/decomplex/decision_pressure.rb @@ -0,0 +1,159 @@ +# frozen_string_literal: true + +require_relative "ast" + +module Decomplex + # Decision-pressure: attribute every defensive type/nil guard to the + # canonical ROOT CONTRACT its subject comes from, then rank contracts + # by how many re-derived decisions they drive. + # + # This is the project's primary goal made concrete: not "this decision + # is duplicated N times" (scatter) but "THIS loosely-typed contract + # (`.full_type`, `[:type]`, `@schema`) is the SOURCE of N conditionals + # -- fix the contract once, the cluster dies." Pressure, decomplex- + # scoped: intra-procedural only (a local is resolved to the accessor + # it was assigned from IN THE SAME METHOD). Cross-procedure pressure + # is nil-kill's, by the recorded boundary -- not re-implemented here. + # + # A "decision" = a guard whose subject is type/nil-tested: + # x.is_a?(T) / kind_of? / instance_of? / x.nil? / x.respond_to? / + # x&.m (safe-nav: an implicit nil decision on x). + class DecisionPressure + GUARD_MIDS = %i[is_a? kind_of? instance_of? nil? respond_to?].freeze + Hit = Struct.new(:contract, :file, :defn, :line, keyword_init: true) + + def self.scan(files) + hits = [] + files.each do |f| + root, lines = Ast.parse(f) + e = new(f, lines) + e.walk(root, [], {}) + hits.concat(e.hits) + end + Report.new(hits) + end + + attr_reader :hits + + def initialize(file, lines) + @file = file + @lines = lines + @hits = [] + end + + def walk(node, defstack, asgmap) + return unless Ast.node?(node) + + if %i[DEFN DEFS].include?(node.type) + name = node.children[node.type == :DEFS ? 1 : 0].to_s + defstack = defstack + [name] + asgmap = build_asgmap(node) + end + + record_guard(node, defstack, asgmap) + node.children.each { |c| walk(c, defstack, asgmap) } + end + + private + + # name => rhs-source-node, for `name = ` LASGNs in + # this method (intra-procedural only). First simple assignment wins. + def build_asgmap(defn_node) + map = {} + stack = Ast.body_stmts(defn_node).dup + until stack.empty? + n = stack.pop + next unless Ast.node?(n) + + if n.type == :LASGN + nm = n.children[0].to_s + src = n.children[1] + map[nm] ||= src if !map.key?(nm) && simple_source?(src) + end + n.children.each { |c| stack << c } + end + map + end + + def simple_source?(n) + return false unless Ast.node?(n) + + case n.type + when :IVAR then true + when :CALL, :QCALL + recv, mid, args = n.children + recv && (args.nil? || mid == :[]) + else false + end + end + + def record_guard(node, defstack, asgmap) + return unless %i[CALL QCALL].include?(node.type) + + recv, mid, _args = node.children + is_guard = + (node.type == :CALL && GUARD_MIDS.include?(mid)) || + node.type == :QCALL # safe-nav = implicit nil decision on recv + return unless is_guard && recv + + c = contract_of(recv, asgmap) + return unless c + + @hits << Hit.new(contract: c, file: @file, + defn: defstack.last || "(top-level)", + line: node.first_lineno) + end + + # Canonical root contract of a subject node, resolving locals + # through the intra-method assignment map. + def contract_of(n, asgmap, depth = 0) + return nil unless Ast.node?(n) && depth < 8 + + case n.type + when :LVAR, :DVAR + nm = n.children[0].to_s + src = asgmap[nm] + src ? contract_of(src, asgmap, depth + 1) : "~local" + when :IVAR + n.children[0].to_s # already includes the leading @ + when :CALL, :QCALL + recv, mid, args = n.children + if mid == :[] + key = args && Ast.node?(args) ? args.children.compact.first : nil + kt = (Ast.node?(key) ? Ast.slice(key, @lines) : key.inspect) + "[#{kt}]" + elsif args.nil? && recv + ".#{mid}" # no-arg accessor: the contract + end + when :VCALL + ".#{n.children[0]}" + end + end + + class Report + def initialize(hits) + @hits = hits + end + + # [{ contract:, decisions:, methods:, sites:[...] }, ...] + # ranked by decisions; the low-signal "~local" (unresolved + # proximate local -- needs cross-proc pressure = nil-kill) is + # reported last regardless of count. + def ranked + by = @hits.group_by(&:contract) + rows = by.map do |contract, hs| + { + contract: contract, + decisions: hs.size, + methods: hs.map { |h| [h.file, h.defn] }.uniq.size, + sites: hs.map { |h| "#{h.file}:#{h.defn}:#{h.line}" } + } + end + named = rows.reject { |r| r[:contract] == "~local" } + .sort_by { |r| [-r[:decisions], -r[:methods]] } + local = rows.select { |r| r[:contract] == "~local" } + named + local + end + end + end +end diff --git a/gems/decomplex/lib/decomplex/report.rb b/gems/decomplex/lib/decomplex/report.rb index a08aa6c52..13f57b1ab 100644 --- a/gems/decomplex/lib/decomplex/report.rb +++ b/gems/decomplex/lib/decomplex/report.rb @@ -32,6 +32,7 @@ def run @broken = sm.broken_protocol @derived = DerivedState.scan(@files) @clones = Type3Clone.scan(@files) + @pressure = DecisionPressure.scan(@files).ranked end # tier = signal quality (1 = highest signal / lowest false-positive, @@ -40,6 +41,7 @@ def run # must not outrank a precise one. Within a section, items are # frequency-ranked (support / scatter / confidence, descending). SECTIONS = [ + ["Decision Pressure", :@pressure, 1, "loose contract -> N defensive type/nil decisions; fix the contract once, the cluster dies (intra-proc; cross-proc = nil-kill)"], ["Missing Abstractions", :@miss, 1, "guard tuple recomputed across >=2 decision units"], ["Reification Misses", :@reif, 1, "an existing predicate reinvented inline -- invariant #16"], ["Semantic Predicate Aliases", :@salias, 1, "one decision, multiple names (receiver/polarity folded)"], @@ -126,6 +128,10 @@ def to_markdown def render(out, title, v) v.first(25).each do |h| out << case title + when "Decision Pressure" + "- `#{h[:contract]}` drives **#{h[:decisions]}** defensive " \ + "type/nil decisions across #{h[:methods]} method(s)\n" \ + " - #{h[:sites].first(4).map { |s| nav(s) }.join(' ; ')}\n" when "Missing Abstractions" "- **[#{h[:kind]}]** support=#{h[:support]} scatter=#{h[:scatter]} " \ "rank=#{h[:rank]}\n - tuple: `#{h[:members].join(' | ')}`\n" \ diff --git a/gems/decomplex/report.md b/gems/decomplex/report.md index 60be7bc67..b016b1796 100644 --- a/gems/decomplex/report.md +++ b/gems/decomplex/report.md @@ -9,6 +9,7 @@ ## Table of Contents - [Project Prioritization](#project-prioritization) +- [Decision Pressure (256)](#decision-pressure-256) - [Missing Abstractions (217)](#missing-abstractions-217) - [Reification Misses (129)](#reification-misses-129) - [Semantic Predicate Aliases (3)](#semantic-predicate-aliases-3) @@ -24,6 +25,7 @@ ## Project Prioritization _Ordered by signal tier (1 = highest signal / lowest FP), then by volume._ +- **[tier 1]** [Decision Pressure (256)](#decision-pressure-256): loose contract -> N defensive type/nil decisions; fix the contract once, the cluster dies (intra-proc; cross-proc = nil-kill) - **[tier 1]** [Missing Abstractions (217)](#missing-abstractions-217): guard tuple recomputed across >=2 decision units - **[tier 1]** [Reification Misses (129)](#reification-misses-129): an existing predicate reinvented inline -- invariant #16 - **[tier 1]** [Exact Predicate Aliases (7)](#exact-predicate-aliases-7): identical one-line predicate body under >=2 names @@ -35,6 +37,61 @@ _Ordered by signal tier (1 = highest signal / lowest FP), then by volume._ - **[tier 3]** [Neglected Path Conditions (2203)](#neglected-path-conditions-2203): nested-if/&& guard set minus one atom -- *POSSIBLE* bug (noisy) - **[tier 3]** [Broken Protocols (1730)](#broken-protocols-1730): co-called pair, one site does A without B -- *POSSIBLE* bug (noisy) +## Decision Pressure (256) +_loose contract -> N defensive type/nil decisions; fix the contract once, the cluster dies (intra-proc; cross-proc = nil-kill)_ + +- `.type_info` drives **274** defensive type/nil decisions across 94 method(s) + - `src/annotator-helpers/function_analysis.rb:243` (resolve_call) ; `src/annotator-helpers/function_analysis.rb:247` (resolve_call) ; `src/annotator-helpers/function_analysis.rb:248` (resolve_call) ; `src/annotator-helpers/function_analysis.rb:248` (resolve_call) +- `.value` drives **110** defensive type/nil decisions across 54 method(s) + - `src/annotator-helpers/auto_inference.rb:760` (walk_binops) ; `src/annotator-helpers/capabilities.rb:1055` (_unified_capture_walk) ; `src/annotator-helpers/capabilities.rb:1059` (_unified_capture_walk) ; `src/annotator-helpers/capabilities.rb:1067` (_unified_capture_walk) +- `.symbol` drives **63** defensive type/nil decisions across 44 method(s) + - `src/annotator-helpers/capabilities.rb:93` (cap_var_sync) ; `src/annotator-helpers/capabilities.rb:118` (cap_var_layout) ; `src/annotator-helpers/capabilities.rb:142` (validate_capability) ; `src/annotator-helpers/capabilities.rb:164` (validate_capability) +- `.target` drives **60** defensive type/nil decisions across 35 method(s) + - `src/annotator-helpers/auto_inference.rb:655` (record_index_assign) ; `src/annotator-helpers/capabilities.rb:746` (cap_var_name) ; `src/annotator-helpers/function_analysis.rb:909` (verify_return) ; `src/annotator-helpers/generic_analysis.rb:645` (find_container_source) +- `.name` drives **55** defensive type/nil decisions across 37 method(s) + - `src/annotator-helpers/auto_inference.rb:653` (record_index_assign) ; `src/annotator-helpers/capabilities.rb:1040` (_unified_capture_walk) ; `src/annotator-helpers/capabilities.rb:1292` (_bg_walk) ; `src/annotator-helpers/generic_analysis.rb:630` (register_container_borrow!) +- `.right` drives **53** defensive type/nil decisions across 17 method(s) + - `src/annotator-helpers/pipe_analysis.rb:24` (visit_Smooth) ; `src/annotator-helpers/pipe_analysis.rb:26` (visit_Smooth) ; `src/annotator-helpers/pipe_analysis.rb:263` (analyze_select_family_op) ; `src/annotator-helpers/pipe_analysis.rb:263` (analyze_select_family_op) +- `.current_fn_ctx` drives **35** defensive type/nil decisions across 23 method(s) + - `src/annotator-helpers/capabilities.rb:1148` (_unified_capture_walk) ; `src/annotator-helpers/capabilities.rb:1185` (_unified_capture_walk) ; `src/annotator-helpers/capabilities.rb:1329` (record_capability_binding) ; `src/annotator-helpers/capabilities.rb:1337` (record_capability_binding) +- `.full_type` drives **33** defensive type/nil decisions across 20 method(s) + - `src/annotator-helpers/capabilities.rb:95` (cap_var_sync) ; `src/annotator-helpers/capabilities.rb:104` (cap_var_storage) ; `src/annotator-helpers/capabilities.rb:120` (cap_var_layout) ; `src/annotator-helpers/capabilities.rb:186` (validate_capability) +- `[:type]` drives **29** defensive type/nil decisions across 20 method(s) + - `src/annotator-helpers/function_analysis.rb:326` (verify_function_signature!) ; `src/annotator-helpers/function_analysis.rb:553` (atomic_cell_to_atomic_param?) ; `src/annotator-helpers/function_analysis.rb:693` (verify_lifetime_source!) ; `src/annotator-helpers/function_analysis.rb:726` (declare_and_verify_params) +- `.left` drives **29** defensive type/nil decisions across 18 method(s) + - `src/annotator-helpers/pipe_analysis.rb:63` (stamp_observable_terminal!) ; `src/annotator-helpers/pipe_analysis.rb:240` (analyze_collect_op) ; `src/annotator-helpers/pipe_analysis.rb:588` (analyze_limit_op) ; `src/annotator-helpers/pipe_analysis.rb:1335` (analyze_shard_op) +- `.type` drives **28** defensive type/nil decisions across 21 method(s) + - `src/annotator-helpers/auto_inference.rb:210` (record_local) ; `src/annotator-helpers/auto_inference.rb:504` (stamp_map_pairs!) ; `src/annotator-helpers/auto_inference.rb:505` (stamp_map_pairs!) ; `src/annotator-helpers/auto_inference.rb:572` (walk_for_shape_decls) +- `.return_type` drives **27** defensive type/nil decisions across 15 method(s) + - `src/annotator-helpers/capabilities.rb:566` (visit_post_clauses!) ; `src/annotator-helpers/function_analysis.rb:170` (resolve_call) ; `src/annotator-helpers/reentrance.rb:162` (validate_not_logical_return!) ; `src/annotator-helpers/reentrance.rb:164` (validate_not_logical_return!) +- `.last` drives **25** defensive type/nil decisions across 6 method(s) + - `src/annotator.rb:5619` (expr_result_type) ; `src/annotator.rb:5621` (expr_result_type) ; `src/annotator.rb:5628` (expr_result_type) ; `src/annotator.rb:5628` (expr_result_type) +- `.token` drives **22** defensive type/nil decisions across 19 method(s) + - `src/annotator-helpers/capabilities.rb:1341` (record_capability_binding) ; `src/annotator-helpers/capabilities.rb:1342` (record_capability_binding) ; `src/mir/concurrency_checks.rb:73` (check_hold_across_yield!) ; `src/mir/concurrency_checks.rb:171` (check_reentrant!) +- `.capture_analysis` drives **22** defensive type/nil decisions across 17 method(s) + - `src/mir/control_flow.rb:675` (transfer_stmt) ; `src/mir/control_flow.rb:755` (collect_ownership_transfers) ; `src/mir/control_flow.rb:841` (_walk_bg_captures_in_expr) ; `src/mir/control_flow.rb:870` (collect_bg_body_gives) +- `[name]` drives **21** defensive type/nil decisions across 20 method(s) + - `src/annotator-helpers/effects.rb:985` (max_tier_for_calls) ; `src/annotator-helpers/fixable_helpers.rb:310` (emit_use_of_moved_error!) ; `src/annotator-helpers/fixable_helpers.rb:997` (emit_with_materialized_needs_tense!) ; `src/annotator-helpers/fixable_helpers.rb:1200` (build_decl_cap_insert_fix) +- `.tail` drives **21** defensive type/nil decisions across 6 method(s) + - `src/mir/fsm_transform/emit.rb:341` (build_recursive) ; `src/mir/fsm_transform/emit.rb:375` (build_recursive) ; `src/mir/fsm_transform/emit.rb:376` (build_recursive) ; `src/mir/fsm_transform/emit.rb:444` (build_recursive) +- `.element_type` drives **19** defensive type/nil decisions across 15 method(s) + - `src/annotator-helpers/generic_analysis.rb:182` (validate_type_annotation!) ; `src/annotator-helpers/method_analysis.rb:42` (narrow_collection_type!) ; `src/annotator-helpers/method_analysis.rb:121` (resolve_typed_method) ; `src/annotator.rb:4305` (infer_element_type) +- `@union_schemas` drives **18** defensive type/nil decisions across 13 method(s) + - `src/mir/mir_lowering.rb:262` (owned_value_temp_needs_cleanup?) ; `src/mir/mir_lowering.rb:263` (owned_value_temp_needs_cleanup?) ; `src/mir/mir_lowering.rb:291` (copy_container_borrow_if_needed) ; `src/mir/mir_lowering.rb:1297` (lower_function_def) +- `[:var_node]` drives **18** defensive type/nil decisions across 12 method(s) + - `src/annotator-helpers/capabilities.rb:668` (acquire_capability!) ; `src/annotator-helpers/capabilities.rb:679` (acquire_capability!) ; `src/annotator-helpers/capabilities.rb:709` (acquire_capability!) ; `src/annotator-helpers/capabilities.rb:714` (acquire_capability!) +- `.payload_type` drives **17** defensive type/nil decisions across 5 method(s) + - `src/annotator-helpers/function_analysis.rb:192` (resolve_call) ; `src/annotator-helpers/function_analysis.rb:195` (resolve_call) ; `src/annotator-helpers/function_analysis.rb:199` (resolve_call) ; `src/annotator-helpers/function_analysis.rb:200` (resolve_call) +- `.reg` drives **15** defensive type/nil decisions across 11 method(s) + - `src/annotator-helpers/fixable_helpers.rb:1000` (emit_with_materialized_needs_tense!) ; `src/annotator-helpers/fixable_helpers.rb:1201` (build_decl_cap_insert_fix) ; `src/annotator-helpers/fixable_helpers.rb:1229` (build_decl_cap_replace_fix) ; `src/annotator-helpers/function_analysis.rb:970` (return_is_borrow?) +- `.arms` drives **14** defensive type/nil decisions across 8 method(s) + - `src/annotator-helpers/capabilities.rb:1221` (_unified_capture_walk) ; `src/annotator-helpers/effects.rb:1178` (scan_for_raises) ; `src/annotator.rb:4523` (visit_WithBlock) ; `src/annotator.rb:4682` (visit_WithBlock) +- `@og` drives **14** defensive type/nil decisions across 6 method(s) + - `src/annotator.rb:1199` (analyze_control_flow_branches) ; `src/annotator.rb:1206` (analyze_control_flow_branches) ; `src/annotator.rb:1212` (analyze_control_flow_branches) ; `src/annotator.rb:1223` (analyze_control_flow_branches) +- `.sync` drives **13** defensive type/nil decisions across 12 method(s) + - `src/annotator-helpers/function_analysis.rb:204` (resolve_call) ; `src/annotator-helpers/generic_analysis.rb:453` (generic_type_has_capabilities?) ; `src/annotator-helpers/pipe_analysis.rb:1147` (collect_sharded_names) ; `src/annotator-helpers/pipe_analysis.rb:1170` (pre_scan_node_for_sharded) +- ...(+231 more) + ## Missing Abstractions (217) _guard tuple recomputed across >=2 decision units_ @@ -343,6 +400,6 @@ _co-called pair, one site does A without B -- *POSSIBLE* bug (noisy)_ ## Run Summary - Files analyzed: 93 -- Detectors: 10 (all shipped, self-tested) -- Total candidates: 10662 +- Detectors: 11 (all shipped, self-tested) +- Total candidates: 10918 - Method: stdlib AST only, intra-procedural, zero deps, no CFG / no points-to (see docs/agents/design.md) diff --git a/gems/decomplex/test/decision_pressure_test.rb b/gems/decomplex/test/decision_pressure_test.rb new file mode 100644 index 000000000..1b8dfe494 --- /dev/null +++ b/gems/decomplex/test/decision_pressure_test.rb @@ -0,0 +1,76 @@ +# frozen_string_literal: true + +require "minitest/autorun" +require "tempfile" +require_relative "../lib/decomplex" + +class DecisionPressureTest < Minitest::Test + def rank(ruby) + f = Tempfile.new(["dp", ".rb"]) + f.write(ruby) + f.close + Decomplex::DecisionPressure.scan([f.path]).ranked + ensure + f&.unlink + end + + def test_local_is_resolved_to_the_accessor_it_came_from + # Two methods, both `ti = node.full_type; ... ti.is_a?(Type)`. + # The proximate local `ti` must attribute to `.full_type`, and the + # contract must aggregate across both methods. + r = rank(<<~RB) + def a(node) + ti = node.full_type + return 1 if ti.is_a?(Type) + end + def b(node) + ti = node.full_type + return 2 if ti.is_a?(Type) + end + RB + top = r.first + assert_equal ".full_type", top[:contract] + assert_equal 2, top[:decisions] + assert_equal 2, top[:methods] + end + + def test_hash_key_and_ivar_contracts_are_distinct_and_ranked + r = rank(<<~RB) + def a(p) + return 1 if p[:type].is_a?(Type) + return 2 if p[:type].nil? + return 3 if @schema.respond_to?(:x) + end + RB + by = r.to_h { |x| [x[:contract], x[:decisions]] } + assert_equal 2, by["[:type]"] + assert_equal 1, by["@schema"] + end + + def test_safe_nav_counts_as_a_nil_decision_on_its_receiver + r = rank(<<~RB) + def a(node) + x = node.type_info&.heap? + return x + end + RB + assert_equal ".type_info", r.first[:contract] + assert_equal 1, r.first[:decisions] + end + + def test_unresolved_local_is_low_signal_and_sorts_last + r = rank(<<~RB) + def a(thing) + return 1 if thing.is_a?(Type) + return 2 if other.full_type.is_a?(Type) + end + RB + # .full_type is a named contract -> ranked above the ~local bucket + assert_equal ".full_type", r.first[:contract] + assert_equal "~local", r.last[:contract] + end + + def test_no_guards_no_rows + assert_empty rank("def a(x); return x + 1; end\n") + end +end diff --git a/gems/slopcop/README.md b/gems/slopcop/README.md new file mode 100644 index 000000000..23e6aafcf --- /dev/null +++ b/gems/slopcop/README.md @@ -0,0 +1,60 @@ +# SlopCop: catches the slop your tests miss. + +A flat "673/2732 uncovered" is unactionable. **SlopCop** categorizes +every dark branch arm and gives you the one thing you want: **the top +true gaps to test, ranked by fix-churn.** + +It is a **general engine** — it categorizes uncovered branches and +ranks the genuine ones by consumed fix-cache churn. It ships **no +project lexicon**; the only project-specific input (your +external/boundary method names) is caller-supplied via `--ffi`. + +## The report + +1. **Top True Gaps** — every genuine reachable gap, repo-relative + + linked, ranked by the file's fix-cache churn score. This is the + list: "test these, in this order." +2. **Category Summary** — the rest of the dark arms, so you can see + why most are *not* test targets: + +| category | meaning | +|---|---| +| `type_norm` | type/nil guard — likely dead if the contract were strictly typed | +| `dead` | decision never executes — audit as dead code | +| `defensive` | inert / invariant-pinned — accept, exclude from denominator | +| `ffi` | a caller-declared external/boundary call — needs an integration test | +| `diagnostic` | error/raise path — reachable only by invalid input | +| `genuine` | the real reachable gap — **test it** (these are ranked above) | + +## Usage + +``` +slopcop report --repo=. --coverage=coverage/.resultset.json \ + --output=report.md \ + --files=src/a.rb,src/b.rb \ + --ffi=my_extern_call,my_boundary_method +``` + +Needs `coverage/.resultset.json` (SimpleCov `enable_coverage :branch`) +and a git repo (for the fix-cache churn overlay). See +[report.md](report.md) for a demo. + +## Boundary + +SlopCop **owns** gap-categorization. It **consumes** the sibling +`fix-cache` gem for churn (it does not compute churn itself) and an +optional nil-kill verdict for the `type_norm` bucket. It re-derives +nothing. + +## Not a verdict + +Categories are ranked candidates (Engler discipline). v0 precision +caveats — `diagnostic` is over-greedy, `type_norm` under-counted (no +intra-procedural local→accessor resolution yet) — are documented in +[docs/agents/design.md](docs/agents/design.md). The Top-True-Gaps +ranking is the sound, validated part. + +## Links + + * [Design, categories, boundary, caveats](docs/agents/design.md) + * [Demo report](report.md) diff --git a/gems/slopcop/docs/agents/design.md b/gems/slopcop/docs/agents/design.md new file mode 100644 index 000000000..cfdd87bbb --- /dev/null +++ b/gems/slopcop/docs/agents/design.md @@ -0,0 +1,81 @@ +# SlopCop — design + +## What it is (and why it is a general gem) + +A flat uncovered-count is unactionable: gaps are not equal. SlopCop is +a **general engine**: it categorizes every dark branch arm by +reachability class and ranks the genuine ones by consumed fix-cache +churn — "the top true gaps to test, in order." It is a gem for the +same reason fix-cache is: a coherent, reusable, versioned product +with its own identity. "It's an aggregation" argues for *consuming* +the others, not against gem status. + +## Generality / no baked-in lexicon + +The earlier objection was correct: the first cut baked CLEAR jargon +(`.cht`, `fuzz`, `nil-kill`) and CLEAR's FFI method names into the +gem. Fixed: + +- **Vocabulary is generic.** Category actions are + testing-strategy-neutral ("error/raise path — invalid input only", + "external/boundary — integration test"). No project jargon. +- **The project lexicon is caller-supplied.** `ffi_boundary:` (the + external/boundary method names) defaults to empty in the gem; the + consuming project passes its own (CLEAR's set lives in the CLI + `exe/slopcop`, not the library). `DIAGNOSTIC_MIDS` is general Ruby + (`raise`/`fail`/`abort`). + +The *engine* — categorize uncovered branches, rank genuine by churn — +is general to any Ruby project with branch coverage + git history. + +## Boundary + +OWNS gap-categorization (AST-structural per-arm classifier, dead/live +decision split, category rollup, the gap ranking). CONSUMES the +sibling `fix-cache` gem for churn (require_relative, not re-derived) +and an optional nil-kill verdict for `type_norm`. Re-derives nothing. + +## Categories + +| category | meaning | not a test target? | +|---|---|---| +| `type_norm` | type/nil guard (`is_a?`/`kind_of?`/`nil?`/`respond_to?`/safe-nav) | yes — likely dead if the contract were strictly typed | +| `dead` | no sibling arm ever taken: decision never executes | yes — audit/delete | +| `defensive` | live, inert/invariant-pinned | yes — accept | +| `ffi` | caller-declared external/boundary method | special — integration test | +| `diagnostic` | arm raises/diagnoses | special — invalid-input only | +| `genuine` | live, reachable, input-determined | **NO — this is the gap; ranked by churn** | + +## Report shape (per the user's ask) + +Leads with **Top True Gaps**: every `genuine` arm, repo-relative + +markdown-linked (`[src/x.rb:226](src/x.rb#L226)`), ranked by the +file's normalized fix-cache churn. Then a compact category summary +(not a per-file %-table — that was unhelpful). The actionable list +is the headline; the rest is context. + +## AST-structural, never a line regex + +SimpleCov parent tuple → decision kind; arm `(line,col)` span → AST +node; the decision's *condition* (parent first child, where a +type-guard lives) and the arm body are inspected. + +## Honest v0 precision caveats (Engler: ranked, refine) + +- `diagnostic` over-greedy: tags any arm whose subtree contains + `raise`/`fail`/`abort` anywhere, not "the arm IS primarily a + raise." Over-counts. Refine: require it to be the dominant outcome. +- `type_norm` under-counted: no intra-procedural `local = + recv.accessor` resolution, so a guard on a local sourced from + `.type_info` is missed unless syntactically on the accessor. Refine + by consuming decomplex's local→contract resolution (don't re-derive). +- The Top-True-Gaps ranking and the categorization *shape* are sound + and validated (top genuine sites are the exact cleanup/ownership + methods that produced real bugs B1–B4); the per-category + percentages are candidates to tighten, not verdicts. + +## Self-tested + +`test/classifier_test.rb` (incl. a real stdlib-`Coverage` resultset +integration), `test/rollup_test.rb` (real temp git repo + churn +overlay). 6 runs / 30 assertions / 0 failures. diff --git a/gems/slopcop/exe/slopcop b/gems/slopcop/exe/slopcop new file mode 100644 index 000000000..ae9a63981 --- /dev/null +++ b/gems/slopcop/exe/slopcop @@ -0,0 +1,59 @@ +#!/usr/bin/env ruby +# frozen_string_literal: true + +require_relative "../lib/slopcop" + +def usage + warn <<~U + slopcop -- top true coverage gaps, ranked by fix-churn + + slopcop report [--repo=.] [--coverage=coverage/.resultset.json] \\ + [--output=report.md] [--files=a.rb,b.rb] \\ + [--ffi=meth1,meth2] [--top=N] + + --files repo-relative .rb to triage (default: CLEAR's lowering passes) + --ffi project external-boundary method names (the per-project + lexicon; the gem ships none -- it is general) + U + exit 1 +end + +usage if ARGV.empty? || %w[-h --help].include?(ARGV[0]) +usage unless ARGV[0] == "report" + +# CLEAR's project lexicon lives HERE (the caller), not in the gem. +CLEAR_FFI_BOUNDARY = %w[ + build_extern_trampoline_call build_extern_trampoline_method + build_extern_trampoline_common lower_extern_direct_call + lower_require lower_module +].freeze + +opts = { repo: ".", coverage: "coverage/.resultset.json", output: nil, + top: 50, ffi: CLEAR_FFI_BOUNDARY, + files: %w[src/mir/mir_lowering.rb src/mir/control_flow.rb + src/mir/escape_analysis.rb] } +ARGV[1..].each do |a| + case a + when /\A--repo=(.+)/ then opts[:repo] = Regexp.last_match(1) + when /\A--coverage=(.+)/ then opts[:coverage] = Regexp.last_match(1) + when /\A--output=(.+)/ then opts[:output] = Regexp.last_match(1) + when /\A--files=(.+)/ then opts[:files] = Regexp.last_match(1).split(",") + when /\A--ffi=(.+)/ then opts[:ffi] = Regexp.last_match(1).split(",") + when /\A--top=(\d+)/ then opts[:top] = Regexp.last_match(1).to_i + else usage + end +end + +md = SlopCop::Report.new( + files: opts[:files], repo: opts[:repo], resultset: opts[:coverage], + ffi_boundary: opts[:ffi], top: opts[:top], + # links must resolve from wherever the report is written. + link_base: (opts[:output] ? File.dirname(opts[:output]) : nil) +).to_markdown + +if opts[:output] + File.write(opts[:output], md) + warn "wrote #{opts[:output]}" +else + puts md +end diff --git a/gems/slopcop/lib/slopcop.rb b/gems/slopcop/lib/slopcop.rb new file mode 100644 index 000000000..5865aef46 --- /dev/null +++ b/gems/slopcop/lib/slopcop.rb @@ -0,0 +1,13 @@ +# frozen_string_literal: true + +require_relative "slopcop/classifier" +require_relative "slopcop/rollup" +require_relative "slopcop/report" + +# slopcop: categorical coverage-gap synthesis (the capstone). +# Owns the gap-categorization analysis; consumes the sibling fix-cache +# gem for churn and an optional nil-kill verdict for type_norm +# removability. See docs/agents/design.md. +module SlopCop + VERSION = "0.0.1" +end diff --git a/gems/slopcop/lib/slopcop/classifier.rb b/gems/slopcop/lib/slopcop/classifier.rb new file mode 100644 index 000000000..b26aec2fa --- /dev/null +++ b/gems/slopcop/lib/slopcop/classifier.rb @@ -0,0 +1,189 @@ +# frozen_string_literal: true + +require "json" + +module SlopCop + # Classifies every never-taken branch arm in a target file into ONE + # actionable category. AST-structural, never a regex over the arm + # line. General -- no project lexicon baked in (see ffi_boundary:). + # + # Categories (not all gaps are equal): + # :type_norm arm/decision guards a type/nil check (is_a?/kind_of?/ + # nil?/respond_to?/safe-nav). Likely dead if the + # contract were strictly typed. + # :dead no sibling arm of the decision is ever taken: the + # decision never executes. Audit as dead code. + # :defensive live decision, inert/pinned polarity (empty else, + # nil, invariant-guaranteed). Accept. + # :ffi a caller-declared external/boundary method -> needs + # an integration test. + # :diagnostic arm raises/diagnoses -> invalid-input only. + # :genuine live, reachable, input-determined, none of the above. + # The real gap. Ranked by fix-churn downstream. + module Classifier + # The gem ships NO project lexicon -- it is general. The consuming + # project supplies its external/boundary method names via + # `ffi_boundary:` (CLEAR passes its set from the CLI). Empty here + # by design. + DIAGNOSTIC_MIDS = %i[raise fail abort].freeze # general Ruby + GUARD_MIDS = %i[is_a? kind_of? instance_of? nil? respond_to?].freeze + + Arm = Struct.new(:file, :defn, :line, :category, keyword_init: true) + + module_function + + def merged_branches(resultset, abspath) + m = {} + JSON.parse(File.read(resultset)).each_value do |e| + (e["coverage"] || {}).each do |p, c| + next unless p == abspath && c.is_a?(Hash) && c["branches"] + + c["branches"].each do |par, arms| + d = (m[par] ||= Hash.new(0)) + arms.each { |a, n| d[a] = d[a] + (n || 0) } + end + end + end + m + end + + def method_index(lines) + idx = {} + stack = [] + lines.each_with_index do |raw, i| + ln = i + 1 + if (mm = raw.match(/^(\s*)def\s+(self\.)?([A-Za-z0-9_?!]+)/)) + ind = mm[1].length + stack.pop while stack.any? && stack.last[0] >= ind + stack.push([ind, mm[3], ln]) + elsif (e = raw.match(/^(\s*)end\b/)) + ind = e[1].length + stack.pop if stack.any? && stack.last[0] == ind + end + idx[ln] = stack.last ? stack.last[1] : "(top-level)" + end + idx + end + + def ast_nodes(abspath) + root = RubyVM::AbstractSyntaxTree.parse(File.read(abspath), keep_script_lines: true) + acc = [] + w = ->(n) { return unless n.is_a?(RubyVM::AbstractSyntaxTree::Node); acc << n; n.children.each { |c| w.call(c) } } + w.call(root) + acc + rescue SyntaxError, StandardError + [] + end + + def node_for(nodes, sl, sc, el, ec) + sp = ->(n) { [n.first_lineno, n.first_column, n.last_lineno, n.last_column] } + ex = nodes.find { |n| sp.call(n) == [sl, sc, el, ec] } + return ex if ex + + cov = nodes.select do |n| + a = sp.call(n) + (a[0] < sl || (a[0] == sl && a[1] <= sc)) && (a[2] > el || (a[2] == el && a[3] >= ec)) + end + cov.min_by { |n| (n.last_lineno - n.first_lineno) * 1000 + n.children.size } + end + + def subtree(node, types: nil, mids: nil) + st = [node] + until st.empty? + n = st.pop + next unless n.is_a?(RubyVM::AbstractSyntaxTree::Node) + return true if types&.include?(n.type) + + if mids && %i[CALL FCALL VCALL QCALL OPCALL].include?(n.type) + mid = n.children[%i[CALL OPCALL QCALL].include?(n.type) ? 1 : 0] + return true if mids.include?(mid) + return true if n.type == :QCALL # safe-nav = nil decision + end + n.children.each { |c| st << c } + end + false + end + + def trivial?(node) + return true if node.nil? + return true if node.type == :NIL + return true if node.type == :BEGIN && node.children.compact.empty? + return false if has_any_call?(node) + return false if subtree(node, types: %i[LASGN IASGN OP_ASGN ATTRASGN MASGN GASGN CVASGN RETURN NEXT BREAK YIELD]) + + !subtree(node, types: %i[LIT STR SYM INTEGER FLOAT LVAR IVAR DVAR CONST ARRAY HASH TRUE FALSE]) + end + + def has_any_call?(node) + subtree(node, types: %i[CALL FCALL VCALL OPCALL QCALL]) + end + + # -> [Arm, ...] for every dark arm in abspath. + def classify_file(resultset, abspath, ffi_boundary: []) + branches = merged_branches(resultset, abspath) + return [] if branches.empty? + + lines = File.readlines(abspath) + midx = method_index(lines) + nodes = ast_nodes(abspath) + out = [] + + branches.each do |parent, arms| + p = parent.gsub(/[\[\]:]/, "").split(",").map(&:strip) + pkind = p[0].to_sym + # The decision's CONDITION (where a type/nil guard lives) is the + # parent node's first child, not the dark arm's body. + pnode = node_for(nodes, p[2].to_i, p[3].to_i, p[4].to_i, p[5].to_i) + cond = if pnode && %i[IF UNLESS WHILE UNTIL CASE].include?(pnode.type) + pnode.children[0] + else + pnode + end + any_taken = arms.values.any? { |v| v.to_i.positive? } + arms.each do |arm, count| + next unless count.to_i.zero? + + a = arm.gsub(/[\[\]:]/, "").split(",").map(&:strip) + sl, sc, el, ec = a[2].to_i, a[3].to_i, a[4].to_i, a[5].to_i + meth = midx[sl] || "(top-level)" + anode = node_for(nodes, sl, sc, el, ec) + cat = categorize(meth, pkind, anode, any_taken, cond, ffi_boundary) + out << Arm.new(file: abspath, defn: meth, line: sl, category: cat) + end + end + out + end + + def categorize(method, pkind, anode, sibling_taken, cond = nil, ffi_boundary = []) + return :ffi if ffi_boundary.include?(method) + return :diagnostic if anode && subtree(anode, mids: DIAGNOSTIC_MIDS) + # type/nil guard family: check the decision's CONDITION and the + # arm body -> the decomplex DecisionPressure class. + return :type_norm if (cond && type_guard?(cond)) || (anode && type_guard?(anode)) + return :dead unless sibling_taken # decision never executes + return :defensive if trivial?(anode) + + if %i[case when & |].include?(pkind) || %i[if unless ternary while until for].include?(pkind) + :genuine + else + :defensive + end + end + + def type_guard?(node) + st = [node] + until st.empty? + n = st.pop + next unless n.is_a?(RubyVM::AbstractSyntaxTree::Node) + return true if n.type == :QCALL # x&.m : implicit nil decision + + if %i[CALL OPCALL].include?(n.type) && GUARD_MIDS.include?(n.children[1]) + return true + end + + n.children.each { |c| st << c } + end + false + end + end +end diff --git a/gems/slopcop/lib/slopcop/report.rb b/gems/slopcop/lib/slopcop/report.rb new file mode 100644 index 000000000..ac1558999 --- /dev/null +++ b/gems/slopcop/lib/slopcop/report.rb @@ -0,0 +1,72 @@ +# frozen_string_literal: true + +require_relative "rollup" +require "pathname" + +module SlopCop + # Markdown report. Leads with the actionable artifact: the top true + # gaps, repo-relative + linked, ranked by fix-cache churn score. + class Report + # link_base: the directory the markdown will be SAVED in, so link + # hrefs resolve correctly (a report at gems/slopcop/report.md must + # link ../../src/x.rb, not src/x.rb). Defaults to repo root + # (correct for stdout / a root-level report). + def initialize(files:, repo:, resultset:, ffi_boundary: [], top: 50, + link_base: nil) + @repo = File.realpath(repo) + @top = top + @link_root = Pathname.new(File.expand_path(link_base || @repo)) + @r = Rollup.run(files: files, repo: repo, resultset: resultset, + ffi_boundary: ffi_boundary) + end + + # href from the report's directory to a repo-relative source file. + def href(rel_file) + Pathname.new(File.join(@repo, rel_file)) + .relative_path_from(@link_root).to_s + end + + def to_markdown + gaps = @r[:top_gaps] + g = @r[:grand] + o = +"# SlopCop Report\n\n" + o << "> Top true coverage gaps to test, ranked by fix-churn.\n" \ + "> Every dark branch arm is categorized; only the GENUINE\n" \ + "> reachable ones are gaps worth testing. Owns\n" \ + "> categorization; consumes fix-cache for churn.\n\n" + + o << "## Top True Gaps (#{gaps.size}) — test these, ranked by fix-churn\n\n" + if gaps.empty? + o << "None.\n\n" + else + o << "| # | gap | method | churn |\n|---|---|---|---|\n" + gaps.first(@top).each_with_index do |x, i| + link = "[`#{x[:file]}:#{x[:line]}`](#{href(x[:file])}#L#{x[:line]})" + o << "| #{i + 1} | #{link} | `#{x[:method]}` | #{x[:churn]} |\n" + end + o << "\n- ...(+#{gaps.size - @top} more genuine gaps)\n" if gaps.size > @top + o << "\n" + end + + o << "## Category Summary\n" + o << "_#{g} dark arms; only #{gaps.size} are genuine gaps. " \ + "The rest are not test targets:_\n\n" + o << "| category | arms | % | what it means |\n|---|---|---|---|\n" + Rollup::CATS.each do |c| + n = @r[:totals][c].to_i + pct = g.zero? ? 0 : (100.0 * n / g).round(1) + o << "| #{c} | #{n} | #{pct}% | #{Rollup::ACTION[c]} |\n" + end + + o << "\n## Run Summary\n" + o << "- Repo: `#{@repo}`\n" + o << "- Files: #{@r[:per_file].size}; dark arms: #{g}; " \ + "genuine gaps: #{gaps.size}\n" + o << "- General engine: categorizes uncovered branches, ranks " \ + "genuine gaps by consumed fix-cache churn. Project lexicon " \ + "(external-boundary methods) is caller-supplied, not baked " \ + "in (see docs/agents/design.md).\n" + o + end + end +end diff --git a/gems/slopcop/lib/slopcop/rollup.rb b/gems/slopcop/lib/slopcop/rollup.rb new file mode 100644 index 000000000..29ff4e3d2 --- /dev/null +++ b/gems/slopcop/lib/slopcop/rollup.rb @@ -0,0 +1,74 @@ +# frozen_string_literal: true + +require_relative "classifier" +# Consume the sibling fix-cache gem for the churn signal -- do NOT +# re-derive it (boundary: own categorization, consume churn). +require_relative "../../../fix-cache/lib/fix_cache" + +module SlopCop + # Per-file categorical totals + the headline artifact: every GENUINE + # reachable gap, repo-relative, ranked by the file's fix-cache churn + # score. "Here are the top N true gaps to test." + module Rollup + # Generic vocabulary -- NO repo jargon. Recommended-action text is + # testing-strategy-neutral; the consuming project decides what a + # "negative test" / "integration test" concretely is. + ACTION = { + type_norm: "type/nil guard -- likely dead if the contract were strictly typed", + dead: "decision never executes -- audit as dead code, delete", + defensive: "inert / invariant-pinned -- accept, exclude from denominator", + ffi: "external/boundary call -- needs an integration test", + diagnostic: "error/raise path -- reachable only by invalid input (negative test)", + genuine: "real reachable gap -- test it; ranked by fix-churn below" + }.freeze + CATS = ACTION.keys.freeze + + module_function + + # files: repo-relative .rb paths. repo: absolute root. resultset: + # SimpleCov json. ffi_boundary: caller-supplied lexicon (the gem + # ships NONE -- it is general; the consuming repo provides its own). + def run(files:, repo:, resultset:, ffi_boundary: []) + repo = File.realpath(repo) + churn = begin + FixCache::Bugspots.from_git(repo) + rescue StandardError + {} + end + mx = churn.values.max + mx = 1.0 if mx.nil? || mx.zero? + + per_file = {} + gaps = [] + files.each do |rel| + abs = File.join(repo, rel) + next unless File.exist?(abs) + + arms = Classifier.classify_file(resultset, abs, ffi_boundary: ffi_boundary) + next if arms.empty? + + counts = Hash.new(0) + arms.each { |a| counts[a.category] += 1 } + cn = ((churn[rel] || 0.0) / mx).round(4) + per_file[rel] = { total: arms.size, counts: counts, churn: cn } + + arms.each do |a| + next unless a.category == :genuine + + gaps << { file: rel, line: a.line, method: a.defn, churn: cn } + end + end + + totals = Hash.new(0) + per_file.each_value { |h| h[:counts].each { |c, n| totals[c] += n } } + { + per_file: per_file, + totals: totals, + grand: totals.values.sum, + # the headline: true gaps ranked by fix-cache score, then + # file/line for stable order. + top_gaps: gaps.sort_by { |g| [-g[:churn], g[:file], g[:line]] } + } + end + end +end diff --git a/gems/slopcop/report.md b/gems/slopcop/report.md new file mode 100644 index 000000000..1502b0dff --- /dev/null +++ b/gems/slopcop/report.md @@ -0,0 +1,80 @@ +# SlopCop Report + +> Top true coverage gaps to test, ranked by fix-churn. +> Every dark branch arm is categorized; only the GENUINE +> reachable ones are gaps worth testing. Owns +> categorization; consumes fix-cache for churn. + +## Top True Gaps (273) — test these, ranked by fix-churn + +| # | gap | method | churn | +|---|---|---|---| +| 1 | [`src/mir/mir_lowering.rb:226`](../../src/mir/mir_lowering.rb#L226) | `hoist_alloc` | 1.0 | +| 2 | [`src/mir/mir_lowering.rb:244`](../../src/mir/mir_lowering.rb#L244) | `hoist_owned_value_temp` | 1.0 | +| 3 | [`src/mir/mir_lowering.rb:255`](../../src/mir/mir_lowering.rb#L255) | `owned_value_temp_needs_cleanup?` | 1.0 | +| 4 | [`src/mir/mir_lowering.rb:260`](../../src/mir/mir_lowering.rb#L260) | `owned_value_temp_needs_cleanup?` | 1.0 | +| 5 | [`src/mir/mir_lowering.rb:261`](../../src/mir/mir_lowering.rb#L261) | `owned_value_temp_needs_cleanup?` | 1.0 | +| 6 | [`src/mir/mir_lowering.rb:270`](../../src/mir/mir_lowering.rb#L270) | `container_borrow_expr?` | 1.0 | +| 7 | [`src/mir/mir_lowering.rb:289`](../../src/mir/mir_lowering.rb#L289) | `copy_container_borrow_if_needed` | 1.0 | +| 8 | [`src/mir/mir_lowering.rb:290`](../../src/mir/mir_lowering.rb#L290) | `copy_container_borrow_if_needed` | 1.0 | +| 9 | [`src/mir/mir_lowering.rb:361`](../../src/mir/mir_lowering.rb#L361) | `cleanup_entry_for_heap_result` | 1.0 | +| 10 | [`src/mir/mir_lowering.rb:362`](../../src/mir/mir_lowering.rb#L362) | `cleanup_entry_for_heap_result` | 1.0 | +| 11 | [`src/mir/mir_lowering.rb:369`](../../src/mir/mir_lowering.rb#L369) | `cleanup_entry_for_heap_result` | 1.0 | +| 12 | [`src/mir/mir_lowering.rb:435`](../../src/mir/mir_lowering.rb#L435) | `lower` | 1.0 | +| 13 | [`src/mir/mir_lowering.rb:462`](../../src/mir/mir_lowering.rb#L462) | `lower` | 1.0 | +| 14 | [`src/mir/mir_lowering.rb:499`](../../src/mir/mir_lowering.rb#L499) | `lower` | 1.0 | +| 15 | [`src/mir/mir_lowering.rb:517`](../../src/mir/mir_lowering.rb#L517) | `lower_body` | 1.0 | +| 16 | [`src/mir/mir_lowering.rb:529`](../../src/mir/mir_lowering.rb#L529) | `lower_body` | 1.0 | +| 17 | [`src/mir/mir_lowering.rb:544`](../../src/mir/mir_lowering.rb#L544) | `lower_body_with_break` | 1.0 | +| 18 | [`src/mir/mir_lowering.rb:552`](../../src/mir/mir_lowering.rb#L552) | `lower_body_with_break` | 1.0 | +| 19 | [`src/mir/mir_lowering.rb:557`](../../src/mir/mir_lowering.rb#L557) | `lower_body_with_break` | 1.0 | +| 20 | [`src/mir/mir_lowering.rb:583`](../../src/mir/mir_lowering.rb#L583) | `lower_program` | 1.0 | +| 21 | [`src/mir/mir_lowering.rb:589`](../../src/mir/mir_lowering.rb#L589) | `lower_program` | 1.0 | +| 22 | [`src/mir/mir_lowering.rb:674`](../../src/mir/mir_lowering.rb#L674) | `alloc_expr` | 1.0 | +| 23 | [`src/mir/mir_lowering.rb:681`](../../src/mir/mir_lowering.rb#L681) | `alloc_from_sym` | 1.0 | +| 24 | [`src/mir/mir_lowering.rb:682`](../../src/mir/mir_lowering.rb#L682) | `alloc_from_sym` | 1.0 | +| 25 | [`src/mir/mir_lowering.rb:700`](../../src/mir/mir_lowering.rb#L700) | `coerce_stdlib_arg` | 1.0 | +| 26 | [`src/mir/mir_lowering.rb:709`](../../src/mir/mir_lowering.rb#L709) | `coerce_stdlib_arg` | 1.0 | +| 27 | [`src/mir/mir_lowering.rb:750`](../../src/mir/mir_lowering.rb#L750) | `resolve_alloc_sym` | 1.0 | +| 28 | [`src/mir/mir_lowering.rb:774`](../../src/mir/mir_lowering.rb#L774) | `alloc_zig_str` | 1.0 | +| 29 | [`src/mir/mir_lowering.rb:775`](../../src/mir/mir_lowering.rb#L775) | `alloc_zig_str` | 1.0 | +| 30 | [`src/mir/mir_lowering.rb:893`](../../src/mir/mir_lowering.rb#L893) | `resolve_decl_stdlib_alloc` | 1.0 | +| 31 | [`src/mir/mir_lowering.rb:915`](../../src/mir/mir_lowering.rb#L915) | `lower_promote` | 1.0 | +| 32 | [`src/mir/mir_lowering.rb:951`](../../src/mir/mir_lowering.rb#L951) | `lower_struct_def` | 1.0 | +| 33 | [`src/mir/mir_lowering.rb:1061`](../../src/mir/mir_lowering.rb#L1061) | `lower_union_def` | 1.0 | +| 34 | [`src/mir/mir_lowering.rb:1245`](../../src/mir/mir_lowering.rb#L1245) | `lower_function_def` | 1.0 | +| 35 | [`src/mir/mir_lowering.rb:1434`](../../src/mir/mir_lowering.rb#L1434) | `lower_function_def` | 1.0 | +| 36 | [`src/mir/mir_lowering.rb:1519`](../../src/mir/mir_lowering.rb#L1519) | `build_post_outer_fn` | 1.0 | +| 37 | [`src/mir/mir_lowering.rb:1528`](../../src/mir/mir_lowering.rb#L1528) | `build_post_outer_fn` | 1.0 | +| 38 | [`src/mir/mir_lowering.rb:1602`](../../src/mir/mir_lowering.rb#L1602) | `build_catch_clauses` | 1.0 | +| 39 | [`src/mir/mir_lowering.rb:1647`](../../src/mir/mir_lowering.rb#L1647) | `collect_catch_reassigns` | 1.0 | +| 40 | [`src/mir/mir_lowering.rb:1662`](../../src/mir/mir_lowering.rb#L1662) | `walk_catch_body_for_reassigns` | 1.0 | +| 41 | [`src/mir/mir_lowering.rb:1672`](../../src/mir/mir_lowering.rb#L1672) | `walk_catch_body_for_reassigns` | 1.0 | +| 42 | [`src/mir/mir_lowering.rb:1675`](../../src/mir/mir_lowering.rb#L1675) | `walk_catch_body_for_reassigns` | 1.0 | +| 43 | [`src/mir/mir_lowering.rb:1856`](../../src/mir/mir_lowering.rb#L1856) | `lower_method_call` | 1.0 | +| 44 | [`src/mir/mir_lowering.rb:1982`](../../src/mir/mir_lowering.rb#L1982) | `lower_intrinsic` | 1.0 | +| 45 | [`src/mir/mir_lowering.rb:2177`](../../src/mir/mir_lowering.rb#L2177) | `extern_call_args_zig` | 1.0 | +| 46 | [`src/mir/mir_lowering.rb:2274`](../../src/mir/mir_lowering.rb#L2274) | `lower_lambda` | 1.0 | +| 47 | [`src/mir/mir_lowering.rb:2361`](../../src/mir/mir_lowering.rb#L2361) | `lower_list_lit` | 1.0 | +| 48 | [`src/mir/mir_lowering.rb:2373`](../../src/mir/mir_lowering.rb#L2373) | `lower_list_lit` | 1.0 | +| 49 | [`src/mir/mir_lowering.rb:2379`](../../src/mir/mir_lowering.rb#L2379) | `lower_list_lit` | 1.0 | +| 50 | [`src/mir/mir_lowering.rb:2402`](../../src/mir/mir_lowering.rb#L2402) | `lower_hash_lit` | 1.0 | + +- ...(+223 more genuine gaps) + +## Category Summary +_935 dark arms; only 273 are genuine gaps. The rest are not test targets:_ + +| category | arms | % | what it means | +|---|---|---|---| +| type_norm | 229 | 24.5% | type/nil guard -- likely dead if the contract were strictly typed | +| dead | 68 | 7.3% | decision never executes -- audit as dead code, delete | +| defensive | 14 | 1.5% | inert / invariant-pinned -- accept, exclude from denominator | +| ffi | 46 | 4.9% | external/boundary call -- needs an integration test | +| diagnostic | 305 | 32.6% | error/raise path -- reachable only by invalid input (negative test) | +| genuine | 273 | 29.2% | real reachable gap -- test it; ranked by fix-churn below | + +## Run Summary +- Repo: `/home/yahn/cheat` +- Files: 3; dark arms: 935; genuine gaps: 273 +- General engine: categorizes uncovered branches, ranks genuine gaps by consumed fix-cache churn. Project lexicon (external-boundary methods) is caller-supplied, not baked in (see docs/agents/design.md). diff --git a/gems/slopcop/slopcop.gemspec b/gems/slopcop/slopcop.gemspec new file mode 100644 index 000000000..6f42b919d --- /dev/null +++ b/gems/slopcop/slopcop.gemspec @@ -0,0 +1,25 @@ +# frozen_string_literal: true + +Gem::Specification.new do |s| + s.name = "slopcop" + s.version = "0.0.1" + s.summary = "Categorical coverage-gap synthesis: not all gaps are equal" + s.description = <<~DESC + The capstone. A flat "673/2732 uncovered" is unactionable because + gaps are not equal. SlopCop classifies every dark branch arm by + category -- type-normalization (likely removable, confirm with + nil-kill), defensive/invariant-pinned (accept), dead-decision + (delete: complexity down), or GENUINE reachable gap -- then overlays + fix-churn so the genuine arms in churn-hot code surface as "bugs + highly likely HERE." It OWNS the gap-categorization analysis and + CONSUMES fix-cache (churn) + an optional nil-kill verdict; it does + not re-derive them. Promotes the branch-gap triage probe to a + first-class product. Zero runtime deps beyond the sibling fix-cache. + DESC + s.authors = ["CLEAR"] + s.license = "MIT" + s.files = Dir["lib/**/*.rb", "exe/*"] + s.bindir = "exe" + s.executables = ["slopcop"] + s.required_ruby_version = ">= 3.1" +end diff --git a/gems/slopcop/test/classifier_test.rb b/gems/slopcop/test/classifier_test.rb new file mode 100644 index 000000000..bc8ee220f --- /dev/null +++ b/gems/slopcop/test/classifier_test.rb @@ -0,0 +1,83 @@ +# frozen_string_literal: true + +require "minitest/autorun" +require "tempfile" +require "json" +require "coverage" +require_relative "../lib/slopcop" + +class ClassifierTest < Minitest::Test + C = SlopCop::Classifier + + def node(expr) + RubyVM::AbstractSyntaxTree.parse(expr).children.last + end + + def test_type_guard_detects_is_a_nil_respond_and_safe_nav + assert C.type_guard?(node("x.is_a?(Type)")) + assert C.type_guard?(node("x.nil?")) + assert C.type_guard?(node("x.respond_to?(:y)")) + assert C.type_guard?(node("x&.foo")) + refute C.type_guard?(node("x + 1")) + refute C.type_guard?(node("x.bar(1)")) + end + + def test_trivial_is_the_narrow_inert_residue + assert C.trivial?(nil) + assert C.trivial?(node("nil")) + refute C.trivial?(node("foo(1)")) # a call + refute C.trivial?(node("return 5")) # an outcome + refute C.trivial?(node("x = 1")) # an assignment + end + + def test_categorize_priority_order + g = node("x.is_a?(Type)") + # FFI method name wins first + assert_equal :ffi, C.categorize("lower_require", :if, g, true, nil, ["lower_require"]) + # diagnostic (raise) before type_norm + assert_equal :diagnostic, C.categorize("m", :if, node("raise 'x'"), true) + # type_norm before dead/defensive + assert_equal :type_norm, C.categorize("m", :if, g, false) + # no sibling taken + not type/diag/ffi -> dead + assert_equal :dead, C.categorize("m", :if, node("foo(1)"), false) + # live + trivial -> defensive + assert_equal :defensive, C.categorize("m", :if, node("nil"), true) + # live + real body + branch kind -> genuine + assert_equal :genuine, C.categorize("m", :case, node("foo(1)"), true) + end + + # Real resultset via stdlib Coverage (same branch-tuple shape SimpleCov + # uses), so classify_file runs the true path on real dark arms. + def test_classify_file_on_real_coverage + src = <<~RB + def shape(x, n) + return 0 if x.is_a?(String) # type_norm (dark: never String) + if n > 0 + a = 1 + else + a = 2 # genuine-ish (dark else, sibling taken) + end + a + end + shape(7, 5) + RB + f = Tempfile.new(["cov", ".rb"]) + f.write(src) + f.close + Coverage.start(branches: true) + load f.path + res = Coverage.result + rs = { "T" => { "coverage" => { f.path => { "branches" => res.dig(f.path, :branches) } } } } + rsf = Tempfile.new(["rs", ".json"]) + rsf.write(JSON.dump(rs)) + rsf.close + + arms = C.classify_file(rsf.path, f.path) + cats = arms.map(&:category) + assert_includes cats, :type_norm, "the never-true String guard" + refute_empty arms + ensure + f&.unlink + rsf&.unlink + end +end diff --git a/gems/slopcop/test/rollup_test.rb b/gems/slopcop/test/rollup_test.rb new file mode 100644 index 000000000..a38e664a1 --- /dev/null +++ b/gems/slopcop/test/rollup_test.rb @@ -0,0 +1,62 @@ +# frozen_string_literal: true + +require "minitest/autorun" +require "tmpdir" +require "json" +require "coverage" +require "fileutils" +require_relative "../lib/slopcop" + +class RollupTest < Minitest::Test + def test_rollup_categorizes_and_surfaces_genuine_with_churn_overlay + Dir.mktmpdir do |dir| + FileUtils.mkdir_p("#{dir}/src") + src = <<~RB + def shape(x, n) + return 0 if x.is_a?(String) + case n + when 1 then 10 + when 2 then 20 + else 30 + end + end + shape(7, 1) + RB + path = "#{dir}/src/m.rb" + File.write(path, src) + # real git repo so fix-cache churn is computable (no fix commit -> + # churn 0, score 0, but the genuine bucket still lists). + system("git", "-C", dir, "init", "-q", out: File::NULL, err: File::NULL) + system("git", "-C", dir, "config", "user.email", "t@t") + system("git", "-C", dir, "config", "user.name", "t") + system("git", "-C", dir, "add", "-A", out: File::NULL, err: File::NULL) + system("git", "-C", dir, "commit", "-qm", "add", out: File::NULL, err: File::NULL) + + Coverage.start(branches: true) + load path + res = Coverage.result + rs = { "T" => { "coverage" => { path => { "branches" => res.dig(path, :branches) } } } } + rsf = "#{dir}/.resultset.json" + File.write(rsf, JSON.dump(rs)) + + out = SlopCop::Rollup.run(files: ["src/m.rb"], repo: dir, resultset: rsf) + assert out[:per_file].key?("src/m.rb") + fh = out[:per_file]["src/m.rb"] + assert fh[:total].positive?, "should find dark arms" + assert(fh[:counts][:type_norm].positive?, "the never-String is_a? guard") + assert_equal fh[:total], fh[:counts].values.sum, "every arm categorized" + assert_equal out[:grand], out[:totals].values.sum + end + end + + def test_missing_file_is_skipped_not_crashed + Dir.mktmpdir do |dir| + system("git", "-C", dir, "init", "-q", out: File::NULL, err: File::NULL) + File.write("#{dir}/rs.json", JSON.dump({ "T" => { "coverage" => {} } })) + out = SlopCop::Rollup.run(files: ["nope.rb"], repo: dir, + resultset: "#{dir}/rs.json") + assert_empty out[:per_file] + assert_empty out[:top_gaps] + end + end +end diff --git a/sorbet/config b/sorbet/config index 14c784d5b..6264c77b8 100644 --- a/sorbet/config +++ b/sorbet/config @@ -12,6 +12,7 @@ --ignore=docs/ --ignore=tools/ --ignore=gems/decomplex/ +--ignore=gems/slopcop/ --ignore=gems/fix-cache/ --ignore=gems/nil-kill/ diff --git a/spec/fsm_suspend_resolvers_spec.rb b/spec/fsm_suspend_resolvers_spec.rb index af947eb54..153cf46c7 100644 --- a/spec/fsm_suspend_resolvers_spec.rb +++ b/spec/fsm_suspend_resolvers_spec.rb @@ -7,6 +7,7 @@ require_relative '../src/mir/fsm_ops' require_relative '../src/mir/fsm_transform/segments' require_relative '../src/mir/fsm_transform/suspend_resolvers' +require_relative '../src/annotator-helpers/intrinsic_registry' # Tests for FsmTransform::SuspendResolvers, the per-suspend-kind # resolvers that turn a Segments::*Suspend tail into a @@ -27,8 +28,10 @@ def lower(node); node; end describe "resolve_io" do # Build a fake stdlib_def with a sleep-like fsm_setup template. + # Production stamps go through IntrinsicRegistry.fs -> a typed + # FunctionSignature; this unit test constructs the same shape. let(:stdlib_def) { - { + IntrinsicRegistry.fs({ suspends: true, fsm_setup: [ FsmOps::StmtCall.new( @@ -43,7 +46,7 @@ def lower(node); node; end fsm_state_decls: [ FsmOps::StateFieldDecl.new("rf_fd", "i32", "-1"), ], - } + }) } let(:call_node) { diff --git a/spec/higher_order_spec.rb b/spec/higher_order_spec.rb index f610485a4..392aace4f 100644 --- a/spec/higher_order_spec.rb +++ b/spec/higher_order_spec.rb @@ -1853,7 +1853,11 @@ def transpile_fn(src) END CLEAR expect(out).to include("__res") - expect(out).to include("+ 1") + # Typed Int64 counter increment now lowers through the + # overflow-safe add (CheatLib.intAdd), not a raw `+`. The old + # `+ 1` assertion only held because the synthesized increment + # node was UNTYPED — the bug the AST→MIR invariant fixed. + expect(out).to match(/__res\d+ = CheatLib\.intAdd\(__res\d+, 1\)/) end it "wraps the predicate in an if condition in the loop" do diff --git a/spec/intrinsic_registry_spec.rb b/spec/intrinsic_registry_spec.rb new file mode 100644 index 000000000..390715586 --- /dev/null +++ b/spec/intrinsic_registry_spec.rb @@ -0,0 +1,83 @@ +# frozen_string_literal: true + +require_relative "spec_helper" +require_relative "../src/ast/type" +require_relative "../src/mir/mir" +require_relative "../src/ast/std_lib" +require_relative "../src/annotator-helpers/intrinsic_registry" + +# Totality + fidelity: every real registry entry must convert without +# error (T::Struct raises on any mistyped IntrinsicEmit prop, so this +# proves the typed model fits the real authoring data), and key +# semantics must round-trip. +RSpec.describe IntrinsicRegistry do + REGISTRIES = { + STD_LIB: STD_LIB, POOL_METHODS: POOL_METHODS, SET_METHODS: SET_METHODS, + MAP_METHODS: MAP_METHODS, INDEX_OPS: INDEX_OPS, BUILTIN_OPS: BUILTIN_OPS + }.freeze + + it "converts every entry in every registry without error (totality)" do + REGISTRIES.each do |rname, reg| + reg.each do |mname, entry| + next unless entry.is_a?(Hash) + + expect { IntrinsicRegistry.convert_entry(mname, entry, REGISTRIES) } + .not_to(raise_error, "#{rname}[#{mname.inspect}] failed to convert") + end + end + end + + it "yields a pure Type return_type and a typed FunctionReturn (no Proc/Hash)" do + REGISTRIES.each_value do |reg| + reg.each do |mname, entry| + next unless entry.is_a?(Hash) + + fs = IntrinsicRegistry.convert_entry(mname, entry, REGISTRIES) + expect(fs.return_type).to be_a(Type) + expect(fs.return_def).to be_a(FunctionReturn) + src = entry.key?(:return_type) ? entry[:return_type] : entry[:return] + # No Proc/Hash leakage: every descriptor maps to a closed + # FunctionReturn variant, and the static return_type matches + # the FunctionReturn for the Fixed case. + expect(src).not_to be_a(Proc) + if fs.return_def.kind == FunctionReturn::Kind::Fixed + expect(fs.return_type).to eq(fs.return_def.fixed) + else + expect(fs.return_type.resolved).to eq(:Any) + end + case src + when :r_element_of + expect(fs.return_def.kind).to eq(FunctionReturn::Kind::ElementOf) + when :r_id_element + expect(fs.return_def.kind).to eq(FunctionReturn::Kind::IdOfElement) + when :r_optional_value + expect(fs.return_def.kind).to eq(FunctionReturn::Kind::OptionalOfValue) + end + expect(fs.emit).to be_a(IntrinsicEmit).or be_nil + expect(fs.intrinsic).to be(true) + end + end + end + + it "round-trips representative emit fields incl. recursion" do + fs = IntrinsicRegistry.convert_entry( + "insert", POOL_METHODS["insert"], REGISTRIES + ) + expect(fs.emit.tag).to eq(:pool_method) + expect(fs.emit.is_method).to be(true) + expect(fs.emit.zig).to be_a(String) + # POOL_METHODS["insert"] returns `Id` -> IdOfElement variant. + expect(fs.return_def).to be_a(FunctionReturn) + expect(fs.return_def.kind).to eq(FunctionReturn::Kind::IdOfElement) + + # Nested recursive sub-descriptor (eql/cleanup/... -> IntrinsicEmit) + nested = REGISTRIES.each_value.flat_map(&:values) + .select { |e| e.is_a?(Hash) } + .find { |e| e[:eql].is_a?(Hash) || e[:cleanup].is_a?(Hash) } + if nested + fe = IntrinsicRegistry.convert_entry("x", nested, REGISTRIES) + sub = fe.emit.eql || fe.emit.cleanup + expect(sub).to be_a(IntrinsicEmit) + end + end +end diff --git a/spec/mir_lowering_spec.rb b/spec/mir_lowering_spec.rb index a907c3048..a909fd00a 100644 --- a/spec/mir_lowering_spec.rb +++ b/spec/mir_lowering_spec.rb @@ -437,7 +437,7 @@ def collect_mir_nodes(root, klass) node = make_binop(left, :ADD, right) result = lowering.lower(node) expect(result).to be_a(MIR::InlineZig) - expect(result.stdlib_def).to include(borrows: :all) + expect(result.stdlib_def.emit.borrows).to eq(:all) expect(emit(result)).to eq("CheatLib.intAdd(a, b)") end @@ -456,7 +456,7 @@ def collect_mir_nodes(root, klass) node = make_binop(left, :EQ, right) result = lowering.lower(node) expect(result).to be_a(MIR::InlineZig) - expect(result.stdlib_def).to include(borrows: :all) + expect(result.stdlib_def.emit.borrows).to eq(:all) expect(emit(result)).to include("CheatLib.eql(name,") end @@ -474,7 +474,7 @@ def collect_mir_nodes(root, klass) node = make_binop(left, :WRAP_ADD, right) result = lowering.lower(node) expect(result).to be_a(MIR::InlineZig) - expect(result.stdlib_def).to include(borrows: :all) + expect(result.stdlib_def.emit.borrows).to eq(:all) expect(emit(result)).to eq("CheatLib.wrapAdd(a, b)") end @@ -923,7 +923,7 @@ def node.reassign_cleanup; @reassign_cleanup; end expect(result).to be_a(MIR::ExprStmt) expect(result.expr).to be_a(MIR::InlineZig) - expect(result.expr.stdlib_def).to include(:value_transforms) + expect(result.expr.stdlib_def.emit.value_transforms).not_to be_nil expect(emit(result)).to include("CheatLib.setAt(items, 0,") end @@ -1467,7 +1467,7 @@ def node.reassign_cleanup; @reassign_cleanup; end node.full_type = :Void result = lowering.lower(node) expect(result).to be_a(MIR::InlineZig) - expect(result.stdlib_def).to include(borrows: :all) + expect(result.stdlib_def.emit.borrows).to eq(:all) expect(emit(result)).to include("CheatLib.assert(true,") end @@ -1867,7 +1867,7 @@ def make_fn(name, params: [], return_type: :Void, body: [], visibility: nil, # Structs with no heap provenance live on the stack. Zig/LLVM SROAs them # into registers. Do NOT pass by *const T — that would prevent SROA. sig = Struct.new(:needs_rt, :can_fail, :params, :return_type) - .new(false, false, [{ name: "p", type: :Point, mutable: false, takes: false }], :Int64) + .new(false, false, [AST::Param.new(name: "p", type: :Point, mutable: false, takes: false)], :Int64) l = lowering( fn_sigs: { "sum3" => sig }, struct_schemas: { Point: { x: :Int64, y: :Int64 } } @@ -1911,7 +1911,7 @@ def make_fn(name, params: [], return_type: :Void, body: [], visibility: nil, node.full_type = :Void sig = FunctionSignature.new( params: [{ name: "count", type: Type.new(:Int64), mutable: true }], - return_type: :Void + return_type: Type.new(:Void) ) result = lowering(fn_sigs: { "bump" => sig }).lower(node) @@ -1927,7 +1927,7 @@ def make_fn(name, params: [], return_type: :Void, body: [], visibility: nil, node = AST::FuncCall.new(tok, "identity", [arg]) node.full_type = :Int64 node.generic_type_args = [:Int64] - sig = FunctionSignature.new(params: [{ name: "x", type: Type.new(:Int64) }], return_type: :Int64) + sig = FunctionSignature.new(params: [{ name: "x", type: Type.new(:Int64) }], return_type: Type.new(:Int64)) sig.needs_rt = true result = lowering(fn_sigs: { "identity" => sig }).lower(node) @@ -2003,7 +2003,7 @@ def make_fn(name, params: [], return_type: :Void, body: [], visibility: nil, body = make_lit(:NUMBER, 42, full_type: :Int64) body.coerced_type = :Int64 node = AST::LambdaLit.new(tok, [], nil, body, nil, nil) - node.full_type = FunctionSignature.new(params: [], return_type: :Int64) + node.full_type = FunctionSignature.new(params: [], return_type: Type.new(:Int64)) result = lowering.lower(node) expect(result).to be_a(MIR::LambdaExpr) zig = emit(result) diff --git a/spec/pipeline_rewriter_spec.rb b/spec/pipeline_rewriter_spec.rb index 0db08589a..753bb3972 100644 --- a/spec/pipeline_rewriter_spec.rb +++ b/spec/pipeline_rewriter_spec.rb @@ -1,13 +1,20 @@ require "rspec" require_relative "../src/ast/lexer" require_relative "../src/ast/parser" +require_relative "../src/backends/transpiler" # loads compiler, annotator, lexer, parser, ast require_relative "../src/backends/pipeline_rewriter" RSpec.describe PipelineRewriter do + # Real pipeline order: lex -> parse -> annotate -> rewrite. + # PipelineRewriter runs AFTER annotation in CompilerFrontend and + # relies on typed nodes (the AST→MIR invariant). The old helper + # skipped annotation, an unrealistic path that masked the contract. def parse_and_rewrite(src) tokens = Lexer.new(src).tokenize ast = Parser.new(tokens, src).parse - PipelineRewriter.new.rewrite!(ast) + annotator = SemanticAnnotator.new(source_code: src) + annotator.annotate!(ast) + PipelineRewriter.new(annotator).rewrite!(ast) ast end diff --git a/spec/propagate_caller_sync_spec.rb b/spec/propagate_caller_sync_spec.rb index 47743c780..9dd84f60f 100644 --- a/spec/propagate_caller_sync_spec.rb +++ b/spec/propagate_caller_sync_spec.rb @@ -185,9 +185,9 @@ def annotate(source) ast, annotator = annotate(src) sig = annotator.scope_stack.first.locals["bumpIt"].type expect(sig).to be_a(FunctionSignature) - # The field is present (key exists in the param hash). - expect(sig.params.first).to have_key(:sync) - expect(sig.params.first[:sync]).to be_nil + # The field is present on the Param struct (defaulting to nil). + expect(sig.params.first).to be_a(AST::Param) + expect(sig.params.first.sync).to be_nil end it "leaves :sync nil for params with no sync annotation" do diff --git a/src/annotator-helpers/auto_inference.rb b/src/annotator-helpers/auto_inference.rb index 9b3ee65a7..c231c6196 100644 --- a/src/annotator-helpers/auto_inference.rb +++ b/src/annotator-helpers/auto_inference.rb @@ -77,11 +77,11 @@ def collect!(program_node) def register_signature_slots @fn_nodes.each do |name, fn| (fn.params || []).each_with_index do |param, i| - next unless auto?(param[:type]) + next unless auto?(param.type) @slots[[:param, name, i]] = Slot.new( kind: :param, fn_name: name, index: i, decl_node: fn, sources: [], - auto_token: param[:type].auto_token, + auto_token: param.type.auto_token, ) end if auto?(fn.return_type) @@ -150,7 +150,7 @@ def record_call_site(call_node) callee = @fn_nodes[call_node.name] return unless callee (callee.params || []).each_with_index do |param, i| - next unless auto?(param[:type]) + next unless auto?(param.type) arg = call_node.args && call_node.args[i] next unless arg slot = @slots[[:param, callee.name, i]] @@ -569,7 +569,7 @@ def walk_for_shape_decls(node, &block) return if node.nil? case node when AST::BindExpr, AST::VarDecl - yield node if node.type.is_a?(Type) && node.type.auto? + yield node if node.type&.auto? walk_for_shape_decls(node.value, &block) when AST::FunctionDef # Don't recurse into nested function definitions. @@ -714,7 +714,7 @@ def build_name_map(fn) map = {} (fn.params || []).each_with_index do |param, i| slot_id = [:param, fn.name, i] - map[param[:name]] = slot_id if @slots.key?(slot_id) + map[param.name] = slot_id if @slots.key?(slot_id) end walk_for_local_decls(fn.body) do |decl| slot_id = [:local, decl.object_id] diff --git a/src/annotator-helpers/capabilities.rb b/src/annotator-helpers/capabilities.rb index 0e783868b..b70325921 100644 --- a/src/annotator-helpers/capabilities.rb +++ b/src/annotator-helpers/capabilities.rb @@ -92,7 +92,7 @@ def cap_var_sync(var_node) T.bind(self, SemanticAnnotator) rescue nil sym_sync = var_node.symbol&.sync return sym_sync if sym_sync - return var_node.full_type.sync if var_node.full_type.is_a?(Type) + return var_node.full_type.sync if var_node.full_type nil end @@ -101,7 +101,7 @@ def cap_var_storage(var_node) T.bind(self, SemanticAnnotator) rescue nil sym = var_node.symbol return sym.storage if sym - if var_node.full_type.is_a?(Type) + if var_node.full_type case T.must(var_node.full_type).ownership when :shared then return :shared when :multiowned then return :multiowned @@ -117,7 +117,7 @@ def cap_var_layout(var_node) T.bind(self, SemanticAnnotator) rescue nil sym_layout = var_node.symbol&.layout return sym_layout if sym_layout - return T.must(var_node.full_type).layout if var_node.full_type.is_a?(Type) + return var_node.full_type.layout if var_node.full_type nil end @@ -409,10 +409,10 @@ def predicate_impurity_reason(call, callee) return "can fail" if call.respond_to?(:can_fail) && call.can_fail if call.matched_stdlib_def md = call.matched_stdlib_def - return "allocates" if md[:allocates] - return "can fail" if md[:can_fail] - return "suspends" if md[:suspends] - return "mutates its receiver" if md[:mutates_receiver] + return "allocates" if md.emit&.allocates + return "can fail" if md.can_fail + return "suspends" if md.emit&.suspends + return "mutates its receiver" if md.emit&.mutates_receiver return nil end @@ -510,7 +510,7 @@ def visit_pre_clauses!(fn_node) end end - param_names = (fn_node.params || []).map { |p| p[:name].to_s } + param_names = (fn_node.params || []).map { |p| p.name.to_s } prev_ctx = @current_predicate_context begin pre_clauses.each do |entry| @@ -555,11 +555,11 @@ def visit_post_clauses!(fn_node) error!(fn_node, :DEBUG_POST_NOT_WITH_CATCH) end - param_names = (fn_node.params || []).map { |p| p[:name].to_s } + param_names = (fn_node.params || []).map { |p| p.name.to_s } rejected = (fn_node.params || []).filter_map do |p| - sym = current_scope.locals[p[:name].to_s] + sym = current_scope.locals[p.name.to_s] next unless sym && %i[locked write_locked versioned atomic].include?(sym.sync) - p[:name].to_s + p.name.to_s end.to_set rt = fn_node.return_type diff --git a/src/annotator-helpers/effects.rb b/src/annotator-helpers/effects.rb index d64d6bfde..2d6d24e03 100644 --- a/src/annotator-helpers/effects.rb +++ b/src/annotator-helpers/effects.rb @@ -359,12 +359,12 @@ def compute_needs_rt! @call_graph = T.let(@call_graph, T.untyped) needs_rt = {} @fn_nodes.each do |name, fn_node| - raw = fn_node.full_type.is_a?(Type) ? fn_node.full_type.raw : fn_node.full_type + raw = fn_node.full_type&.raw ret_type = raw.is_a?(FunctionSignature) ? raw.return_type : nil heap_return = ret_type.is_a?(Type) && (ret_type.heap? || ret_type.dynamic?) has_takes_heap = fn_node.params&.any? { |p| - next unless p[:takes] - ti = Type.new(p[:type] || :Any) + next unless p.takes + ti = Type.new(p.type || :Any) ti.string? || ti.array? || ti.list_collection? || ti.map? } has_catch = fn_node.catch_clauses.is_a?(Array) && fn_node.catch_clauses.any? @@ -690,7 +690,7 @@ def scan_suspend_points(node, fn_node, points) node.each_pair { |_, v| scan_suspend_points(v, fn_node, points) } when AST::FuncCall, AST::MethodCall if func_call_suspends?(node) - kind = node.matched_stdlib_def && node.matched_stdlib_def[:suspends] ? :io : :call + kind = node.matched_stdlib_def&.emit&.suspends ? :io : :call points << { id: points.size, kind: kind, node: node } end node.each_pair { |_, v| scan_suspend_points(v, fn_node, points) } @@ -718,7 +718,7 @@ def with_block_suspends?(node) def func_call_suspends?(node) T.bind(self, SemanticAnnotator) rescue nil @fn_nodes = T.let(@fn_nodes, T.untyped) - return true if node.matched_stdlib_def && node.matched_stdlib_def[:suspends] + return true if node.matched_stdlib_def&.emit&.suspends return false if node.respond_to?(:fn_var_call) && node.fn_var_call callee = @fn_nodes[node.name] return false unless callee diff --git a/src/annotator-helpers/fixable_helpers.rb b/src/annotator-helpers/fixable_helpers.rb index f63ade491..378cc5b56 100644 --- a/src/annotator-helpers/fixable_helpers.rb +++ b/src/annotator-helpers/fixable_helpers.rb @@ -1481,7 +1481,7 @@ def emit_auto_resolved_finding!(resolution) sig { params(decl: T.untyped, slot: AutoConstraintCollector::Slot).returns(T.untyped) } def emit_auto_shape_resolved_finding!(decl, slot) T.bind(self, SemanticAnnotator) rescue nil - return unless decl && decl.type.is_a?(Type) + return unless decl&.type return if decl.type.auto? # not yet wrapped — skip type_str = auto_type_source_form(decl.type) name = decl.respond_to?(:name) ? decl.name : "" @@ -1625,7 +1625,7 @@ def auto_slot_label(slot) case slot.kind when :param param = slot.decl_node.params[slot.index] - "parameter '#{param[:name]}' of `#{slot.fn_name}`" + "parameter '#{param.name}' of `#{slot.fn_name}`" when :return "return type of `#{slot.fn_name}`" when :local diff --git a/src/annotator-helpers/function_analysis.rb b/src/annotator-helpers/function_analysis.rb index 8326339ac..e6d4fc457 100644 --- a/src/annotator-helpers/function_analysis.rb +++ b/src/annotator-helpers/function_analysis.rb @@ -64,21 +64,21 @@ def analyze_routine(node, body, declared_return, is_implicit) return_type end - sig { params(params: T::Array[T::Hash[Symbol, T.untyped]], return_type: Symbol).returns(FunctionSignature) } + sig { params(params: T::Array[AST::Param], return_type: Symbol).returns(FunctionSignature) } def build_lambda_signature(params, return_type) T.bind(self, SemanticAnnotator) rescue nil normalized_params = params.map do |param| { - name: param[:name], - type: param[:type], - required: param[:default].nil?, - default: param[:default], - mutable: param[:mutable] || false, - takes: param[:takes] || false + name: param.name, + type: param.type, + required: param.default.nil?, + default: param.default, + mutable: param.mutable || false, + takes: param.takes || false } end - FunctionSignature.new(params: normalized_params, return_type: return_type) + FunctionSignature.new(params: normalized_params, return_type: Type.new(return_type)) end # Resolve a function call: look up the function, dispatch based on type @@ -135,7 +135,7 @@ def resolve_call(node, args) comptime_type_args = [] params = func_type.params || [] params.each_with_index do |p, i| - if p[:comptime] && args[i].is_a?(AST::Identifier) + if p.comptime && args[i].is_a?(AST::Identifier) comptime_type_args << args[i].name.to_sym args[i].full_type = :Type # Mark as type-value, not a variable end @@ -178,7 +178,7 @@ def resolve_call(node, args) # The original `!T` is stashed on `error_union_type` so # OR-RESCUE handlers (which read the LHS's union to pick # `catch`/`orelse`) can still see the un-stripped form. - if node.full_type.is_a?(Type) && node.full_type.respond_to?(:error_union?) && + if node.full_type.respond_to?(:error_union?) && node.full_type.error_union? node.error_union_type = node.full_type if node.respond_to?(:error_union_type=) outer = node.full_type @@ -240,13 +240,13 @@ def resolve_call(node, args) # String returns only get heap_promoted_call from callee.returns_promoted # (not from type alone) because stdlib string functions like readFile use # frameAlloc internally — the caller shouldn't try to free those. - if node.type_info.is_a?(Type) + if node.type_info callee_node = @fn_nodes[func_name] sig_return_heap = func_type.is_a?(FunctionSignature) && func_type.return_provenance == :heap if callee_node&.return_provenance == :heap || sig_return_heap - node.type_info.provenance = :heap if node.type_info.is_a?(Type) + node.type_info&.provenance = :heap elsif node.type_info&.needs_escape_promotion? && !node.type_info&.string? - node.type_info.provenance = :heap if node.type_info.is_a?(Type) + node.type_info&.provenance = :heap else # Union return types with heap variants need heap_promoted_call # when the callee allocates at all (frame, heap, or alloc). @@ -258,7 +258,7 @@ def resolve_call(node, args) has_heap = (schema[:variants] || {}).any? { |_, vt| Type.variant_has_heap?(vt) } callee_allocates = callee_node&.return_provenance == :heap || callee_node&.uses_frame || callee_node&.uses_heap || callee_node&.uses_alloc if has_heap && callee_allocates - node.type_info.provenance = :heap if node.type_info.is_a?(Type) + node.type_info&.provenance = :heap end end end @@ -266,12 +266,12 @@ def resolve_call(node, args) end end - sig { params(config: T::Hash[Symbol, T.untyped]).returns(T.nilable(FunctionSignature)) } + sig { params(config: FunctionSignature).returns(T.nilable(FunctionSignature)) } def normalize_intrinsic_signature(config) T.bind(self, SemanticAnnotator) rescue nil - return nil if config[:args] == :Varargs + return nil if config.arg_spec == :Varargs - params = config[:args].each_with_index.map do |arg_def, i| + params = config.arg_spec.each_with_index.map do |arg_def, i| if arg_def.is_a?(Hash) # Extended format: { type: :Int64, mutable: true, takes: false } { @@ -295,9 +295,9 @@ def normalize_intrinsic_signature(config) FunctionSignature.new( params: params, - return_type: config[:return], + return_type: config.return_type, intrinsic: true, - zig_pattern: config[:zig] + zig_pattern: config.emit&.zig ) end @@ -305,7 +305,7 @@ def normalize_intrinsic_signature(config) def verify_function_signature!(node, signature) T.bind(self, SemanticAnnotator) rescue nil params = signature.params - min_args = params.count { |param| param[:required] } + min_args = params.count { |param| param.required } max_args = params.size given_args = node.args.size @@ -319,11 +319,11 @@ def verify_function_signature!(node, signature) if given_args < max_args params[given_args...max_args].each do |param| - next if param[:required] - default = param[:default] + next if param.required + default = param.default injected = case default when AST::DefaultLit - type_name = param[:type].is_a?(Symbol) ? param[:type].to_s : param[:type].to_s + type_name = param.type.to_s AST::StructLit.new(default.token, type_name, {}, nil) else default.dup @@ -339,16 +339,16 @@ def verify_function_signature!(node, signature) node.args.each_with_index do |arg_node, i| param = params[i] - next if param[:comptime] # comptime type params are not type-checked + next if param.comptime # comptime type params are not type-checked verify_param_lifetime!(arg_node, param, signature) - if param[:mutable] + if param.mutable if !arg_node.is_a?(AST::Identifier) - error!(arg_node, :IMMUTABLE_ARG_PASSED_AS_EXPRESSION, index: i+1, param: param[:name]) + error!(arg_node, :IMMUTABLE_ARG_PASSED_AS_EXPRESSION, index: i+1, param: param.name) end if current_scope.is_immutable?(arg_node.name) - emit_immutable_arg_error!(arg_node, current_scope, i + 1, param[:name]) + emit_immutable_arg_error!(arg_node, current_scope, i + 1, param.name) end # Mark only the SymbolEntry as mutated-through-call. The callee receives a mutable reference @@ -368,7 +368,7 @@ def verify_function_signature!(node, signature) is_give = arg_node.is_a?(AST::MoveNode) inner_node = is_give ? arg_node.value : arg_node - if param[:takes] || is_give + if param.takes || is_give # Reject borrowed values passed to TAKES params. # Container index access (arr[i], map[key]) returns a borrow - # you cannot take ownership of data inside a container. @@ -386,7 +386,7 @@ def verify_function_signature!(node, signature) # Ensure @list args to TAKES params are heap-owned (implicit COPY). if inner_node.is_a?(AST::Identifier) - owned = ensure_owned_value!(inner_node, param[:type]) + owned = ensure_owned_value!(inner_node, param.type) node.args[i] = owned if owned end @@ -401,7 +401,7 @@ def verify_function_signature!(node, signature) move_if_not_copyable!( inner_node, action: is_give ? :give : :takes, - consumer_param_type: param[:type], + consumer_param_type: param.type, ) inner_node.was_moved = true arg_node.was_moved = true @@ -414,16 +414,16 @@ def verify_function_signature!(node, signature) # Weak refs must be RESOLVE'd before passing to concrete params. arg_ti = arg_node.respond_to?(:type_info) ? arg_node.type_info : nil - expected_raw = param[:type] + expected_raw = param.type if arg_ti&.link? && expected_raw != :Any param_type_obj = expected_raw.is_a?(Type) ? expected_raw : nil unless param_type_obj&.link? arg_name = arg_node.respond_to?(:name) ? arg_node.name : "Expression" - error!(arg_node, :LINK_NEEDS_RESOLVE_FOR_CALL, name: arg_name, param: param[:name]) + error!(arg_node, :LINK_NEEDS_RESOLVE_FOR_CALL, name: arg_name, param: param.name) end end - expected = param[:type] + expected = param.type actual = arg_node.resolved_type match = false @@ -439,7 +439,7 @@ def verify_function_signature!(node, signature) elsif actual_type_obj.is_a?(Type) && actual_type_obj.fn_type? && actual_type_obj.raw.reentrant && !expected_type_obj.raw.reentrant arg_name = arg_node.respond_to?(:name) ? arg_node.name : "Expression" - error!(arg_node, :REENTRANT_FN_TO_NON_REENTRANT_PARAM, name: arg_name, param: param[:name]) + error!(arg_node, :REENTRANT_FN_TO_NON_REENTRANT_PARAM, name: arg_name, param: param.name) end end @@ -506,7 +506,7 @@ def verify_function_signature!(node, signature) current_path = get_path_to_root(arg_node) next if current_path.nil? - is_mutable = param[:mutable] + is_mutable = param.mutable encountered_args.each_with_index do |prev, prev_index| # Mutable aliases conflict when their root paths overlap. @@ -528,34 +528,34 @@ def verify_function_signature!(node, signature) warn_multi_atomic_bare_value_call!(node, atomic_bare_value_args) end - sig { params(arg_node: T.untyped, expected_type_obj: Type, param: T::Hash[Symbol, T.untyped]).returns(T::Boolean) } + sig { params(arg_node: T.untyped, expected_type_obj: Type, param: AST::Param).returns(T::Boolean) } def atomic_cell_to_bare_value_param?(arg_node, expected_type_obj, param) T.bind(self, SemanticAnnotator) rescue nil return false unless arg_node.is_a?(AST::Identifier) sym = arg_node.symbol return false unless sym&.sync == :atomic return false if sym.respond_to?(:layout) && sym.layout == :indirect - return false if param[:sync] == :atomic - return false if param[:symbol]&.respond_to?(:sync) && param[:symbol].sync == :atomic + return false if param.sync == :atomic + return false if param.symbol&.respond_to?(:sync) && param.symbol.sync == :atomic return false if expected_type_obj.any? || expected_type_obj.fn_type? return false if expected_type_obj.shared? || expected_type_obj.any_sync? expected_type_obj.primitive? end - sig { params(arg_node: T.untyped, param: T::Hash[Symbol, T.untyped], signature: FunctionSignature).returns(T::Boolean) } + sig { params(arg_node: T.untyped, param: AST::Param, signature: FunctionSignature).returns(T::Boolean) } def atomic_cell_to_atomic_param?(arg_node, param, signature) T.bind(self, SemanticAnnotator) rescue nil return false unless arg_node.is_a?(AST::Identifier) sym = arg_node.symbol return false unless sym&.sync == :atomic - ptype = param[:type] + ptype = param.type return true if ptype.is_a?(Type) && ptype.sync == :atomic - return true if param[:sync] == :atomic - return true if param[:symbol]&.respond_to?(:sync) && param[:symbol].sync == :atomic + return true if param.sync == :atomic + return true if param.symbol&.respond_to?(:sync) && param.symbol.sync == :atomic requires = signature.requires - families = requires && requires[param[:name].to_s] + families = requires && requires[param.name.to_s] families.respond_to?(:include?) && families.include?(:ATOMIC) end @@ -587,13 +587,13 @@ def warn_multi_atomic_bare_value_call!(node, atomic_args) "will require an explicit @inconsistent call-site annotation.") end - sig { params(arg_node: T.untyped, param: T::Hash[Symbol, T.untyped], signature: FunctionSignature).returns(T.nilable(T::Boolean)) } + sig { params(arg_node: T.untyped, param: AST::Param, signature: FunctionSignature).returns(T.nilable(T::Boolean)) } def verify_param_lifetime!(arg_node, param, signature) T.bind(self, SemanticAnnotator) rescue nil return true if !arg_node.is_a?(AST::Identifier) @og = T.let(@og, T.untyped) - if param[:mutable] && !@og.can_write?(arg_node.name) + if param.mutable && !@og.can_write?(arg_node.name) error!(arg_node, :MUTABLE_ARG_RESTRICTED, name: arg_node.name) end @@ -602,7 +602,7 @@ def verify_param_lifetime!(arg_node, param, signature) lifetime_paths = [lifetime_paths] unless lifetime_paths.is_a?(Array) return true if lifetime_paths.empty? - borrow_type = param[:mutable] ? :mutable : :immutable + borrow_type = param.mutable ? :mutable : :immutable return true if current_scope.is_immutable?(arg_node.name) || current_scope.is_restricted?(arg_node.name) # If `param` is named in the lifetime sources (any of the multi- @@ -613,9 +613,9 @@ def verify_param_lifetime!(arg_node, param, signature) next [:wildcard] if p == :wildcard [p.to_s.split(".").first] end - return true unless base_paths.include?(:wildcard) || base_paths.include?(param[:name]) + return true unless base_paths.include?(:wildcard) || base_paths.include?(param.name) - error!(arg_node, :MUTABLE_PARAM_NEEDS_RESTRICT, name: param[:name]) + error!(arg_node, :MUTABLE_PARAM_NEEDS_RESTRICT, name: param.name) end # `node.return_lifetime` shapes: @@ -682,14 +682,14 @@ def verify_lifetime_source!(node, source_node) T.bind(self, SemanticAnnotator) rescue nil path = get_path_to_root(source_node) root_param_name = T.must(path).first.to_s - param = node.params.find { |p| p[:name] == root_param_name } + param = node.params.find { |p| p.name == root_param_name } if param.nil? error!(node, :LIFETIME_ROOT_NOT_PARAM, name: root_param_name) end # Extract the resolved type name (Type objects from parse_type_annotation) - param_type = param[:type] + param_type = param.type current_type_name = param_type.is_a?(Type) ? param_type.resolved : param_type.to_sym T.must(path).drop(1).each do |field_sym| @@ -720,13 +720,13 @@ def declare_and_verify_params(node) T.bind(self, SemanticAnnotator) rescue nil node.params.each do |param| # Validate Defaults - if param[:default] - if param[:default].is_a?(AST::DefaultLit) + if param.default + if param.default.is_a?(AST::DefaultLit) # DEFAULT is only valid for struct-type params - param_type_sym = param[:type].is_a?(Symbol) ? param[:type] : param[:type].to_sym rescue nil + param_type_sym = param.type&.resolved schema = lookup_type_schema(param_type_sym) if param_type_sym unless schema.is_a?(Hash) && !schema[:kind] - error!(node, :DEFAULT_NEEDS_STRUCT_PARAM, type: param[:type]) + error!(node, :DEFAULT_NEEDS_STRUCT_PARAM, type: param.type) end # Validate all fields of the struct have defaults field_names = schema.keys.reject { |k| k.is_a?(Symbol) } @@ -734,16 +734,16 @@ def declare_and_verify_params(node) field_defaults = schema[:field_defaults] || {} missing = field_names.reject { |f| field_defaults.key?(f) } if missing.any? - error!(node, :DEFAULT_STRUCT_MISSING_DEFAULTS, name: param[:name], type: param[:type], missing: missing.join(', ')) + error!(node, :DEFAULT_STRUCT_MISSING_DEFAULTS, name: param.name, type: param.type, missing: missing.join(', ')) end end - param[:default].full_type = param[:type].to_sym rescue param[:type] + param.default.full_type = param.type else - visit(param[:default]) - def_type = param[:default].resolved_type - param_type = param[:type] + visit(param.default) + def_type = param.default.resolved_type + param_type = param.type unless is_safe_autocast?(def_type, param_type) - error!(node, :DEFAULT_VALUE_TYPE_MISMATCH, name: param[:name], expected: param_type, got: def_type) + error!(node, :DEFAULT_VALUE_TYPE_MISMATCH, name: param.name, expected: param_type, got: def_type) end end end @@ -751,12 +751,12 @@ def declare_and_verify_params(node) # Seed sync for cross-module helpers where caller-sync propagation # cannot see call sites. Visible callers still override this later. param_sync = nil - if param[:sync] - param_sync = param[:sync] - elsif param[:type].is_a?(Type) && param[:type].any_sync? - param_sync = param[:type].sync + if param.sync + param_sync = param.sync + elsif param.type&.any_sync? + param_sync = param.type.sync elsif node.respond_to?(:requires) && node.requires - families = node.requires[param[:name].to_s] + families = node.requires[param.name.to_s] if families # Polymorphic family seeds are only defaults; visible callers # override them during caller-sync propagation. @@ -779,33 +779,33 @@ def declare_and_verify_params(node) # the bare-cell form. param_layout = nil if param_sync == :atomic - param_t = param[:type].is_a?(Type) ? param[:type] : Type.new(param[:type]) + param_t = param.type param_layout = :indirect if param_t.respond_to?(:struct?) && param_t.struct? end current_scope.declare( - param[:name], nil, param[:type], param[:mutable], false, nil, :stack, + param.name, nil, param.type, param.mutable, false, nil, :stack, Set.new, [], sync: param_sync, layout: param_layout ) - # Stash the SymbolEntry on the param hash so downstream passes don't + # Stash the SymbolEntry on the Param so downstream passes don't # need to find an Identifier reference in the body. - param[:symbol] = current_scope.locals[param[:name]] - param[:symbol].is_param = true - param[:symbol].param_decl_token = param[:name_token] + param.symbol = current_scope.locals[param.name] + param.symbol.is_param = true + param.symbol.param_decl_token = param.name_token # Preserve REQUIRES disjunctions for call-site effect resolution. if node.respond_to?(:requires) && node.requires - fams = node.requires[param[:name].to_s] - param[:symbol].sync_families = fams if fams.is_a?(Set) && !fams.empty? + fams = node.requires[param.name.to_s] + param.symbol.sync_families = fams if fams.is_a?(Set) && !fams.empty? end # TAKES parameters own the data — force :affine so cleanup is emitted. - current_scope.locals[param[:name]].takes = true if param[:takes] - classify_ownership!(current_scope.locals[param[:name]]) - og_declare(param[:name], nil, param[:type]) + current_scope.locals[param.name].takes = true if param.takes + classify_ownership!(current_scope.locals[param.name]) + og_declare(param.name, nil, param.type) # Non-TAKES parameters are implicit borrows. Mark in OG so the # annotator prevents storing borrowed data into owned containers. - unless param[:takes] - @og[param[:name]]&.kind = :borrowed + unless param.takes + @og[param.name]&.kind = :borrowed end - param[:type] + param.type end end @@ -1003,10 +1003,10 @@ def reject_arg_type_matches?(arg, kind) pred.call(type) end - sig { params(definitions: T::Array[T.untyped], args: T::Array[T.untyped]).returns(T.nilable(T::Hash[Symbol, T.untyped])) } + sig { params(definitions: T::Array[T.untyped], args: T::Array[T.untyped]).returns(T.untyped) } def find_matching_intrinsic(definitions, args) T.bind(self, SemanticAnnotator) rescue nil - definitions.find do |config| + matched = definitions.find do |config| next true if config[:args] == :Varargs # Varargs accepts anything # Arity check @@ -1022,7 +1022,7 @@ def find_matching_intrinsic(definitions, args) expected = spec[:type] next false unless is_safe_autocast?(arg.resolved_type, expected) # Check capability constraints (sync, ownership, etc.) - arg_type = arg.type_info.is_a?(Type) ? arg.type_info : nil + arg_type = arg.type_info next false if spec[:sync] && arg_type&.sync != spec[:sync] next false if spec[:ownership] && arg_type&.ownership != spec[:ownership] true @@ -1031,6 +1031,7 @@ def find_matching_intrinsic(definitions, args) end end end + matched && IntrinsicRegistry.fs(matched) end # Formats intrinsic args for error messages diff --git a/src/annotator-helpers/function_context.rb b/src/annotator-helpers/function_context.rb index 1a1d7801d..66f7cd66e 100644 --- a/src/annotator-helpers/function_context.rb +++ b/src/annotator-helpers/function_context.rb @@ -6,16 +6,31 @@ class FunctionContext extend T::Sig - attr_accessor :name, :return_type, :lifetime, :type_params, + attr_accessor :name, :lifetime, :type_params, :frame_count, :heap_count, :alloc_count, :needs_rt, # explicit "fn body references rt" flag (independent of allocation counters) :loop_depth, :conditional_depth, :returns, :stack_vars_bytes # accumulated bytes for stack-local variables - sig { params(name: String, return_type: T.untyped, lifetime: T.nilable(T::Array[String]), type_params: T::Array[Symbol]).void } - def initialize(name:, return_type:, lifetime: nil, type_params: []) + # Seam: the enclosing function's expected return is ALWAYS a Type + # (Void for "no value"). Coerced here so the producer may pass + # nil/Symbol without any return-check reader needing a Symbol/Type + # discriminator. + sig { returns(Type) } + attr_reader :return_type + + sig { params(val: T.untyped).void } + def return_type=(val) + @return_type = T.let( + val.nil? ? Type.new(:Void) : (val.is_a?(Type) ? val : Type.new(val)), + Type + ) + end + + sig { params(name: String, return_type: T.nilable(Type), lifetime: T.nilable(T::Array[String]), type_params: T::Array[Symbol]).void } + def initialize(name:, return_type: nil, lifetime: nil, type_params: []) @name = name - @return_type = return_type + self.return_type = return_type @lifetime = lifetime @type_params = type_params @frame_count = T.let(0, Integer) diff --git a/src/annotator-helpers/function_return.rb b/src/annotator-helpers/function_return.rb new file mode 100644 index 000000000..312d946fe --- /dev/null +++ b/src/annotator-helpers/function_return.rb @@ -0,0 +1,94 @@ +# typed: strict +# Strongly-typed representation of a function's return. +# +# Replaces the untyped `return_type` union (Type | Symbol | nil | +# Proc | Hash) and the std_lib `return_type: ->(recv){...}` Procs. +# Every return is one of a closed set of variants; `resolve` always +# yields a concrete non-nil Type. No Proc, no Hash, no nil. +# +# Fixed -> a concrete Type (covers all static returns, +# incl. {type:,sync:} -> Type.new with caps, +# and the implicit-Void case) +# ElementOf -> receiver.element_type +# OptionalOfElement -> ?element_type +# IdOfElement -> Id +# OptionalOfValue -> ?value_type +# ValueList -> value_type[]@list +# KeyList -> key_type[]@list +# Infer -> a host inference method (bounded Symbol set: +# infer_element_type / infer_optional_element_type +# / infer_map_return_type) -- a typed variant, +# not a Proc; resolve dispatches via the host. +require "sorbet-runtime" +require_relative "../ast/type" + +class FunctionReturn < T::Struct + extend T::Sig + + class Kind < T::Enum + enums do + Fixed = new("fixed") + ElementOf = new("element_of") + OptionalOfElement = new("optional_of_element") + IdOfElement = new("id_of_element") + OptionalOfValue = new("optional_of_value") + ValueList = new("value_list") + KeyList = new("key_list") + Infer = new("infer") + end + end + + const :kind, Kind + # Payload for Fixed only (the concrete return Type). For every + # parametric variant this is nil because the Type is computed from + # the receiver at resolve time -- that is the variant's whole point, + # not an "untyped" hole. + const :fixed, T.nilable(Type), default: nil + # Payload for Infer only: the host inference method name (bounded). + const :infer, T.nilable(Symbol), default: nil + + sig { params(t: Type).returns(FunctionReturn) } + def self.fixed(t) = new(kind: Kind::Fixed, fixed: t) + + sig { params(m: Symbol).returns(FunctionReturn) } + def self.infer(m) = new(kind: Kind::Infer, infer: m) + + # A receiver-parametric variant (ElementOf / OptionalOfElement / + # IdOfElement / OptionalOfValue / ValueList / KeyList) by Kind + # constant name. No payload -- the Type is computed from the + # receiver at resolve time. + sig { params(kind_name: Symbol).returns(FunctionReturn) } + def self.variant(kind_name) = new(kind: Kind.const_get(kind_name)) + + # Resolve to a concrete Type. receiver is the call's receiver type + # (for parametric shapes); args/host support the Infer variant's + # host-method dispatch. Always returns a Type, never nil. + sig do + params(receiver: T.nilable(Type), args: T::Array[T.untyped], + host: T.untyped).returns(Type) + end + def resolve(receiver, args = [], host = nil) + case kind + when Kind::Fixed + T.must(fixed) + when Kind::ElementOf + el = receiver&.element_type + el.is_a?(Type) ? el : Type.new(el || :Any) + when Kind::OptionalOfElement + Type.new(:"?#{T.must(receiver).element_type.resolved}") + when Kind::IdOfElement + Type.new(:"Id<#{T.must(receiver).element_type.resolved}>") + when Kind::OptionalOfValue + Type.new(:"?#{T.must(receiver).value_type.resolved}") + when Kind::ValueList + Type.new(:"#{T.must(receiver).value_type.resolved}[]@list") + when Kind::KeyList + Type.new(:"#{T.must(receiver).key_type.resolved}[]@list") + when Kind::Infer + r = host.send(T.must(infer), args, nil) + r.is_a?(Type) ? r : Type.new(r || :Any) + else + Type.new(:Any) + end + end +end diff --git a/src/annotator-helpers/function_signature.rb b/src/annotator-helpers/function_signature.rb index 3cc6b4eec..689424c01 100644 --- a/src/annotator-helpers/function_signature.rb +++ b/src/annotator-helpers/function_signature.rb @@ -6,13 +6,34 @@ # computed metadata (needs_rt, can_fail, return_provenance) that callers # need for code generation and cleanup planning. require "sorbet-runtime" +require_relative "intrinsic_emit" +require_relative "function_return" class FunctionSignature extend T::Sig # Static signature fields (set at creation) - attr_reader :params, :visibility, :type_params, :reentrant - attr_accessor :return_type, :return_lifetime, :return_strategy + attr_reader :visibility, :type_params, :reentrant + attr_accessor :return_lifetime, :return_strategy + + # Always a list of AST::Param (coerced at the seam). No Hash. + sig { returns(T::Array[AST::Param]) } + attr_reader :params + + # Seam: a function signature's return is ALWAYS a Type (Void for + # "no value"). Coerced here so callers may pass nil/Symbol during + # construction or late return-inference assignment without any + # reader ever needing a Symbol/Type/nil discriminator. + sig { returns(Type) } + attr_reader :return_type + + sig { params(val: T.untyped).void } + def return_type=(val) + @return_type = T.let( + val.nil? ? Type.new(:Void) : (val.is_a?(Type) ? val : Type.new(val)), + Type + ) + end # EXTERN function fields attr_accessor :extern, :module_alias, :extern_effects @@ -24,6 +45,17 @@ class FunctionSignature # Intrinsic marker attr_accessor :intrinsic, :zig_pattern + # Intrinsic signature semantics (set by the registry converter; nil + # for ordinary user functions). `arg_validator` the custom arg + # type-checker; `arg_spec` the raw args shape; `emit` the typed + # codegen/dispatch metadata (IntrinsicEmit). + attr_accessor :arg_validator, :arg_spec, :arity, :emit + # Strongly-typed return (FunctionReturn). Non-nil; defaults to + # Fixed(Void). The single return facility -- resolve(receiver, + # args, host) always yields a concrete Type. Replaced the former + # untyped return_spec (Symbol|Hash|Proc|nil) / return_resolver Proc. + attr_accessor :return_def + # P2: REQUIRES clause as { param_name_string => Set[Symbol] } or nil. # Mirrors FunctionDef#requires; needed at signature level so call-site # checks survive cross-module flow. @@ -39,7 +71,7 @@ def self.from_function_def(fn) else FunctionSignature.new( params: fn.params || [], - return_type: fn.return_type || :Any, + return_type: fn.return_type || Type.new(:Any), return_lifetime: fn.return_lifetime, visibility: fn.visibility, type_params: fn.type_params, @@ -62,14 +94,14 @@ def self.sync_from_function_def!(sig, fn) sig end - sig { params(params: T::Array[T::Hash[Symbol, T.untyped]], return_type: T.untyped, return_lifetime: T.untyped, visibility: T.nilable(Symbol), type_params: T.nilable(T::Array[Symbol]), reentrant: T::Boolean, extern: T::Boolean, module_alias: T.nilable(String), extern_effects: T.nilable(T::Hash[Symbol, Symbol]), fn_type_params: T.nilable(T::Array[Symbol]), owner_type: T.nilable(String), owner_type_params: T.nilable(T::Array[T.untyped]), intrinsic: T::Boolean, zig_pattern: T.nilable(String)).void } - def initialize(params:, return_type:, return_lifetime: nil, visibility: nil, + sig { params(params: T::Array[T.untyped], return_type: T.nilable(Type), return_lifetime: T.untyped, visibility: T.nilable(Symbol), type_params: T.nilable(T::Array[Symbol]), reentrant: T::Boolean, extern: T::Boolean, module_alias: T.nilable(String), extern_effects: T.nilable(T::Hash[Symbol, Symbol]), fn_type_params: T.nilable(T::Array[Symbol]), owner_type: T.nilable(String), owner_type_params: T.nilable(T::Array[T.untyped]), intrinsic: T::Boolean, zig_pattern: T.nilable(String)).void } + def initialize(params:, return_type: nil, return_lifetime: nil, visibility: nil, type_params: nil, reentrant: false, extern: false, module_alias: nil, extern_effects: nil, fn_type_params: nil, owner_type: nil, owner_type_params: nil, intrinsic: false, zig_pattern: nil) - @params = params - @return_type = return_type + @params = params.map { |p| AST::Param.coerce(p) } + self.return_type = return_type @return_lifetime = return_lifetime @visibility = visibility @type_params = type_params @@ -89,8 +121,20 @@ def initialize(params:, return_type:, return_lifetime: nil, visibility: nil, @return_strategy = T.let(nil, T.untyped) @stack_tier = T.let(nil, T.untyped) @requires = T.let(nil, T.untyped) + @arg_validator = T.let(nil, T.nilable(Proc)) + @arg_spec = T.let(nil, T.untyped) + @arity = T.let(nil, T.nilable(Integer)) + @emit = T.let(nil, T.nilable(IntrinsicEmit)) + @return_def = T.let(FunctionReturn.fixed(Type.new(:Void)), + FunctionReturn) end + # True iff the return is a static Fixed Type (not receiver-parametric + # or host-inferred). Callers that only honor a statically-declared + # owned return (e.g. the MIR HPT_LEAK check) gate on this. + sig { returns(T::Boolean) } + def fixed_return? = @return_def.kind == FunctionReturn::Kind::Fixed + sig { returns(FunctionSignature) } def dup FunctionSignature.new( @@ -108,6 +152,11 @@ def dup s.return_strategy = @return_strategy s.stack_tier = @stack_tier s.requires = @requires + s.arg_validator = @arg_validator + s.arg_spec = @arg_spec + s.arity = @arity + s.emit = @emit + s.return_def = @return_def end end end diff --git a/src/annotator-helpers/generic_analysis.rb b/src/annotator-helpers/generic_analysis.rb index 974e96d20..aedc29d7a 100644 --- a/src/annotator-helpers/generic_analysis.rb +++ b/src/annotator-helpers/generic_analysis.rb @@ -268,8 +268,8 @@ def infer_generic_type_args!(node, signature, actual_args, type_params) signature.params.each_with_index do |param, i| arg = actual_args[i] next unless arg - param_type = param[:type].is_a?(Type) ? param[:type] : Type.new(param[:type] || :Any) - actual_type = if arg.respond_to?(:type_info) && arg.type_info.is_a?(Type) + param_type = param.type || Type.new(:Any) + actual_type = if arg.respond_to?(:type_info) && arg.type_info arg.type_info else Type.new(arg.resolved_type || :Any) @@ -292,16 +292,16 @@ def enforce_shared_family_call_sync!(node, signature, actual_args, type_params) signature.params.each_with_index do |param, i| arg = actual_args[i] next unless arg - param_type = param[:type].is_a?(Type) ? param[:type] : Type.new(param[:type] || :Any) + param_type = param.type || Type.new(:Any) next unless generic_shared_family_param?(param_type) && type_params.include?(param_type.resolved) - actual_type = if arg.respond_to?(:type_info) && arg.type_info.is_a?(Type) + actual_type = if arg.respond_to?(:type_info) && arg.type_info arg.type_info else Type.new(arg.resolved_type || :Any) end next unless actual_type.shared? shared_args << { - name: param[:name], + name: param.name, type: generic_shared_payload_binding(actual_type) } end @@ -507,7 +507,7 @@ def shared_call_capability_display(type) def substitute_type_params(signature, subst) T.bind(self, SemanticAnnotator) rescue nil FunctionSignature.new( - params: signature.params.map { |p| p.merge(type: apply_type_subst(p[:type], subst)) }, + params: signature.params.map { |p| p.dup.tap { |np| np.type = apply_type_subst(p.type, subst) } }, return_type: apply_type_subst(signature.return_type, subst), return_lifetime: signature.return_lifetime, visibility: signature.visibility @@ -522,7 +522,7 @@ def substitute_type_params(signature, subst) sig { params(node: T.untyped).returns(NilClass) } def validate_stream_type!(node) T.bind(self, SemanticAnnotator) rescue nil - return unless node.type.is_a?(Type) && node.type.future? + return unless node.type&.future? if node.type.multiowned? error!(node, :RC_PROMISE_NEEDS_SHARED) end @@ -537,7 +537,7 @@ def validate_stream_type!(node) sig { params(node: T.untyped, final_type: T.untyped).returns(T.nilable(Type)) } def propagate_declared_type_to_value!(node, final_type) T.bind(self, SemanticAnnotator) rescue nil - return unless node.type.is_a?(Type) + return unless node.type # BgStreamBlock infers ~?T[]; declared ~T[INF] picks the runtime wrapper. if node.value.is_a?(AST::BgStreamBlock) && node.type.inf_stream? @@ -567,7 +567,7 @@ def propagate_declared_type_to_value!(node, final_type) sig { params(node: T.untyped, final_type: T.untyped).returns(T.nilable(Symbol)) } def propagate_collection_metadata!(node, final_type) T.bind(self, SemanticAnnotator) rescue nil - coll_src = if (decl_t = node.type).is_a?(Type) && decl_t.collection + coll_src = if (decl_t = node.type) && decl_t.collection decl_t elsif node.value.type_info&.collection node.value.type_info @@ -577,7 +577,7 @@ def propagate_collection_metadata!(node, final_type) node.type_info.provenance = :heap if coll_src.collection == :pool || coll_src.collection == :set node.type_info.shard_count = coll_src.shard_count if coll_src.shard_count node.type_info.soa = coll_src.soa if coll_src.respond_to?(:soa) && coll_src.soa - if node.full_type.is_a?(Type) + if node.full_type node.full_type.collection = coll_src.collection unless node.full_type.collection node.full_type.soa = coll_src.soa if coll_src.respond_to?(:soa) && coll_src.soa node.full_type.shard_count = coll_src.shard_count if coll_src.shard_count && !node.full_type.shard_count @@ -585,24 +585,24 @@ def propagate_collection_metadata!(node, final_type) end # Standalone @soa on fixed arrays (no collection): propagate soa flag directly. - if !coll_src && (decl_t = node.type).is_a?(Type) && decl_t.soa + if !coll_src && (decl_t = node.type) && decl_t.soa node.type_info.soa = true if node.type_info - node.full_type.soa = true if node.full_type.is_a?(Type) + node.full_type&.soa = true end # Map-specific propagation: maps don't use :collection, so the above doesn't cover them. - if (decl_t = node.type).is_a?(Type) + if (decl_t = node.type) if decl_t.shard_count && !node.type_info&.shard_count node.type_info.shard_count = decl_t.shard_count if node.type_info - node.full_type.instance_variable_set(:@shard_count, decl_t.shard_count) if node.full_type.is_a?(Type) + node.full_type&.instance_variable_set(:@shard_count, decl_t.shard_count) end if decl_t.sync && node.type_info && !node.type_info.sync node.type_info.sync = decl_t.sync - node.full_type.sync = decl_t.sync if node.full_type.is_a?(Type) + node.full_type&.sync = decl_t.sync end if decl_t.ownership != :affine && node.type_info node.type_info.instance_variable_set(:@ownership, decl_t.ownership) - node.full_type.instance_variable_set(:@ownership, decl_t.ownership) if node.full_type.is_a?(Type) + node.full_type&.instance_variable_set(:@ownership, decl_t.ownership) end end end @@ -615,7 +615,7 @@ def propagate_collection_metadata!(node, final_type) def propagate_call_flags!(node) T.bind(self, SemanticAnnotator) rescue nil if has_heap_promoted_call?(node.value) - node.type_info.provenance = :heap if node.type_info.is_a?(Type) + node.type_info&.provenance = :heap end end @@ -706,7 +706,7 @@ def bg_exit_frame_string?(expr) # Check stdlib def for explicit frame allocation (provenance not yet set on expr). if expr.respond_to?(:matched_stdlib_def) msd = expr.matched_stdlib_def - return true if msd.is_a?(Hash) && msd[:return_alloc] == :frame + return true if msd && msd.emit&.return_alloc == :frame end false end diff --git a/src/annotator-helpers/intrinsic_emit.rb b/src/annotator-helpers/intrinsic_emit.rb new file mode 100644 index 000000000..ed16a3e8b --- /dev/null +++ b/src/annotator-helpers/intrinsic_emit.rb @@ -0,0 +1,98 @@ +# typed: strict +# Strongly-typed emission/dispatch metadata for an intrinsic. +# +# The std_lib registries (STD_LIB / POOL_METHODS / SET_METHODS / +# MAP_METHODS / INDEX_OPS / BUILTIN_OPS) stay defined as Hash literals +# (the readable authoring DSL). A startup converter (see EPIC) turns +# each entry into a FunctionSignature whose intrinsic-only codegen +# metadata lives HERE -- a typed value object, never an untyped Hash. +# +# Recursive: sub-descriptors (`eql:`, `cleanup:`, `pool:`, +# `string_map:` ...) are themselves IntrinsicEmit; registry-pointer +# forms (`{ registry: MAP_METHODS }`) carry the registry name in +# `:registry`. +require "sorbet-runtime" + +class IntrinsicEmit < T::Struct + extend T::Sig + + # --- Zig codegen templates (String template, or Symbol macro + # directive like :macro_map in STD_LIB) --- + StrOrSym = T.type_alias { T.any(String, Symbol) } + prop :zig, T.nilable(StrOrSym), default: nil + prop :numeric_zig, T.nilable(StrOrSym), default: nil + prop :sharded_zig, T.nilable(StrOrSym), default: nil + prop :shard_direct_zig, T.nilable(StrOrSym), default: nil + + # --- FSM emission fragments --- + # FsmOps DSL op-objects, not strings -- passthrough, no coercion. + prop :fsm_setup, T.nilable(T::Array[T.untyped]), default: nil + prop :fsm_state_decls, T.nilable(T::Array[T.untyped]), default: nil + prop :fsm_finish_block, T.nilable(T::Array[T.untyped]), default: nil + prop :fsm_state_finalize, T.nilable(T::Array[T.untyped]), default: nil + prop :fsm_finish_value, T.nilable(String), default: nil + + # --- Dispatch flags --- + prop :bc, T::Boolean, default: false + prop :is_method, T::Boolean, default: false + prop :suspends, T::Boolean, default: false + prop :narrows_collection, T::Boolean, default: false + prop :mutates_receiver, T::Boolean, default: false + prop :allocates, T::Boolean, default: false + prop :takes_value, T::Boolean, default: false + prop :container_borrow, T::Boolean, default: false + + # --- Symbol-valued dispatch / allocation --- + prop :tag, T.nilable(Symbol), default: nil + prop :builtin, T.nilable(Symbol), default: nil + prop :alloc, T.nilable(Symbol), default: nil + prop :return_alloc, T.nilable(Symbol), default: nil + prop :val_alloc, T.nilable(Symbol), default: nil + prop :key_alloc, T.nilable(Symbol), default: nil + prop :shard_alloc, T.nilable(Symbol), default: nil + prop :sharded_alloc, T.nilable(Symbol), default: nil + prop :borrows, T.nilable(T.any(Symbol, T::Array[T.untyped])), + default: nil + prop :reject_when, T.nilable(Symbol), default: nil + prop :bc_op, T.nilable(Symbol), default: nil + prop :error_kind, T.nilable(Symbol), default: nil + prop :error_type, T.nilable(Symbol), default: nil + prop :registry, T.nilable(Symbol), default: nil + + # elem: transient element-type-name hint (merged at lowering, e.g. + # pool_get_def). fallible_clauses: internal with-block clause + # structure injected at lowering (not authoring DSL). + prop :elem, T.nilable(String), default: nil + prop :fallible_clauses, T.untyped, default: nil + + # --- Strings --- + prop :lifetime, T.nilable(String), default: nil + prop :reject_error, T.nilable(String), default: nil + + # --- Arg-shape (element typing deferred; union keeps it bounded) --- + prop :arity, T.nilable(Integer), default: nil + prop :takes_args, T.nilable(T::Array[Integer]), default: nil + prop :value_transforms, + T.nilable(T::Array[Symbol]), default: nil + prop :shard_direct_value_transforms, + T.nilable(T::Array[Symbol]), default: nil + + # --- Procs (varying arity by role) --- + prop :label, T.nilable(Proc), default: nil + + # --- Recursive sub-descriptors --- + prop :eql, T.nilable(IntrinsicEmit), default: nil + prop :strcmp, T.nilable(IntrinsicEmit), default: nil + prop :cleanup, T.nilable(IntrinsicEmit), default: nil + prop :assert, T.nilable(IntrinsicEmit), default: nil + prop :array, T.nilable(IntrinsicEmit), default: nil + prop :list, T.nilable(IntrinsicEmit), default: nil + prop :pool, T.nilable(IntrinsicEmit), default: nil + prop :set, T.nilable(IntrinsicEmit), default: nil + prop :get, T.nilable(IntrinsicEmit), default: nil + prop :string_raw, T.nilable(IntrinsicEmit), default: nil + prop :string_symbol, T.nilable(IntrinsicEmit), default: nil + prop :string_map, T.nilable(IntrinsicEmit), default: nil + prop :numeric_map, T.nilable(IntrinsicEmit), default: nil + prop :set_collection, T.nilable(IntrinsicEmit), default: nil +end diff --git a/src/annotator-helpers/intrinsic_registry.rb b/src/annotator-helpers/intrinsic_registry.rb new file mode 100644 index 000000000..9294d2332 --- /dev/null +++ b/src/annotator-helpers/intrinsic_registry.rb @@ -0,0 +1,192 @@ +# typed: false +# Startup converter: std_lib registry Hash entry -> FunctionSignature +# (+ typed IntrinsicEmit). The Hash literals stay the authoring DSL; +# this builds the typed objects consumers will read. Inert until +# consumers are migrated (EPIC #65, per-registry slices). +require_relative "function_signature" +require_relative "intrinsic_emit" + +module IntrinsicRegistry + module_function + + # Keys consumed at the FunctionSignature level (not IntrinsicEmit). + FS_KEYS = %i[args arity validate return return_type can_fail needs_rt].freeze + + EMIT_BOOL = %i[bc is_method suspends narrows_collection mutates_receiver + allocates takes_value container_borrow].freeze + EMIT_STRSYM = %i[zig numeric_zig sharded_zig shard_direct_zig].freeze + EMIT_STR = %i[lifetime reject_error fsm_finish_value elem].freeze + EMIT_SYM = %i[tag builtin alloc return_alloc val_alloc key_alloc + shard_alloc sharded_alloc reject_when bc_op + error_kind error_type].freeze + # Passthrough (no coercion): borrows (:all|Array), fallible_clauses + # (internal), fsm_* (FsmOps op-object arrays, not strings). + EMIT_PASS = %i[borrows fallible_clauses fsm_setup fsm_state_decls + fsm_finish_block fsm_state_finalize].freeze + EMIT_SYMARR = %i[value_transforms shard_direct_value_transforms].freeze + EMIT_INTARR = %i[takes_args].freeze + EMIT_PROC = %i[label].freeze + EMIT_NESTED = %i[eql strcmp cleanup assert array list pool set get + string_raw string_symbol string_map numeric_map + set_collection].freeze + + # registries: { Symbol => the registry Hash } (for {registry: X} ptrs) + def build_emit(h, registries) + return nil unless h.is_a?(Hash) + e = IntrinsicEmit.new + h.each do |k, v| + next if FS_KEYS.include?(k) + next if v.nil? + case k + when *EMIT_BOOL then e.public_send("#{k}=", !!v) + when *EMIT_STRSYM then e.public_send("#{k}=", v) + when *EMIT_STR then e.public_send("#{k}=", v.to_s) + when *EMIT_SYM then e.public_send("#{k}=", v.to_sym) + when *EMIT_PASS then e.public_send("#{k}=", v) + when *EMIT_SYMARR then e.public_send("#{k}=", Array(v).map(&:to_sym)) + when *EMIT_INTARR then e.public_send("#{k}=", Array(v).map(&:to_i)) + when *EMIT_PROC then e.public_send("#{k}=", v) + when *EMIT_NESTED + e.public_send("#{k}=", nested_emit(v, registries)) + else + raise "IntrinsicRegistry: unmapped registry key #{k.inspect}" + end + end + e + end + + # A nested sub-descriptor is either another emit Hash or a + # {registry: } pointer (resolved to that registry's name). + def nested_emit(v, registries) + return nil unless v.is_a?(Hash) + if (ptr = v[:registry]) + name = registries.find { |_, r| r.equal?(ptr) }&.first + return IntrinsicEmit.new(registry: name || :unknown) + end + name = registries.find { |_, r| r.equal?(v) }&.first + return IntrinsicEmit.new(registry: name) if name + + build_emit(v, registries) + end + + # Best-effort STATIC view of the return, derived from the typed + # FunctionReturn (single source of truth). Fixed -> its concrete + # Type; receiver-parametric / host-inferred -> polymorphic + # placeholder (the real resolution is consumer-side via + # return_def.resolve, gated by fixed_return?). + def to_return_type(rdef) + if rdef.kind == FunctionReturn::Kind::Fixed + rdef.fixed || Type.new(:Void) + else + Type.new(:Any) + end + end + + # Declarative receiver-parametric return directives (replace the old + # `return_type: ->(recv){...}` Procs). Mapped to FunctionReturn + # variants whose Type is computed from the receiver at resolve time. + RETURN_VARIANTS = { + r_element_of: :ElementOf, + r_optional_element: :OptionalOfElement, + r_id_element: :IdOfElement, + r_optional_value: :OptionalOfValue, + r_value_list: :ValueList, + r_key_list: :KeyList + }.freeze + + # Registry return descriptor -> FunctionReturn (strongly typed, + # non-nil). No Proc, no Hash, no bare nil escape: every form maps to + # Fixed(Type) | a receiver-parametric variant | Infer(host method). + def to_return_def(v) + return FunctionReturn.fixed(Type.new(:Void)) if v.nil? + return FunctionReturn.fixed(v) if v.is_a?(Type) + if v.is_a?(Hash) + return FunctionReturn.fixed( + v[:type] ? Type.new(v[:type], sync: v[:sync], ownership: v[:ownership]) + : Type.new(:Any) + ) + end + if v.is_a?(Proc) + raise "IntrinsicRegistry: Proc return descriptor is not allowed; " \ + "use a declarative directive (r_* variant or infer_* host method)" + end + if (kind = RETURN_VARIANTS[v]) + return FunctionReturn.variant(kind) + end + + s = v.to_s + return FunctionReturn.infer(v.to_sym) if s.start_with?("infer_", "macro_") + + FunctionReturn.fixed(Type.new(v)) + end + + def convert_entry(_name, h, registries) + ret = h.key?(:return_type) ? h[:return_type] : h[:return] + rdef = to_return_def(ret) + fs = FunctionSignature.new( + params: [], + return_type: to_return_type(rdef), + intrinsic: true + ) + fs.return_def = rdef + fs.arg_validator = h[:validate] if h[:validate].is_a?(Proc) + fs.arg_spec = h[:args] + fs.arity = h[:arity] + fs.can_fail = h[:can_fail] + fs.needs_rt = h[:needs_rt] + fs.emit = build_emit(h, registries) + fs + end + + # registries: { Symbol => Hash } + def convert_registry(reg, registries) + reg.each_with_object({}) do |(name, entry), out| + out[name] = convert_entry(name, entry, registries) if entry.is_a?(Hash) + end + end + + # Startup conversion (memoized, built once per registry on first + # access — the registries are frozen constants). The typed view of + # a whole registry: name -> FunctionSignature, or + # Array[FunctionSignature] for overload sets (e.g. + # STD_LIB["charAt"]). Consumers read THIS, never the raw Hash. + def sigs(reg) + (@sigs ||= {})[reg.object_id] ||= + reg.each_with_object({}) do |(name, entry), out| + out[name] = + if entry.is_a?(Array) + entry.map { |e| convert_entry(name, e, registries) } + elsif entry.is_a?(Hash) + convert_entry(name, entry, registries) + end + end + end + + # Typed lookup into a registry: reg[name] as FunctionSignature + # (or Array[FS] for overloads, or nil if absent). + def sig(reg, name) + sigs(reg)[name] + end + + # Memoized registry map (built lazily from the std_lib constants so + # there is no load-order coupling). Used by `fs` so call sites need + # not thread the map. + def registries + @registries ||= %i[STD_LIB POOL_METHODS SET_METHODS MAP_METHODS + INDEX_OPS BUILTIN_OPS].each_with_object({}) do |c, h| + h[c] = Object.const_get(c) if Object.const_defined?(c) + end + end + + # Idempotent normalizer for the flag-day migration: returns a + # FunctionSignature for a registry/ad-hoc entry Hash, passes a + # FunctionSignature through unchanged, and maps nil -> nil. Every + # `*.stdlib_def = X` / `matched_stdlib_def = X` site routes through + # this so the carried value is always a FunctionSignature. + def fs(x, name = "_inline") + return nil if x.nil? + return x if x.is_a?(FunctionSignature) + + convert_entry(name, x, registries) if x.is_a?(Hash) + end +end diff --git a/src/annotator-helpers/method_analysis.rb b/src/annotator-helpers/method_analysis.rb index 1fccedafa..26ae1e84d 100644 --- a/src/annotator-helpers/method_analysis.rb +++ b/src/annotator-helpers/method_analysis.rb @@ -28,10 +28,10 @@ def resolve_collection_method(node) # # @param matched_def [Hash] the STD_LIB definition that matched # @param args [Array] the resolved argument nodes - sig { params(matched_def: T::Hash[Symbol, T.untyped], args: T::Array[T.untyped]).returns(T.nilable(Type)) } + sig { params(matched_def: FunctionSignature, args: T::Array[T.untyped]).returns(T.nilable(Type)) } def narrow_collection_type!(matched_def, args) T.bind(self, SemanticAnnotator) rescue nil - return unless matched_def[:narrows_collection] && args.size >= 2 + return unless matched_def.emit&.narrows_collection && args.size >= 2 list_arg = args[0] val_arg = args[1] @@ -57,7 +57,7 @@ def narrow_collection_type!(matched_def, args) sig { params(node: AST::MethodCall, obj_type: Type, registry: T::Hash[String, T::Hash[Symbol, T.untyped]], tag_field: Symbol, type_label: String).returns(T.nilable(T::Boolean)) } def resolve_typed_method(node, obj_type, registry, tag_field, type_label) T.bind(self, SemanticAnnotator) rescue nil - defn = registry[node.name] + defn = IntrinsicRegistry.sig(registry, node.name) unless defn available = registry.keys.join(", ") emit_typo_suggestion!( @@ -70,52 +70,57 @@ def resolve_typed_method(node, obj_type, registry, tag_field, type_label) end # Arity check - if defn[:arity] >= 0 && node.args.length != defn[:arity] - if defn[:arity] == 0 + if defn.arity && defn.arity >= 0 && node.args.length != defn.arity + if defn.arity == 0 error!(node, :STDLIB_METHOD_NO_ARGS, label: type_label, method: node.name, got: node.args.length) else - error!(node, :STDLIB_METHOD_ARITY, label: type_label, method: node.name, expected: defn[:arity], got: node.args.length) + error!(node, :STDLIB_METHOD_ARITY, label: type_label, method: node.name, expected: defn.arity, got: node.args.length) end return true end # Type validation (optional) - if defn[:validate] - defn[:validate].call(node, node.args, obj_type, method(:error!)) + if defn.arg_validator + defn.arg_validator.call(node, node.args, obj_type, method(:error!)) end # Set tag and return type node.send(:"#{tag_field}=", node.name.to_sym) - node.full_type = defn[:return_type].call(obj_type) + node.full_type = defn.return_def.resolve(obj_type, [], self) # Resolve zig pattern -- pick variant based on receiver type. # Sharded takes priority over numeric: PartitionedNumericMap shares the # sharded API (count/keys/values/put/get) with PartitionedStringMap. - zig = if (obj_type.sharded? || obj_type.striped?) && defn[:sharded_zig] - defn[:sharded_zig] - elsif obj_type.numeric_map? && !obj_type.sharded? && !obj_type.striped? && defn[:numeric_zig] - defn[:numeric_zig] + em = defn.emit + zig = if (obj_type.sharded? || obj_type.striped?) && em&.sharded_zig + em.sharded_zig + elsif obj_type.numeric_map? && !obj_type.sharded? && !obj_type.striped? && em&.numeric_zig + em.numeric_zig else - defn[:zig] + em&.zig end # Resolve alloc variant for sharded types - alloc = if (obj_type.sharded? || obj_type.striped?) && defn[:sharded_alloc] - defn[:sharded_alloc] + alloc = if (obj_type.sharded? || obj_type.striped?) && em&.sharded_alloc + em.sharded_alloc else - defn[:alloc] + em&.alloc end - # Set zig_pattern and matched_stdlib_def so lower_intrinsic handles emission + # Set zig_pattern and matched_stdlib_def so lower_intrinsic handles + # emission. Override the zig/alloc on a dup'd FS (+ its emit) so + # the shared registry FS is never mutated. if zig - resolved_defn = defn.merge(zig: zig) - resolved_defn = resolved_defn.merge(alloc: alloc) if alloc + resolved_defn = defn.dup + resolved_defn.emit = (resolved_defn.emit ? resolved_defn.emit.dup : IntrinsicEmit.new) + resolved_defn.emit.zig = zig + resolved_defn.emit.alloc = alloc if alloc node.zig_pattern = zig node.matched_stdlib_def = resolved_defn end - node.stdlib_allocates = true if defn[:allocates] - node.mutates_receiver = true if defn[:mutates_receiver] + node.stdlib_allocates = true if em&.allocates + node.mutates_receiver = true if em&.mutates_receiver # Narrow Set element type on first insert (Any[] -> T[]) if tag_field == :set_method && node.name == "insert" && obj_type.element_type&.resolved == :Any && node.args.length == 1 @@ -132,8 +137,8 @@ def resolve_typed_method(node, obj_type, registry, tag_field, type_label) end # Ownership: mark TAKES args as moved (same as function_analysis.rb line 305-310) - if defn[:takes_args] - defn[:takes_args].each do |arg_idx| + if defn.emit&.takes_args + defn.emit.takes_args.each do |arg_idx| arg_node = node.args[arg_idx] next unless arg_node if arg_node.is_a?(AST::Identifier) @@ -144,12 +149,12 @@ def resolve_typed_method(node, obj_type, registry, tag_field, type_label) end # Methods that allocate on the heap -- record so needs_rt is computed correctly. - if defn[:allocates] && current_fn_ctx + if defn.emit&.allocates && current_fn_ctx current_fn_ctx.heap_count += 1 end - node.can_fail = true if defn[:can_fail] || defn[:allocates] - node.error_kind = defn[:error_kind] if defn[:error_kind] - node.error_type = defn[:error_type] if defn[:error_type] + node.can_fail = true if defn.can_fail || defn.emit&.allocates + node.error_kind = defn.emit&.error_kind if defn.emit&.error_kind + node.error_type = defn.emit&.error_type if defn.emit&.error_type true end diff --git a/src/annotator-helpers/pipe_analysis.rb b/src/annotator-helpers/pipe_analysis.rb index 40aabf695..cd51d5fea 100644 --- a/src/annotator-helpers/pipe_analysis.rb +++ b/src/annotator-helpers/pipe_analysis.rb @@ -220,6 +220,13 @@ def analyze_higher_order_op(node) when AST::CollectOp analyze_collect_op(node) end + + # Every analyze_* above stamps node.full_type with the pipeline's + # type AFTER this op. The op node itself evaluates to exactly that + # (a transform op -> the post-op stream type; a terminal -> its + # result / Void). Stamp it so no pipeline op reaches MIR untyped. + # Sole owner of a pipeline op node's type — assign unconditionally. + node.right.full_type = node.full_type end # COLLECT: pipe-terminal that joins a `~T@observable` and returns @@ -300,11 +307,11 @@ def analyze_select_family_op(node) when AST::IndexOp # INDEX returns HashMap key_type = node.right.expression.resolved_type - node.full_type = :"HashMap<#{item_type}[]>" + node.full_type = Type.new(:"HashMap<#{item_type}[]>") node.right.full_type = key_type when AST::OrderByOp # ORDER_BY returns the same list type, sorted - node.full_type = :"#{item_type}[]" + node.full_type = Type.new(:"#{item_type}[]") node.right.full_type = node.right.expression.resolved_type end @@ -365,7 +372,7 @@ def analyze_window_op(node) # Result is a list of whatever the expression produces expr_type = node.right.expression.full_type || node.right.expression.resolved_type - node.full_type = :"#{expr_type}[]" + node.full_type = Type.new(:"#{expr_type}[]") node.storage = :frame current_fn_ctx.frame_count += 1 if current_fn_ctx end @@ -442,7 +449,7 @@ def analyze_batch_window_op(node) end expr_type = bw.expression.full_type || bw.expression.resolved_type - node.full_type = :"#{expr_type}[]" + node.full_type = Type.new(:"#{expr_type}[]") node.storage = :heap current_fn_ctx.frame_count += 1 if current_fn_ctx end @@ -477,6 +484,10 @@ def analyze_join_op(node) unless key_expr.body.resolved_type == :Bool error!(key_expr, :JOIN_LAMBDA_NEEDS_BOOL, got: key_expr.body.resolved_type) end + # The JOIN key lambda IS a predicate ((left,right)->Bool). Type + # the LambdaLit via the standard lambda-signature builder (same + # as visit_LambdaLit) — its return is the Bool just validated. + key_expr.full_type = build_lambda_signature(params, :Bool) else # Shared key form: _.field applied to both sides # Validate the key expression with _ as left type @@ -500,7 +511,7 @@ def analyze_join_op(node) })) end - node.full_type = :"#{join_type_name}[]" + node.full_type = Type.new(:"#{join_type_name}[]") node.storage = :frame current_fn_ctx.frame_count += 1 if current_fn_ctx end @@ -608,7 +619,7 @@ def analyze_limit_op(node) end # Result type is a materialized list of the element type - node.full_type = :"#{item_type}[]" + node.full_type = Type.new(:"#{item_type}[]") node.storage = :frame end @@ -642,7 +653,7 @@ def analyze_unnest_op(node) # Result type is the element type of the nested array nested_element_type = T.must(expr_type.element_type).resolved - node.full_type = :"#{nested_element_type}[]" + node.full_type = Type.new(:"#{nested_element_type}[]") node.right.full_type = node.right.expression.full_type node.storage = :frame @@ -756,7 +767,7 @@ def analyze_pipe_to_named_function(node, sig, func_name) T.bind(self, SemanticAnnotator) rescue nil # 1. Validate Arity: Must accept exactly 1 argument (the pipe input) params = sig.params - min_args = params.count { |p| p[:required] } + min_args = params.count { |p| p.required } max_args = params.size if min_args < 1 || max_args > 1 @@ -770,12 +781,12 @@ def analyze_pipe_to_named_function(node, sig, func_name) # 2. Validate Type: The Input must match Parameter 1 if max_args >= 1 param = params[0] - expected = param[:type] + expected = param.type actual = node.left.resolved_type # Type.accepts? handles slice coercion (Number[3] -> Number[]) unless is_safe_autocast?(actual, expected) - error!(node.left, :ARGUMENT_TYPE_ERROR, fn: "Pipe Input '#{param[:name]}'", index: 1, expected: expected, got: actual) + error!(node.left, :ARGUMENT_TYPE_ERROR, fn: "Pipe Input '#{param.name}'", index: 1, expected: expected, got: actual) end end @@ -908,7 +919,7 @@ def analyze_find_op(node) error!(node.right, :PIPE_CLAUSE_NEEDS_BOOL, clause: "FIND", got: node.right.expression.resolved_type) end - node.full_type = :"?#{item_type}" + node.full_type = Type.new(:"?#{item_type}") node.storage = :stack mark_observable_terminal!(node, terminal: :find, raw: :"~?#{item_type}") end @@ -1484,6 +1495,22 @@ def analyze_concurrent_op(node) end end + # CONCURRENT option values that are bare identifiers (size: MICRO, + # etc.) are compile-time keyword selectors consumed structurally + # via .name — never runtime values. Same compile-time-marker + # category as a type-name ident; stamp the codebase's :Type marker + # (not a guess: it is not an evaluatable value). + options.each_value do |v| + if v.is_a?(AST::Identifier) + # Keyword selector (MICRO/STANDARD/...): compile-time marker. + v.full_type = Type.new(:Type) + else + # A real value option (workers: 2, parallel: TRUE) — annotate + # it normally so it gets its true type. + visit(v) + end + end + # Validate that only known option keys are used options.each_key do |key| unless VALID_CONCURRENT_OPTIONS.include?(key) @@ -1595,6 +1622,15 @@ def analyze_concurrent_op(node) node.full_type = proxy.full_type node.storage = (node.full_type == :Void) ? :stack : :heap end + + # CONCURRENT wraps an inner op (conc.op) and analyzes it through a + # throwaway proxy SMOOTH, bypassing analyze_higher_order_op's + # op-stamp. The wrapped op evaluates to the pipeline's post-op + # type just computed here — stamp it (and its WHERE/SELECT + # expression sub-node) so no wrapped op reaches MIR untyped. + inner = conc.op + inner.full_type = node.full_type || proxy.full_type + nil # sig: returns(T.nilable(Symbol)) — don't leak the Type assignment end sig { params(node: AST::BinaryOp).returns(T.nilable(Integer)) } @@ -1620,12 +1656,12 @@ def analyze_concurrent_bounded_select_family_op(node) error!(node.right.op, :WHERE_NEEDS_BOOL) end - node.full_type = case node.right.op + node.full_type = Type.new(case node.right.op when AST::SelectOp :"#{node.right.op.expression.full_type}[]" when AST::WhereOp :"#{item_type}[]" - end + end) node.storage = :heap current_fn_ctx.frame_count += 1 if current_fn_ctx end @@ -1683,10 +1719,10 @@ def analyze_concurrent_stream_select_family_op(node) error!(node.right.op, :WHERE_NEEDS_BOOL) end - node.full_type = case node.right.op + node.full_type = Type.new(case node.right.op when AST::SelectOp then :"#{node.right.op.expression.full_type}[]" when AST::WhereOp then :"#{item_type}[]" - end + end) node.storage = :heap current_fn_ctx.frame_count += 1 if current_fn_ctx end diff --git a/src/annotator-helpers/reentrance.rb b/src/annotator-helpers/reentrance.rb index a4266991f..60d30f30c 100644 --- a/src/annotator-helpers/reentrance.rb +++ b/src/annotator-helpers/reentrance.rb @@ -688,9 +688,9 @@ def offer_unconstrained_fn_param_fix!(fn_node) existing = fn_node.requires_clauses || {} candidates = (fn_node.params || []).filter_map do |p| - name = p[:name] + name = p.name next nil if existing.key?(name) - type = p[:type] + type = p.type next nil unless type.respond_to?(:fn_type?) && type.fn_type? raw = type.respond_to?(:raw) ? type.raw : nil next nil if raw.is_a?(FunctionSignature) && raw.reentrant == true @@ -733,7 +733,7 @@ def validate_requires_clauses!(fn_node) return if fn_node.requires_clauses.nil? || fn_node.requires_clauses.empty? # Params come from the parser as hashes ({ name:, type:, default:, ... }). # See pre_register_function in annotator.rb. - param_names = (fn_node.params || []).map { |p| p[:name] }.compact.to_set + param_names = (fn_node.params || []).map { |p| p.name }.compact.to_set fn_node.requires_clauses.each do |bound_name, _kind| next if param_names.include?(bound_name) error!(fn_node, :REQUIRES_NON_REENTRANT_NOT_PARAM, fn: fn_node.name, name: bound_name) diff --git a/src/annotator-helpers/union.rb b/src/annotator-helpers/union.rb index 69e67e4c4..987f4b023 100644 --- a/src/annotator-helpers/union.rb +++ b/src/annotator-helpers/union.rb @@ -36,7 +36,7 @@ def validate_union_methods!(node) if req[:body] # No concrete override — synthesize a top-level function from the default body. fn_params = req[:params].map { |rp| - { name: rp[:name], type: rp[:type], default: nil, mutable: false, takes: false } + AST::Param.new(name: rp[:name], type: rp[:type], default: nil, mutable: false, takes: false) } fn_node = AST::FunctionDef.new( req[:token], req[:name], fn_params, [], req[:return_type], @@ -81,7 +81,7 @@ def validate_union_methods!(node) # Return type check if req[:return_type] req_ret = to_type(req[:return_type]).resolved - sig_ret = to_type(sig.return_type).resolved + sig_ret = sig.return_type.resolved unless req_ret == sig_ret || req_ret == :Any || sig_ret == :Any error!(req_tok, :UNION_METHOD_RETURN_TYPE, union: union_name, method: fn_name, expected: req_ret, fn: fn_name, got: sig_ret) end diff --git a/src/annotator-helpers/with_match_check.rb b/src/annotator-helpers/with_match_check.rb index 78e744ab5..126676677 100644 --- a/src/annotator-helpers/with_match_check.rb +++ b/src/annotator-helpers/with_match_check.rb @@ -52,7 +52,7 @@ def self.poly_requires?(family_set) def self.check_function!(fn, error_handler, warn_handler: nil, policy_handlers: nil) return unless fn.respond_to?(:body) && fn.body requires_map = (fn.respond_to?(:requires) ? fn.requires : nil) || {} - param_names = fn.params.map { |p| p[:name].to_s }.to_set + param_names = fn.params.map { |p| p.name.to_s }.to_set AST.walk_body(fn.body) do |node| next unless node.is_a?(AST::WithBlock) @@ -231,7 +231,7 @@ def self.check_call_sites!(fn, sig_lookup, error_handler) sig = sig_lookup.call(call_node.name.to_s) next unless sig.is_a?(FunctionSignature) && sig.requires sig.params.each_with_index do |param, idx| - pname = (param[:name] || param["name"]).to_s + pname = param.name.to_s fams = sig.requires[pname] next unless fams && fams.empty? arg = (call_node.args || [])[idx] @@ -252,7 +252,7 @@ def self.check_call_sites!(fn, sig_lookup, error_handler) next unless sig.requires && !sig.requires.empty? sig.params.each_with_index do |param, idx| - param_name = param[:name].to_s + param_name = param.name.to_s disjunction = sig.requires[param_name] next unless disjunction && !disjunction.empty? diff --git a/src/annotator.rb b/src/annotator.rb index a83cf985b..b2a394799 100644 --- a/src/annotator.rb +++ b/src/annotator.rb @@ -262,7 +262,7 @@ def program_has_auto?(node) return true if node.respond_to?(:type) && node.type.is_a?(Type) && node.type.auto? return true if node.respond_to?(:return_type) && node.return_type.is_a?(Type) && node.return_type.auto? if node.respond_to?(:params) && node.params.is_a?(Array) - return true if node.params.any? { |p| p[:type].is_a?(Type) && p[:type].auto? } + return true if node.params.any? { |p| p.type&.auto? } end if node.respond_to?(:each_pair) return node.each_pair.any? { |_, v| program_has_auto?(v) } @@ -541,13 +541,13 @@ def visit_RequireNode(node) def visit_ExternFnDecl(node) signature = FunctionSignature.new( params: node.params.map { |p| { - name: p[:name], - type: p[:type], - required: p[:default].nil?, - mutable: p[:mutable] || false, - comptime: p[:comptime] || false + name: p.name, + type: p.type, + required: p.default.nil?, + mutable: p.mutable || false, + comptime: p.comptime || false }}, - return_type: node.return_type || :Any, + return_type: node.return_type || Type.new(:Any), visibility: :pub, extern: true, module_alias: node.from_module, @@ -598,15 +598,15 @@ def visit_ExternStructDecl(node) def pre_register_function(node) signature = FunctionSignature.new( params: node.params.map { |p| { - name: p[:name], - type: p[:type], - required: p[:default].nil?, - default: p[:default], - mutable: p[:mutable], - takes: p[:takes] || false, - sync: (p[:type].is_a?(Type) && p[:type].any_sync?) ? p[:type].sync : nil + name: p.name, + type: p.type, + required: p.default.nil?, + default: p.default, + mutable: p.mutable, + takes: p.takes || false, + sync: p.type&.any_sync? ? p.type.sync : nil }}, - return_type: (node.return_type || :Any), + return_type: node.return_type || Type.new(:Any), return_lifetime: get_lifetime_path(node), visibility: node.visibility, reentrant: node.reentrant == :reentrant @@ -655,12 +655,12 @@ def visit_FunctionDef(node) lifetime_paths = get_lifetime_paths(node) fn_type_params = (node.type_params || []).map(&:to_sym) @function_context_stack.push(FunctionContext.new( - name: node.name, return_type: declared_return, + name: node.name, return_type: node.return_type || Type.new(:Any), lifetime: lifetime_paths, type_params: fn_type_params )) # 2. Validation & Lifetime - has_mutable_param = node.params.any? { |p| p[:mutable] } + has_mutable_param = node.params.any? { |p| p.mutable } if has_mutable_param && !node.name.end_with?("!") emit_style_mutable_param_needs_bang!(node) end @@ -670,17 +670,17 @@ def visit_FunctionDef(node) validate_type_param_list!(node, node.type_params, "function") if fn_type_params.any? # Make type params visible during type annotation validation - node.params.each { |p| validate_type_annotation!(node, p[:type], is_param: true) if p[:type].is_a?(Type) } - validate_type_annotation!(node, node.return_type) if node.return_type.is_a?(Type) + node.params.each { |p| validate_type_annotation!(node, p.type, is_param: true) if p.type } + validate_type_annotation!(node, node.return_type) if node.return_type # 3. Pre-declaration (so the function can be recursive) signature = FunctionSignature.new( params: node.params.map { |p| { - name: p[:name], type: p[:type], required: p[:default].nil?, - default: p[:default], mutable: p[:mutable], takes: p[:takes], - sync: (p[:type].is_a?(Type) && p[:type].any_sync?) ? p[:type].sync : nil + name: p.name, type: p.type, required: p.default.nil?, + default: p.default, mutable: p.mutable, takes: p.takes, + sync: p.type&.any_sync? ? p.type.sync : nil }}, - return_type: declared_return, return_lifetime: lifetime_paths, + return_type: node.return_type || Type.new(:Any), return_lifetime: lifetime_paths, visibility: node.visibility, type_params: fn_type_params.any? ? fn_type_params : nil, reentrant: node.reentrant == :reentrant @@ -811,7 +811,7 @@ def visit_FunctionDef(node) # CATCH wrappers heap-dupe all string returns (both success and catch paths). # A fallible String return is an error union, so unwrap the payload # before classifying string-return ownership. - ret_type = node.return_type.is_a?(Type) ? node.return_type : Type.new(node.return_type || :Void) + ret_type = node.return_type || Type.new(:Void) bare_ret = if ret_type.respond_to?(:error_union?) && ret_type.error_union? && ret_type.respond_to?(:payload_type) ret_type.payload_type || ret_type @@ -950,7 +950,7 @@ def collapse_errors_for_call(sig, args) require_relative 'annotator-helpers/with_match_check' unless defined?(WithMatchCheck) collapsed = Set.new sig.requires.each do |param_name, _families| - idx = sig.params.find_index { |p| p[:name].to_s == param_name } + idx = sig.params.find_index { |p| p.name.to_s == param_name } next unless idx arg = args[idx] next unless arg @@ -1114,6 +1114,18 @@ def visit_StructDef(node) field_defaults = node.fields.each_with_object({}) { |(k, f), h| h[k] = f[:default] if f[:default] } schema[:field_defaults] = field_defaults unless field_defaults.empty? + # A field default's type IS the field's declared type (it must be + # assignable to it) — not a guess. These default nodes (Literal / + # DefaultLit) are schema metadata never walked by the expression + # visitor, so type them here. + node.fields.each do |_, f| + d = f[:default] + next unless d + # In field-default position the value's type IS the field type; + # assign unconditionally (overrides a Literal's derived kind). + d.full_type = f[:type].is_a?(Type) ? f[:type] : Type.new(f[:type] || :Any) + end + # Track which fields are BORROWED (references, not owned). borrowed_fields = node.fields.select { |_, f| f[:borrowed] }.keys schema[:borrowed_fields] = borrowed_fields.to_set if borrowed_fields.any? @@ -1322,7 +1334,9 @@ def visit_IfBind(node) # Declare each binding in the then-scope with the unwrapped type. node.bindings.each do |b| unwrapped = b[:unwrapped_type] - sym = unwrapped.is_a?(Type) ? unwrapped.resolved : unwrapped + # Sole producer sets this from ti.wrapped_type (Type|nil; + # never a Symbol) — see the binding-annotation loop above. + sym = unwrapped&.resolved current_scope.declare(b[:name], nil, unwrapped, false, false, nil, :stack) entry = current_scope.locals[b[:name]] b[:symbol] = entry @@ -1407,6 +1421,12 @@ def annotate_struct_pattern!(match_node, pat) end end end + + # A destructuring pattern's type IS the subject it destructures + # (the MATCH expr) — not a guess. + pat.full_type = match_node.expr.full_type || + Type.new(match_node.expr.resolved_type || :Any) + nil # sig: returns(T.nilable(T::Array[...])) — don't leak the Type end sig { params(node: AST::PassStmt).returns(Symbol) } @@ -1530,8 +1550,17 @@ def visit_MatchStatement(node) # patterns (tag identifiers), not constructors — they don't need field values. @match_pattern_context = true visit(c[:value]) - # Multi-pattern arm: visit + type-check each extra pattern too. - c[:extra_values]&.each { |ev| visit(ev) } + # Multi-pattern arm: visit + type-check each extra pattern + # too. A `{ field }` destructure goes through the SAME + # handler as a single :struct_pattern arm so it is typed + # (and its binds declared), not just visited. + c[:extra_values]&.each do |ev| + if ev.is_a?(AST::StructPattern) + annotate_struct_pattern!(node, ev) + else + visit(ev) + end + end @match_pattern_context = false expr_t2 = Type.new(node.expr.resolved_type || :Any) # Type-check the head pattern, then each extra. Patterns share @@ -1630,6 +1659,12 @@ def visit_MatchStatement(node) # the SAME payload (same inline-struct fields and types) — the # destructured names are shared across all patterns' bodies. if c[:destructure] && is_union + # The destructure pattern's type IS the subject it + # destructures (the MATCH union expr) — same principle as + # annotate_struct_pattern!; not a guess. Binds are declared + # below; this only types the pattern node itself. + c[:destructure].full_type = + node.expr.full_type || Type.new(node.expr.resolved_type || :Any) variant_name = case c[:value] when AST::GetField then c[:value].field when AST::MethodCall then c[:value].name @@ -2198,7 +2233,7 @@ def visit_ReturnNode(node) # Promote non-identifier literals to heap when the expected return type requires it. unless node.value.is_a?(AST::Identifier) - expected_type = Type.new(expected) if expected + expected_type = expected if expected_type && (expected_type.heap? || expected_type.dynamic?) && node.value.respond_to?(:storage=) && node.value.type_info&.requires_move? @@ -2294,7 +2329,7 @@ def infer_implicit_type_params(fn_node) explicit = (fn_node.type_params || []).map(&:to_s) return explicit unless explicit.empty? inferred = [] - ([fn_node.return_type] + (fn_node.params || []).map { |p| p[:type] }).each do |type| + ([fn_node.return_type] + (fn_node.params || []).map { |p| p.type }).each do |type| collect_implicit_type_params(type, inferred, explicit) end (explicit + inferred).uniq @@ -2319,6 +2354,12 @@ def collect_implicit_type_params(type, out, explicit) def visit_StaticCall(node) node.args.each { |arg| visit(arg) } + # `File` in `File::open(...)` is a TYPE reference, not a runtime + # value. The codebase's established marker for a type-position + # identifier is :Type (cf. comptime type args in function_analysis) + # — not a guess. + node.type_name.full_type = Type.new(:Type) + type_name = node.type_name.name.to_sym schema = lookup_type_schema(type_name) @@ -2331,7 +2372,7 @@ def visit_StaticCall(node) end static_methods = schema[:static_methods] || {} - method_def = static_methods[node.method_name] + method_def = IntrinsicRegistry.sig(static_methods, node.method_name) unless method_def available = static_methods.keys.join(", ") @@ -2354,7 +2395,7 @@ def visit_StaticCall(node) end end - expected_args = method_def[:args] + expected_args = method_def.arg_spec if node.args.length != expected_args.length error!(node, :STATIC_ARITY, type: type_name, method: node.method_name, expected: expected_args.length, got: node.args.length) end @@ -2366,17 +2407,17 @@ def visit_StaticCall(node) end end - node.zig_pattern = method_def[:zig] - node.full_type = method_def[:return] + node.zig_pattern = method_def.emit&.zig + node.full_type = method_def.return_def.resolve(nil, node.args, self) node.matched_stdlib_def = method_def - node.stdlib_allocates = true if method_def[:allocates] - node.mutates_receiver = true if method_def[:mutates_receiver] - node.can_fail = true if method_def[:can_fail] - node.error_kind = method_def[:error_kind] if method_def[:error_kind] - node.error_type = method_def[:error_type] if method_def[:error_type] - current_fn_ctx.alloc_count += 1 if current_fn_ctx && (method_def[:allocates] || method_def[:can_fail]) - - if method_def[:mutates_receiver] && node.is_a?(AST::MethodCall) + node.stdlib_allocates = true if method_def.emit&.allocates + node.mutates_receiver = true if method_def.emit&.mutates_receiver + node.can_fail = true if method_def.can_fail + node.error_kind = method_def.emit&.error_kind if method_def.emit&.error_kind + node.error_type = method_def.emit&.error_type if method_def.emit&.error_type + current_fn_ctx.alloc_count += 1 if current_fn_ctx && (method_def.emit&.allocates || method_def.can_fail) + + if method_def.emit&.mutates_receiver && node.is_a?(AST::MethodCall) root = chain_root_name(node.object) mark_var_mutated(root) if root end @@ -2458,7 +2499,7 @@ def visit_MethodCall(node) node.extern_call = true node.extern_effects = method_sig.extern_effects if method_sig.extern_effects node.instance_variable_set(:@extern_method, true) - node.full_type = method_sig.return_type || :Void + node.full_type = method_sig.return_type record_effect(EffectTracker::EXTERN) # Track allocator usage for EFFECTS :alloc methods. alloc_kind = method_sig.extern_effects&.dig(:alloc) @@ -2547,8 +2588,8 @@ def visit_IntrinsicFunc(node, args) # `u32_val.negative?()` where Int64 autocast would otherwise mask # the bug. Generic — keyed by symbol so std_lib.rb stays # declarative and annotator.rb has no per-function logic. - if matched_def[:reject_when] && reject_arg_type_matches?(args.first, matched_def[:reject_when]) - reason = matched_def[:reject_error] || + if matched_def.emit&.reject_when && reject_arg_type_matches?(args.first, matched_def.emit.reject_when) + reason = matched_def.emit&.reject_error || "#{node.name}() is not valid for #{args.first.resolved_type}" error!(node, :INTRINSIC_REJECTED, message: reason) return @@ -2557,34 +2598,24 @@ def visit_IntrinsicFunc(node, args) # 3. Resolve return type (may be dynamic via method call). # Dynamic resolver methods are named `infer_*` to avoid collisions with # Ruby Kernel conversion methods (Integer, String, Array, etc.). - ret = matched_def[:return] - if ret.is_a?(Hash) && ret[:type] - # Structured return: { type: :String, sync: :raw } etc. — preserves capabilities. - node.full_type = Type.new(ret[:type], sync: ret[:sync], ownership: ret[:ownership]) - elsif ret.is_a?(Symbol) && ret.to_s.start_with?("infer_") && respond_to?(ret, true) - node.full_type = send(ret, args, node) - elsif ret.respond_to?(:call) - node.full_type = ret.call(args.map(&:resolved_type), node) - else - node.full_type = ret - end + node.full_type = matched_def.return_def.resolve(nil, args, self) # 4. Store Zig pattern and stdlib metadata for transpiler - node.zig_pattern = matched_def[:zig] + node.zig_pattern = matched_def.emit&.zig node.matched_stdlib_def = matched_def - node.stdlib_allocates = true if matched_def[:allocates] - node.mutates_receiver = true if matched_def[:mutates_receiver] - node.can_fail = true if matched_def[:can_fail] || matched_def[:allocates] - node.error_kind = matched_def[:error_kind] if matched_def[:error_kind] - node.error_type = matched_def[:error_type] if matched_def[:error_type] - current_fn_ctx.alloc_count += 1 if current_fn_ctx && (matched_def[:allocates] || matched_def[:can_fail] || matched_def[:needs_rt]) - record_effect(EffectTracker::SUSPENDS) if matched_def[:suspends] + node.stdlib_allocates = true if matched_def.emit&.allocates + node.mutates_receiver = true if matched_def.emit&.mutates_receiver + node.can_fail = true if matched_def.can_fail || matched_def.emit&.allocates + node.error_kind = matched_def.emit&.error_kind if matched_def.emit&.error_kind + node.error_type = matched_def.emit&.error_type if matched_def.emit&.error_type + current_fn_ctx.alloc_count += 1 if current_fn_ctx && (matched_def.emit&.allocates || matched_def.can_fail || matched_def.needs_rt) + record_effect(EffectTracker::SUSPENDS) if matched_def.emit&.suspends # 5. Flag mutable access through list indexing. # When a mutating intrinsic (e.g., append, remove) is called on a receiver # that chains through a GetIndex, the GetIndex must emit pointer access # instead of by-value getAt(). - if matched_def[:mutates_receiver] && node.is_a?(AST::MethodCall) + if matched_def.emit&.mutates_receiver && node.is_a?(AST::MethodCall) mark_chain_needs_mut_ref!(node.object) root = chain_root_name(node.object) mark_var_mutated(root) if root @@ -2610,7 +2641,7 @@ def visit_IntrinsicFunc(node, args) # CLEAR operations, so :stack is always safe here. sig { params(node: AST::VarDecl).void } def visit_VarDecl(node) - if node.value.is_a?(AST::ListLit) && node.type.is_a?(Type) && node.type.fixed? + if node.value.is_a?(AST::ListLit) && node.type&.fixed? node.value.storage = :stack end visit(node.value) @@ -2636,7 +2667,7 @@ def visit_VarDecl(node) def promote_pipe_to_observable_dest!(node) return unless node.respond_to?(:type) && node.type return unless node.value - target = node.type.is_a?(Type) ? node.type : Type.new(node.type) + target = node.type return unless target.future? && target.observable? pipe = node.value return unless pipe.is_a?(AST::BinaryOp) && pipe.op == :SMOOTH @@ -2647,9 +2678,9 @@ def promote_pipe_to_observable_dest!(node) # only the fold's analyzer knows whether this is :sum/:count/:max/... # Copying it onto node.type also propagates the kind to the binding's # symbol entry (so WITH VIEW / NEXT / cleanup all see it). - if pipe.full_type.is_a?(Type) && T.must(pipe.full_type).observable_terminal + if pipe.full_type&.observable_terminal pipe_terminal = T.must(pipe.full_type).observable_terminal - target_t = node.type.is_a?(Type) ? node.type : Type.new(node.type) + target_t = node.type # The pipe is the authority on terminal kind: only the fold's # analyzer knows whether this is :sum / :count / :max / ... . # The LHS annotation (`~Int64@observable`) never carries one, so @@ -2671,7 +2702,7 @@ def promote_pipe_to_observable_dest!(node) # OBSERVABLE_WRAPPERS can find it. Without this, the binding's # emitted Zig wrapper would default-or-raise. Same mismatch # check as above. - if node.full_type.is_a?(Type) && node.full_type.observable? + if node.full_type&.observable? if node.full_type.observable_terminal && node.full_type.observable_terminal != pipe_terminal raise CompilerError.new( node.token, @@ -2709,7 +2740,7 @@ def finalize_decl_node!(node, mutable_flag) # check is correct because promote_pipe_to_observable_dest! sets # `observable_dest` only when the RHS is a SMOOTH-pipe over a # tense source; any other shape leaves it false. - if node.type.is_a?(Type) && node.type.future? && node.type.observable? + if node.type&.future? && node.type.observable? pipe = node.value ok = pipe.is_a?(AST::BinaryOp) && pipe.op == :SMOOTH && pipe.observable_dest unless ok @@ -2760,7 +2791,7 @@ def finalize_decl_node!(node, mutable_flag) # Empty collection literals annotated as Auto need a permissive # container type in scope so method dispatch works during the body walk; # the declaration annotation remains Auto for the later constraint pass. - if node.type.is_a?(Type) && node.type.auto? && + if node.type&.auto? && node.value.respond_to?(:type_object) && node.value.type_object && ( (node.value.is_a?(AST::ListLit) && node.value.items.empty? && @@ -2872,7 +2903,7 @@ def finalize_decl_node!(node, mutable_flag) sig { params(node: AST::BindExpr).returns(T.nilable(T::Hash[Symbol, T::Array[SymbolEntry]])) } def visit_BindExpr(node) # Same pre-set as visit_VarDecl: mark fixed-array list literals as :stack before visiting. - if node.value.is_a?(AST::ListLit) && node.type.is_a?(Type) && node.type.fixed? + if node.value.is_a?(AST::ListLit) && node.type&.fixed? node.value.storage = :stack end visit(node.value) @@ -3313,7 +3344,8 @@ def visit_GetIndex(node) if op # Registry-driven: type and ownership from INDEX_OPS - node.full_type = op[:return_type].call(target_type_info) + node.full_type = IntrinsicRegistry.to_return_def(op[:return_type]) + .resolve(target_type_info, [], self) node.container_borrow = true if op[:container_borrow] # Validate key types for maps @@ -3453,7 +3485,7 @@ def visit_GetField(node) end end - sig { params(node: AST::Slice).returns(Symbol) } + sig { params(node: AST::Slice).void } def visit_Slice(node) visit(node.target) visit(node.start) if node.start @@ -3464,7 +3496,7 @@ def visit_Slice(node) target_type = node.target.type_info if target_type&.array? element = target_type.element_type.resolved - node.full_type = :"#{element}[]" + node.full_type = Type.new(:"#{element}[]") else node.full_type = :Any end @@ -3523,7 +3555,7 @@ def visit_HashLit(node) end end - node.full_type = :"HashMap<#{first_val_type}>" + node.full_type = Type.new(:"HashMap<#{first_val_type}>") node.storage = :heap current_fn_ctx.heap_count += 1 if current_fn_ctx record_effect(EffectTracker::HEAP) @@ -3744,7 +3776,7 @@ def visit_ListLit(node) if inner_types.size > 1 error!(node, :BOUNDED_STREAM_MIXED_TYPES, types: inner_types.join(', ')) end - node.full_type = :"~#{inner_types.first}[#{node.items.size}]" + node.full_type = Type.new(:"~#{inner_types.first}[#{node.items.size}]") node.storage = :stack return end @@ -3787,7 +3819,7 @@ def visit_ListLit(node) end if node.storage == :stack - node.full_type = :"#{base_type}[#{node.items.size}]" + node.full_type = Type.new(:"#{base_type}[#{node.items.size}]") else t = Type.new(:"#{base_type}[]", location: :heap) t.provenance = :frame # makeList uses frameAlloc for backing @@ -3837,8 +3869,8 @@ def visit_RangeLit(node) def visit_Literal(node) node.full_type = case node.type - when :NUMBER then :Float64 - when :INT64 then :Int64 + when :NUMBER then Type.new(:Float64) + when :INT64 then Type.new(:Int64) when :STRING # provenance auto-inferred from location: :rodata in Type constructor if node.storage == :stack @@ -3849,17 +3881,17 @@ def visit_Literal(node) when :SYMBOL # Symbol literals: compile-time interned, static lifetime, O(1) equality by pointer. Type.new(Type::STRING_TYPE, sync: :symbol) - when :BYTE then :Byte - when :PREFIXED_INT then :Byte # Default; overflows checked after coercion context is known - when :INT8 then :Int8 - when :INT16 then :Int16 - when :INT32 then :Int32 - when :UINT16 then :UInt16 - when :UINT32 then :UInt32 - when :UINT64 then :UInt64 - when :FLOAT32 then :Float32 - when :BOOLEAN then :Bool - when :NIL then :NIL + when :BYTE then Type.new(:Byte) + when :PREFIXED_INT then Type.new(:Byte) # Default; overflows checked after coercion context is known + when :INT8 then Type.new(:Int8) + when :INT16 then Type.new(:Int16) + when :INT32 then Type.new(:Int32) + when :UINT16 then Type.new(:UInt16) + when :UINT32 then Type.new(:UInt32) + when :UINT64 then Type.new(:UInt64) + when :FLOAT32 then Type.new(:Float32) + when :BOOLEAN then Type.new(:Bool) + when :NIL then Type.new(:NIL) else error!(node, :UNKNOWN_LITERAL) end @@ -3949,6 +3981,10 @@ def visit_BindVar(node) :stack ) + # The bound identifier ($u) IS the per-element binding — type it + # exactly as it was declared (binding_type), not a guess. + node.right.full_type = binding_type + # The result of the operation is the collection itself (passthrough for pipeline) node.full_type = lhs_type end @@ -4298,7 +4334,9 @@ def visit_CopyNode(node) end # Infer return type for list.remove(i) — returns the element type. - sig { params(args: T::Array[T.untyped], node: AST::MethodCall).returns(Symbol) } + # `node` is unused (the receiver is args.first); nilable because + # FunctionReturn#resolve dispatches without a call node. + sig { params(args: T::Array[T.untyped], node: T.untyped).returns(Symbol) } def infer_element_type(args, node) receiver = args.first ti = receiver&.type_info @@ -4314,6 +4352,25 @@ def infer_optional_element_type(args, node) :"?#{elem}" end + # Infer return type for stream/list `.toList()` — an owned heap list + # of the receiver's element type (unwrapping stream/promise tenses). + sig { params(args: T::Array[T.untyped], node: T.untyped).returns(Type) } + def infer_to_list(args, node) + recv_t = Type.new(args[0].resolved_type) + elem_t = if recv_t.dynamic_stream? || recv_t.promise_list? + recv_t.tense_type.element_type + elsif recv_t.bounded_stream? + recv_t.stream_element_type + elsif recv_t.inf_stream? + recv_t.inf_stream_element_type + elsif recv_t.open_stream? + recv_t.open_stream_element_type + else + recv_t.element_type + end + Type.new(:"#{elem_t.resolved}[]", collection: :list, location: :heap) + end + sig { params(node: AST::LinkNode).returns(T.nilable(Type)) } def visit_LinkNode(node) visit(node.value) @@ -5140,7 +5197,7 @@ def visit_BgStreamBlock(node) error!(node, :BG_STREAM_INCONSISTENT_YIELD, types: elem_syms.join(', ')) end - node.full_type = :"~?#{elem_syms.first}[]" + node.full_type = Type.new(:"~?#{elem_syms.first}[]") # Detect YIELD of frame strings: when any YIELD expression is frame-allocated, # the MIR pass will heap-dupe it before push. NEXT callers own the duped copy. @@ -5186,7 +5243,7 @@ def visit_YieldExpr(node) error!(node, :YIELD_OUTSIDE_BG_STREAM) end visit(node.expr) - node.full_type = node.expr.full_type || :Void + node.full_type = Type.new(node.expr.full_type || :Void) T.must(@stream_yield_types) << Type.new(node.full_type) record_effect(EffectTracker::SUSPENDS) end @@ -5221,7 +5278,7 @@ def visit_BgBlock(node) if last_type_str.start_with?('!') last_type = T.must(last_type_str[1..]).to_sym end - node.full_type = :"~#{last_type}" + node.full_type = Type.new(:"~#{last_type}") # Propagate returns_promoted through BG blocks: if the last expression # calls a function with returns_promoted, the BG block's promise carries @@ -5369,7 +5426,7 @@ def visit_NextExpr(node) node.storage = :heap elsif promise_type.dynamic_stream? elem_sym = promise_type.tense_type.element_type.to_sym - node.full_type = :"?#{elem_sym}" + node.full_type = Type.new(:"?#{elem_sym}") elsif promise_type.bounded_stream? # NEXT on ~T[N]: returns T (the element type). # Does NOT mark the stream as moved — the stream can be NEXT'd up to N times. @@ -5382,12 +5439,12 @@ def visit_NextExpr(node) # NEXT on ~?T[]@split: returns ?T — each handle advances independently through # the shared memoized sequence until exhaustion. elem_sym = promise_type.open_stream_element_type.to_sym - node.full_type = :"?#{elem_sym}" + node.full_type = Type.new(:"?#{elem_sym}") elsif promise_type.open_stream? # NEXT on ~?T[]: returns ?T — null signals stream exhaustion. # Does NOT mark as moved — stream is a resource cleaned up via deinit. elem_sym = promise_type.open_stream_element_type.to_sym - node.full_type = :"?#{elem_sym}" + node.full_type = Type.new(:"?#{elem_sym}") elsif promise_type.inf_stream? # NEXT on ~T[INF]: returns T (never nil — stream is infinite, rendezvous-style). # Does NOT mark as moved — stream is a resource cleaned up via deinit. @@ -5554,14 +5611,14 @@ def handle_assign_borrow(node) def resolve_borrow_source(call_node) # Path 1: stdlib functions with lifetime: "self" matched_def = call_node.matched_stdlib_def - if matched_def.is_a?(Hash) && matched_def[:lifetime] - lifetime = matched_def[:lifetime] + if matched_def && matched_def.emit&.lifetime + lifetime = matched_def.emit.lifetime if lifetime == "self" && call_node.is_a?(AST::MethodCall) return call_node.object end # Named param lifetime -- find by index in args list args = call_node.is_a?(AST::MethodCall) ? [call_node.object] + call_node.args : call_node.args - arg_types = matched_def[:args] + arg_types = matched_def.arg_spec if arg_types.is_a?(Array) idx = arg_types.index { |a| a.is_a?(Hash) && a[:name] == lifetime } return args[idx] if idx && args[idx] @@ -5592,7 +5649,7 @@ def resolve_borrow_source(call_node) return nil if primary == :wildcard primary_root = primary.to_s.split(".").first - param_index = func_type.params&.find_index { |p| p[:name] == primary_root } + param_index = func_type.params&.find_index { |p| p.name == primary_root } return nil unless param_index args = call_node.is_a?(AST::MethodCall) ? [call_node.object] + call_node.args : call_node.args @@ -5957,7 +6014,7 @@ def lookup_source_name(sym) @fn_nodes.each_value do |fn| next unless fn.respond_to?(:params) fn.params.each do |p| - return p[:name].to_s if p[:symbol].equal?(sym) + return p.name.to_s if p.symbol.equal?(sym) end end nil @@ -6491,16 +6548,16 @@ def set_cleanup_alloc!(node) val = node.value if val && (val.is_a?(AST::FuncCall) || val.is_a?(AST::MethodCall)) matched_def = val.matched_stdlib_def - if matched_def.is_a?(Hash) + if matched_def # Borrow returns (lifetime:) need no cleanup -- the caller owns the data - if matched_def[:lifetime] + if matched_def.emit&.lifetime ti.provenance = :borrow return end - ret_alloc = matched_def[:return_alloc] + ret_alloc = matched_def.emit&.return_alloc # For allocating methods without explicit return_alloc, the method's # alloc IS the return alloc (e.g. map.values() on sharded maps). - ret_alloc ||= matched_def[:alloc] if matched_def[:allocates] + ret_alloc ||= matched_def.emit&.alloc if matched_def.emit&.allocates if ret_alloc ti.provenance ||= ret_alloc if [:heap, :frame].include?(ret_alloc) return diff --git a/src/ast/ast.rb b/src/ast/ast.rb index 9f7ee0271..a6fe4b3a9 100644 --- a/src/ast/ast.rb +++ b/src/ast/ast.rb @@ -3,6 +3,7 @@ require_relative "type" require_relative "schemas" +require_relative "../annotator-helpers/intrinsic_registry" # ========================================== # AST @@ -10,6 +11,56 @@ module AST extend T::Sig + # A node's value-type is, for these kinds, a pure function of its + # structure — so it is DERIVED, never stamped. The full_type getter + # below memoizes the derived Type into @type_object; the pre-MIR + # invariant walk (which calls .full_type on every node) materializes + # it, so type_info / resolved_type work downstream with no extra + # code. An annotator-set value always wins (`||=`). + LITERAL_VALUE_TYPE = { + STRING: :String, NUMBER: :Number, FLOAT64: :Float64, + INT64: :Int64, BOOLEAN: :Bool, SYMBOL: :Symbol, NIL: :Void + }.freeze + BOOL_BINOPS = %i[LT GT LTE GTE EQ NEQ AND OR].freeze + # Statements / control-flow evaluate to Void unless the annotator + # promoted them to an expression (IF/MATCH as a value), in which + # case @type_object is already set and wins. + module StatementVoidType + def full_type + @type_object ||= Type.new(:Void) + end + end + + # A function/lambda/method parameter descriptor. Replaces the loose + # `{ name:, type:, ... }` Hash that flowed through FunctionDef#params + # and FunctionSignature#params. `type` is ALWAYS a Type (coerced; + # nil only when the param is unannotated/inferred — the inference + # signal). Strongly typed; no Hash-style access. + Param = Struct.new(:name, :type, :default, :mutable, :takes, + :comptime, :name_token, :required, :sync, :symbol, + keyword_init: true) do + extend T::Sig + + def initialize(**kw) + super + t = self[:type] + self[:type] = Type.new(t) unless t.nil? || t.is_a?(Type) + end + + sig { params(val: T.untyped).void } + def type=(val) + self[:type] = val.nil? || val.is_a?(Type) ? val : Type.new(val) + end + + # Idempotent normalizer used at the FunctionDef / FunctionSignature + # seams. Accepts a Param (passthrough) or a Hash (legacy producer). + sig { params(p: T.any(Param, T::Hash[Symbol, T.untyped])).returns(Param) } + def self.coerce(p) + return p if p.is_a?(Param) + new(**p.slice(*members)) + end + end + # Walk all statements in a body, recursing into control flow branches. # Yields each statement node. Handles IfStatement, MatchStatement, # WhileLoop, ForRange, ForEach, and generic nodes with .body. @@ -220,7 +271,9 @@ def zig_pattern=(val); @zig_pattern = T.let(val, T.untyped); end sig { returns(T.untyped) } def matched_stdlib_def; @matched_stdlib_def = T.let(@matched_stdlib_def, T.untyped); end sig { params(val: T.untyped).returns(T.untyped) } - def matched_stdlib_def=(val); @matched_stdlib_def = T.let(val, T.untyped); end + def matched_stdlib_def=(val) + @matched_stdlib_def = T.let(IntrinsicRegistry.fs(val), T.untyped) + end sig { void } def stdlib_allocates; @stdlib_allocates = T.let(@stdlib_allocates, T.untyped); end @@ -556,6 +609,30 @@ def metatype include HasBodies sig { returns(T::Array[T::Array[T.untyped]]) } def child_bodies = [body].compact + + # Seam: a function's declared/inferred return is always a Type + # (or nil when undeclared — the implicit-return signal that + # inference consumes). Coerced at BOTH construction (positional + # Struct init from parser/synthetic builders) and post-parse + # assignment (return inference, auto-infer) so no reader needs + # an `is_a?(Type)` Symbol/Type discriminator. + def initialize(*) + super + rt = self[:return_type] + self[:return_type] = Type.new(rt) unless rt.nil? || rt.is_a?(Type) + self[:params] = (self[:params] || []).map { |p| Param.coerce(p) } + end + + sig { params(val: T.untyped).void } + def return_type=(val) + self[:return_type] = val.nil? || val.is_a?(Type) ? val : Type.new(val) + end + + sig { params(val: T.nilable(T::Array[T.untyped])).void } + def params=(val) + self[:params] = (val || []).map { |p| Param.coerce(p) } + end + attr_accessor :type_params # Array of type param name strings, e.g. ["T", "K"], or nil # True when the user wrote RETURNS explicitly; fallible-signature checks # only enforce on user-authored return types. @@ -665,9 +742,25 @@ def child_bodies = [body].compact VarDecl = Struct.new(:token, :name, :type, :value, :mutable) do include Locatable attr_accessor :mir_binding_entry # stamped by CleanupClassifier: per-node cleanup entry (avoids same-name collision) + + # Seam: a declaration's annotated/inferred type is always a Type + # (or nil when unannotated — the inference signal). Coerced at + # construction (positional Struct init) and post-parse assignment + # (auto-infer, propagation) so no reader needs an `is_a?(Type)` + # Symbol/Type discriminator. + def initialize(*) + super + t = self[:type] + self[:type] = Type.new(t) unless t.nil? || t.is_a?(Type) + end + + def type=(val) + self[:type] = val.nil? || val.is_a?(Type) ? val : Type.new(val) + end end Assignment = Struct.new(:token, :name, :value) do include Locatable + include StatementVoidType attr_accessor :auto_lock # set by annotator when target is @locked/@writeLocked (inline guard) attr_accessor :field_pre_cleanup # stamped by MIRPass: { zig_type:, alloc: } for field overwrite cleanup attr_accessor :container_promote_zig_type # stamped by MIRPass: Zig type string when indexed store needs frame-to-heap promote @@ -686,10 +779,33 @@ def child_bodies = [body].compact attr_accessor :mir_binding_entry # stamped by CleanupClassifier: per-node cleanup entry (avoids same-name collision) attr_accessor :compound_op attr_accessor :auto_atomic_op + + # Seam: same contract as VarDecl#type — annotated/inferred type is + # always a Type (or nil when unannotated). Coerced at construction + # and post-parse assignment so no reader needs an `is_a?(Type)` + # Symbol/Type discriminator. + def initialize(*) + super + t = self[:type] + self[:type] = Type.new(t) unless t.nil? || t.is_a?(Type) + end + + def type=(val) + self[:type] = val.nil? || val.is_a?(Type) ? val : Type.new(val) + end end BinaryOp = Struct.new(:token, :left, :op, :right) do extend T::Sig include Locatable + # Derived: comparison/logical -> Bool; otherwise an operand's type. + def full_type + @type_object ||= + if BOOL_BINOPS.include?(op) + Type.new(:Bool) + else + Type.new(left&.full_type&.resolved || right&.full_type&.resolved || :Any) + end + end attr_accessor :string_concat # true when this is string + (stamped by annotator) attr_accessor :storage # :heap when carry-var concat is promoted to heap attr_accessor :or_fallback_dupe # true when OR_RESCUE fallback struct needs string-field heap dupe @@ -709,7 +825,13 @@ def lazy_fields = (op == :OR_RESCUE ? [:right] : []) # pipeline lowering switches to fiber-spawn-with-accumulator codegen. attr_accessor :observable_dest end - UnaryOp = Struct.new(:token, :op, :right) { include Locatable } + UnaryOp = Struct.new(:token, :op, :right) do + include Locatable + # Derived: NOT -> Bool; otherwise the operand's type. + def full_type + @type_object ||= op == :NOT ? Type.new(:Bool) : Type.new(right&.full_type&.resolved || :Any) + end + end # Parser-only placeholder for call-site override syntax; the annotator # rejects it until runtime semantics are implemented. CallSiteOverride = Struct.new(:token, :kind, :n, :inner) { include Locatable } @@ -723,7 +845,14 @@ def lazy_fields = (op == :OR_RESCUE ? [:right] : []) def wildcard?; false end def name; self[:name].to_s end end - Literal = Struct.new(:token, :type, :value, :storage) { include Locatable } + Literal = Struct.new(:token, :type, :value, :storage) do + include Locatable + # Derived: a literal's value-type is a pure function of its token + # kind. Never nil, never stamped. + def full_type + @type_object ||= Type.new(LITERAL_VALUE_TYPE.fetch(self[:type], :Any)) + end + end ListLit = Struct.new(:token, :items, :storage) { extend T::Sig include Locatable @@ -752,10 +881,22 @@ def coerce!(declared_type) # misspelled field-name for a fixable edit span. attr_accessor :field_tokens end - LambdaLit = Struct.new(:token, :params, :captures, :body, :storage, :deferred_drops) { include Locatable } + LambdaLit = Struct.new(:token, :params, :captures, :body, :storage, :deferred_drops) do + include Locatable + # Same params seam as FunctionDef: always Array. + def initialize(*) + super + self[:params] = (self[:params] || []).map { |p| Param.coerce(p) } + end + + def params=(val) + self[:params] = (val || []).map { |p| Param.coerce(p) } + end + end IfStatement = Struct.new(:token, :condition, :then_branch, :else_branch, :then_drops, :else_drops) do extend T::Sig include Locatable + include StatementVoidType include HasBodies sig { returns(T::Array[T.untyped]) } def child_bodies = [then_branch, else_branch].compact @@ -773,6 +914,7 @@ def child_bodies = [then_branch, else_branch].compact WhileLoop = Struct.new(:token, :condition, :do_branch, :deferred_drops) do extend T::Sig include Locatable + include StatementVoidType include HasBodies sig { returns(T::Array[T.untyped]) } def child_bodies = [do_branch].compact @@ -782,14 +924,21 @@ def child_bodies = [do_branch].compact WhileBindLoop = Struct.new(:token, :condition, :binding_name, :binding_token, :do_branch, :deferred_drops) do extend T::Sig include Locatable + include StatementVoidType include HasBodies sig { returns(T::Array[T.untyped]) } def child_bodies = [do_branch].compact attr_accessor :mark_per_iter attr_accessor :tight end - BreakNode = Struct.new(:token) { include Locatable } - ContinueNode = Struct.new(:token) { include Locatable } + BreakNode = Struct.new(:token) do + include Locatable + include StatementVoidType + end + ContinueNode = Struct.new(:token) do + include Locatable + include StatementVoidType + end FuncCall = Struct.new(:token, :name, :args) do extend T::Sig include Locatable @@ -1044,12 +1193,22 @@ def name; target.respond_to?(:name) ? target.name : nil end # ExternFnDecl: EXTERN FN name(params) RETURNS type [EFFECTS :alloc] FROM "module" # Or method: EXTERN FN TypeName.method(params) RETURNS type FROM "module" # Declares a native Zig/C function importable via @import("module"). - ExternFnDecl = Struct.new(:token, :name, :params, :return_type, :from_module, :effects) { + ExternFnDecl = Struct.new(:token, :name, :params, :return_type, :from_module, :effects) do include Locatable attr_accessor :owner_type # "TypeName" for method declarations (nil for free functions) attr_accessor :owner_type_params # [:T, :U] for TypeName.method attr_accessor :fn_type_params # [:T] for fnName(...) - } + + # Same params seam as FunctionDef/LambdaLit: always Array. + def initialize(*) + super + self[:params] = (self[:params] || []).map { |p| Param.coerce(p) } + end + + def params=(val) + self[:params] = (val || []).map { |p| Param.coerce(p) } + end + end # ExternStructDecl: EXTERN STRUCT Name { fields } [CLOSE "method"] FROM "module" # Declares a native Zig/C struct type for CLEAR type-checking purposes. # CLOSE registers the type as a resource with auto-defer cleanup (RAII). @@ -1192,6 +1351,7 @@ def child_bodies ForRange = Struct.new(:token, :var_name, :start_expr, :end_expr, :inclusive, :body, :deferred_drops, :mark_per_iter) do extend T::Sig include Locatable + include StatementVoidType include HasBodies sig { returns(T::Array[T::Array[T.untyped]]) } def child_bodies = [body].compact @@ -1202,6 +1362,7 @@ def child_bodies = [body].compact ForEach = Struct.new(:token, :var_name, :collection, :body, :deferred_drops, :is_mutable) do extend T::Sig include Locatable + include StatementVoidType include HasBodies sig { returns(T::Array[T::Array[T.untyped]]) } def child_bodies = [body].compact diff --git a/src/ast/diagnostic_registry.rb b/src/ast/diagnostic_registry.rb index 151386282..9f21a13c4 100644 --- a/src/ast/diagnostic_registry.rb +++ b/src/ast/diagnostic_registry.rb @@ -2456,7 +2456,7 @@ module DiagnosticRegistry INTRINSIC_REJECTED: { severity: :error, category: :type, template: "%{message}", - summary: "Stdlib intrinsic rejected this call (matched_def[:reject_when] fired).", + summary: "Stdlib intrinsic rejected this call (the matched signature's reject_when fired).", cause: "A stdlib intrinsic (`.negative?`, `.zero?`, ...) rejected this call because the argument type isn't allowed. The stdlib uses `reject_when` patterns to rule out call shapes that look valid but produce wrong results — e.g. `.negative?` on an unsigned int.", fix_hint: "Check the message for the specific reject reason. Often the fix is to remove the call entirely (the answer is statically known) or use a different intrinsic.", }, diff --git a/src/ast/parser.rb b/src/ast/parser.rb index cf494eca5..065ebea87 100644 --- a/src/ast/parser.rb +++ b/src/ast/parser.rb @@ -872,6 +872,9 @@ def parse_argument_list() end end + # Plain Hash: this comma-seq block is shared by FN-param and + # USE-capture parsing. Params are coerced to AST::Param at the + # FunctionDef/LambdaLit seam; captures stay Hashes. { name: p_name, type: p_type, default: default_val, mutable: is_mutable, takes: takes, comptime: is_comptime, name_token: name_tok } end .last # always ignore the first token diff --git a/src/ast/scope.rb b/src/ast/scope.rb index ccd6b9681..012e16483 100644 --- a/src/ast/scope.rb +++ b/src/ast/scope.rb @@ -54,7 +54,7 @@ def declare(name, reg, type, is_mutable = true, is_rebindable = false, size = ni # # The cost: storage / sync / type changes that happen AFTER the body # has been visited (notably `EscapeAnalysis.propagate_caller_sync!`, - # which mutates `param[:symbol]`) do NOT propagate to the deep-copied + # which mutates `param.symbol`) do NOT propagate to the deep-copied # entries inside nested scopes. A pass that reads `node.symbol.storage` # off an Identifier inside a nested scope sees the pre-propagation # value. @@ -62,7 +62,7 @@ def declare(name, reg, type, is_mutable = true, is_rebindable = false, size = ni # The rule for any post-annotation pass that needs a param's CURRENT # storage / sync: # - # * mutate `param[:symbol]` (the function-level entry) + # * mutate `param.symbol` (the function-level entry) # * read against `Scope.live_param_syms(fn)` to refresh stale # references # @@ -101,7 +101,7 @@ def initialize_copy(original) # Build a {param_name => live SymbolEntry} map from a FunctionDef. # - # The "live" entry is the one stored on `param[:symbol]` -- the entry + # The "live" entry is the one stored on `param.symbol` -- the entry # that lives at the function scope and that `propagate_caller_sync!` # mutates in place. Any pass that has a `capture_symbols` (or similar) # cache of SymbolEntry references collected during annotation should @@ -114,7 +114,7 @@ def initialize_copy(original) def self.live_param_syms(fn) return {} unless fn.respond_to?(:params) (fn.params || []).each_with_object({}) do |p, h| - h[p[:name].to_s] = p[:symbol] if p[:symbol] + h[p.name.to_s] = p.symbol if p.symbol end end diff --git a/src/ast/std_lib.rb b/src/ast/std_lib.rb index ae7bffde3..e88b71a08 100644 --- a/src/ast/std_lib.rb +++ b/src/ast/std_lib.rb @@ -209,21 +209,7 @@ "toList" => [ { args: [:"Any[]"], - return: lambda { |args, _node| - recv_t = Type.new(args[0]) - elem_t = if recv_t.dynamic_stream? || recv_t.promise_list? - recv_t.tense_type.element_type - elsif recv_t.bounded_stream? - recv_t.stream_element_type - elsif recv_t.inf_stream? - recv_t.inf_stream_element_type - elsif recv_t.open_stream? - recv_t.open_stream_element_type - else - recv_t.element_type - end - Type.new(:"#{elem_t.resolved}[]", collection: :list, location: :heap) - }, + return: :infer_to_list, zig: "try ({0}).toList({rt}.heapAlloc())", bc: true, bc_op: :to_list, allocates: true, @@ -1007,10 +993,12 @@ # ============================================================================ # Method Registry — type-specific method definitions for Pool and HashMap # ============================================================================ -# Each entry: { arity: N, validate: lambda, return_type: lambda, tag: symbol } +# Each entry: { arity: N, validate: lambda, return_type: , tag: symbol } # arity: expected arg count (-1 = any) # validate: lambda(node, args, obj_type, error_fn) — type-check args -# return_type: lambda(obj_type) — compute return type from receiver type +# return_type: declarative return directive (a type Symbol/Hash, an +# r_* receiver-parametric variant, or an infer_* host +# method) -> FunctionReturn via IntrinsicRegistry # tag: symbol to set on the node (pool_method / map_method) POOL_METHODS = T.let({ @@ -1028,14 +1016,14 @@ error_fn.call(node, "Pool.insert: argument type #{arg_type} does not match pool element type #{elem.resolved}") end }, - return_type: ->(obj_type) { Type.new(:"Id<#{obj_type.element_type.resolved}>") }, + return_type: :r_id_element, is_method: true, }, "get" => { arity: 1, tag: :pool_method, bc: true, zig: "{0}.get({1})", - return_type: ->(obj_type) { Type.new(:"?#{obj_type.element_type.resolved}") }, + return_type: :r_optional_element, borrows: :all, # returns borrowed pointer into pool storage, is_method: true, }, @@ -1044,7 +1032,7 @@ bc: true, zig: "{0}.remove({1})", mutates_receiver: true, - return_type: ->(_) { :Void }, + return_type: :Void, borrows: :all, # pool frees the slot internally, is_method: true, }, @@ -1052,7 +1040,7 @@ arity: 0, tag: :pool_method, bc: true, zig: "{0}.length()", - return_type: ->(_) { Type.new(:Int64) }, + return_type: :Int64, borrows: :all, is_method: true, }, @@ -1060,7 +1048,7 @@ arity: 1, tag: :pool_method, bc: true, zig: "({0}.get({1}) != null)", - return_type: ->(_) { :Bool }, + return_type: :Bool, borrows: :all, is_method: true, }, @@ -1068,7 +1056,7 @@ arity: 0, tag: :pool_method, bc: true, zig: "({0}.length() == 0)", - return_type: ->(_) { :Bool }, + return_type: :Bool, borrows: :all, is_method: true, }, @@ -1076,7 +1064,7 @@ arity: 0, tag: :pool_method, bc: true, zig: "({0}.length() > 0)", - return_type: ->(_) { :Bool }, + return_type: :Bool, borrows: :all, is_method: true, }, @@ -1098,14 +1086,14 @@ error_fn.call(node, "Set.insert: argument type #{arg_type} does not match set element type #{elem.resolved}") end }, - return_type: ->(_) { :Void }, + return_type: :Void, is_method: true, }, "contains?" => { arity: 1, tag: :set_method, zig: "{0}.contains({1})", bc: true, - return_type: ->(_) { :Bool }, + return_type: :Bool, borrows: :all, is_method: true, }, @@ -1115,7 +1103,7 @@ bc: true, alloc: :heap, mutates_receiver: true, - return_type: ->(_) { :Void }, + return_type: :Void, borrows: :all, # set frees the element internally, is_method: true, }, @@ -1123,7 +1111,7 @@ arity: 0, tag: :set_method, zig: "{0}.length()", bc: true, - return_type: ->(_) { Type.new(:Int64) }, + return_type: :Int64, borrows: :all, is_method: true, }, @@ -1131,7 +1119,7 @@ arity: 0, tag: :set_method, zig: "({0}.length() == 0)", bc: true, - return_type: ->(_) { :Bool }, + return_type: :Bool, borrows: :all, is_method: true, }, @@ -1139,7 +1127,7 @@ arity: 0, tag: :set_method, zig: "({0}.length() > 0)", bc: true, - return_type: ->(_) { :Bool }, + return_type: :Bool, borrows: :all, is_method: true, }, @@ -1163,7 +1151,7 @@ error_fn.call(node, "HashMap.put: key must be a String, got #{args[0].resolved_type}") unless key_type.string? end }, - return_type: ->(_) { :Void }, + return_type: :Void, is_method: true, }, "delete" => { @@ -1181,7 +1169,7 @@ error_fn.call(node, "HashMap.delete: key must be a String, got #{args[0].resolved_type}") unless arg_type.string? end }, - return_type: ->(_) { :Void }, + return_type: :Void, borrows: :all, # map frees key+value internally, is_method: true, }, @@ -1198,7 +1186,7 @@ error_fn.call(node, "HashMap.contains?: key must be a String, got #{args[0].resolved_type}") unless arg_type.string? end }, - return_type: ->(_) { :Bool }, + return_type: :Bool, borrows: :all, is_method: true, }, @@ -1207,7 +1195,7 @@ zig: "{0}.count()", bc: true, numeric_zig: "CheatLib.numericMapCount({key_zig}, {val_zig}, {0})", - return_type: ->(_) { Type.new(:Int64) }, + return_type: :Int64, borrows: :all, is_method: true, }, @@ -1216,7 +1204,7 @@ zig: "{0}.count()", bc: true, numeric_zig: "CheatLib.numericMapCount({key_zig}, {val_zig}, {0})", - return_type: ->(_) { Type.new(:Int64) }, + return_type: :Int64, borrows: :all, is_method: true, }, @@ -1237,7 +1225,7 @@ # type-mismatch in CheatLib.cleanup at the binding's defer site. # For string-keyed HashMap, key_type defaults to String; # numeric maps return e.g. `Int64[]@list`. - return_type: ->(obj_type) { :"#{obj_type.key_type.resolved}[]@list" }, + return_type: :r_key_list, borrows: :all, # borrows map; returns new owned list, is_method: true, }, @@ -1249,7 +1237,7 @@ zig: "({0}.count() == 0)", bc: true, numeric_zig: "(CheatLib.numericMapCount({key_zig}, {val_zig}, {0}) == 0)", - return_type: ->(_) { :Bool }, + return_type: :Bool, borrows: :all, is_method: true, }, @@ -1258,7 +1246,7 @@ zig: "({0}.count() > 0)", bc: true, numeric_zig: "(CheatLib.numericMapCount({key_zig}, {val_zig}, {0}) > 0)", - return_type: ->(_) { :Bool }, + return_type: :Bool, borrows: :all, is_method: true, }, @@ -1272,7 +1260,7 @@ numeric_zig: "try CheatLib.numericMapValues({key_zig}, {val_zig}, {alloc}, {0})", # See the matching note on `keys`: this allocates an owned list, # so the declared type must be `T[]@list`, not the bare slice. - return_type: ->(obj_type) { :"#{obj_type.value_type.resolved}[]@list" }, + return_type: :r_value_list, borrows: :all, # borrows map; returns new owned list, is_method: true, }, @@ -1284,7 +1272,7 @@ # Keyed by container kind (:string_map, :numeric_map, :array, :pool, :set_collection). # Each entry has :get and/or :set with: # zig: Zig pattern string ({target}, {index}, {value}, {alloc}, {key_alloc}, etc.) -# return_type: lambda(container_type) -> return type for get +# return_type: declarative return directive (r_* variant / type) for get # container_borrow: true if get returns a borrowed view (no cleanup) # takes_value: true if set takes ownership of the value # allocates: true if set requires an allocator @@ -1304,7 +1292,7 @@ get: { zig: "{target}.get({index})", shard_direct_zig: "{target}.getDirect({shard_idx}, {shard_key})", - return_type: ->(ct) { :"?#{ct.value_type.resolved}" }, + return_type: :r_optional_value, container_borrow: true, bc: true, bc_op: :map_get, }, @@ -1326,7 +1314,7 @@ zig: "CheatLib.numericMapGet({key_zig}, {val_zig}, {target}, {index})", sharded_zig: "{target}.get({index})", shard_direct_zig: "{target}.getDirect({shard_idx}, {shard_key})", - return_type: ->(ct) { :"?#{ct.value_type.resolved}" }, + return_type: :r_optional_value, container_borrow: true, bc: true, bc_op: :map_get, }, @@ -1346,7 +1334,7 @@ array: { get: { zig: "CheatLib.getAt({target}, {index})", - return_type: ->(ct) { ct.element_type }, + return_type: :r_element_of, container_borrow: true, }, set: { @@ -1358,7 +1346,7 @@ list: { get: { zig: "CheatLib.getAt({target}, {index})", - return_type: ->(ct) { ct.element_type }, + return_type: :r_element_of, container_borrow: true, }, set: { @@ -1370,7 +1358,7 @@ pool: { get: { zig: "{target}.get({index})", - return_type: ->(ct) { :"?#{ct.element_type.resolved}" }, + return_type: :r_optional_element, container_borrow: false, }, set: { @@ -1384,7 +1372,7 @@ set_collection: { get: { zig: "if ({target}.contains({index})) {index} else null", - return_type: ->(ct) { Type.new(:"?#{ct.element_type.resolved}") }, + return_type: :r_optional_element, container_borrow: true, }, set: { @@ -1399,7 +1387,7 @@ get: { # O(1) byte access on String@raw. No allocation. builtin: :charAt, - return_type: ->(_t) { Type.new(:String, sync: :raw) }, + return_type: {type: :String, sync: :raw}, container_borrow: true, }, # No :set — strings are immutable. @@ -1408,7 +1396,7 @@ get: { # Byte indexing on String@symbol — same as @raw, returns @symbol slice. builtin: :charAt, - return_type: ->(_t) { Type.new(:String, sync: :symbol) }, + return_type: {type: :String, sync: :symbol}, container_borrow: true, }, # No :set — symbols are immutable. diff --git a/src/ast/type.rb b/src/ast/type.rb index e90500d1f..72081f55f 100644 --- a/src/ast/type.rb +++ b/src/ast/type.rb @@ -1,8 +1,6 @@ # typed: strict require "sorbet-runtime" -require_relative "../annotator-helpers/function_signature" - # Result struct for binary operation type resolution BinaryOpResult = Struct.new(:type, :left_coercion, :right_coercion, :storage, :error, keyword_init: true) @@ -1742,23 +1740,21 @@ def schema_union_any?(schema, &blk) # Structural match for function/lambda types. Called by accepts? when self.fn_type?. sig { params(other_type: Type).returns(T::Boolean) } def accepts_fn_type?(other_type) - return true if other_type.is_a?(Type) && other_type.any? - return false unless other_type.is_a?(Type) && other_type.fn_type? + return true if other_type.any? + return false unless other_type.fn_type? other_raw = other_type.raw self_params = @raw.params || [] other_params = other_raw.params || [] return false unless self_params.length == other_params.length - self_ret = @raw.return_type - other_ret = other_raw.return_type - self_ret_t = self_ret.is_a?(Type) ? self_ret : Type.new(self_ret || :Any) - other_ret_t = other_ret.is_a?(Type) ? other_ret : Type.new(other_ret || :Any) - return false unless self_ret_t.accepts?(other_ret_t) + # @raw / other_raw are FunctionSignature (fn_type? gate); their + # return_type is a non-nil Type by the FunctionSignature seam. + return false unless @raw.return_type.accepts?(other_raw.return_type) self_params.zip(other_params).each do |sp, op| - sp_t = sp[:type].is_a?(Type) ? sp[:type] : Type.new(sp[:type] || :Any) - op_t = op[:type].is_a?(Type) ? op[:type] : Type.new(op[:type] || :Any) + sp_t = sp.type || Type.new(:Any) + op_t = op.type || Type.new(:Any) return false unless sp_t.accepts?(op_t) end @@ -2098,11 +2094,10 @@ def compute_zig_type(is_param: false, is_field: false) # 2c. Function type: FN(T, ...) -> R => *const fn(*Runtime, T, ...) anyerror!R if fn_type? param_types_zig = @raw.params.map do |p| - t = p[:type] + t = p.type t.is_a?(Type) ? t.zig_type(is_param: true) : Type.new(t).zig_type(is_param: true) end - ret = @raw.return_type - ret_zig = ret.is_a?(Type) ? ret.zig_type : Type.new(ret).zig_type + ret_zig = @raw.return_type.zig_type all_params = ["*Runtime"] + param_types_zig ret_str = ret_zig.start_with?("!") ? ret_zig : "anyerror!#{ret_zig}" return "*const fn(#{all_params.join(', ')}) #{ret_str}" @@ -2332,3 +2327,11 @@ def check_prefixed_int_range!(node, effective_type) end end + +# Loaded after `class Type` is fully defined so the +# function_signature -> function_return -> type require cycle resolves +# with `Type` already present (function_return's `const :fixed, +# T.nilable(Type)` evaluates at class-body time). All Type refs to +# FunctionSignature are runtime-lazy (method bodies), so deferring +# this require is safe. +require_relative "../annotator-helpers/function_signature" diff --git a/src/backends/compiler_frontend.rb b/src/backends/compiler_frontend.rb index 732714008..dba362c9b 100644 --- a/src/backends/compiler_frontend.rb +++ b/src/backends/compiler_frontend.rb @@ -16,6 +16,7 @@ require_relative "pipeline_rewriter" require_relative "string_concat_rewriter" require_relative "../mir/control_flow" +require_relative "../mir/pre_mir_type_check" class CompilerFrontend extend T::Sig @@ -62,6 +63,11 @@ def self.compile(cheat_code, importer:, source_dir:, strict_test: false) # generation -- mir_lowering still walks the TestBlock directly. synthesize_test_body_wrappers!(T.must(ast), fn_nodes) + # AST→MIR boundary invariant: every evaluatable node must carry a + # resolved type by now. A nil full_type here is a compiler bug + # (annotator failed to stamp it), surfaced before MIR consumes it. + PreMirTypeCheck.verify!(T.must(ast)) + mir_pass = MIRPass.new(fn_nodes: fn_nodes, schema_lookup: schema_lookup) mir_pass.transform!(T.must(ast)) diff --git a/src/backends/importer.rb b/src/backends/importer.rb index ddfb1bb27..bc8d3f960 100644 --- a/src/backends/importer.rb +++ b/src/backends/importer.rb @@ -140,7 +140,7 @@ def reject_auto_in_public_signatures!(ast, abs_path) offending = [] stmt.params.each do |p| - offending << "param '#{p[:name]}'" if auto_type?(p[:type]) + offending << "param '#{p.name}'" if auto_type?(p.type) end offending << "return type" if auto_type?(stmt.return_type) next if offending.empty? diff --git a/src/backends/pipeline_host.rb b/src/backends/pipeline_host.rb index d3557a394..0c6b879cc 100644 --- a/src/backends/pipeline_host.rb +++ b/src/backends/pipeline_host.rb @@ -785,7 +785,7 @@ def build_soa_scalar_fold_block(site, fold_node, label, source_mir, expr_mir, fi false, nil, nil) end - result_type = smooth_node.respond_to?(:full_type) && smooth_node.full_type ? transpile_type(smooth_node.full_type.to_s) : "f64" + result_type = transpile_type(smooth_node.full_type.to_s) init_stmts = [] loop_body = [] final_expr = nil @@ -1106,7 +1106,7 @@ def lower_limit(site, limit_node) # cases @channel_slots) and accumulates into a list. Producer fibers # whose body terminates early push Nil; the for-loop's nil-guard # ends the drain. - if bc_target? && list_node.full_type&.inf_stream? + if bc_target? && list_node.full_type.inf_stream? label = next_pipe_label source_mir = visit_mir(list_node) @current_pipe_label = label @@ -1337,7 +1337,7 @@ def lower_reduce(site, reduce_node) def lower_window(site, window_node) list_node = site.list smooth_node = site.options - expr_type_str = (window_node.expression.full_type || window_node.expression.resolved_type).to_s + expr_type_str = window_node.expression.full_type.to_s res_zig = transpile_type(expr_type_str) alloc = pipeline_alloc(smooth_node) size_mir = visit_mir(window_node.size) @@ -1409,7 +1409,7 @@ def batch_window_timeout_ns(bw_node) def lower_batch_window(site, bw_node) list_node = site.list smooth_node = site.options - expr_type_str = (bw_node.expression.full_type || bw_node.expression.resolved_type).to_s + expr_type_str = bw_node.expression.full_type.to_s res_zig = transpile_type(expr_type_str) lhs_type = list_node.type_info @@ -1446,7 +1446,7 @@ def lower_batch_window(site, bw_node) # genuinely-infinite producers stay blocked at the next YIELD until # exec! shutdown closes the channel. if bc_target? && list_node.is_a?(AST::Identifier) && - list_node.type_info&.inf_stream? + list_node.type_info.inf_stream? label = next_pipe_label drain_label = next_pipe_label source_mir = visit_mir(list_node) @@ -1684,7 +1684,7 @@ def lower_index(site, expr_node) lhs_ti.open_stream_element_type.resolved elsif lhs_ti&.dynamic_stream? || lhs_ti&.bounded_stream? lhs_ti.tense_type.element_type.resolved - elsif range_chain[:source].type_info&.inf_stream? + elsif range_chain[:source].type_info.inf_stream? # list_node is a SMOOTH chain (e.g. counter |> LIMIT 9); lhs_ti is the # materialized list type so tense_type is unavailable. Pull element type # from the inf stream source directly. @@ -2095,6 +2095,9 @@ def lower_sharded_each(site, each_op) # structural MIR. There is no user-facing CONCURRENT wrapper here; # sharded EACH has always implied one worker per shard. conc = AST::ConcurrentOp.new(each_op.token, each_op, {}) + # Synthesized post-annotation: inherit the wrapped EachOp's type + # so the AST->MIR type-resolution invariant holds. + conc.full_type = each_op.full_type if each_op.full_type cb = build_bounded_concurrent_callback_pointer(conc, item_t) source_mir = visit_mir(list_node) @@ -2129,9 +2132,9 @@ def lower_sharded_each(site, each_op) sig { params(node: T.untyped).returns(T::Boolean) } def finite_stream_source_node?(node) - node.is_a?(AST::RangeLit) || node.type_info&.dynamic_stream? || - node.type_info&.open_stream? || - node.type_info&.bounded_stream? || node.type_info&.inf_stream? + node.is_a?(AST::RangeLit) || node.type_info.dynamic_stream? || + node.type_info.open_stream? || + node.type_info.bounded_stream? || node.type_info.inf_stream? end # Walk a BinaryOp(SMOOTH) left-spine looking for a finite stream source @@ -3220,8 +3223,8 @@ def lower_range_fold(range_lit, stages, fold_op, smooth_node) ]) end if bc_target? && range_lit.is_a?(AST::Identifier) && - (range_lit.type_info&.dynamic_stream? || range_lit.type_info&.bounded_stream? || - range_lit.type_info&.inf_stream?) + (range_lit.type_info.dynamic_stream? || range_lit.type_info.bounded_stream? || + range_lit.type_info.inf_stream?) return MIR::BlockExpr.new(label, [ *p[:outer_stmts], *acc_init_stmts, MIR::ForStmt.new(visit_mir(range_lit), capture_name, @@ -3278,8 +3281,8 @@ def lower_range_reduce(range_lit, stages, reduce_op, smooth_node = nil) ]) end if bc_target? && range_lit.is_a?(AST::Identifier) && - (range_lit.type_info&.dynamic_stream? || range_lit.type_info&.bounded_stream? || - range_lit.type_info&.inf_stream?) + (range_lit.type_info.dynamic_stream? || range_lit.type_info.bounded_stream? || + range_lit.type_info.inf_stream?) return MIR::BlockExpr.new(label, [ *p[:outer_stmts], MIR::Let.new("acc", init_mir, true, acc_zig, nil), @@ -3328,7 +3331,7 @@ def lower_concurrent(site, conc_op) return lower_concurrent_bc(smooth_node.left, conc_op, smooth_node) unless stream_lhs end - if !smooth_node.left.is_a?(AST::RangeLit) && smooth_node.left.type_info&.bounded_stream? + if !smooth_node.left.is_a?(AST::RangeLit) && smooth_node.left.type_info.bounded_stream? return lower_concurrent_bounded_stream(smooth_node.left, conc_op) end @@ -3554,7 +3557,7 @@ def lower_shard_concurrent_each(lhs, conc_op, smooth_node) def lower_shard_concurrent_each_zig(id, range_node, conc_op, each_op, ctx, map_node, map_var_name, idx_var, key_var, sh_var, map_ptr, start_mir, end_mir) - shard_count = ctx[:shard_count] || map_node.type_info&.shard_count + shard_count = ctx[:shard_count] || map_node.type_info.shard_count raise "SHARD target missing shard_count" unless shard_count map_t = map_node.type_info @@ -4288,7 +4291,7 @@ def list_concurrent_source_setup_iz(lhs) list_zig = visit(lhs) src_needs_cleanup = lhs.is_a?(AST::MethodCall) && %w[values keys].include?(lhs.name.to_s) && - lhs.object.type_info&.sharded? + lhs.object.type_info.sharded? cleanup_line = src_needs_cleanup ? "defer pipe_src_list.deinit(rt.heapAlloc());\n" : "" src_decl = src_needs_cleanup ? "var pipe_src_list" : "const pipe_src_list" items_block = build_pipe_items_block(lhs_type, "rt.heapAlloc()") @@ -4300,8 +4303,11 @@ def list_concurrent_source_setup_iz(lhs) sig { params(lhs: T.untyped).returns(Type) } def concurrent_list_item_type(lhs) if lhs.is_a?(AST::RangeLit) - elem = lhs.type_info&.tense_type&.element_type&.resolved || - lhs.start.full_type || :Int64 + # tense_type is legitimately nil for a non-tense source; the + # range start is an evaluatable node, so its full_type is the + # invariant-guaranteed fallback (no dead :Int64 guard). + elem = lhs.type_info.tense_type&.element_type&.resolved || + lhs.start.full_type return Type.new(elem) end Type.new(lhs.type_info.element_type.resolved) diff --git a/src/backends/pipeline_rewriter.rb b/src/backends/pipeline_rewriter.rb index ddb8fba5a..7afdbd76c 100644 --- a/src/backends/pipeline_rewriter.rb +++ b/src/backends/pipeline_rewriter.rb @@ -141,11 +141,11 @@ def rewrite_pipeline(node) TERMINAL_FOLDS.any? { |t| terminal.is_a?(t) } # Infinite streams (~T[INF]) are included only when a LimitOp stage is present: # they require LIMIT to be finite. Other stream types bypass unconditionally. - inf_with_limit = real_source.type_info&.inf_stream? && + inf_with_limit = real_source.type_info.inf_stream? && stages.any? { |s| s.is_a?(AST::LimitOp) } - if (real_source.is_a?(AST::RangeLit) || real_source.type_info&.dynamic_stream? || - real_source.type_info&.open_stream? || - real_source.type_info&.bounded_stream? || inf_with_limit) && is_range_fold_terminal && + if (real_source.is_a?(AST::RangeLit) || real_source.type_info.dynamic_stream? || + real_source.type_info.open_stream? || + real_source.type_info.bounded_stream? || inf_with_limit) && is_range_fold_terminal && stages.all? { |s| FUSIBLE_STAGES.any? { |t| s.is_a?(t) } } patch_chain_source!(node, real_source) unless real_source.equal?(chain[:source]) return node @@ -166,8 +166,8 @@ def rewrite_pipeline(node) # handles it as a lazy while loop (lower_stream_index via unwrap_range_chain). # inf_with_limit reuses the variable already computed above. is_stream_index = terminal.is_a?(AST::IndexOp) && - (real_source.type_info&.dynamic_stream? || real_source.type_info&.open_stream? || - real_source.type_info&.bounded_stream? || inf_with_limit) && + (real_source.type_info.dynamic_stream? || real_source.type_info.open_stream? || + real_source.type_info.bounded_stream? || inf_with_limit) && stages.all? { |s| FUSIBLE_STAGES.any? { |t| s.is_a?(t) } } if is_stream_index patch_chain_source!(node, real_source) unless real_source.equal?(chain[:source]) @@ -234,9 +234,10 @@ def rewrite_pipeline(node) call = AST::FuncCall.new(rhs.token, rhs.name, [lhs_node]) call.full_type = node.full_type call.storage = node.storage - config = STD_LIB[rhs.name] + config = IntrinsicRegistry.sig(STD_LIB, rhs.name) if config - call.zig_pattern = config.is_a?(Array) ? config.first[:zig] : config[:zig] + sig0 = config.is_a?(Array) ? config.first : config + call.zig_pattern = sig0.emit&.zig end return call end @@ -313,6 +314,8 @@ def fuse_pipeline(smooth_node, source, stages, terminal) body = [] # 1. Initialize result container or accumulator(s) + # Inner Literal/BinaryOp/UnaryOp self-derive their type from + # structure (AST::*#full_type); statements derive Void. No stamping. init_nodes = build_init(terminal, res_var, token, smooth_node) body.concat(init_nodes) @@ -334,22 +337,24 @@ def fuse_pipeline(smooth_node, source, stages, terminal) stage_inits = [] res_type = smooth_node.full_type + # Loop-body nodes self-derive their type from structure + # (Literal/BinaryOp/UnaryOp via AST::*#full_type; statements Void). loop_body = build_recursive_body(stages, terminal, current_it, res_var, token, stage_inits, res_type) body.concat(stage_inits) # 3. Create ForEach loop is_each = terminal.is_a?(AST::EachOp) foreach = AST::ForEach.new(token, it_var, source.dup, loop_body, nil, is_each) - foreach.full_type = :Void + foreach.full_type = Type.new(:Void) foreach.instance_variable_set(:@var_used, true) body << foreach # 4. Post-loop guards if terminal.is_a?(AST::MinOp) || terminal.is_a?(AST::MaxOp) found_ident = AST::Identifier.new(token, "#{res_var}_found") - found_ident.full_type = :Bool + found_ident.full_type = Type.new(:Bool) guard = AST::Assert.new(token, found_ident, "MIN/MAX applied to empty list") - guard.full_type = :Void + guard.full_type = Type.new(:Void) body << guard end @@ -359,7 +364,7 @@ def fuse_pipeline(smooth_node, source, stages, terminal) # Return the ForEach directly (or wrap in a sequence if there are init nodes). return foreach if body.length == 1 wrapper = AST::BlockExpr.new(token, body, nil) - wrapper.full_type = :Void + wrapper.full_type = Type.new(:Void) return wrapper end @@ -368,9 +373,9 @@ def fuse_pipeline(smooth_node, source, stages, terminal) if terminal.is_a?(AST::AverageOp) avg_var = "#{res_var}_avg" zero = AST::Literal.new(token, :NUMBER, 0.0) - zero.full_type = :Float64 + zero.full_type = Type.new(:Float64) avg_decl = AST::VarDecl.new(token, avg_var, nil, zero.dup, true) - avg_decl.full_type = :Float64 + avg_decl.full_type = Type.new(:Float64) avg_decl.storage = :stack avg_decl.slot_size = 1 avg_decl.instance_variable_set(:@var_used, true) @@ -379,20 +384,20 @@ def fuse_pipeline(smooth_node, source, stages, terminal) sum_id = AST::Identifier.new(token, "#{res_var}_sum") cnt_id = AST::Identifier.new(token, "#{res_var}_cnt") - sum_id.full_type = :Float64 - cnt_id.full_type = :Float64 + sum_id.full_type = Type.new(:Float64) + cnt_id.full_type = Type.new(:Float64) cond = AST::BinaryOp.new(token, cnt_id.dup, :GT, zero.dup) - cond.full_type = :Bool + cond.full_type = Type.new(:Bool) div = AST::BinaryOp.new(token, sum_id, :DIV, cnt_id) - div.full_type = :Float64 + div.full_type = Type.new(:Float64) avg_assign = AST::Assignment.new(token, avg_var, div) - avg_assign.full_type = :Float64 + avg_assign.full_type = Type.new(:Float64) guard = AST::IfStatement.new(token, cond, [avg_assign], nil) - guard.full_type = :Void + guard.full_type = Type.new(:Void) body << guard result = AST::Identifier.new(token, avg_var) - result.full_type = :Float64 + result.full_type = Type.new(:Float64) else result = build_final_result(terminal, res_var, token, smooth_node) end @@ -420,14 +425,14 @@ def build_init(terminal, res_var, token, smooth_node) when AST::AverageOp # Two accumulators: sum and count sum_decl = AST::VarDecl.new(token, "#{res_var}_sum", nil, AST::Literal.new(token, :NUMBER, 0.0), true) - sum_decl.full_type = :Float64 + sum_decl.full_type = Type.new(:Float64) sum_decl.storage = :stack sum_decl.slot_size = 1 sum_decl.instance_variable_set(:@var_used, true) sum_decl.var_mutated = true cnt_decl = AST::VarDecl.new(token, "#{res_var}_cnt", nil, AST::Literal.new(token, :NUMBER, 0.0), true) - cnt_decl.full_type = :Float64 + cnt_decl.full_type = Type.new(:Float64) cnt_decl.storage = :stack cnt_decl.slot_size = 1 cnt_decl.instance_variable_set(:@var_used, true) @@ -437,9 +442,9 @@ def build_init(terminal, res_var, token, smooth_node) when AST::AnyOp, AST::AllOp init_val = terminal.is_a?(AST::AllOp) val = AST::Literal.new(token, :BOOLEAN, init_val) - val.full_type = :Bool + val.full_type = Type.new(:Bool) decl = AST::VarDecl.new(token, res_var, nil, val, true) - decl.full_type = :Bool + decl.full_type = Type.new(:Bool) decl.storage = :stack decl.slot_size = 1 decl.instance_variable_set(:@var_used, true) @@ -455,7 +460,7 @@ def build_init(terminal, res_var, token, smooth_node) [decl] when AST::FindOp val = AST::Literal.new(token, :NIL, nil) - val.full_type = :NIL + val.full_type = Type.new(:NIL) decl = AST::VarDecl.new(token, res_var, nil, val, true) decl.full_type = smooth_node.full_type decl.storage = :stack @@ -466,18 +471,18 @@ def build_init(terminal, res_var, token, smooth_node) when AST::MinOp, AST::MaxOp # Found-flag pattern: first element always sets result, subsequent compare zero = AST::Literal.new(token, :NUMBER, 0.0) - zero.full_type = :Float64 + zero.full_type = Type.new(:Float64) val_decl = AST::VarDecl.new(token, res_var, nil, zero, true) - val_decl.full_type = :Float64 + val_decl.full_type = Type.new(:Float64) val_decl.storage = :stack val_decl.slot_size = 1 val_decl.instance_variable_set(:@var_used, true) val_decl.var_mutated = true found_init = AST::Literal.new(token, :BOOLEAN, false) - found_init.full_type = :Bool + found_init.full_type = Type.new(:Bool) found_decl = AST::VarDecl.new(token, "#{res_var}_found", nil, found_init, true) - found_decl.full_type = :Bool + found_decl.full_type = Type.new(:Bool) found_decl.storage = :stack found_decl.slot_size = 1 found_decl.instance_variable_set(:@var_used, true) @@ -513,7 +518,7 @@ def build_recursive_body(stages, terminal, current_val, res_var, token, stage_in pred = replace_placeholder(stage.expression, current_val) then_branch = build_recursive_body(T.must(remaining), terminal, current_val, res_var, token, stage_inits, res_type) if_stmt = AST::IfStatement.new(stage.token, pred, then_branch, nil) - if_stmt.full_type = :Void + if_stmt.full_type = Type.new(:Void) [if_stmt] when AST::SelectOp expr = replace_placeholder(stage.expression, current_val) @@ -536,14 +541,14 @@ def build_recursive_body(stages, terminal, current_val, res_var, token, stage_in pred = replace_placeholder(stage.expression, current_val) then_branch = build_recursive_body(T.must(remaining), terminal, current_val, res_var, token, stage_inits, res_type) if_stmt = AST::IfStatement.new(stage.token, pred, then_branch, [AST::BreakNode.new(stage.token)]) - if_stmt.full_type = :Void + if_stmt.full_type = Type.new(:Void) [if_stmt] when AST::SkipOp cnt_var = next_var("__skip_cnt") zero = AST::Literal.new(token, :INT64, 0) - zero.full_type = :Int64 + zero.full_type = Type.new(:Int64) cnt_decl = AST::VarDecl.new(token, cnt_var, nil, zero, true) - cnt_decl.full_type = :Int64 + cnt_decl.full_type = Type.new(:Int64) cnt_decl.storage = :stack cnt_decl.slot_size = 1 cnt_decl.instance_variable_set(:@var_used, true) @@ -551,26 +556,26 @@ def build_recursive_body(stages, terminal, current_val, res_var, token, stage_in stage_inits << cnt_decl cnt_ident = AST::Identifier.new(token, cnt_var) - cnt_ident.full_type = :Int64 + cnt_ident.full_type = Type.new(:Int64) one = AST::Literal.new(token, :INT64, 1) - one.full_type = :Int64 + one.full_type = Type.new(:Int64) increment = AST::Assignment.new(token, cnt_ident, AST::BinaryOp.new(token, cnt_ident.dup, :ADD, one)) - increment.full_type = :Void + increment.full_type = Type.new(:Void) skip_n = stage.count.dup cond = AST::BinaryOp.new(token, cnt_ident.dup, :LTE, skip_n) - cond.full_type = :Bool + cond.full_type = Type.new(:Bool) skip_if = AST::IfStatement.new(token, cond, [AST::ContinueNode.new(token)], nil) - skip_if.full_type = :Void + skip_if.full_type = Type.new(:Void) rest = build_recursive_body(T.must(remaining), terminal, current_val, res_var, token, stage_inits, res_type) [increment, skip_if] + rest when AST::LimitOp cnt_var = next_var("__lim_cnt") zero = AST::Literal.new(token, :INT64, 0) - zero.full_type = :Int64 + zero.full_type = Type.new(:Int64) cnt_decl = AST::VarDecl.new(token, cnt_var, nil, zero, true) - cnt_decl.full_type = :Int64 + cnt_decl.full_type = Type.new(:Int64) cnt_decl.storage = :stack cnt_decl.slot_size = 1 cnt_decl.instance_variable_set(:@var_used, true) @@ -578,17 +583,17 @@ def build_recursive_body(stages, terminal, current_val, res_var, token, stage_in stage_inits << cnt_decl cnt_ident = AST::Identifier.new(token, cnt_var) - cnt_ident.full_type = :Int64 + cnt_ident.full_type = Type.new(:Int64) one = AST::Literal.new(token, :INT64, 1) - one.full_type = :Int64 + one.full_type = Type.new(:Int64) increment = AST::Assignment.new(token, cnt_ident, AST::BinaryOp.new(token, cnt_ident.dup, :ADD, one)) - increment.full_type = :Void + increment.full_type = Type.new(:Void) limit_n = stage.count.dup cond = AST::BinaryOp.new(token, cnt_ident.dup, :GT, limit_n) - cond.full_type = :Bool + cond.full_type = Type.new(:Bool) limit_if = AST::IfStatement.new(token, cond, [AST::BreakNode.new(token)], nil) - limit_if.full_type = :Void + limit_if.full_type = Type.new(:Void) rest = build_recursive_body(T.must(remaining), terminal, current_val, res_var, token, stage_inits, res_type) [increment, limit_if] + rest @@ -601,74 +606,78 @@ def build_recursive_body(stages, terminal, current_val, res_var, token, stage_in def build_terminal_action(terminal, current_val, res_var, token, res_type = nil) res_ident = AST::Identifier.new(token, res_var) res_ident.full_type = res_type if res_type - case terminal + actions = case terminal when AST::SumOp expr = replace_placeholder(terminal.expression, current_val) assign = AST::Assignment.new(token, res_ident, AST::BinaryOp.new(token, res_ident, :ADD, expr)) - assign.full_type = :Void + assign.full_type = Type.new(:Void) [assign] when AST::CountOp expr = replace_placeholder(terminal.expression, current_val) one = AST::Literal.new(token, :INT64, 1) increment = AST::Assignment.new(token, res_ident, AST::BinaryOp.new(token, res_ident, :ADD, one)) - increment.full_type = :Void + increment.full_type = Type.new(:Void) if_stmt = AST::IfStatement.new(token, expr, [increment], nil) - if_stmt.full_type = :Void + if_stmt.full_type = Type.new(:Void) [if_stmt] when AST::AverageOp expr = replace_placeholder(terminal.expression, current_val) sum_ident = AST::Identifier.new(token, "#{res_var}_sum") cnt_ident = AST::Identifier.new(token, "#{res_var}_cnt") + # AVERAGE's sum/cnt accumulators are Float64 by the desugar's own + # definition (same as build_init / build_final_result type them). + sum_ident.full_type = Type.new(:Float64) + cnt_ident.full_type = Type.new(:Float64) [AST::Assignment.new(token, sum_ident, AST::BinaryOp.new(token, sum_ident, :ADD, expr)), AST::Assignment.new(token, cnt_ident, AST::BinaryOp.new(token, cnt_ident, :ADD, AST::Literal.new(token, :NUMBER, 1.0)))] when AST::AnyOp expr = replace_placeholder(terminal.expression, current_val) set_true = AST::Assignment.new(token, res_ident, AST::Literal.new(token, :BOOLEAN, true)) - set_true.full_type = :Void + set_true.full_type = Type.new(:Void) if_stmt = AST::IfStatement.new(token, expr, [set_true, AST::BreakNode.new(token)], nil) - if_stmt.full_type = :Void + if_stmt.full_type = Type.new(:Void) [if_stmt] when AST::AllOp expr = replace_placeholder(terminal.expression, current_val) set_false = AST::Assignment.new(token, res_ident, AST::Literal.new(token, :BOOLEAN, false)) - set_false.full_type = :Void + set_false.full_type = Type.new(:Void) if_stmt = AST::IfStatement.new(token, AST::UnaryOp.new(token, :NOT, expr), [set_false, AST::BreakNode.new(token)], nil) - if_stmt.full_type = :Void + if_stmt.full_type = Type.new(:Void) [if_stmt] when AST::ReduceOp expr = replace_placeholder(terminal.expression, current_val) expr = replace_named_placeholder(expr, "acc", res_ident) assign = AST::Assignment.new(token, res_ident, expr) - assign.full_type = :Void + assign.full_type = Type.new(:Void) [assign] when AST::FindOp expr = replace_placeholder(terminal.expression, current_val) assign = AST::Assignment.new(token, res_ident, current_val.dup) - assign.full_type = :Void + assign.full_type = Type.new(:Void) if_stmt = AST::IfStatement.new(token, expr, [assign, AST::BreakNode.new(token)], nil) - if_stmt.full_type = :Void + if_stmt.full_type = Type.new(:Void) [if_stmt] when AST::MinOp, AST::MaxOp expr = replace_placeholder(terminal.expression, current_val) op = terminal.is_a?(AST::MinOp) ? :LT : :GT found_ident = AST::Identifier.new(token, "#{res_var}_found") - found_ident.full_type = :Bool + found_ident.full_type = Type.new(:Bool) # if !found || expr < res { res = expr; found = true } not_found = AST::UnaryOp.new(token, :NOT, found_ident.dup) - not_found.full_type = :Bool + not_found.full_type = Type.new(:Bool) cmp = AST::BinaryOp.new(token, expr, op, res_ident.dup) - cmp.full_type = :Bool + cmp.full_type = Type.new(:Bool) cond = AST::BinaryOp.new(token, not_found, :OR, cmp) - cond.full_type = :Bool + cond.full_type = Type.new(:Bool) assign_val = AST::Assignment.new(token, res_ident, expr.dup) - assign_val.full_type = :Void + assign_val.full_type = Type.new(:Void) set_found = AST::Assignment.new(token, found_ident, AST::Literal.new(token, :BOOLEAN, true)) - set_found.full_type = :Void + set_found.full_type = Type.new(:Void) if_stmt = AST::IfStatement.new(token, cond, [assign_val, set_found], nil) - if_stmt.full_type = :Void + if_stmt.full_type = Type.new(:Void) [if_stmt] when AST::EachOp terminal.body.map { |s| replace_placeholder(s, current_val) } @@ -678,16 +687,22 @@ def build_terminal_action(terminal, current_val, res_var, token, res_type = nil) inner_it_var = next_var("__it") inner_it = AST::Identifier.new(token, inner_it_var) + # inner_it iterates inner_expr's elements — its type IS that + # element type (not a guess; derived from the flattened array). + if inner_expr.full_type + et = Type.new(inner_expr.full_type).element_type + inner_it.full_type = et if et + end append = AST::MethodCall.new(token, res_ident, "append", [inner_it.dup]) - append.full_type = :Void - append.zig_pattern = STD_LIB["append"][:zig] - append.matched_stdlib_def = STD_LIB["append"] + append.full_type = Type.new(:Void) + append.zig_pattern = IntrinsicRegistry.sig(STD_LIB, "append").emit&.zig + append.matched_stdlib_def = IntrinsicRegistry.sig(STD_LIB, "append") # Iterate directly over the expression (avoids ArrayList/slice confusion). # Mark collection as a slice so the transpiler uses &expr, not .items. inner_foreach = AST::ForEach.new(token, inner_it_var, inner_expr, [append], nil, false) - inner_foreach.full_type = :Void + inner_foreach.full_type = Type.new(:Void) inner_foreach.instance_variable_set(:@var_used, true) [inner_foreach] @@ -695,23 +710,24 @@ def build_terminal_action(terminal, current_val, res_var, token, res_type = nil) # Set insert: result is a T[]@set; insert deduplicates in O(1). key_expr = replace_placeholder(terminal.expression, current_val) insert_call = AST::MethodCall.new(token, res_ident.dup, "insert", [key_expr]) - insert_call.full_type = :Void + insert_call.full_type = Type.new(:Void) insert_call.zig_pattern = "try {0}.insert({alloc}, {1})" - insert_call.matched_stdlib_def = STD_LIB["insert"] if STD_LIB.key?("insert") + insert_call.matched_stdlib_def = IntrinsicRegistry.sig(STD_LIB, "insert") if STD_LIB.key?("insert") [insert_call] when nil, AST::SelectOp, AST::WhereOp, AST::TapOp, AST::TakeWhileOp # Produces a list call = AST::MethodCall.new(token, res_ident, "append", [current_val.dup]) - call.full_type = :Void - call.zig_pattern = STD_LIB["append"][:zig] - call.matched_stdlib_def = STD_LIB["append"] + call.full_type = Type.new(:Void) + call.zig_pattern = IntrinsicRegistry.sig(STD_LIB, "append").emit&.zig + call.matched_stdlib_def = IntrinsicRegistry.sig(STD_LIB, "append") [call] else call = AST::MethodCall.new(token, res_ident, "append", [current_val.dup]) - call.zig_pattern = STD_LIB["append"][:zig] - call.matched_stdlib_def = STD_LIB["append"] + call.zig_pattern = IntrinsicRegistry.sig(STD_LIB, "append").emit&.zig + call.matched_stdlib_def = IntrinsicRegistry.sig(STD_LIB, "append") [call] end + actions end sig { params(terminal: T.untyped, res_var: String, token: Lexer::Token, smooth_node: AST::BinaryOp).returns(T.any(AST::BinaryOp, AST::Identifier)) } @@ -719,10 +735,10 @@ def build_final_result(terminal, res_var, token, smooth_node) if terminal.is_a?(AST::AverageOp) sum_ident = AST::Identifier.new(token, "#{res_var}_sum") cnt_ident = AST::Identifier.new(token, "#{res_var}_cnt") - sum_ident.full_type = :Float64 - cnt_ident.full_type = :Float64 + sum_ident.full_type = Type.new(:Float64) + cnt_ident.full_type = Type.new(:Float64) div = AST::BinaryOp.new(token, sum_ident, :DIV, cnt_ident) - div.full_type = :Float64 + div.full_type = Type.new(:Float64) div else res = AST::Identifier.new(token, res_var) diff --git a/src/mir/concurrency_checks.rb b/src/mir/concurrency_checks.rb index 6457ded04..9c707b84a 100644 --- a/src/mir/concurrency_checks.rb +++ b/src/mir/concurrency_checks.rb @@ -157,7 +157,7 @@ def check_reentrant!(fn, sig_lookup, error_handler) next unless sig.is_a?(FunctionSignature) && sig.requires && !sig.requires.empty? sig.params.each_with_index do |param, idx| - pname = param[:name].to_s + pname = param.name.to_s next unless sig.requires.key?(pname) arg = node.args[idx] next unless arg @@ -229,7 +229,7 @@ def walk_scope_for_nested_with(stmts, &blk) sig { params(with_block: T.untyped, fn: T.untyped).returns(T::Set[T.untyped]) } def collect_held_params(with_block, fn) return Set.new unless fn.respond_to?(:params) - param_names = fn.params.map { |p| p[:name].to_s }.to_set + param_names = fn.params.map { |p| p.name.to_s }.to_set out = Set.new (with_block.capabilities || []).each do |cap| n = cap_var_name(cap[:var_node]) diff --git a/src/mir/control_flow.rb b/src/mir/control_flow.rb index 2b611c96f..a802a666a 100644 --- a/src/mir/control_flow.rb +++ b/src/mir/control_flow.rb @@ -507,9 +507,9 @@ def self.analyze(fn_node, can_fail_fns: nil, schema_lookup: nil) def init_entry_state state = {} (@fn_node.params || []).each do |p| - next unless p[:takes] - name = p[:name].to_s - ti = p[:type].is_a?(Type) ? p[:type] : (Type.new(p[:type] || :Any) rescue nil) + next unless p.takes + name = p.name.to_s + ti = p.type || Type.new(:Any) needs = ti ? ti.needs_explicit_cleanup?(:heap, @schema_lookup) : true state[name] = OwnerEntry.new(state: OWNED, allocator: :heap, needs_cleanup: needs) end @@ -1970,8 +1970,7 @@ def _collect_share_moves(node, names) if source.is_a?(AST::Identifier) ti = source.type_info rescue nil - ti = Type.new(ti) if ti && !ti.is_a?(Type) - return if ti.is_a?(Type) && ti.shared? + return if ti&.shared? names << source.name.to_s return end diff --git a/src/mir/escape_analysis.rb b/src/mir/escape_analysis.rb index ab1a80c1e..028473c4f 100644 --- a/src/mir/escape_analysis.rb +++ b/src/mir/escape_analysis.rb @@ -235,7 +235,7 @@ def self.analyze!(fn_nodes, heap_fns:, promotion_plans: {}) end if callee_name && heap_fns.include?(callee_name) ti = node.type_info rescue nil - ti.provenance = :heap if ti.is_a?(Type) && !ti.heap_provenance? + ti.provenance = :heap if ti && !ti.heap_provenance? end end @@ -313,7 +313,7 @@ def self.analyze!(fn_nodes, heap_fns:, promotion_plans: {}) args = call.args || [] callee_fn.params.each_with_index do |param, idx| - next unless param[:takes] + next unless param.takes arg = args[idx] next unless arg @@ -360,8 +360,8 @@ def self.analyze!(fn_nodes, heap_fns:, promotion_plans: {}) args = call.args || [] callee_fn.params.each_with_index do |param, idx| - next unless param[:mutable] - param_t = param[:type] + next unless param.mutable + param_t = param.type param_t = param_t.is_a?(Type) ? param_t : (Type.new(param_t) rescue nil) next unless param_t && param_t.list_collection? @@ -541,8 +541,7 @@ def self.analyze!(fn_nodes, heap_fns:, promotion_plans: {}) next unless (local_decl.is_a?(AST::VarDecl) || (local_decl.is_a?(AST::BindExpr) && local_decl.mode == :decl)) && local_decl.name.to_s == rhs.name local_decl.storage = :heap decl_ti = local_decl.type_info rescue nil - decl_ti = Type.new(decl_ti) if decl_ti && !decl_ti.is_a?(Type) - decl_ti.provenance = :heap if decl_ti.is_a?(Type) + decl_ti&.provenance = :heap if rhs.symbol rhs.symbol.storage = :heap sym_reg = rhs.symbol.reg @@ -555,8 +554,7 @@ def self.analyze!(fn_nodes, heap_fns:, promotion_plans: {}) next unless (outer_decl.is_a?(AST::VarDecl) || (outer_decl.is_a?(AST::BindExpr) && outer_decl.mode == :decl)) && outer_decl.name.to_s == outer_name outer_decl.storage = :heap outer_ti = outer_decl.type_info rescue nil - outer_ti = Type.new(outer_ti) if outer_ti && !outer_ti.is_a?(Type) - outer_ti.provenance = :heap if outer_ti.is_a?(Type) + outer_ti&.provenance = :heap outer_decl.symbol.storage = :heap if outer_decl.symbol end end @@ -670,18 +668,26 @@ def self.tag_transitive_provenance!(fn_nodes, heap_fns) val = node.value callee_name = val.is_a?(AST::FuncCall) ? val.name.to_s : nil next unless callee_name && heap_fns.include?(callee_name) - T.must(node.type_info).provenance = :heap if node.type_info.is_a?(Type) + node.type_info.provenance = :heap if node.is_a?(AST::BindExpr) && node.mode == :assign + # decl is a lookup result (legitimately nil on miss); its + # type_info is invariant-guaranteed non-nil when found. The + # old `decl&.type_info.provenance=` was a BROKEN chain (the + # `&.` short-circuited to nil, then `.provenance=` ran on + # nil). Guard decl only; the type_info guard is dead. decl = e3_find_decl(fn.body, node.name) - decl.type_info.provenance = :heap if decl&.type_info.is_a?(Type) + decl.type_info.provenance = :heap if decl end when AST::Assignment val = node.value callee_name = val.is_a?(AST::FuncCall) ? val.name.to_s : nil next unless callee_name && heap_fns.include?(callee_name) - sym = node.name.symbol + # Same as above: sym.reg is a lookup back-pointer (nil on + # miss); type_info is invariant-guaranteed when present. + # Guard decl only — no type_info guard, no broken `&.` chain. + sym = node.name.symbol decl = sym&.reg - decl.type_info.provenance = :heap if decl&.type_info.is_a?(Type) + decl.type_info.provenance = :heap if decl end end end @@ -725,7 +731,7 @@ def self.propagate_caller_sync!(fn_nodes) next if sites.empty? callee_fn.params.each_with_index do |param, idx| - entry = param[:symbol] + entry = param.symbol next unless entry # ── sync axis ──────────────────────────────────────────────── @@ -802,20 +808,20 @@ def self.propagate_caller_sync!(fn_nodes) # True when the param's declared type carried explicit sync (so the # entry.sync currently reflects an annotation, not a propagated value). - sig { params(param: T::Hash[Symbol, T.untyped]).returns(T.nilable(T::Boolean)) } + sig { params(param: AST::Param).returns(T.nilable(T::Boolean)) } private_class_method def self.param_sync_was_declared?(param) - t = param[:type] + t = param.type t.is_a?(Type) && t.any_sync? end - sig { params(fn_node: AST::FunctionDef, param: T::Hash[Symbol, T.untyped], sync: Symbol).returns(T::Boolean) } + sig { params(fn_node: AST::FunctionDef, param: AST::Param, sync: Symbol).returns(T::Boolean) } private_class_method def self.param_accepts_caller_sync?(fn_node, param, sync) - t = param[:type] + t = param.type return true if t.is_a?(Type) && (t.shared? || t.any_sync?) return true unless sync == :atomic requires = fn_node.respond_to?(:requires) ? fn_node.requires : nil - families = requires && requires[param[:name].to_s] + families = requires && requires[param.name.to_s] return false unless families.respond_to?(:include?) case sync @@ -903,13 +909,13 @@ def self.tag_carry_call_sites!(fn_nodes) when AST::FuncCall if carry_fns.include?(node.name.to_s) ti = node.type_info rescue nil - ti.provenance = :heap if ti.is_a?(Type) && !ti.heap_provenance? + ti.provenance = :heap if ti && !ti.heap_provenance? end node.args.each { |a| e3_mark_carry_expr!(a, carry_fns) } when AST::MethodCall if carry_fns.include?(node.name.to_s) ti = node.type_info rescue nil - ti.provenance = :heap if ti.is_a?(Type) && !ti.heap_provenance? + ti.provenance = :heap if ti && !ti.heap_provenance? end e3_mark_carry_expr!(node.object, carry_fns) node.args.each { |a| e3_mark_carry_expr!(a, carry_fns) } diff --git a/src/mir/fsm_lowering.rb b/src/mir/fsm_lowering.rb index 73724f699..a1f959645 100644 --- a/src/mir/fsm_lowering.rb +++ b/src/mir/fsm_lowering.rb @@ -119,7 +119,7 @@ def lower_step_stmts(stmts, no_result:, ctx_id: nil, exit_promote: nil) result_mir.concat(last_pending) last_is_assign = last_step[:expr].is_a?(AST::Assignment) - expr_type = last_step[:expr].full_type || :Void + expr_type = last_step[:expr].full_type is_step_void = expr_type.nil? || expr_type == :Void || (expr_type.respond_to?(:to_s) && Type.new(expr_type).zig_type == "void") @@ -198,7 +198,7 @@ def wrap_step_as_stmt(step, mir) return MIR::Let.new(step[:binding], mir, false, nil, nil) end return mir if mir.respond_to?(:stmt?) && mir.stmt? - expr_type = step[:expr].full_type || :Void + expr_type = step[:expr].full_type is_void_step = expr_type.nil? || expr_type == :Void || (expr_type.respond_to?(:to_s) && Type.new(expr_type).zig_type == "void") MIR::ExprStmt.new(mir, !is_void_step) diff --git a/src/mir/fsm_transform.rb b/src/mir/fsm_transform.rb index 7e04673aa..f6d645501 100644 --- a/src/mir/fsm_transform.rb +++ b/src/mir/fsm_transform.rb @@ -141,7 +141,7 @@ def collect_body_locals(stmts) when Array node.each { |n| visit.call(n) } when AST::VarDecl - entry = local_entry(node.name, node.full_type || node.type || node.value&.full_type) + entry = local_entry(node.name, node.full_type) if entry && !seen[entry[:name]] seen[entry[:name]] = true out << entry @@ -149,7 +149,7 @@ def collect_body_locals(stmts) visit.call(node.value) if node.value when AST::BindExpr if node.mode == :decl - entry = local_entry(node.name, node.full_type || node.type || node.value&.full_type) + entry = local_entry(node.name, node.full_type) if entry && !seen[entry[:name]] seen[entry[:name]] = true # Mark suspend-result decls so the caller can include @@ -257,7 +257,7 @@ def suspend_value?(value) return true if value.is_a?(AST::NextExpr) return false unless value.is_a?(AST::FuncCall) || value.is_a?(AST::MethodCall) md = value.matched_stdlib_def - !!(md && md[:suspends] && md[:fsm_setup]) + !!(md && md.emit&.suspends && md.emit&.fsm_setup) end sig { params(name: T.untyped, type_obj: T.untyped).returns(T.nilable(T::Hash[T.untyped, T.untyped])) } diff --git a/src/mir/fsm_transform/segments.rb b/src/mir/fsm_transform/segments.rb index a0e1b1472..7fa43ddf4 100644 --- a/src/mir/fsm_transform/segments.rb +++ b/src/mir/fsm_transform/segments.rb @@ -324,7 +324,7 @@ def classify_suspend(stmt) def io_suspending_call?(call_node) T.bind(self, T.untyped) rescue nil md = call_node.matched_stdlib_def - !!(md && md[:suspends] && md[:fsm_setup]) + !!(md && md.emit&.suspends && md.emit&.fsm_setup) end sig { params(expr: T.untyped).returns(T::Boolean) } diff --git a/src/mir/fsm_transform/suspend_resolvers.rb b/src/mir/fsm_transform/suspend_resolvers.rb index 5e987680c..9656cc9d8 100644 --- a/src/mir/fsm_transform/suspend_resolvers.rb +++ b/src/mir/fsm_transform/suspend_resolvers.rb @@ -61,11 +61,12 @@ def resolve_io(io_tail, ctx, lowering) id = ctx[:id] bg_rt = ctx[:bg_rt] - setup_ops = stdlib_def[:fsm_setup] || [] - finish_block = stdlib_def[:fsm_finish_block] || [] - finish_value = stdlib_def[:fsm_finish_value] - state_decls = stdlib_def[:fsm_state_decls] || [] - state_finalize = stdlib_def[:fsm_state_finalize] || [] + em = stdlib_def.emit + setup_ops = em&.fsm_setup || [] + finish_block = em&.fsm_finish_block || [] + finish_value = em&.fsm_finish_value + state_decls = em&.fsm_state_decls || [] + state_finalize = em&.fsm_state_finalize || [] # Lower call args via the surrounding capture-map context. arg_mirs = (io_tail.call_node.respond_to?(:args) ? diff --git a/src/mir/mir.rb b/src/mir/mir.rb index 01b8d1e1a..3ade9db40 100644 --- a/src/mir/mir.rb +++ b/src/mir/mir.rb @@ -17,6 +17,7 @@ # New nodes here use distinct names to coexist during migration. require "sorbet-runtime" +require_relative "../annotator-helpers/intrinsic_registry" module MIR # Common interface for all MIR nodes. @@ -1794,4 +1795,25 @@ def expr?; true; end :resolved_allocs, :template_kind) do include Expr end + + # Hard flip (EPIC #65): every stdlib_def carrier coerces its payload + # to a FunctionSignature on write. No Hash backdoor -- readers still + # doing entry[:zig]/.dig(:...) will fail loudly, which is the + # intended map of remaining reader-migration work. + module StdlibDefFsCoercion + def stdlib_def=(v) + super(IntrinsicRegistry.fs(v)) + end + + # Struct positional construction (`InlineBc.new(op, args, hash)`) + # assigns the member directly, bypassing the setter -- re-run it + # through the coercing setter so the carrier is always FS. + def initialize(*) + super + self.stdlib_def = stdlib_def + end + end + [RawZig, InlineZig, InlineBc, RawBc, ShardedMapPut, ShardedMapGet].each do |k| + k.prepend(StdlibDefFsCoercion) + end end diff --git a/src/mir/mir_checker.rb b/src/mir/mir_checker.rb index 378c1393f..5b2b6272a 100644 --- a/src/mir/mir_checker.rb +++ b/src/mir/mir_checker.rb @@ -168,16 +168,23 @@ def owned_return_init?(init) return true if init.is_a?(MIR::TryCatch) && init.heap_provenance if init.is_a?(MIR::InlineZig) || init.is_a?(MIR::RawZig) return false unless stdlib_owned_return?(init) - ret = init.stdlib_def[:return] - return !(ret == :Void || ret.nil?) + # Receiver-dependent (Proc-resolved) returns -- collection + # intrinsics like pool.insert/get -- are not a static owned- + # return declaration; their ownership is governed by + # allocates/borrows, handled elsewhere. Only a static return + # type counts here (matches pre-FS behavior, which read only + # the static `:return` key). + return false unless init.stdlib_def.fixed_return? + ret = init.stdlib_def.return_type + return !ret.void? end false end sig { params(node: T.untyped).returns(T::Boolean) } def stdlib_owned_return?(node) - return false unless node.stdlib_def&.dig(:allocates) - return true if node.stdlib_def[:return_alloc] == :heap + return false unless node.stdlib_def&.emit&.allocates + return true if node.stdlib_def.emit&.return_alloc == :heap return false unless node.is_a?(MIR::InlineZig) allocs = node.allocs @@ -390,9 +397,10 @@ def scan_expr_for_hpt_leak!(node, leaks) leaks << error(:HPT_LEAK, "try-catch", "heap-returning try/catch result not bound to variable (leak)") end - if (node.is_a?(MIR::InlineZig) || node.is_a?(MIR::RawZig)) && stdlib_owned_return?(node) - ret = node.stdlib_def[:return] - unless ret == :Void || ret.nil? + if (node.is_a?(MIR::InlineZig) || node.is_a?(MIR::RawZig)) && stdlib_owned_return?(node) && + node.stdlib_def.fixed_return? + ret = node.stdlib_def.return_type + unless ret.void? label = node.is_a?(MIR::RawZig) ? "RawZig block" : "stdlib call" leaks << error(:HPT_LEAK, node.reason, "#{label} with allocates:true result not bound to variable (leak)") @@ -507,7 +515,7 @@ def verify_alloc_cleanup_match!(allocs, cleanups, errdefer_destroy_names = Set.n # INV-COPY-CLEANUP: primitives and Id (value types that can never own # heap memory) must not get a Cleanup node. If they do, needs_explicit_cleanup? # or visit_CopyNode missed the gate. - if (ti = alloc_marks.first.type_info).is_a?(Type) + if (ti = alloc_marks.first.type_info) no_caps = !ti.any_sync? && !ti.multiowned? && !ti.shared? if no_caps && (ti.primitive? || (ti.generic_instance? && ti.generic_base == :Id)) @errors << error(:COPY_CLEANUP, name, @@ -750,7 +758,7 @@ def expr_has_frame_alloc?(expr) return false unless expr case expr when MIR::InlineZig - return false if expr.stdlib_def&.dig(:mutates_receiver) + return false if expr.stdlib_def&.emit&.mutates_receiver expr.allocs&.any? { |_k, v| v == :frame } when MIR::DupeSlice, MIR::ConcatStr, MIR::HeapCreate, MIR::AllocSlice, MIR::ContainerInit, MIR::MakeList, MIR::DeepCopy, MIR::CapWrap diff --git a/src/mir/mir_emitter.rb b/src/mir/mir_emitter.rb index e7ebd445b..a53aebce1 100644 --- a/src/mir/mir_emitter.rb +++ b/src/mir/mir_emitter.rb @@ -178,8 +178,8 @@ def emit(node) sig { params(node: MIR::InlineBc).returns(String) } def emit_inline_bc_as_zig(node) entry = node.stdlib_def - raise "emit_inline_bc_as_zig: node has no stdlib_def (:#{node.op})" unless entry && entry[:zig] - pattern = entry[:zig].dup + raise "emit_inline_bc_as_zig: node has no stdlib_def (:#{node.op})" unless entry && entry.emit&.zig + pattern = entry.emit.zig.to_s.dup node.args.each_with_index { |a, i| pattern = pattern.gsub("{#{i}}") { emit(a) } } pattern end @@ -216,7 +216,7 @@ def emit_sharded_map_get(node) def sharded_map_template(node) op = node.stdlib_def kind = node.template_kind || :zig - op[kind] or raise "ShardedMap: op has no :#{kind} template (op keys=#{op.keys})" + op.emit&.public_send(kind) or raise "ShardedMap: op has no :#{kind} template (emit=#{op.emit.inspect})" end sig { params(pattern: String, node: T.untyped).returns(String) } @@ -242,8 +242,8 @@ def sharded_map_substitute_common(pattern, node) sig { params(node: T.untyped).returns(String) } def emit_raw_bc_as_zig(node) entry = node.stdlib_def - raise "emit_raw_bc_as_zig: node has no stdlib_def" unless entry && entry[:zig] - pattern = entry[:zig].dup + raise "emit_raw_bc_as_zig: node has no stdlib_def" unless entry && entry.emit&.zig + pattern = entry.emit.zig.to_s.dup node.args.each_with_index { |a, i| pattern = pattern.gsub("{#{i}}") { emit(a) } } pattern end diff --git a/src/mir/mir_lowering.rb b/src/mir/mir_lowering.rb index 16ef48bbf..b6d4a6e74 100644 --- a/src/mir/mir_lowering.rb +++ b/src/mir/mir_lowering.rb @@ -200,7 +200,7 @@ def mir_allocates?(node) when MIR::InlineZig # Only hoist if the node heap-allocates (stdlib_def allocates: true AND # allocs contains a :heap entry -- frame-only intrinsics are excluded). - return false unless node.stdlib_def&.dig(:allocates) + return false unless node.stdlib_def&.emit&.allocates return true unless node.allocs node.allocs.any? { |_k, v| v == :heap } else @@ -848,8 +848,6 @@ def mir_cast(mir_node, from_type, to_type) # (TAKES params) to avoid deferring type resolution to the emitter. sig { params(entry: T::Hash[Symbol, T.untyped], ti: Type, source_node: T.nilable(AST::VarDecl)).returns(T.nilable(T::Boolean)) } def build_drop_entry!(entry, ti, source_node) - ti = Type.new(ti) if ti && !ti.is_a?(Type) - ti = nil unless ti.is_a?(Type) zig_type = case entry[:kind] when :heap_slice @@ -1168,12 +1166,12 @@ def lower_function_def(node) # original name from MIR-level checks (notably the new # INV-CROSS-FRAME-PARAM-ALLOC verifier in mir_checker.rb). mutable_scalar_params = (node.params || []).select { |p| - next false unless p[:mutable] - p_type_obj = p[:type].is_a?(Type) ? p[:type] : (Type.new(p[:type] || :Any) rescue nil) + next false unless p.mutable + p_type_obj = p.type || Type.new(:Any) next false if p_type_obj && (p_type_obj.collection? || (p_type_obj.respond_to?(:needs_pointer_passing?) && p_type_obj.needs_pointer_passing?)) - !transpile_type(p[:type], is_param: true).start_with?("[]", "*") - }.map { |p| p[:name] }.to_set + !transpile_type(p.type, is_param: true).start_with?("[]", "*") + }.map { |p| p.name }.to_set @current_fn_mutable_scalar_params = T.let(mutable_scalar_params, T.nilable(T::Set[T.untyped])) # Collection params: already passed by pointer, skip & at recursive @@ -1183,23 +1181,23 @@ def lower_function_def(node) # callee adds a second `&`, producing `**ArrayList` which Zig's # one-level method auto-deref can't unwrap. @current_fn_collection_params = (node.params || []).select { |p| - p_type_obj = p[:type].is_a?(Type) ? p[:type] : Type.new(p[:type] || :Any) + p_type_obj = p.type || Type.new(:Any) p_type_obj.needs_pointer_passing? || - (p[:mutable] && p_type_obj.list_collection?) - }.map { |p| p[:name] }.to_set + (p.mutable && p_type_obj.list_collection?) + }.map { |p| p.name }.to_set # All param names: used to distinguish params (slices) from locals (ArrayLists) - @current_fn_param_names = (node.params || []).map { |p| p[:name] }.to_set + @current_fn_param_names = (node.params || []).map { |p| p.name }.to_set # Build param list params_mir = (node.params || []).map { |param| - p_name = mutable_scalar_params.include?(param[:name]) ? "_m_#{param[:name]}" : param[:name] - p_type_sym = param[:type].is_a?(Type) ? param[:type].resolved : param[:type] - p_type_obj = param[:type].is_a?(Type) ? param[:type] : Type.new(param[:type] || :Any) + p_name = mutable_scalar_params.include?(param.name) ? "_m_#{param.name}" : param.name + p_type_sym = param.type&.resolved + p_type_obj = param.type || Type.new(:Any) is_user_struct = @struct_schemas&.key?(p_type_sym) # Atomic params need `anytype` so call sites pass the cell itself, # allowing WITH MATCH comptime probes to dispatch by actual family. - sym = param[:symbol] + sym = param.symbol atomic_sync = sym && (sym.sync == :atomic || (sym.sync_families && sym.sync_families.include?(:ATOMIC))) zig_t = if p_type_obj.shared? && p_type_obj.resolved.to_s.match?(/\A[A-Z]\z/) @@ -1211,17 +1209,17 @@ def lower_function_def(node) elsif atomic_sync "anytype" else - transpile_type(param[:type], is_param: true) + transpile_type(param.type, is_param: true) end - zig_t = "*#{zig_t}" if mutable_scalar_params.include?(param[:name]) && zig_t != "anytype" + zig_t = "*#{zig_t}" if mutable_scalar_params.include?(param.name) && zig_t != "anytype" # `pointer_passed`: this param's receiver is a pointer-to-T at the # Zig level, so allocations made inside this function on its behalf # outlive the function. Mirrors `@current_fn_collection_params`'s # criteria so the MIR checker can independently verify the # allocator-routing decision (see INV-CROSS-FRAME-PARAM-ALLOC). pointer_passed = p_type_obj.needs_pointer_passing? || - (param[:mutable] && p_type_obj.list_collection?) || - mutable_scalar_params.include?(param[:name]) + (param.mutable && p_type_obj.list_collection?) || + mutable_scalar_params.include?(param.name) MIR::Param.new(p_name, zig_t, pointer_passed) } @@ -1283,7 +1281,7 @@ def lower_function_def(node) node.uses_frame end uses_frame_or_alloc = has_frame_bindings || node.uses_alloc - ret_type_obj = node.return_type.is_a?(Type) ? node.return_type : Type.new(node.return_type || :Void) + ret_type_obj = node.return_type || Type.new(:Void) # Unwrap `!T` so value-type and string-return classification sees the # payload; otherwise frame save/restore is skipped for error-union returns. bare_ret = if ret_type_obj.respond_to?(:error_union?) && ret_type_obj.error_union? && @@ -1358,8 +1356,8 @@ def lower_function_def(node) # Param suppressions for unused params (node.params || []).each do |p| - next if used_names.include?(p[:name]) - suppress_name = mutable_scalar_params.include?(p[:name]) ? "_m_#{p[:name]}" : p[:name] + next if used_names.include?(p.name) + suppress_name = mutable_scalar_params.include?(p.name) ? "_m_#{p.name}" : p.name prologue << MIR::Suppress.new(suppress_name) end @@ -1376,14 +1374,14 @@ def lower_function_def(node) # Emit AllocMark + Cleanup for TAKES parameters (replaces insert_takes_drops! from MIRPass). # TAKES params own their value from function entry; cleanup is always defer (Cleanup, not ErrCleanup). takes_mir = [] - (node.params || []).select { |p| p[:takes] }.each do |p| - entry = @current_bindings[p[:name].to_s] + (node.params || []).select { |p| p.takes }.each do |p| + entry = @current_bindings[p.name.to_s] next unless entry && entry[:needs_cleanup] - ti = p[:type].is_a?(Type) ? p[:type] : Type.new(p[:type] || :Any) + ti = p.type || Type.new(:Any) drop_entry = entry.dup build_drop_entry!(drop_entry, ti, nil) - takes_mir << MIR::AllocMark.new(p[:name].to_s, entry[:alloc], ti) - takes_mir << MIR::Cleanup.new(zig_safe_name(p[:name].to_s), drop_entry) + takes_mir << MIR::AllocMark.new(p.name.to_s, entry[:alloc], ti) + takes_mir << MIR::Cleanup.new(zig_safe_name(p.name.to_s), drop_entry) end # Lower body (track snapshot types for catch blocks) @@ -1431,7 +1429,7 @@ def lower_function_def(node) prologue + body_mir, :private, false, comptime_params) # Outer function: calls inner, catches errors - call_args = fn_needs_rt ? ["rt"] + (node.params || []).map { |p| p[:name] } : (node.params || []).map { |p| p[:name] } + call_args = fn_needs_rt ? ["rt"] + (node.params || []).map { |p| p.name } : (node.params || []).map { |p| p.name } inner_call = "#{inner_name}(#{call_args.join(', ')})" catch_zig, catch_clause_bodies = build_catch_clauses(node, fn_can_fail) @@ -1483,9 +1481,9 @@ def build_post_outer_fn(node, params_mir, return_type_str, fn_needs_rt, vis, com # names verbatim. Forwarding the user-level name would produce # "use of undeclared identifier" at the wrapper's call site. mutable_scalar = (node.params || []).select { |p| - p[:mutable] && !transpile_type(p[:type], is_param: true).start_with?("[]", "*") - }.map { |p| p[:name] }.to_set - forward_name = ->(p) { mutable_scalar.include?(p[:name]) ? "_m_#{p[:name]}" : p[:name] } + p.mutable && !transpile_type(p.type, is_param: true).start_with?("[]", "*") + }.map { |p| p.name }.to_set + forward_name = ->(p) { mutable_scalar.include?(p.name) ? "_m_#{p.name}" : p.name } arg_idents = (node.params || []).map { |p| MIR::Ident.new(forward_name.call(p)) } arg_idents = [MIR::Ident.new("rt")] + arg_idents if fn_needs_rt @@ -1494,7 +1492,7 @@ def build_post_outer_fn(node, params_mir, return_type_str, fn_needs_rt, vis, com # `anyerror!T`, and any whitespace variants the formatter might # emit. Type#error_union? / Type#void? / Type#payload_type are # the single source of truth. - rt_obj = node.return_type.is_a?(Type) ? node.return_type : (node.return_type ? Type.new(node.return_type) : nil) + rt_obj = node.return_type is_error_union = !!(rt_obj && rt_obj.error_union?) payload_type = is_error_union ? rt_obj.payload_type : rt_obj is_void = !!(payload_type && payload_type.respond_to?(:void?) && payload_type.void?) @@ -1711,7 +1709,7 @@ def lower_func_call(node) callee_sig = @fn_sigs&.dig(node.name) || @fn_sigs&.dig(node.name.to_s) args_mir = node.args.each_with_index.map { |a, idx| # The annotator stamps was_moved when the callee takes ownership of this - # arg on success (param[:takes] || GIVE). That is the SINGLE source of + # arg on success (param.takes || GIVE). That is the SINGLE source of # truth for "callee takes" — the lowering must not re-derive it from # CopyNode/MoveNode wrappers (a COPY into a borrow param is NOT a take). takes = a.was_moved @@ -1735,14 +1733,14 @@ def lower_func_call(node) callee_param = params_list[idx] end callee_wants_mutable_list = - callee_param && callee_param[:mutable] && - callee_param[:type].respond_to?(:list_collection?) && - callee_param[:type].list_collection? + callee_param && callee_param.mutable && + callee_param.type.respond_to?(:list_collection?) && + callee_param.type.list_collection? callee_param_type = if callee_param - callee_param[:type].is_a?(Type) ? callee_param[:type] : (Type.new(callee_param[:type] || :Any) rescue nil) + callee_param.type || Type.new(:Any) end callee_wants_mutable_value = - callee_param && callee_param[:mutable] && a.is_a?(AST::Identifier) && + callee_param && callee_param.mutable && a.is_a?(AST::Identifier) && !callee_wants_mutable_list && !(callee_param_type&.respond_to?(:needs_pointer_passing?) && callee_param_type.needs_pointer_passing?) @@ -1945,8 +1943,8 @@ def lower_intrinsic(node) # When the entry has an explicit :bc_op, prefer it over the AST name so # the BC dispatch key is decoupled from CLEAR's surface naming # (e.g. fileReadAll -> :file_read_all). - if @target == :bc && node.matched_stdlib_def&.dig(:bc) - op_name = node.matched_stdlib_def[:bc_op] || node.name.to_s.to_sym + if @target == :bc && node.matched_stdlib_def&.emit&.bc + op_name = node.matched_stdlib_def.emit&.bc_op || node.name.to_s.to_sym return MIR::InlineBc.new(op_name, mir_args, node.matched_stdlib_def) end @@ -1956,7 +1954,7 @@ def lower_intrinsic(node) # The {alloc} PLACEHOLDER stays in the pattern -- the emitter substitutes it. resolved_allocs = {} if pattern.include?("{alloc}") - alloc_sym = node.matched_stdlib_def&.dig(:alloc) || :node_storage + alloc_sym = node.matched_stdlib_def&.emit&.alloc || :node_storage # Resolve receiver type: MethodCall -> receiver object; UFCS FuncCall -> first arg receiver_type = if node.is_a?(AST::MethodCall) ti = node.object.type_info rescue nil @@ -1969,7 +1967,7 @@ def lower_intrinsic(node) resolved_allocs[:alloc] = resolved # Wrap non-heap strings at TAKES positions in DupeSlice (visible to MIR checker) - stdlib_args = node.matched_stdlib_def&.dig(:args) + stdlib_args = node.matched_stdlib_def&.arg_spec if stdlib_args.is_a?(Array) raw_args = node.is_a?(AST::MethodCall) ? node.args : node.args[1..] raw_args&.each_with_index do |arg_node, ai| @@ -2001,7 +1999,7 @@ def lower_intrinsic(node) # non-literal args pay nothing. We skip it for `:Any` (anytype) and # for arg specs without a concrete declared type (Hash forms whose # `:type` is missing or :Any). - stdlib_args = node.matched_stdlib_def&.dig(:args) + stdlib_args = node.matched_stdlib_def&.arg_spec if stdlib_args.is_a?(Array) args_zig = args_zig.each_with_index.map do |arg_zig, i| coerce_stdlib_arg(arg_zig, stdlib_args[i]) @@ -2117,13 +2115,13 @@ def build_extern_trampoline_call(node) # Skip extern/module functions: their CLEAR types (e.g. String -> []const u8) may differ # from the actual Zig/C types (e.g. [*:0]const u8), breaking implicit coercions. sig = @fn_sigs&.dig(node.name) || @fn_sigs&.dig(node.name.to_sym) || @fn_sigs&.dig(node.name.to_s) - sig_params = (sig&.params || sig&.dig(:params) || []).reject { |p| p[:comptime] } + sig_params = (sig&.params || sig&.dig(:params) || []).reject { |p| p.comptime } arg_field_types = if sig&.module_alias nil else types = sig_params.each_with_index.map do |p, i| next nil unless i < runtime_ast_args.length - pt = p[:type] + pt = p.type pt.is_a?(Type) ? pt.zig_type(is_param: true) : (Type::ZIG_TYPE_MAP[pt] || nil) end types.empty? || types.all?(&:nil?) ? nil : types @@ -2181,7 +2179,7 @@ def extern_call_args_zig(argc, alloc_kind) sig { params(id: Integer, prefix: String, args_tuple_name: String, frame_name: String, arg_codes: T::Array[T.untyped], arg_field_types: NilClass, arg_tuple: String, alloc_kind: T.nilable(Symbol), return_type: Type, call_zig: String, receiver_field: T.nilable(String)).returns(MIR::InlineZig) } def build_extern_trampoline_common(id:, prefix:, args_tuple_name:, frame_name:, arg_codes:, arg_field_types:, arg_tuple:, alloc_kind:, return_type:, call_zig:, receiver_field:) - ret_t = return_type.is_a?(Type) ? return_type : Type.new(return_type || :Void) + ret_t = return_type can_fail = ret_t.error_union? payload_t = can_fail ? ret_t.payload_type : ret_t returns_void = payload_t.resolved == :Void @@ -2260,16 +2258,15 @@ def lower_lambda(node) params_list = sig.params || [] params_mir = [MIR::Param.new("_rt", "*Runtime", false)] + params_list.map { |p| - p_type = p[:type] + p_type = p.type type_str = p_type.is_a?(Type) ? p_type.zig_type(is_param: true) : transpile_type(p_type || :Any, is_param: true) pt_obj = p_type.is_a?(Type) ? p_type : (Type.new(p_type) rescue nil) pp = !!(pt_obj && (pt_obj.respond_to?(:needs_pointer_passing?) && pt_obj.needs_pointer_passing? || - (p[:mutable] && pt_obj.respond_to?(:list_collection?) && pt_obj.list_collection?))) - MIR::Param.new(p[:name], type_str, pp) + (p.mutable && pt_obj.respond_to?(:list_collection?) && pt_obj.list_collection?))) + MIR::Param.new(p.name, type_str, pp) } - ret = sig.return_type || :Void - ret_zig = ret.is_a?(Type) ? ret.zig_type : transpile_type(ret) + ret_zig = sig.return_type.zig_type ret_str = if ret_zig.start_with?("!") || ret_zig.include?("anyerror!") || ret_zig.include?("error{") ret_zig else @@ -2279,7 +2276,7 @@ def lower_lambda(node) # Build body: suppressions + return expr body_mir = [] body_mir << MIR::Suppress.new("_rt") - params_list.each { |p| body_mir << MIR::Suppress.new(p[:name]) } + params_list.each { |p| body_mir << MIR::Suppress.new(p.name) } body_mir << MIR::ReturnStmt.new(lower(node.body)) fn_def = MIR::FnDef.new(fn_name, params_mir, ret_str, body_mir, nil, false, nil) @@ -2301,7 +2298,7 @@ def lower_lambda(node) sig { params(node: AST::ListLit).returns(T.untyped) } def lower_list_lit(node) - ti = node.coerced_type_info || node.type_info || Type.new(node.full_type || :Any) + ti = node.coerced_type_info || node.type_info # Bounded stream: ~T[N] - emit BoundedStream struct with Promise items if ti.respond_to?(:bounded_stream?) && ti.bounded_stream? @@ -2384,7 +2381,7 @@ def lower_list_lit(node) sig { params(node: AST::HashLit).returns(T.untyped) } def lower_hash_lit(node) # HashMaps are always heap-allocated - ti = node.coerced_type_info || node.type_info || Type.new(node.full_type || :Any) + ti = node.coerced_type_info || node.type_info rt_name = @rt_name alloc_str = "#{rt_name}.heapAlloc()" @@ -2514,7 +2511,7 @@ def with_cap_var_name(var_node) # only works for non-Arc parameters. sig { params(var_node: AST::Identifier).returns(T::Array[T.nilable(Symbol)]) } def with_cap_sync_storage(var_node) - if var_node.is_a?(AST::GetField) && var_node.full_type.is_a?(Type) + if var_node.is_a?(AST::GetField) && var_node.full_type ft = var_node.full_type sync = ft.sync storage = case ft.ownership @@ -3368,7 +3365,9 @@ def emit_snapshot_mutable_call(node, with_label) # *Arc, and Arc by value (the BG-capture # case). Mirrors the read-mode SNAPSHOT path. source_unwrap = with_match_unwrap_value(T.must(source_zig)) - st = cap[:resolved_type].is_a?(Type) ? cap[:resolved_type] : Type.new(cap[:resolved_type]) + # cap[:resolved_type] sole producer is var_node.full_type + # (Type|nil via the full_type seam; never a Symbol). + st = cap[:resolved_type] || Type.new(:Any) bare_t_zig = st.bare_data_type.zig_type # AtomicPtr commits surface AtomicConflict; Versioned commits surface # MvccConflict. @@ -3730,7 +3729,7 @@ def lower_do_block(node) elsif code.strip.end_with?(";") code elsif code.strip.end_with?("}") - expr_type = expr.full_type || :Void + expr_type = expr.full_type is_void_expr = expr_type.nil? || expr_type == :Void || (expr_type.respond_to?(:to_s) && Type.new(expr_type).zig_type == "void") is_void_expr = false if mir.is_a?(MIR::BgBlock) @@ -3787,7 +3786,7 @@ def lower_bg_block(node) @bg_block_counter = (@bg_block_counter || 0) + 1 id = @bg_block_counter - 1 - tense_t = Type.new(node.full_type || :"~Void") + tense_t = Type.new(node.full_type) inner_t = Type.new(tense_t.tense_type) inner_zig = inner_t.zig_type promise_zig = tense_t.zig_type @@ -3907,7 +3906,7 @@ def lower_bg_block(node) elsif code.strip.end_with?(";") || code.strip.end_with?("}") code else - expr_type = step[:expr].full_type || :Void + expr_type = step[:expr].full_type is_void_step = expr_type.nil? || expr_type == :Void || (expr_type.respond_to?(:to_s) && Type.new(expr_type).zig_type == "void") is_void_step ? "#{code};" : "_ = #{code};" end @@ -4114,7 +4113,7 @@ def lower_bg_stream_block(node) @stream_gen_counter = (@stream_gen_counter || 0) + 1 id = @stream_gen_counter - 1 - tense_t = Type.new(node.full_type || :"~?Void[]") + tense_t = Type.new(node.full_type) is_inf = tense_t.inf_stream? stream_zig = tense_t.zig_type @@ -4271,7 +4270,7 @@ def lower_yield(node) sig { params(node: AST::NextExpr, alloc_sym: Symbol).returns(T.untyped) } def lower_next_expr(node, alloc_sym = :frame) - promise_type = Type.new(node.expr.full_type || :Void) + promise_type = Type.new(node.expr.full_type) if promise_type.promise_list? # NEXT on ~T[]@list: iterate the promise list, await each promise, collect results. @@ -4329,9 +4328,9 @@ def lower_static_call(node) # bc:true. Both backends consume the same node: Zig emits via # emit_inline_bc_as_zig (substituting {0}, {1}, ... from stdlib_def[:zig]), # BC dispatches by op symbol in compile_inline_bc. - if node.matched_stdlib_def&.dig(:bc) + if node.matched_stdlib_def&.emit&.bc mir_args = node.args.map { |a| hoist_alloc(lower(a), a) } - return MIR::InlineBc.new(node.matched_stdlib_def[:bc_op], + return MIR::InlineBc.new(node.matched_stdlib_def.emit&.bc_op, mir_args, node.matched_stdlib_def) end @@ -5012,11 +5011,7 @@ def lower_smooth(node) # COLLECT only needs to call .next() to read the final value. if rhs.is_a?(AST::CollectOp) left = lower(node.left) - ft = if node.left.full_type - node.left.full_type.is_a?(Type) ? node.left.full_type : Type.new(node.left.full_type) - else - nil - end + ft = node.left.full_type # Collection observable (DISTINCT producing `~T[]@set:observable`): # COLLECT must yield an owned ArrayList(T), not the StreamSetSnapshot # that `next()` returns. Mirrors lower_next_expr's collection branch @@ -5101,7 +5096,7 @@ def lower_smooth(node) elsif rhs.is_a?(AST::FuncCall) # x |> f(y) -> f(x, y) synthetic = AST::FuncCall.new(rhs.token, rhs.name, [node.left] + rhs.args) - synthetic.full_type = node.full_type || rhs.full_type + synthetic.full_type = node.full_type synthetic.storage = node.storage synthetic.zig_pattern = rhs.zig_pattern if rhs.zig_pattern synthetic.coerced_type = rhs.coerced_type if rhs.coerced_type @@ -5124,7 +5119,7 @@ def lower_smooth(node) sig { params(node: AST::BinaryOp).returns(T.untyped) } def lower_or_rescue(node) - t_left = node.left.full_type ? Type.new(node.left.full_type) : nil + t_left = Type.new(node.left.full_type) # CLEAR's auto-propagate strips `!T` from a fallible call's # full_type (so `x = call()` is x: T at the binding level). The # original `!T` is stashed on `error_union_type`. OR-RESCUE needs @@ -5370,7 +5365,9 @@ def lower_get_index(node) # `IF pool[id] AS env`. elem_t = (ti.is_a?(Type) ? ti : Type.new(ti)).element_type elem_name = elem_t.respond_to?(:resolved) ? T.must(elem_t).resolved.to_s : elem_t.to_s - pool_get_def = POOL_METHODS["get"].merge(elem: elem_name) + pool_get_def = IntrinsicRegistry.sig(POOL_METHODS, "get").dup + pool_get_def.emit = (pool_get_def.emit ? pool_get_def.emit.dup : IntrinsicEmit.new) + pool_get_def.emit.elem = elem_name return MIR::InlineBc.new(:get, [target, index], pool_get_def) elsif ti&.set_collection? # @set[item]: membership check — returns ?T (item if present, null otherwise) @@ -5408,7 +5405,7 @@ def lower_struct_lit(node) # err_cleanup: struct owns its fields on success; only clean up on error. hoist_alloc(lower(v), v, err_cleanup: true) end - vt = v.type_info.is_a?(Type) ? v.type_info : nil + vt = v.type_info needs_items = vt&.list_collection? && !v.is_a?(AST::CopyNode) && !(v.respond_to?(:target_is_list_field) && v.target_is_list_field) # BORROWED fields: source may be ArrayList but field expects slice @@ -5423,7 +5420,7 @@ def lower_struct_lit(node) # @indirect field: hoist HeapCreate to a named temp so it is a Let-init, # not an anonymous sub-expression (INV-H). if v.needs_heap_create - zig_t = v.type_info ? transpile_type(v.type_info.resolved.to_s) : "UNKNOWN" + zig_t = transpile_type(v.type_info.resolved.to_s) @block_expr_counter += 1 temp = "__ind_#{@block_expr_counter}_#{k}" hc = MIR::HeapCreate.new(zig_t, val, :heap, "blk_#{k}") @@ -5560,7 +5557,7 @@ def lower_block_expr(node) def lower_range_lit(node) s = lower(node.start) e = lower(node.finish) - elem_type = node.type_info&.tense_type&.element_type&.resolved + elem_type = node.type_info.tense_type&.element_type&.resolved if node.inclusive MIR::RangeLit.new(s, MIR::BinOp.new("+", e, MIR::Lit.new("1")), elem_type) else @@ -5587,7 +5584,7 @@ def lower_slice(node) MIR::BinOp.new("+", MIR::Cast.new(end_expr, "usize", :intCast), MIR::Lit.new("1")) end - elem_zig = node.target.type_info&.element_type ? Type.new(node.target.type_info.element_type).zig_type : "u8" + elem_zig = node.target.type_info.element_type ? Type.new(node.target.type_info.element_type).zig_type : "u8" MIR::SliceExpr.new(target, start_cast, end_cast, elem_zig) end @@ -5925,7 +5922,7 @@ def tied_shared_family_return_param(node, mutable_scalar_params) return nil unless ret.is_a?(Type) && ret.polymorphic_shared? return nil unless ret.resolved.to_s.match?(/\A[A-Z]\z/) params = (node.params || []).select do |p| - pt = p[:type].is_a?(Type) ? p[:type] : Type.new(p[:type] || :Any) + pt = p.type || Type.new(:Any) pt.shared? && pt.resolved == ret.resolved end return nil unless params.size == 1 @@ -5992,7 +5989,7 @@ def compose_capability_wrap(inner_mir, bare_zig_t, ft, alloc) sig { params(node: AST::VarDecl).returns(T.untyped) } def lower_var_decl(node) is_mutable = node.respond_to?(:mutable) && node.mutable - ft = Type.new(node.full_type || :Void) + ft = Type.new(node.full_type) is_mutable ||= ft.dynamic_stream? || ft.bounded_stream? || ft.shared_promise? || ft.open_stream? || ft.inf_stream? is_mutable ||= ft.collection? is_mutable ||= ft.any_sync? @@ -6077,7 +6074,6 @@ def lower_var_decl(node) lower(node.value) elsif (rhs_unwrapped.is_a?(AST::CopyNode) || rhs_unwrapped.is_a?(AST::CloneNode)) && rhs_unwrapped.value.respond_to?(:type_info) && - rhs_unwrapped.value.type_info.is_a?(Type) && rhs_unwrapped.value.type_info.list_collection? # `let dest: T[]@list = COPY src;` where src is also @list. # The default lower_copy path returns a slice (via :list_shallow + @@ -6172,7 +6168,7 @@ def lower_var_decl(node) # use per-declaration storage. This avoids using a stale alloc from a same-named # heap variable in a different scope. All other cleanup allocs (:frame, :cleanup, # :heap on heap-backing types) are preserved verbatim from cleanup_bindings. - ft = node.type_info ? (Type.new(node.type_info) rescue nil) : nil + ft = Type.new(node.type_info) node_alloc = if mir_allocates?(init) :heap elsif node.respond_to?(:storage) && node.storage == :heap @@ -6210,8 +6206,8 @@ def owned_return_transfer_binding?(binding_entry, init) end if init.is_a?(MIR::InlineZig) || init.is_a?(MIR::RawZig) - return false unless init.stdlib_def&.dig(:allocates) - return true if init.stdlib_def[:return_alloc] == :heap + return false unless init.stdlib_def&.emit&.allocates + return true if init.stdlib_def.emit&.return_alloc == :heap return false unless init.is_a?(MIR::InlineZig) allocs = init.allocs @@ -7228,7 +7224,7 @@ def lower_return(node) ], T::Array[T.untyped]) MIR::ScopeBlock.new(stmts) elsif needs_string_dupe && value - ret_type = node.value.full_type ? Type.new(node.value.full_type) : nil + ret_type = Type.new(node.value.full_type) if ret_type&.string? MIR::ScopeBlock.new([ MIR::AllocMark.new("__ret_dupe", :heap, nil), @@ -7291,7 +7287,7 @@ def universal_poly_arg_needs_addr?(arg_node, callee_sig, idx) return false unless callee_sig.requires param = callee_sig.params[idx] return false unless param - pname = (param[:name] || param["name"]).to_s + pname = param.name.to_s fams = callee_sig.requires[pname] # Universal poly: REQUIRES key present AND the family-set is empty. return false unless fams && fams.empty? @@ -7395,12 +7391,12 @@ def collect_identifier_names(nodes) # with stdlib_def attached so the MIR checker can verify ownership. sig { params(name: Symbol, args: T::Array[T.untyped]).returns(T.any(MIR::InlineBc, MIR::InlineZig)) } def emit_builtin(name, args) - entry = BUILTIN_OPS[name] + entry = IntrinsicRegistry.sig(BUILTIN_OPS, name) raise "emit_builtin: unknown builtin :#{name}" unless entry - if @target == :bc && entry[:bc] + if @target == :bc && entry.emit&.bc return MIR::InlineBc.new(name, args, entry) end - pattern = entry[:zig].dup + pattern = entry.emit&.zig.to_s.dup # Use block form of gsub so backslashes in Zig code (e.g. "\\" for a literal # backslash) are not interpreted as replacement specials by String#gsub. args.each_with_index { |a, i| code = emit_expr(a); pattern = pattern.gsub("{#{i}}") { code } } diff --git a/src/mir/mir_pass.rb b/src/mir/mir_pass.rb index 4a46f12e8..976e55fec 100644 --- a/src/mir/mir_pass.rb +++ b/src/mir/mir_pass.rb @@ -551,7 +551,7 @@ def bg_exit_needs_string_dupe?(expr, t) return true if t.frame? # No explicit provenance: check the stdlib def for frame allocation. msd = expr.matched_stdlib_def - msd.is_a?(Hash) && msd[:return_alloc] == :frame + !!(msd && msd.emit&.return_alloc == :frame) end # Annotate YieldExpr nodes inside a BgStreamBlock that yield frame-allocated strings. diff --git a/src/mir/pre_mir_type_check.rb b/src/mir/pre_mir_type_check.rb new file mode 100644 index 000000000..88ea360d0 --- /dev/null +++ b/src/mir/pre_mir_type_check.rb @@ -0,0 +1,81 @@ +# typed: false +# pre_mir_type_check.rb -- AST→MIR boundary invariant. +# +# Every evaluatable AST node (anything that includes the typed-node +# mixin, i.e. responds to :full_type) MUST have a resolved type by the +# time annotation is done -- Void for statements / void expressions, +# a concrete Type for everything else. A nil full_type reaching MIR is +# the type system failing to stamp the node: a COMPILER bug, never a +# user syntax error. +# +# Structural / declaration nodes (StructDef, EnumDef, UnionDef, +# RequireNode, Extern*) do not evaluate to a value and do not include +# the typed-node mixin, so they are excluded automatically: the mixin +# presence IS the "this node should have a type" classifier. +# +# Runs once, right before MIRPass. With PREMIR_SURVEY=1 it collects and +# prints the distinct offending node classes instead of raising (used +# to inventory annotator holes); default behavior raises the ICE. +module PreMirTypeCheck + class InternalTypeResolutionError < StandardError; end + + LEAVES = [Symbol, String, Numeric, TrueClass, FalseClass, NilClass].freeze + + module_function + + def verify!(program) + violations = [] + walk(program, violations, {}) + return if violations.empty? + + if ENV["PREMIR_SURVEY"] == "1" + by_class = violations.group_by { |v| v[:cls] } + .transform_values(&:size) + .sort_by { |_, n| -n } + warn "[pre-mir-survey] #{violations.size} untyped node(s):" + by_class.each { |c, n| warn format(" %5d %s", n, c) } + return + end + + sample = violations.first(30) + .map { |v| " - #{v[:cls]} @ #{v[:loc]}" } + .join("\n") + more = violations.size > 30 ? "\n ... (+#{violations.size - 30} more)" : "" + raise InternalTypeResolutionError, <<~MSG + Internal Compiler Error: #{violations.size} AST node(s) reached MIR + lowering without a resolved type (full_type == nil). Every + evaluatable node must be typed (Void for statements / void + expressions) by the end of annotation. + + This is a bug in the CLEAR compiler, not an error in your + program. Sorry for the inconvenience -- please report it. + + #{sample}#{more} + MSG + end + + # Generic structural recursion: AST nodes are Structs (each_pair), + # bodies are Arrays, some carry Hashes. Type / Token and scalars are + # leaves. object_id memo guards shared-reference cycles. + def walk(node, violations, seen) + return if node.nil? || LEAVES.any? { |k| node.is_a?(k) } + return if defined?(Type) && node.is_a?(Type) + oid = node.object_id + return if seen[oid] + seen[oid] = true + + if node.respond_to?(:full_type) && node.full_type.nil? + tok = node.respond_to?(:token) ? node.token : nil + loc = tok && tok.respond_to?(:line) ? "#{tok.line}:#{tok.column}" : "?" + violations << { cls: node.class.name.to_s.split("::").last, loc: loc } + end + + if node.is_a?(Array) + node.each { |c| walk(c, violations, seen) } + elsif node.is_a?(Hash) + node.each_value { |v| walk(v, violations, seen) } + elsif node.respond_to?(:each_pair) # Struct AST node + node.each_pair { |_, v| walk(v, violations, seen) } + end + end +end diff --git a/src/mir/promotion_plan.rb b/src/mir/promotion_plan.rb index 967a4f19d..89cdae672 100644 --- a/src/mir/promotion_plan.rb +++ b/src/mir/promotion_plan.rb @@ -320,8 +320,6 @@ def self.stamp_field_pre_cleanups!(body, bindings, schema_lookup: nil) target_node = stmt.name.target field_ti = stmt.name.type_info rescue nil - field_ti = Type.new(field_ti) if field_ti && !field_ti.is_a?(Type) - field_ti = nil unless field_ti.is_a?(Type) # Auto-lock string fields: locked/always_mutable structs heap-dupe # string fields, so overwriting needs explicit free of the old value. @@ -474,9 +472,9 @@ def self.stamp_field_pre_cleanups!(body, bindings, schema_lookup: nil) # *T; cleanup must NOT re-apply &. sig { params(fn_node: AST::FunctionDef, schema_lookup: Proc, bindings: T::Hash[String, T::Hash[Symbol, T.untyped]]).returns(T.nilable(T::Array[T::Hash[Symbol, T.untyped]])) } private_class_method def self.walk_takes_params(fn_node, schema_lookup, bindings) - (fn_node.params || []).select { |p| p[:takes] }.each do |p| - ti = p[:type].is_a?(Type) ? p[:type] : Type.new(p[:type] || :Any) - name = p[:name].to_s + (fn_node.params || []).select { |p| p.takes }.each do |p| + ti = p.type || Type.new(:Any) + name = p.name.to_s base = takes_param_base_entry(ti, schema_lookup) next unless base @@ -598,7 +596,6 @@ def self.stamp_field_pre_cleanups!(body, bindings, schema_lookup: nil) ti = nil unless ti.is_a?(Type) inner_ti = ti&.wrapped_type next unless inner_ti - inner_ti = Type.new(inner_ti) unless inner_ti.is_a?(Type) e = classify_binding(node.binding_name.to_s, inner_ti, node, promoted_fns, schema_lookup) next unless e e[:zig_type] ||= (Type.new(inner_ti.resolved).zig_type rescue inner_ti.resolved.to_s) @@ -623,7 +620,6 @@ def self.stamp_field_pre_cleanups!(body, bindings, schema_lookup: nil) ti = nil unless ti.is_a?(Type) inner_ti = ti&.wrapped_type next unless inner_ti - inner_ti = Type.new(inner_ti) unless inner_ti.is_a?(Type) e = classify_binding(b[:name].to_s, inner_ti, node, promoted_fns, schema_lookup) next unless e e[:zig_type] ||= (Type.new(inner_ti.resolved).zig_type rescue inner_ti.resolved.to_s) diff --git a/src/tools/method_rewriter.rb b/src/tools/method_rewriter.rb index c836e2a88..425ea7324 100644 --- a/src/tools/method_rewriter.rb +++ b/src/tools/method_rewriter.rb @@ -1,4 +1,6 @@ # typed: strict +require "sorbet-runtime" + require 'set' require_relative '../ast/lexer' require_relative '../ast/parser' @@ -20,8 +22,11 @@ # Nested METHOD calls (`length(filter(xs, p))` with both METHODs) # rewrite inside-out to method chains (`xs.filter(p).length()`). module MethodRewriter + extend T::Sig + module_function + sig { params(source: String).returns(String) } def rewrite(source) tokens = ::Lexer.new(source).tokenize ast = ::Parser.new(tokens, source).parse @@ -45,6 +50,7 @@ def rewrite(source) # User declarations always take precedence over stdlib — if the # user wrote `FN length(xs) -> ...`, calls to `length(xs)` stay in # prefix form regardless of stdlib's flag. + sig { params(ast: AST::Program).returns(Set) } def collect_method_names(ast) user_methods = Set.new user_fns = Set.new @@ -85,17 +91,18 @@ def walk_collect_user_decls(node, methods, fns) -> { MAP_METHODS rescue nil }, ].freeze + sig { returns(Set) } def stdlib_method_names @stdlib_method_names ||= begin names = Set.new STDLIB_REGISTRIES.each do |loader| registry = loader.call next unless registry.is_a?(Hash) - registry.each do |name, defs| + IntrinsicRegistry.sigs(registry).each do |name, defs| list = defs.is_a?(Array) ? defs : [defs] list.each do |d| - next unless d.is_a?(Hash) - next unless d[:is_method] + next unless d.is_a?(FunctionSignature) + next unless d.emit&.is_method # Skip stdlib functions whose Zig lowering is FSM-based # (suspending I/O calls like readFile / writeFile / accept). # Their MIR/FSM lowering reads the call's positional args @@ -119,9 +126,12 @@ def stdlib_method_names # yields, and the `fsm_*` keys carry the templates the FSM emitter # reads. Either alone wouldn't be enough — `suspends: true` is also # set on plain async helpers that don't go through FSM. + sig { params(defn: FunctionSignature).returns(T::Boolean) } def fsm_lowered?(defn) - return false unless defn[:suspends] - defn.keys.any? { |k| k.to_s.start_with?("fsm_") } + em = defn.emit + return false unless em&.suspends + !!(em.fsm_setup || em.fsm_state_decls || em.fsm_finish_block || + em.fsm_state_finalize || em.fsm_finish_value) end # Post-order walk: collect edits for inner calls first so outer @@ -156,6 +166,7 @@ def walk_collect_edits(node, methods, source, edits) # (e.g., contains a comment we'd rather not move). Source span is # the byte range from the start of the callee name to the closing # `)`, inclusive. + sig { params(call: AST::FuncCall, source: String).returns(T.nilable(Hash)) } def compute_edit(call, source) start_off = offset_for(source, call.token.line, call.token.column) return nil unless start_off @@ -242,6 +253,7 @@ def needs_parens?(node, text) # ---- Source / span helpers ---- + sig { params(source: String, line: Integer, col: Integer).returns(Integer) } def offset_for(source, line, col) return nil if line < 1 || col < 1 off = 0 @@ -257,6 +269,7 @@ def offset_for(source, line, col) target end + sig { params(source: String, off: Integer).returns(Integer) } def next_non_ws(source, off) while off < source.length && (source[off] == ' ' || source[off] == "\t") off += 1 @@ -267,6 +280,7 @@ def next_non_ws(source, off) # Find matching ')' for '(' at `open_off`, respecting nested parens, # brackets, braces, and string literals. Returns the byte offset of # the matching ')' or nil if unbalanced. + sig { params(source: String, open_off: Integer).returns(Integer) } def match_paren(source, open_off) depth = 0 i = open_off @@ -313,6 +327,7 @@ def match_paren(source, open_off) # Split args_text into [start, end_exclusive] spans by top-level # commas. Respects nested parens / brackets / braces and strings. + sig { params(args_text: String).returns(Array) } def split_args_by_comma(args_text) spans = [] depth = 0 @@ -368,6 +383,7 @@ def split_args_by_comma(args_text) # produces inside-out edits which are nested (overlapping). To get # the chain rewrite (`xs.filter(p).length()`) we apply the inner # edit first to the *replacement string* of the outer edit. + sig { params(source: String, edits: Array).returns(String) } def apply_edits(source, edits) # Post-order has inner edits first. Group: an inner edit is one # whose span is strictly inside an outer edit's span. Process by @@ -385,6 +401,7 @@ def apply_edits(source, edits) # the inner's original source text). Returns a flat list of # non-overlapping outer edits with replacements that include all # inner rewrites embedded. + sig { params(edits: Array, source: String).returns(Array) } def resolve_nested_edits(edits, source) outers = [] edits.each do |e| @@ -404,6 +421,7 @@ def resolve_nested_edits(edits, source) outers end + sig { params(source: String, edits: Array).returns(String) } def apply_flat_edits(source, edits) return source if edits.empty? # Apply right-to-left so unaffected positions remain valid. diff --git a/tools/bc_lower_coverage.rb b/tools/bc_lower_coverage.rb new file mode 100644 index 000000000..7fe06ca56 --- /dev/null +++ b/tools/bc_lower_coverage.rb @@ -0,0 +1,60 @@ +#! /usr/bin/env ruby +# Drive src/ branch coverage of the `@target == :bc` lowering arms by +# re-lowering the EXISTING corpus with target: :bc. Zero new programs. +# +# Feasibility: the `@target == :bc` branches in mir_lowering fire during +# MIRLowering#lower_program (Ruby), which runs BEFORE the bytecode VM. +# The MiniVM (_bc_runner) is incomplete, but that is irrelevant here -- +# we never execute, never even require BcEmitter to succeed. A program +# that hits `raise Unimplemented` inside a :bc arm still EXECUTED that +# arm (coverage is recorded up to the raise). So every per-file failure +# is rescued and counted as "lowering attempted". +# +# Usage: +# COVERAGE=1 ruby tools/bc_lower_coverage.rb +# bundle exec ruby spec/collate_coverage.rb +# ruby tools/branch_gap_triage.rb + +require 'bundler/setup' +require_relative '../spec/coverage_bootstrap' +CoverageBootstrap.start('bc-lower') + +require_relative '../src/backends/transpiler' + +ROOT = File.expand_path('..', __dir__) +files = ( + Dir.glob(File.join(ROOT, 'transpile-tests', '**', '*.cht')) + + Dir.glob(File.join(ROOT, '{examples,benchmarks}', '**', '*.cht')) + + Dir.glob(File.join(ROOT, 'transpile-tests', 'fuzz', '*.cht')) +).reject { |f| File.basename(f).start_with?('._') } + .uniq.sort + +lowered = 0 +raised = 0 +files.each do |path| + dir = File.dirname(path) + begin + imp = ModuleImporter.new(base_dir: dir, use_mir: true) + fe = CompilerFrontend.compile(File.read(path), importer: imp, source_dir: dir) + lo = MIRLowering.new( + struct_schemas: fe.struct_schemas, + enum_schemas: fe.enum_schemas, + union_schemas: fe.union_schemas, + fn_sigs: fe.fn_sigs, + moved_guard_info: fe.moved_guard_info, + importer: imp, + source_dir: dir, + target: :bc + ) + lo.lower_program(fe.ast) + lowered += 1 + rescue StandardError, ScriptError + # A raise inside a :bc arm still covered that arm -- that is the + # point. Count and continue; do not let the incomplete VM / a + # bc-Unimplemented stop the batch. + raised += 1 + end +end + +puts "bc-lower coverage: #{lowered} lowered cleanly, #{raised} raised " \ + "(still covered up to the raise), #{files.size} total" diff --git a/tools/branch_gap_triage.rb b/tools/branch_gap_triage.rb index 784e96ee9..fdbbd56eb 100644 --- a/tools/branch_gap_triage.rb +++ b/tools/branch_gap_triage.rb @@ -1,26 +1,33 @@ #! /usr/bin/env ruby -# Branch-gap TRIAGE: collapse never-taken arms to their enclosing method. +# Branch-gap TRIAGE + MODALITY BUCKETING. # -# You do not triage 955 branches. You triage the ~N methods that contain -# them. A never-taken arm's fill-modality is a property of the DECISION -# the enclosing method makes, not of the arm: +# You do not triage N branches. You triage the ~M methods that contain +# them, and for each dark arm you decide WHICH testing modality can +# reach it. A never-taken arm is exactly one of four things: # -# - a dark arm in an escape / frame / cleanup / move decision is a -# latent UAF / double-free / leak, reachable only by a VALID program -# of some shape the corpus never wrote -> fuzz template axis (scales -# combinatorially; one axis value covers a whole arm family) + mutant. -# - a dark arm in a diagnostic / error builder is reachable only by an -# INVALID program. Fuzz emits valid self-checking programs by -# construction, so fuzz can NEVER reach it -> negative unit spec -# (deterministic, one per error cluster). -# - a dark arm guarding an impossible / defensive case (raise -# unreachable, exhaustive-when else) -> accept + annotate. No test. -# - whole-program integration .cht almost never fills a branch gap: -# 92 real programs moved this set 50/1005 arms. Not the tool. +# fuzz-axis reachable by a VALID program of an unseen shape +# (case-on-AST dispatch, &&/|| clause gap, a live +# if/while body). One fuzz template axis covers a +# whole arm family combinatorially. + a mutant. +# negative-spec the arm raises / diagnoses -> reachable only by an +# INVALID program. Fuzz emits valid self-checking +# programs by construction and can NEVER reach it. +# One deterministic negative unit spec per cluster. +# ffi-integration the arm is in the extern/require/module boundary +# -> needs a real external artifact a fuzzer cannot +# synthesize. A handful of targeted .cht. (Whole- +# program .cht is otherwise the WRONG lever: 92 real +# programs moved this set 50/1005 arms.) +# accept-defensive an effect-free else / impossible guard -> annotate +# and remove from the denominator. No test. (Human +# confirms; this bucket is proposed, not decided.) # -# This script does the collapse and the ranking. It does NOT classify by -# regexing arm source lines (that is the fake-value grep) — it groups by -# enclosing `def` so a human reads the DECISION, then assigns modality. +# Classification is AST-STRUCTURAL, never a regex over the arm's source +# line (that is the fake-value grep): the SimpleCov parent tuple gives +# the decision kind, and the arm's (line,col)-(line,col) span is +# matched to an AST node whose subtree is then inspected for raise / +# FFI. The two PER-PROJECT LEXICON constants below are the only +# project-specific knobs (generalizable: swap them per codebase). # # Usage: ruby tools/branch_gap_triage.rb [src/file.rb ...] @@ -34,6 +41,18 @@ src/mir/mir_lowering.rb ].freeze +# --- PER-PROJECT LEXICON (the only project-specific config) --- +# Methods that ARE the FFI / package boundary: a dark arm here needs a +# real external module + oracle no fuzzer can synthesize. +FFI_BOUNDARY = %w[ + build_extern_trampoline_call build_extern_trampoline_method + build_extern_trampoline_common lower_extern_direct_call + lower_require lower_module +].freeze +# Message names that mean "this arm is an error/diagnostic path" +# (reachable only by an invalid program). +DIAGNOSTIC_MIDS = %i[raise fail abort].freeze + abort "no #{RESULTSET}" unless File.exist?(RESULTSET) merged = Hash.new @@ -68,24 +87,158 @@ def method_index(lines) idx end +# All AST nodes of a file, for span -> node resolution. +def ast_nodes(abspath) + root = RubyVM::AbstractSyntaxTree.parse(File.read(abspath), + keep_script_lines: true) + acc = [] + walk = lambda do |n| + return unless n.is_a?(RubyVM::AbstractSyntaxTree::Node) + + acc << n + n.children.each { |c| walk.call(c) } + end + walk.call(root) + acc +rescue SyntaxError, StandardError + [] +end + +# Smallest AST node whose span covers the arm span (sl,sc)-(el,ec); +# prefers an exact match. nil if none (then we fall back to the +# decision kind alone, flagged low-confidence). +def node_for(nodes, sl, sc, el, ec) + span = ->(n) { [n.first_lineno, n.first_column, n.last_lineno, n.last_column] } + exact = nodes.find { |n| span.call(n) == [sl, sc, el, ec] } + return exact if exact + + covering = nodes.select do |n| + a = span.call(n) + (a[0] < sl || (a[0] == sl && a[1] <= sc)) && + (a[2] > el || (a[2] == el && a[3] >= ec)) + end + covering.min_by { |n| (n.last_lineno - n.first_lineno) * 1000 + n.children.size } +end + +def subtree_calls(node) + mids = [] + stack = [node] + until stack.empty? + n = stack.pop + next unless n.is_a?(RubyVM::AbstractSyntaxTree::Node) + + case n.type + when :FCALL, :VCALL then mids << n.children[0] + when :CALL, :OPCALL, :QCALL then mids << n.children[1] + end + n.children.each { |c| stack << c } + end + mids +end + +# accept-defensive is the NARROW residue: an arm that produces no +# observable outcome -- the synthetic implicit `else` SimpleCov still +# counts, an empty body, a bare `nil`. Anything that calls, assigns, +# returns/breaks, or yields a value IS a reachable valid-program +# decision outcome and defaults to fuzz_axis (human triage may later +# demote a genuinely-impossible one; we never auto-accept a reachable +# arm). +def trivial?(node) + return true if node.nil? + return true if node.type == :NIL + return true if node.type == :BEGIN && node.children.compact.empty? + return false if subtree_calls(node).any? + return false if has_type?(node, %i[LASGN IASGN OP_ASGN ATTRASGN MASGN + GASGN CVASGN RETURN NEXT BREAK YIELD]) + + # a bare value (literal / lvar / ivar) IS the branch's outcome -> + # reachable, not inert. + !has_type?(node, %i[LIT STR SYM INTEGER FLOAT LVAR IVAR DVAR CONST + ARRAY HASH TRUE FALSE]) +end + +def has_type?(node, types) + stack = [node] + until stack.empty? + n = stack.pop + next unless n.is_a?(RubyVM::AbstractSyntaxTree::Node) + return true if types.include?(n.type) + + n.children.each { |c| stack << c } + end + false +end + +DISPATCH_KINDS = %i[case when].freeze +CONJ_KINDS = %i[& |].freeze +COND_KINDS = %i[if unless ternary while until for].freeze + +# decision_kind: Symbol from the SimpleCov parent tuple ([:if,...] etc). +# Returns [bucket, confidence]. +def classify(method_name, decision_kind, arm_node) + return [:ffi_integration, :high] if FFI_BOUNDARY.include?(method_name) + + if arm_node && (subtree_calls(arm_node) & DIAGNOSTIC_MIDS).any? + return [:negative_spec, :high] + end + + if DISPATCH_KINDS.include?(decision_kind) || CONJ_KINDS.include?(decision_kind) + return [:fuzz_axis, arm_node ? :high : :low] + end + + if COND_KINDS.include?(decision_kind) + return [:accept_defensive, :med] if trivial?(arm_node) + + return [:fuzz_axis, arm_node ? :high : :low] + end + + [:accept_defensive, :low] +end + +ACTION = { + fuzz_axis: 'fuzz template axis (+ mutant)', + negative_spec: 'negative unit spec (fuzz cannot reach)', + ffi_integration: 'targeted FFI/package .cht', + accept_defensive: 'annotate + accept (human-confirm)' +}.freeze + targets = ARGV.empty? ? DEFAULT_FILES : ARGV +grand = Hash.new(0) + targets.each do |rel| abspath = File.join(ROOT, rel) branches = merged[abspath] next unless branches + lines = File.readlines(abspath) midx = method_index(lines) + nodes = ast_nodes(abspath) by_method = Hash.new { |h, k| h[k] = [] } total_by_method = Hash.new(0) - branches.each do |_p, arms| + bucket_by_method = Hash.new { |h, k| h[k] = Hash.new(0) } + file_bucket = Hash.new(0) + + branches.each do |parent, arms| + pkind = parent.gsub(/[\[\]:\s]/, '').split(',').first.to_s.to_sym arms.each do |arm, count| a = arm.gsub(/[\[\]:]/, '').split(',').map(&:strip) line = a[2].to_i meth, mstart = midx[line] || ['(top-level)', 0] key = [meth, mstart] total_by_method[key] += 1 - by_method[key] << [line, a[0]] if count.to_i.zero? + next unless count.to_i.zero? + + sl = a[2].to_i + sc = a[3].to_i + el = a[4].to_i + ec = a[5].to_i + anode = node_for(nodes, sl, sc, el, ec) + bucket, conf = classify(meth, pkind, anode) + by_method[key] << [line, a[0], bucket, conf] + bucket_by_method[key][bucket] += 1 + file_bucket[bucket] += 1 + grand[bucket] += 1 end end @@ -93,12 +246,24 @@ def method_index(lines) .sort_by { |(_, _), v| -v.size } puts "\n##### #{rel} — #{ranked.size} methods carry dark arms " \ "(#{by_method.values.sum(&:size)} arms)" - puts format(' %-42s %5s %5s %s', 'method', 'dark', 'tot', 'dark-arm lines') + puts ' buckets: ' + file_bucket.sort_by { |_, n| -n } + .map { |b, n| "#{b}=#{n}" }.join(' ') + puts format(' %-40s %4s %4s %-16s %s', + 'method', 'dark', 'tot', 'dominant', 'bucket mix') ranked.each do |(meth, mstart), arms| tot = total_by_method[[meth, mstart]] - ls = arms.map(&:first).uniq.sort - shown = ls.first(12).join(',') - shown += ",+#{ls.size - 12}" if ls.size > 12 - puts format(' %-42s %5d %5d %s', "#{meth}@#{mstart}", arms.size, tot, shown) + mix = bucket_by_method[[meth, mstart]] + dom = mix.max_by { |_, n| n }.first + mixs = mix.sort_by { |_, n| -n }.map { |b, n| "#{b}:#{n}" }.join(' ') + puts format(' %-40s %4d %4d %-16s %s', + "#{meth}@#{mstart}", arms.size, tot, dom, mixs) end end + +puts "\n##### MODALITY WORK PLAN (all targets)" +grand.sort_by { |_, n| -n }.each do |bucket, n| + puts format(' %-18s %5d arms -> %s', bucket, n, ACTION[bucket]) +end +puts "\n Triage order: fuzz_axis (combinatorial, memory-safety) first," \ + " then negative_spec, then ffi_integration; accept_defensive is" \ + " human-confirmed and leaves the denominator." diff --git a/tools/fuzz/templates/binary_op_matrix.rb b/tools/fuzz/templates/binary_op_matrix.rb new file mode 100644 index 000000000..bef6fec4e --- /dev/null +++ b/tools/fuzz/templates/binary_op_matrix.rb @@ -0,0 +1,81 @@ +# Template: binary-operator lowering matrix — ENUMERATED, not sampled. +# +# Targets src/mir/mir_lowering.rb#lower_binary_op + #lower_or_rescue. +# The cell set is the dispatch's OWN discriminant set read from the +# source: the string-compare `case node.op` has arms +# {EQ,NEQ,LT,LTE,GT,GTE}; POW (**), MOD, concat (+), OR (OR_RESCUE) +# are the other op branches. Every comparison arm x every operand +# type is emitted -- exhaustive by construction, not a guessed axis. +# +# Surface syntax confirmed from lexer/transpile-tests: +# == != < <= > >= ; **=POW ; MOD ; + (concat) ; OR (rescue). +# The symbol-path `case node.op {EQ,NEQ}` is EXCLUDED: CLEAR has no +# surface symbol literal (only a union *variant* named Symbol), so +# those 2 arms are not source-reachable -> accept/invariant_guarded, +# correctly not chased here. +# +# lhs() is ALWAYS strictly less than rhs() for every type, so the +# oracle is uniform: EQ false, NEQ true, LT true, LTE true, GT false, +# GTE false. expected :pass; a failing :pass cell is a SURFACED bug. + +BOM_CELLS = [] +BOM_CMP = %i[eq neq lt lte gt gte] +BOM_TYPES = %i[int float str_lit heap_str] + +BOM_CMP.each do |op| + BOM_TYPES.each { |t| BOM_CELLS << { op: op, type: t } } +end +# Non-comparison op branches, each at its valid operand type(s). +BOM_CELLS << { op: :mod, type: :int } +BOM_CELLS << { op: :pow, type: :int } +BOM_CELLS << { op: :pow, type: :float } +BOM_CELLS << { op: :concat, type: :str_lit } +BOM_CELLS << { op: :concat, type: :heap_str } +BOM_CELLS << { op: :or_fallback, type: :heap_str } + +def bom_provider(t) + case t + when :int then "FN lhs() RETURNS Int64 -> RETURN 3_i64; END\nFN rhs() RETURNS Int64 -> RETURN 10_i64; END" + when :float then "FN lhs() RETURNS Float64 -> RETURN 1.5; END\nFN rhs() RETURNS Float64 -> RETURN 2.5; END" + when :str_lit then "FN lhs() RETURNS String -> RETURN \"abc\"; END\nFN rhs() RETURNS String -> RETURN \"abd\"; END" + when :heap_str then "FN lhs() RETURNS !String -> RETURN COPY \"abc\"; END\nFN rhs() RETURNS !String -> RETURN COPY \"abd\"; END" + end +end + +def bom_lhs(t) = (t == :heap_str ? "(lhs())" : "lhs()") +def bom_rhs(t) = (t == :heap_str ? "(rhs())" : "rhs()") + +def bom_body(op, t) + l = bom_lhs(t) + r = bom_rhs(t) + case op + when :eq then " ASSERT (#{l} == #{r}) == FALSE, \"eq #{t}\";" + when :neq then " ASSERT (#{l} != #{r}), \"neq #{t}\";" + when :lt then " ASSERT (#{l} < #{r}), \"lt #{t}\";" + when :lte then " ASSERT (#{l} <= #{r}), \"lte #{t}\";" + when :gt then " ASSERT (#{l} > #{r}) == FALSE, \"gt #{t}\";" + when :gte then " ASSERT (#{l} >= #{r}) == FALSE, \"gte #{t}\";" + when :mod then " ASSERT (10_i64 MOD 3_i64) == 1_i64, \"mod\";" + when :pow + t == :int ? " ASSERT (2_i64 ** 3_i64) == 8_i64, \"pow int\";" \ + : " ASSERT (2.0 ** 3.0) == 8.0, \"pow float\";" + when :concat + " t: String = #{l} + #{r};\n ASSERT t.length() == 6_i64, \"concat #{t}\";" + when :or_fallback + " v: String = mightFail() OR \"fb\";\n ASSERT v.length() >= 2_i64, \"or fallback\";" + end +end + +def bom_extra_fn(op) + op == :or_fallback ? "FN mightFail() RETURNS !String ->\n RETURN COPY \"ok\";\nEND\n" : "" +end + +FuzzGenerator.register(:binary_op_matrix, cells: BOM_CELLS) do |p| + <<~CHT + #{bom_provider(p[:type])} + #{bom_extra_fn(p[:op])}FN main() RETURNS Void -> + #{bom_body(p[:op], p[:type])} + RETURN; + END + CHT +end diff --git a/tools/fuzz/templates/capability_wrap_matrix.rb b/tools/fuzz/templates/capability_wrap_matrix.rb new file mode 100644 index 000000000..0bb6cfd13 --- /dev/null +++ b/tools/fuzz/templates/capability_wrap_matrix.rb @@ -0,0 +1,99 @@ +# Template: capability-wrap composition matrix — ENUMERATED. +# +# Targets src/mir/mir_lowering.rb#compose_capability_wrap. The +# discriminant set is read from the dispatch itself: +# sync_fn = case ft.sync {locked, write_locked, always_mutable, +# versioned, atomic} +# own_fn = case ft.ownership {shared->arc, multiowned->rc} +# + the 4-way sync_fn&&own_fn / sync_only / own_only / else. +# One cell per sync mode + one per ownership wrap = exhaustive over +# the dispatch labels. Every surface form is CONFIRMED from +# transpile-tests (all sigils occur there); nothing is :in_dev. +# +# expected :pass; a failing/leaking :pass cell is a SURFACED bug. + +CWM_CELLS = %i[ + locked write_locked always_mutable versioned atomic + multiowned shared_locked +].map { |m| { mode: m } } + +FuzzGenerator.register(:capability_wrap_matrix, cells: CWM_CELLS) do |p| + case p[:mode] + when :locked + <<~CHT + STRUCT Counter { value: Int64 } + FN main() RETURNS Void -> + MUTABLE c = Counter{ value: 1_i64 } @locked; + WITH EXCLUSIVE c AS r { + ASSERT r.value == 1_i64, "locked wrap read"; + } + RETURN; + END + CHT + when :write_locked + <<~CHT + STRUCT Counter { value: Int64 } + FN main() RETURNS Void -> + MUTABLE c = Counter{ value: 1_i64 } @writeLocked; + WITH EXCLUSIVE c AS r { + ASSERT r.value == 1_i64, "writeLocked wrap read"; + } + RETURN; + END + CHT + when :always_mutable + # Interior mutability: immutable binding, mutable data, direct. + <<~CHT + STRUCT Counter { value: Int64 } + FN main() RETURNS Void -> + c = Counter{ value: 1_i64 } @alwaysMutable; + c.value = 2_i64; + ASSERT c.value == 2_i64, "alwaysMutable interior mutate"; + RETURN; + END + CHT + when :versioned + <<~CHT + STRUCT Counter { value: Int64 } + FN main() RETURNS Void -> + MUTABLE c = Counter{ value: 1_i64 } @versioned; + WITH SNAPSHOT c AS r { + ASSERT r.value == 1_i64, "versioned snapshot read"; + } + RETURN; + END + CHT + when :atomic + <<~CHT + STRUCT Counter { value: Int64 } + FN main() RETURNS Void -> + MUTABLE c = Counter{ value: 1_i64 } @indirect:atomic; + WITH EXCLUSIVE c AS x { + x.value = 2_i64; + ASSERT x.value == 2_i64, "atomic-ptr exclusive mutate"; + } + RETURN; + END + CHT + when :multiowned + <<~CHT + STRUCT Counter { value: Int64 } + FN main() RETURNS Void -> + p = Counter{ value: 1_i64 } @multiowned; + ASSERT p.value == 1_i64, "multiowned (rc) read"; + RETURN; + END + CHT + when :shared_locked + <<~CHT + STRUCT Counter { value: Int64 } + FN main() RETURNS Void -> + MUTABLE t = Counter{ value: 1_i64 } @shared:locked; + WITH EXCLUSIVE t AS r { + ASSERT r.value == 1_i64, "shared:locked (arc+lock) read"; + } + RETURN; + END + CHT + end +end diff --git a/tools/fuzz/templates/catch_allocator_matrix.rb b/tools/fuzz/templates/catch_allocator_matrix.rb new file mode 100644 index 000000000..686393753 --- /dev/null +++ b/tools/fuzz/templates/catch_allocator_matrix.rb @@ -0,0 +1,86 @@ +# Template: catch / OR-rescue allocator-identity matrix (the P0). +# +# Targets src/mir/mir_lowering.rb#infer_catch_value_allocator (12/12 +# dark -- invariant #9: error paths must preserve allocator identity) +# + #lower_or_rescue. The decision is: when `v = mayFail() OR fallback`, +# the success value and the fallback value may have DIFFERENT +# allocators (heap COPY vs frame literal vs primitive). If lowering +# binds one allocator on success and a different one on the error path, +# that is a UAF / double-free / leak. The corpus never crossed +# success_alloc x fallback_alloc. +# +# Both success and failure paths are exercised (call arg "" forces the +# RAISE -> fallback path; non-empty forces success). expected :pass; +# any leak / mir-error on a :pass cell is the invariant-#9 bug class. + +CAM_CELLS = [] +CAM_VALUE = %i[string int] # value type flowing out +CAM_SUCCESS = %i[heap] # success path: COPY -> heap +CAM_FALLBK = %i[heap_empty literal frame_var] # fallback allocator shape +CAM_TAKEN = %i[success failure] # which path the input forces + +CAM_VALUE.each do |vt| + CAM_FALLBK.each do |fb| + CAM_TAKEN.each do |taken| + # int value: only the primitive fallback shapes make sense. + next if vt == :int && fb == :heap_empty + CAM_CELLS << { value: vt, fallback: fb, taken: taken } + end + end +end + +def cam_inner(vt) + if vt == :string + "FN maybe(s: String) RETURNS !String ->\n" \ + " IF s.length() == 0_i64 THEN RAISE \"empty\"; END\n" \ + " RETURN COPY s;\nEND" + else + "FN maybe(s: String) RETURNS !Int64 ->\n" \ + " IF s.length() == 0_i64 THEN RAISE \"empty\"; END\n" \ + " RETURN s.length();\nEND" + end +end + +def cam_fallback_expr(vt, fb) + if vt == :string + case fb + when :heap_empty then "\"\"" + when :literal then "\"fb\"" + when :frame_var then "fbv" + end + else + fb == :frame_var ? "fbv" : "0_i64" + end +end + +def cam_fallback_setup(vt, fb) + return "" unless fb == :frame_var + + vt == :string ? " fbv: String = \"fb\";" : " fbv: Int64 = 0_i64;" +end + +def cam_call_arg(taken) = (taken == :success ? "\"X\"" : "\"\"") + +def cam_assert(vt, taken) + if vt == :string + taken == :success ? "ASSERT r.length() == 1_i64, \"success heap value\";" \ + : "ASSERT r.length() >= 0_i64, \"fallback value live\";" + else + taken == :success ? "ASSERT r == 1_i64, \"success int value\";" \ + : "ASSERT r >= 0_i64, \"fallback int live\";" + end +end + +FuzzGenerator.register(:catch_allocator_matrix, cells: CAM_CELLS) do |p| + setup = cam_fallback_setup(p[:value], p[:fallback]) + setup_line = setup.empty? ? "" : "#{setup}\n" + <<~CHT + #{cam_inner(p[:value])} + + FN main() RETURNS Void -> + #{setup_line} r = maybe(#{cam_call_arg(p[:taken])}) OR #{cam_fallback_expr(p[:value], p[:fallback])}; + #{cam_assert(p[:value], p[:taken])} + RETURN; + END + CHT +end diff --git a/tools/fuzz/templates/catch_reassign_matrix.rb b/tools/fuzz/templates/catch_reassign_matrix.rb new file mode 100644 index 000000000..ab3591f06 --- /dev/null +++ b/tools/fuzz/templates/catch_reassign_matrix.rb @@ -0,0 +1,82 @@ +# Template: reassign-through-fallible-expression matrix. +# +# Targets src/mir/mir_lowering.rb#walk_catch_body_for_reassigns (12/13 +# fuzz_axis dark). The decision: an outer MUTABLE binding is reassigned +# from a fallible expression `acc = maybe(...) OR acc;`. On the error +# path the binding keeps its OLD value/allocator; on success it takes +# the new one. If lowering mishandles the reassignment cleanup across +# the success/error split, that is a double-free or leak (invariant +# #9). The corpus never reassigned an outer binding through OR-rescue. +# +# var_kind x value_type x path-taken. Both paths exercised. expected +# :pass; a leak / mir-error on a :pass cell is the SURFACED bug. + +CRM_CELLS = [] +CRM_VARKIND = %i[local struct_field] +CRM_VALUE = %i[string int] +CRM_TAKEN = %i[success failure] + +CRM_VARKIND.each do |vk| + CRM_VALUE.each do |vt| + CRM_TAKEN.each do |tk| + CRM_CELLS << { var: vk, value: vt, taken: tk } + end + end +end + +def crm_ret(vt) = (vt == :string ? "!String" : "!Int64") +def crm_succ(vt) = (vt == :string ? "RETURN COPY s;" : "RETURN s.length();") +def crm_arg(tk) = (tk == :success ? "\"X\"" : "\"\"") + +def crm_inner(vt) + "FN maybe(s: String) RETURNS #{crm_ret(vt)} ->\n" \ + " IF s.length() == 0_i64 THEN RAISE \"empty\"; END\n" \ + " #{crm_succ(vt)}\nEND" +end + +def crm_init(vt) = (vt == :string ? "\"init\"" : "7_i64") + +def crm_assert(vt, tk) + if vt == :string + tk == :success ? "ASSERT acc.length() == 1_i64, \"reassigned to success\";" \ + : "ASSERT acc.length() == 4_i64, \"kept old value on failure\";" + else + tk == :success ? "ASSERT acc == 1_i64, \"reassigned to success\";" \ + : "ASSERT acc == 7_i64, \"kept old value on failure\";" + end +end + +FuzzGenerator.register(:catch_reassign_matrix, cells: CRM_CELLS) do |p| + if p[:var] == :local + <<~CHT + #{crm_inner(p[:value])} + + FN main() RETURNS Void -> + MUTABLE acc = #{crm_init(p[:value])}; + acc = maybe(#{crm_arg(p[:taken])}) OR acc; + #{crm_assert(p[:value], p[:taken])} + RETURN; + END + CHT + else + field_t = p[:value] == :string ? "String" : "Int64" + rd = p[:value] == :string ? "h.acc.length()" : "h.acc" + exp = if p[:value] == :string + p[:taken] == :success ? "1_i64" : "4_i64" + else + p[:taken] == :success ? "1_i64" : "7_i64" + end + <<~CHT + #{crm_inner(p[:value])} + + STRUCT Holder { acc: #{field_t} } + + FN main() RETURNS Void -> + MUTABLE h = Holder{ acc: #{crm_init(p[:value])} }; + h.acc = maybe(#{crm_arg(p[:taken])}) OR h.acc; + ASSERT #{rd} == #{exp}, "struct field reassign #{p[:taken]}"; + RETURN; + END + CHT + end +end diff --git a/tools/fuzz/templates/indexed_assignment_matrix.rb b/tools/fuzz/templates/indexed_assignment_matrix.rb new file mode 100644 index 000000000..756f46150 --- /dev/null +++ b/tools/fuzz/templates/indexed_assignment_matrix.rb @@ -0,0 +1,106 @@ +# Template: indexed-assignment lowering matrix. +# +# Targets src/mir/mir_lowering.rb#lower_indexed_assignment (the largest +# fuzz_axis dark cluster: `kind = ti.dispatch_key` -> +# INDEX_OPS.dig(kind,:set) crossed with value_transforms +# {:dupe_string_literal, :dupe_borrowed_union, :container_promote} and +# shard_direct). The corpus only ever wrote `lst[i] = int` into a plain +# list; this enumerates container_shape x key_kind x value_ownership x +# map_wrap so every dispatch/transform arm lowers. +# +# Value type DERIVES from the container (int containers take an Int64 +# value; String-valued maps take string / COPY-string values -- the +# :dupe_string_literal transform arm). Mixing them would be an invalid +# program, not a surfaced bug. +# +# expected :pass with a self-checking ASSERT. A :pass cell that fails, +# leaks, or mir-errors is a SURFACED lowering bug (do not fix here). + +IAM_CELLS = [] + +# container => :seq (int-indexed) | :map_int | :map_str +IAM_CONTAINERS = { + array: :seq, + list: :seq, + map_int: :map_int, + map_int_sharded: :map_int, + map_str: :map_str, + map_str_sharded: :map_str +} + +IAM_CONTAINERS.each do |container, family| + if family == :seq + IAM_CELLS << { container: container, key: :index, value: :primitive } + else + keys = %i[literal variable concat] + values = family == :map_int ? %i[primitive] : %i[str_literal copy_str] + keys.each do |k| + values.each { |v| IAM_CELLS << { container: container, key: k, value: v } } + end + end +end + +def iam_decl(c) + { + array: "MUTABLE box: Int64[] = [0_i64, 0_i64, 0_i64];", + list: "MUTABLE box: Int64[]@list = [];", + map_int: "MUTABLE box: HashMap = {};", + map_int_sharded: "MUTABLE box: HashMap@sharded(2) = {};", + map_str: "MUTABLE box: HashMap = {};", + map_str_sharded: "MUTABLE box: HashMap@sharded(2) = {};" + }[c] +end + +def iam_prep(c) + c == :list ? " box.append(0_i64);" : "" +end + +def iam_key_expr(c, k) + return "0_i64" if %i[array list].include?(c) + + case k + when :literal then "\"kk\"" + when :variable then "kvar" + when :concat then "(\"k\" + \"k\")" + end +end + +def iam_key_setup(c, k) + (k == :variable && !%i[array list].include?(c)) ? " kvar: String = \"kk\";" : "" +end + +def iam_value_expr(v) + case v + when :primitive then "9_i64" + when :str_literal then "\"vv\"" + when :copy_str then "COPY sval" + end +end + +def iam_value_setup(v) + v == :copy_str ? " sval: String = \"vv\";" : "" +end + +def iam_expected_read(c, key_e, v) + if %i[array list].include?(c) + "ASSERT box[#{key_e}] == 9_i64, \"seq indexed set\";" + elsif v == :primitive + "ASSERT (box[#{key_e}] OR 0_i64) == 9_i64, \"map int set\";" + else + "ASSERT (box[#{key_e}] OR \"\") == \"vv\", \"map str set\";" + end +end + +FuzzGenerator.register(:indexed_assignment_matrix, cells: IAM_CELLS) do |p| + key_e = iam_key_expr(p[:container], p[:key]) + val_e = iam_value_expr(p[:value]) + parts = ["FN main() RETURNS Void ->", " #{iam_decl(p[:container])}"] + prep = iam_prep(p[:container]); parts << prep unless prep.empty? + ks = iam_key_setup(p[:container], p[:key]); parts << ks unless ks.empty? + vs = iam_value_setup(p[:value]); parts << vs unless vs.empty? + parts << " box[#{key_e}] = #{val_e};" + parts << " #{iam_expected_read(p[:container], key_e, p[:value])}" + parts << " RETURN;" + parts << "END" + parts.join("\n") + "\n" +end diff --git a/tools/fuzz/templates/match_matrix.rb b/tools/fuzz/templates/match_matrix.rb new file mode 100644 index 000000000..0e566c42b --- /dev/null +++ b/tools/fuzz/templates/match_matrix.rb @@ -0,0 +1,76 @@ +# Template: MATCH lowering matrix. +# +# Targets src/mir/mir_lowering.rb#lower_match. Dark arms = the subject +# shape x arm kind x default-presence cross-product: union payload +# variant, union unit variant, enum, with and without DEFAULT. The +# corpus exercised only a couple of these shapes. +# +# Confirmed syntax (transpile-tests/52_union.cht): UNION with payload +# and unit variants, `Type{ Variant: v }` construction, `PARTIAL MATCH +# subj START Type.Variant -> ...; DEFAULT -> ...; END`. +# +# expected :pass; a failing/leaking :pass cell is a SURFACED bug. + +MM_CELLS = [] +MM_SUBJECT = %i[union_payload union_unit enum] +MM_DEFAULT = %i[with_default no_default] + +MM_SUBJECT.each do |s| + MM_DEFAULT.each do |d| + MM_CELLS << { subject: s, default: d } + end +end + +def mm_default_arm(d) + d == :with_default ? " DEFAULT -> got = 9.0;\n" : "" +end + +FuzzGenerator.register(:match_matrix, cells: MM_CELLS) do |p| + case p[:subject] + when :union_payload + <<~CHT + UNION Shape { Circle: Float64, Rect: Float64, Empty } + + FN main() RETURNS Void -> + s = Shape{ Circle: 2.0 }; + MUTABLE got: Float64 = 0.0; + PARTIAL MATCH s START + Shape.Circle -> got = 1.0;, + Shape.Rect -> got = 2.0;, + #{mm_default_arm(p[:default])} END + ASSERT got == 1.0, "union payload match"; + RETURN; + END + CHT + when :union_unit + <<~CHT + UNION Shape { Circle: Float64, Rect: Float64, Empty } + + FN main() RETURNS Void -> + s = Shape.Empty; + MUTABLE got: Float64 = 0.0; + PARTIAL MATCH s START + Shape.Empty -> got = 5.0;, + Shape.Circle -> got = 1.0;, + #{mm_default_arm(p[:default])} END + ASSERT got == 5.0, "union unit match"; + RETURN; + END + CHT + when :enum + <<~CHT + ENUM Dir { North, South, East } + + FN main() RETURNS Void -> + d: Dir = Dir.South; + MUTABLE got: Float64 = 0.0; + PARTIAL MATCH d START + Dir.North -> got = 1.0;, + Dir.South -> got = 2.0;, + #{mm_default_arm(p[:default])} END + ASSERT got == 2.0, "enum match"; + RETURN; + END + CHT + end +end