PerryTS · proggeramlug · Apr 23, 2026 · Apr 23, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -8,7 +8,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 
 Perry is a native TypeScript compiler written in Rust that compiles TypeScript source code directly to native executables. It uses SWC for TypeScript parsing and LLVM for code generation.
 
-**Current Version:** 0.5.164
+**Current Version:** 0.5.166
 
 ## TypeScript Parity Status
 
@@ -153,6 +153,8 @@ First-resolved directory cached in `compile_package_dirs`; subsequent imports re
 
 Keep entries to 1-2 lines max. Full details in CHANGELOG.md.
 
+- **v0.5.166** — Fix #145: README's "Perry beats Node.js and Bun on every benchmark" claim wasn't defensible — `benchmarks/baseline.json` already recorded `bench_json_roundtrip` as Perry 591 ms vs Node 375 ms (~1.58× slower), and that benchmark wasn't in the README table. Verified by rerunning `bench_json_roundtrip` on v0.5.165 locally: Perry best-of-5 588 ms, Node 369 ms, Bun 245 ms → ~1.6× slower than Node, ~2.4× slower than Bun. Also spot-checked `bench_gc_pressure` (baseline showed Perry 16 vs Node 13, which would have been the other ceiling; fresh runs show Perry 16 vs Node 20 — Perry now wins, so the baseline row was stale and json_roundtrip is genuinely the only current exception). Amended the claim in `README.md:53` to name the exception rather than weaseling ("beats … on every benchmark below **except `json_roundtrip`**, where Node is ~1.6× faster and Bun ~2.4× faster — tracked as a stdlib JSON perf bug (#149)"); added a `json_roundtrip | 588ms | 369ms | 245ms | **1.6x slower** | 50× JSON.parse + JSON.stringify on a ~1MB, 10K-item blob` row to the public comparison table so the README and artifact agree. Dated the new row as `rerun 2026-04-23 on v0.5.165` (existing rows keep their `2026-04-22 on v0.5.164` footer). Filed #149 as the tracking issue for the underlying perf work (candidate directions: shape-caching for parse fast path; arena/allocator scoping — baseline shows Perry at 307 MB RSS vs Node 157 MB, so allocator pressure is part of the gap, not just parse throughput). No code change in this bump — followup is the actual json.rs perf work, tracked separately.
+- **v0.5.165** — Fix #144: TypeScript decorators were parsed into HIR but silently dropped at codegen — programs using `@Component` / `@log` / `@Module` etc. compiled without error and the decorator body never ran. Verified the ticket's claims hold: `lower_decorators` in `crates/perry-hir/src/lower_types.rs:760` exists and populates `Decorator` structs on functions (HIR field at `crates/perry-hir/src/ir.rs:654`), but the only consumers in `perry-codegen` are two `decorators: Vec::new()` constructions for imported-class stubs — no path reads the real vec. `monomorph.rs` clones the field forward but feeds nothing downstream. Empirical repro: `@logClass class Greeter { @logMethod greet(...) }` compiled clean and the `console.log("DECORATOR RAN")` bodies never executed. README.md:488 listed `| Decorators | ✅ |` while docs/src/language/limitations.md has a dedicated "No Decorators" section — the Limitations page is correct. Fix: added `reject_decorators(class, name)` helper at `crates/perry-hir/src/lower_decl.rs:394` that walks every decoration point on a `&ast::Class` — class-level (`class.decorators`), method-level (`ClassMember::Method.function.decorators`), private-method (`PrivateMethod.function.decorators`), property (`ClassProp.decorators` + `PrivateProp.decorators`), constructor parameters (`ParamOrTsParamProp::{Param, TsParamProp}.decorators`), and instance-method parameters (`method.function.params[].decorators`) — and `bail!`s on the first non-empty vec with a message naming the decorator, the owning class/method, and pointing at `docs/src/language/limitations.md#no-decorators`. Called from both `lower_class_decl` (class declarations) and `lower_class_from_ast` (anonymous class expressions like `new (class { ... })()`). Helpers `decorator_name_hint` and `method_key_hint` extract readable names from SWC AST (`@log` → `"log"`, `@Cached({ ttl: 60 })` → `"Cached"`, falling back to `"<decorator>"` for non-identifier decorator expressions). Same failure-mode reasoning as v0.5.119's warn→bail upgrade: silent no-ops are the worst compile outcome because the executable appears to work. Flipped README.md row to `| Decorators | ❌ ([not supported](docs/src/language/limitations.md#no-decorators)) |`. Deleted `test-files/test_decorators.ts` (its header comment claimed "Perry implements @log as a compile-time transformation" — never true; runtime output never included the "Calling <method>" prefix the file was testing for). Stripped the `Calculator` class + "Test 2: Decorators" block from `test-files/test_integration_app.ts` (the rest — private fields, file I/O, strings, arrays — was unrelated and still passes). Verified against three repros: `@logClass class Greeter` → "decorators are not supported (found `@logClass` on class `Greeter`)"; `@logMethod add(...)` → "found `@logMethod` on method `Calc.add`"; `doThing(@inject arg: string)` → "parameter decorators are not supported (found `@inject` on a parameter of `Svc.doThing`)". Non-decorator classes (`test_getters_setters.ts`, `test_gap_class_advanced.ts`, rewritten `test_integration_app.ts`) compile unchanged. Side discovery: `example-code/nestjs-typeorm/src/**` uses `@Module`/`@Controller`/`@Injectable`/`@Entity` heavily, so that example was never actually functional — before this fix it compiled but produced an executable that did nothing NestJS-like (NestJS's entire behavior is decorator-driven); after this fix it fails loudly at compile. Left untouched for now (separate from the #144 scope), but the README's "nestjs-typeorm" example row is misleading and should be removed or labeled as non-working in a follow-up. The `decorators: Vec<Decorator>` HIR field and `lower_decorators` function are left in place as dead-but-harmless code — preserving the shape for a future real implementation is cheaper than deleting and re-adding.
 - **v0.5.164** — Fix #140: restore autovectorization of pure-accumulator loops regressed between v0.5.22 and v0.5.162. Two compounding changes had kicked `for (let i=0; i<N; i++) sum+=1;` off the `<2 x double>` parallel-accumulator reduction path that v0.5.22's LLVM -O3 pipeline used to widen across 4 interleaved lanes. (1) Issue #48/#49's i32 shadow slot for integer-valued mutable locals was gated only on `integer_locals`, not on *usage*: every `let sum = 0` that only ever participates in `sum = sum + 1` writes (no array indexing) still got a parallel `i32` alloca, so the Let-emission path wrote `add i32 %shadow, 1; store i32; sitofp to double; store double (dead)` where the old v0.5.22 path had a clean `load/fadd/store` chain. Even after DSE eliminates the dead double store, the vectorizer bails on the dual-slot reduction pattern. Fix: added `collect_index_used_locals` walker to `crates/perry-codegen/src/collectors.rs` that collects LocalIds appearing in any `index` subtree of `IndexGet`/`IndexSet`/`IndexUpdate`/`BufferIndex{Get,Set}`/`Uint8Array{Get,Set}`/`ArrayAt`/`ArrayWith`/`StringAt`/`StringCodePointAt` (conservative over-approximation — `arr[i+1]`, `arr[(i|0)]`, `buf[k*4+j]` all mark their inner locals, so real loop counters keep the optimization). Threaded through `FnCtx.index_used_locals` at all 6 `compile_*` sites; stmt.rs's Let-emission now AND-gates `needs_i32_slot` on `index_used_locals.contains(id)`. The `lower_for` counter-specific i32 slot (the `classify_for_length_hoist` path for `for(...;i<arr.length;...)`) is untouched — it runs after Let lowering and still allocates its own slot for bounded-index fast-path loops. (2) Issue #74's `asm sideeffect "", ""()` barrier was emitted at the end of every "LLVM-pure" body, including bodies whose only operation is a `LocalSet` to an outer-scope local. But for accumulator loops the outer local (`sum`) is already observed by `console.log("sum:" + sum)` after the loop, so the barrier is redundant — and its sideeffect-semantics kill vectorization. Refined `body_is_observably_side_effect_free` → new `body_needs_asm_barrier`: body must be pure AND must not write to any local declared outside the loop body. Truly-empty `for (;;) {}` / `while (cond) {}` bodies still trigger the barrier (the #74 repro case), but `for (;;) sum+=1` with outer `sum` no longer does. Verified via `-Rpass=loop-vectorize`: post-opt IR now shows `<2 x double>` vec.phi with interleave count 4 (exact shape from the issue). Benchmark deltas (best-of-5 on M-series, default per-module pipeline): `loop_overhead` 32ms→**12ms** (matches v0.5.22 baseline), `math_intensive` 48ms→**14ms** (matches), `accumulate` 97ms→**24ms** (matches). Array benchmarks that depend on the i32-shadow path for counter-as-index are unchanged: `array_write` 3ms, `array_read` 4ms, `nested_loops` 9ms (the `index_used_locals` set marks their counters, preserving the fast-path). Issue #74's empty-loop protection verified intact: `for (let i=0; i<100M; i++) {}` still runs for ~34ms on both default and bitcode-link pipelines (was the original bug: 0ms). Ran gap tests (`test_gap_array_methods`, `test_gap_closures`) — no regressions.
 
 - **v0.5.163** — docs+chore (#139, tracking #140): respond to polyglot-benchmark scrutiny and audit the suite. Issue #139 cited `benchmarks/bench_loop_only.ts` (a scratchpad file that does 100×100K iterations — not 100M like the Rust comparator in `benchmarks/polyglot/bench.rs`) as evidence the `loop_overhead` comparison was inflating Perry's 8x Rust win. Scratchpad file was a January-era dev artifact — never referenced by `benchmarks/polyglot/run_all.sh`, which uses `benchmarks/suite/02_loop_overhead.ts` (a flat 100M loop with matching checksum to Rust). Cleanup: deleted 17 stale `bench_*.ts` + `test_inline*.ts` + `bench_loop_only.ts` scratchpads in `benchmarks/` root, 15 January `benchmarks/results/*.txt` run logs, and `benchmarks/simple_loop` (a committed compiled binary). The four files `benchmarks/run_benchmarks.sh` still references (`bench_fibonacci`, `bench_array_ops`, `bench_string_ops`, `bench_bitwise`) are kept; the `*.ts` pattern guard already exists in `.gitignore`. Reran `polyglot/run_all.sh 5` on current main — 8 polyglot cells confirmed workload-parity with Rust/C++/Go/Swift/Java/Node/Bun/Python via checksum agreement, but three cells regressed vs the v0.5.22 (e1cbd37) baseline in `RESULTS.md`: `loop_overhead` 12→32 ms, `math_intensive` 14→48 ms, `accumulate` 24→97 ms. IR-level bisect attributes the regression to two compounding changes — #74's `asm sideeffect` loop-body barrier at v0.5.91 and an over-eager i32 shadow counter for integer-valued accumulator locals (not just loop counters) — both of which kick the LLVM default pipeline off the vectorization path it was on at v0.5.22. Filed as #140 with the pre-opt/post-opt IR diff and three candidate fixes. Tightened the comparison narrative: in response to @MaxGraey's follow-up on #139, reran `g++ -O3 -ffast-math bench.cpp` → C++ drops from 96 ms to 11 ms, confirming the entire `loop_overhead` gap is the default fast-math flag choice (Perry emits `reassoc contract` on f64 ops because TS `number` semantics allow it; Rust/C++/Go/Swift default to strict-IEEE fadd and hit the 3-cycle latency wall). Updated `README.md` Perry-vs-{Node,Bun} + Perry-vs-compiled-languages tables with fresh best-of-3 numbers from the full suite run, kept the historical `LLVM backend progress` column honest (method_calls 2→1, fibonacci 310→302, factorial 24→96, closure 8→15, etc. — the regressed cells are shown at their current values, not hidden), and rewrote the narrative in `benchmarks/polyglot/RESULTS.md` sections `loop_overhead` / `math_intensive` / `accumulate` to lead with the fast-math-default explanation (linking `RESULTS_OPT.md`, which already documented the `-ffast-math` opt-sweep back at v0.5.22) and point to #140 for the vectorization regression specifically. No codegen or runtime changes in this bump; fix for the regression will land separately against #140.

diff --git a/Cargo.lock b/Cargo.lock
diff --git a/Cargo.toml b/Cargo.toml
@@ -104,7 +104,7 @@ opt-level = "s"       # Optimize for size in stdlib
 opt-level = 3
 
 [workspace.package]
-version = "0.5.164"
+version = "0.5.166"
 edition = "2021"
 license = "MIT"
 repository = "https://github.com/PerryTS/perry"

diff --git a/README.md b/README.md
@@ -50,7 +50,7 @@ People are building real apps with Perry today. Here are some highlights:
 
 ## Performance
 
-Perry beats Node.js and Bun on every benchmark. Best of 5 runs, macOS ARM64 (Apple Silicon), Node.js v25, Bun 1.3, rerun 2026-04-22 on v0.5.164.
+Perry beats Node.js and Bun on every benchmark below **except `json_roundtrip`**, where Node is ~1.6× faster and Bun ~2.4× faster — tracked as a stdlib JSON perf bug ([#149](https://github.com/PerryTS/perry/issues/146)). Best of 5 runs, macOS ARM64 (Apple Silicon), Node.js v25, Bun 1.3, rerun 2026-04-22 on v0.5.164 (json_roundtrip row rerun 2026-04-23 on v0.5.165).
 
 | Benchmark | Perry | Node.js | Bun | vs Node | What it tests |
 |-----------|-------|---------|-----|---------|---------------|
@@ -69,6 +69,7 @@ Perry beats Node.js and Bun on every benchmark. Best of 5 runs, macOS ARM64 (App
 | prime_sieve | 3ms | 7ms | 7ms | **2.3x faster** | Sieve of Eratosthenes |
 | mandelbrot | 21ms | 24ms | 29ms | **1.1x faster** | Complex f64 iteration (800x800) |
 | matrix_multiply | 19ms | 33ms | 33ms | **1.7x faster** | 256x256 matrix multiply |
+| json_roundtrip | 588ms | 369ms | 245ms | **1.6x slower** | 50× `JSON.parse` + `JSON.stringify` on a ~1MB, 10K-item blob ([#149](https://github.com/PerryTS/perry/issues/146)) |
 
 Perry compiles to native machine code via LLVM — no JIT warmup, no interpreter overhead. Key optimizations: **scalar replacement** of non-escaping objects (escape analysis eliminates heap allocation entirely — object fields become registers), inline bump allocator for objects that do escape, i32 loop counters for bounded array access, `reassoc contract` fast-math flags, integer-modulo fast path (`fptosi → srem → sitofp` instead of `fmod`), elimination of redundant `js_number_coerce` calls on numeric function returns, i64 specialization for pure numeric recursive functions, and `<2 x double>` parallel-accumulator vectorization on pure-fadd reduction loops (restored in v0.5.164 via [#140](https://github.com/PerryTS/perry/issues/140)).
 
@@ -485,7 +486,7 @@ perry publish macos   # or: ios / android / linux
 | Spread operator in calls and literals | ✅ |
 | RegExp (test, match, replace) | ✅ |
 | BigInt (256-bit) | ✅ |
-| Decorators | ✅ |
+| Decorators | ❌ ([not supported](docs/src/language/limitations.md#no-decorators)) |
 
 ### Standard Library