Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

Perry is a native TypeScript compiler written in Rust that compiles TypeScript source code directly to native executables. It uses SWC for TypeScript parsing and LLVM for code generation.

**Current Version:** 0.5.164
**Current Version:** 0.5.166

## TypeScript Parity Status

Expand Down Expand Up @@ -153,6 +153,8 @@ First-resolved directory cached in `compile_package_dirs`; subsequent imports re

Keep entries to 1-2 lines max. Full details in CHANGELOG.md.

- **v0.5.166** — Fix #145: README's "Perry beats Node.js and Bun on every benchmark" claim wasn't defensible — `benchmarks/baseline.json` already recorded `bench_json_roundtrip` as Perry 591 ms vs Node 375 ms (~1.58× slower), and that benchmark wasn't in the README table. Verified by rerunning `bench_json_roundtrip` on v0.5.165 locally: Perry best-of-5 588 ms, Node 369 ms, Bun 245 ms → ~1.6× slower than Node, ~2.4× slower than Bun. Also spot-checked `bench_gc_pressure` (baseline showed Perry 16 vs Node 13, which would have been the other ceiling; fresh runs show Perry 16 vs Node 20 — Perry now wins, so the baseline row was stale and json_roundtrip is genuinely the only current exception). Amended the claim in `README.md:53` to name the exception rather than weaseling ("beats … on every benchmark below **except `json_roundtrip`**, where Node is ~1.6× faster and Bun ~2.4× faster — tracked as a stdlib JSON perf bug (#149)"); added a `json_roundtrip | 588ms | 369ms | 245ms | **1.6x slower** | 50× JSON.parse + JSON.stringify on a ~1MB, 10K-item blob` row to the public comparison table so the README and artifact agree. Dated the new row as `rerun 2026-04-23 on v0.5.165` (existing rows keep their `2026-04-22 on v0.5.164` footer). Filed #149 as the tracking issue for the underlying perf work (candidate directions: shape-caching for parse fast path; arena/allocator scoping — baseline shows Perry at 307 MB RSS vs Node 157 MB, so allocator pressure is part of the gap, not just parse throughput). No code change in this bump — followup is the actual json.rs perf work, tracked separately.
- **v0.5.165** — Fix #144: TypeScript decorators were parsed into HIR but silently dropped at codegen — programs using `@Component` / `@log` / `@Module` etc. compiled without error and the decorator body never ran. Verified the ticket's claims hold: `lower_decorators` in `crates/perry-hir/src/lower_types.rs:760` exists and populates `Decorator` structs on functions (HIR field at `crates/perry-hir/src/ir.rs:654`), but the only consumers in `perry-codegen` are two `decorators: Vec::new()` constructions for imported-class stubs — no path reads the real vec. `monomorph.rs` clones the field forward but feeds nothing downstream. Empirical repro: `@logClass class Greeter { @logMethod greet(...) }` compiled clean and the `console.log("DECORATOR RAN")` bodies never executed. README.md:488 listed `| Decorators | ✅ |` while docs/src/language/limitations.md has a dedicated "No Decorators" section — the Limitations page is correct. Fix: added `reject_decorators(class, name)` helper at `crates/perry-hir/src/lower_decl.rs:394` that walks every decoration point on a `&ast::Class` — class-level (`class.decorators`), method-level (`ClassMember::Method.function.decorators`), private-method (`PrivateMethod.function.decorators`), property (`ClassProp.decorators` + `PrivateProp.decorators`), constructor parameters (`ParamOrTsParamProp::{Param, TsParamProp}.decorators`), and instance-method parameters (`method.function.params[].decorators`) — and `bail!`s on the first non-empty vec with a message naming the decorator, the owning class/method, and pointing at `docs/src/language/limitations.md#no-decorators`. Called from both `lower_class_decl` (class declarations) and `lower_class_from_ast` (anonymous class expressions like `new (class { ... })()`). Helpers `decorator_name_hint` and `method_key_hint` extract readable names from SWC AST (`@log` → `"log"`, `@Cached({ ttl: 60 })` → `"Cached"`, falling back to `"<decorator>"` for non-identifier decorator expressions). Same failure-mode reasoning as v0.5.119's warn→bail upgrade: silent no-ops are the worst compile outcome because the executable appears to work. Flipped README.md row to `| Decorators | ❌ ([not supported](docs/src/language/limitations.md#no-decorators)) |`. Deleted `test-files/test_decorators.ts` (its header comment claimed "Perry implements @log as a compile-time transformation" — never true; runtime output never included the "Calling <method>" prefix the file was testing for). Stripped the `Calculator` class + "Test 2: Decorators" block from `test-files/test_integration_app.ts` (the rest — private fields, file I/O, strings, arrays — was unrelated and still passes). Verified against three repros: `@logClass class Greeter` → "decorators are not supported (found `@logClass` on class `Greeter`)"; `@logMethod add(...)` → "found `@logMethod` on method `Calc.add`"; `doThing(@inject arg: string)` → "parameter decorators are not supported (found `@inject` on a parameter of `Svc.doThing`)". Non-decorator classes (`test_getters_setters.ts`, `test_gap_class_advanced.ts`, rewritten `test_integration_app.ts`) compile unchanged. Side discovery: `example-code/nestjs-typeorm/src/**` uses `@Module`/`@Controller`/`@Injectable`/`@Entity` heavily, so that example was never actually functional — before this fix it compiled but produced an executable that did nothing NestJS-like (NestJS's entire behavior is decorator-driven); after this fix it fails loudly at compile. Left untouched for now (separate from the #144 scope), but the README's "nestjs-typeorm" example row is misleading and should be removed or labeled as non-working in a follow-up. The `decorators: Vec<Decorator>` HIR field and `lower_decorators` function are left in place as dead-but-harmless code — preserving the shape for a future real implementation is cheaper than deleting and re-adding.
- **v0.5.164** — Fix #140: restore autovectorization of pure-accumulator loops regressed between v0.5.22 and v0.5.162. Two compounding changes had kicked `for (let i=0; i<N; i++) sum+=1;` off the `<2 x double>` parallel-accumulator reduction path that v0.5.22's LLVM -O3 pipeline used to widen across 4 interleaved lanes. (1) Issue #48/#49's i32 shadow slot for integer-valued mutable locals was gated only on `integer_locals`, not on *usage*: every `let sum = 0` that only ever participates in `sum = sum + 1` writes (no array indexing) still got a parallel `i32` alloca, so the Let-emission path wrote `add i32 %shadow, 1; store i32; sitofp to double; store double (dead)` where the old v0.5.22 path had a clean `load/fadd/store` chain. Even after DSE eliminates the dead double store, the vectorizer bails on the dual-slot reduction pattern. Fix: added `collect_index_used_locals` walker to `crates/perry-codegen/src/collectors.rs` that collects LocalIds appearing in any `index` subtree of `IndexGet`/`IndexSet`/`IndexUpdate`/`BufferIndex{Get,Set}`/`Uint8Array{Get,Set}`/`ArrayAt`/`ArrayWith`/`StringAt`/`StringCodePointAt` (conservative over-approximation — `arr[i+1]`, `arr[(i|0)]`, `buf[k*4+j]` all mark their inner locals, so real loop counters keep the optimization). Threaded through `FnCtx.index_used_locals` at all 6 `compile_*` sites; stmt.rs's Let-emission now AND-gates `needs_i32_slot` on `index_used_locals.contains(id)`. The `lower_for` counter-specific i32 slot (the `classify_for_length_hoist` path for `for(...;i<arr.length;...)`) is untouched — it runs after Let lowering and still allocates its own slot for bounded-index fast-path loops. (2) Issue #74's `asm sideeffect "", ""()` barrier was emitted at the end of every "LLVM-pure" body, including bodies whose only operation is a `LocalSet` to an outer-scope local. But for accumulator loops the outer local (`sum`) is already observed by `console.log("sum:" + sum)` after the loop, so the barrier is redundant — and its sideeffect-semantics kill vectorization. Refined `body_is_observably_side_effect_free` → new `body_needs_asm_barrier`: body must be pure AND must not write to any local declared outside the loop body. Truly-empty `for (;;) {}` / `while (cond) {}` bodies still trigger the barrier (the #74 repro case), but `for (;;) sum+=1` with outer `sum` no longer does. Verified via `-Rpass=loop-vectorize`: post-opt IR now shows `<2 x double>` vec.phi with interleave count 4 (exact shape from the issue). Benchmark deltas (best-of-5 on M-series, default per-module pipeline): `loop_overhead` 32ms→**12ms** (matches v0.5.22 baseline), `math_intensive` 48ms→**14ms** (matches), `accumulate` 97ms→**24ms** (matches). Array benchmarks that depend on the i32-shadow path for counter-as-index are unchanged: `array_write` 3ms, `array_read` 4ms, `nested_loops` 9ms (the `index_used_locals` set marks their counters, preserving the fast-path). Issue #74's empty-loop protection verified intact: `for (let i=0; i<100M; i++) {}` still runs for ~34ms on both default and bitcode-link pipelines (was the original bug: 0ms). Ran gap tests (`test_gap_array_methods`, `test_gap_closures`) — no regressions.

- **v0.5.163** — docs+chore (#139, tracking #140): respond to polyglot-benchmark scrutiny and audit the suite. Issue #139 cited `benchmarks/bench_loop_only.ts` (a scratchpad file that does 100×100K iterations — not 100M like the Rust comparator in `benchmarks/polyglot/bench.rs`) as evidence the `loop_overhead` comparison was inflating Perry's 8x Rust win. Scratchpad file was a January-era dev artifact — never referenced by `benchmarks/polyglot/run_all.sh`, which uses `benchmarks/suite/02_loop_overhead.ts` (a flat 100M loop with matching checksum to Rust). Cleanup: deleted 17 stale `bench_*.ts` + `test_inline*.ts` + `bench_loop_only.ts` scratchpads in `benchmarks/` root, 15 January `benchmarks/results/*.txt` run logs, and `benchmarks/simple_loop` (a committed compiled binary). The four files `benchmarks/run_benchmarks.sh` still references (`bench_fibonacci`, `bench_array_ops`, `bench_string_ops`, `bench_bitwise`) are kept; the `*.ts` pattern guard already exists in `.gitignore`. Reran `polyglot/run_all.sh 5` on current main — 8 polyglot cells confirmed workload-parity with Rust/C++/Go/Swift/Java/Node/Bun/Python via checksum agreement, but three cells regressed vs the v0.5.22 (e1cbd37) baseline in `RESULTS.md`: `loop_overhead` 12→32 ms, `math_intensive` 14→48 ms, `accumulate` 24→97 ms. IR-level bisect attributes the regression to two compounding changes — #74's `asm sideeffect` loop-body barrier at v0.5.91 and an over-eager i32 shadow counter for integer-valued accumulator locals (not just loop counters) — both of which kick the LLVM default pipeline off the vectorization path it was on at v0.5.22. Filed as #140 with the pre-opt/post-opt IR diff and three candidate fixes. Tightened the comparison narrative: in response to @MaxGraey's follow-up on #139, reran `g++ -O3 -ffast-math bench.cpp` → C++ drops from 96 ms to 11 ms, confirming the entire `loop_overhead` gap is the default fast-math flag choice (Perry emits `reassoc contract` on f64 ops because TS `number` semantics allow it; Rust/C++/Go/Swift default to strict-IEEE fadd and hit the 3-cycle latency wall). Updated `README.md` Perry-vs-{Node,Bun} + Perry-vs-compiled-languages tables with fresh best-of-3 numbers from the full suite run, kept the historical `LLVM backend progress` column honest (method_calls 2→1, fibonacci 310→302, factorial 24→96, closure 8→15, etc. — the regressed cells are shown at their current values, not hidden), and rewrote the narrative in `benchmarks/polyglot/RESULTS.md` sections `loop_overhead` / `math_intensive` / `accumulate` to lead with the fast-math-default explanation (linking `RESULTS_OPT.md`, which already documented the `-ffast-math` opt-sweep back at v0.5.22) and point to #140 for the vectorization regression specifically. No codegen or runtime changes in this bump; fix for the regression will land separately against #140.
Expand Down
53 changes: 27 additions & 26 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ opt-level = "s" # Optimize for size in stdlib
opt-level = 3

[workspace.package]
version = "0.5.164"
version = "0.5.166"
edition = "2021"
license = "MIT"
repository = "https://github.com/PerryTS/perry"
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ People are building real apps with Perry today. Here are some highlights:

## Performance

Perry beats Node.js and Bun on every benchmark. Best of 5 runs, macOS ARM64 (Apple Silicon), Node.js v25, Bun 1.3, rerun 2026-04-22 on v0.5.164.
Perry beats Node.js and Bun on every benchmark below **except `json_roundtrip`**, where Node is ~1.6× faster and Bun ~2.4× faster — tracked as a stdlib JSON perf bug ([#149](https://github.com/PerryTS/perry/issues/146)). Best of 5 runs, macOS ARM64 (Apple Silicon), Node.js v25, Bun 1.3, rerun 2026-04-22 on v0.5.164 (json_roundtrip row rerun 2026-04-23 on v0.5.165).

| Benchmark | Perry | Node.js | Bun | vs Node | What it tests |
|-----------|-------|---------|-----|---------|---------------|
Expand All @@ -69,6 +69,7 @@ Perry beats Node.js and Bun on every benchmark. Best of 5 runs, macOS ARM64 (App
| prime_sieve | 3ms | 7ms | 7ms | **2.3x faster** | Sieve of Eratosthenes |
| mandelbrot | 21ms | 24ms | 29ms | **1.1x faster** | Complex f64 iteration (800x800) |
| matrix_multiply | 19ms | 33ms | 33ms | **1.7x faster** | 256x256 matrix multiply |
| json_roundtrip | 588ms | 369ms | 245ms | **1.6x slower** | 50× `JSON.parse` + `JSON.stringify` on a ~1MB, 10K-item blob ([#149](https://github.com/PerryTS/perry/issues/146)) |

Perry compiles to native machine code via LLVM — no JIT warmup, no interpreter overhead. Key optimizations: **scalar replacement** of non-escaping objects (escape analysis eliminates heap allocation entirely — object fields become registers), inline bump allocator for objects that do escape, i32 loop counters for bounded array access, `reassoc contract` fast-math flags, integer-modulo fast path (`fptosi → srem → sitofp` instead of `fmod`), elimination of redundant `js_number_coerce` calls on numeric function returns, i64 specialization for pure numeric recursive functions, and `<2 x double>` parallel-accumulator vectorization on pure-fadd reduction loops (restored in v0.5.164 via [#140](https://github.com/PerryTS/perry/issues/140)).

Expand Down Expand Up @@ -485,7 +486,7 @@ perry publish macos # or: ios / android / linux
| Spread operator in calls and literals | ✅ |
| RegExp (test, match, replace) | ✅ |
| BigInt (256-bit) | ✅ |
| Decorators | |
| Decorators | ❌ ([not supported](docs/src/language/limitations.md#no-decorators)) |

### Standard Library

Expand Down
Loading
Loading