Skip to content

v1.0.3: Fix critical DMA bugs in crt0 startup code#105

Open
CTalkobt wants to merge 37 commits into
mainfrom
dev_v1.0.3
Open

v1.0.3: Fix critical DMA bugs in crt0 startup code#105
CTalkobt wants to merge 37 commits into
mainfrom
dev_v1.0.3

Conversation

@CTalkobt

@CTalkobt CTalkobt commented Jun 2, 2026

Copy link
Copy Markdown
Owner

Summary

Both fixes applied to crt0.s (stack convention) and crt0_zp.s (ZP convention).

Test plan

  • All unit tests pass (make test)
  • All mmemu execution tests pass
  • Disassembly verified correct register write sequence
  • Test on real MEGA65 hardware (ZP save/restore round-trip)

🤖 Generated with Claude Code

CTalkobt and others added 2 commits June 2, 2026 15:10
The crt0 startup code uses STZ to clear DMA registers ($D700, $D703),
but the Z register is not guaranteed to be zero at program entry. If Z
holds a non-zero value from the prior context, the DMA list address and
trigger register receive incorrect values, causing silent corruption
during ZP save/restore.

Fix: add `ldz #0` before the first DMA invocation in both crt0.s
(stack convention) and crt0_zp.s (ZP convention).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The DMA list address was written to the wrong F018B registers:
- MSB was going to $D702 (bank register)
- $D703 was being cleared (control register, should not be used as trigger)
- LSB was going to $D701 (MSB register)
- $D700 was cleared to 0 (triggering DMA with wrong list address)

Fixed to use the correct F018B sequence:
1. $D703 ← EN018B=1 (enable F018B enhanced DMA list format)
2. $D702 ← 0 (bank)
3. $D701 ← MSB of DMA list address
4. $D700 ← LSB of DMA list address (triggers DMA)

Applied to both crt0.s (stack convention) and crt0_zp.s (ZP convention).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@MassiveBattlebotsFan

Copy link
Copy Markdown

The DMA lists are each one byte short of an F018B list (modulo is implemented as a .byte, but it should be a .word), so the MSB of the modulo on the restore list consists of the first byte of the zeropage save area.
This shouldn't break anything, but is non-compliant, and could cause future issues if/when the modulo field actually does something.

CTalkobt and others added 27 commits June 2, 2026 17:03
…, optimizer BRK corruption

Four bugs caused the game_of_life clear_grid/step crash:

1. Proc parameter offset assignment reversed (AssemblerParser.cpp)
   The proc directive assigned stack offsets in reverse iteration order,
   causing the first declared param to get the highest offset. With the
   compiler's right-to-left push, the first param is closest to SP
   (lowest offset). This swapped memset's dest/count params, filling
   10832 bytes from $03E8 and overwriting the code segment.
   Fix: changed loop from reverse to forward iteration.

2. rtn #0 skips callee stack cleanup (AssemblerGenerator.cpp, AssemblerParser.cpp)
   Hand-written stdlib functions use rtn #0 which emitted plain RTS,
   but the callee-cleanup convention requires RTS #N to pop params.
   Fix: rtn #N now auto-adds currentProc->totalParamSize.

3. memcpy return value offset wrong after PHA restore (memcpy.s)
   After PLZ/STZ restored saved ZP bytes, the return value loading
   still used +2 offsets for the (now-popped) PHA saves.
   Fix: use __sp_base+_p_dest (no +2) for post-restore access.

4. Optimizer tail-dedup BRK corruption (cherry-pick of 8514aed from dev_v1.1)
   Optimizer-created BRA/label/RTS statements had empty segmentName,
   causing the generator to skip them (segment filter mismatch),
   leaving BRK gap bytes. This corrupted step()'s PLZ frame cleanup.

Also: palette_fade Makefile now auto-builds lib as a dependency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
memset used `cpy $05` (compare Y with count_lo) to detect loop end,
but Y tracks page offset while count decrements independently. When
count reaches 0, Y != count_lo in most cases, so the loop overflows —
writing ~65000 extra bytes and corrupting memory.

Fix: changed `cpy $05` to `ldx $05` to check if count_lo is zero
(when count_hi is already zero). Applied to both stack and ZP variants.

Added examples/c/memset_screen: fills screen with each character
value 0-255 in a loop, exercising memset with a 1000-byte fill.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RTS #N ($62 nn) is unreliable on some 45GS02 hardware, causing stack
leaks that corrupt return addresses after repeated function calls.

Changed calling convention from callee-cleanup (RTS #N) to
caller-cleanup (PLZ x N after JSR):

- IRCodeGen.cpp: emit PLZ instructions after each stack-convention
  call to pop argument bytes, instead of relying on callee's RTS #N
- AssemblerGenerator.cpp: endproc now always emits plain RTS ($60);
  reverted rtn auto-add of procParamSize
- AssemblerParser.cpp: endproc sizing always 1 byte; reverted rtn
  sizing changes
- memcpy.s: use plain rts instead of rtn #0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add AZ-pair frame load/store ops (ldaz.fp, staz.fp) that avoid ZP scratch
usage. IRCodeGen now loads I16 hi byte into Z for frame destinations.
stax.fp rewritten to use pha/txa/taz/pla instead of ZP scratch.

NOTE: Multiple mmemu execution tests currently failing — I16 frame
store changes need debugging before this is ready.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous staz.fp changes broke I16 frame stores by:
1. Loading hi byte into Z instead of X for frame-destined constants,
   but canSkipTransfer optimization left X unloaded even when b1!=b0
2. storeVreg checked if Z was "known" (any value) rather than whether
   Z actually held the hi byte

Fix: Add valueByte_[4] tracking to IRCodeGen that records which register
was actually loaded with each value byte. CONST always loads X for I16
hi byte (standard AX convention). storeVreg checks valueByte_[1] to
pick staz.fp (when Z holds hi) vs stax.fp (safe default).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: The linked crt0 uses F018B DMA to save/restore ZP ($08-$FF).
The mmemu emulator's DMA handler executes the copy correctly but then
terminates the program instead of resuming CPU execution, so _main()
never runs.

The direct-compile path (cc45 → ca45) uses an inline startup stub with
a loop-based ZP save — no DMA — which works fine in mmemu.

Fix:
- Add compile_direct_test() helper to test_mmemu.sh
- Switch all 5 mega65.h tests from compile_link_test to compile_direct_test
- Keyboard test: inline C reimplementation of key_pressed() since direct
  compile doesn't link the stdlib
- Align DMA job data in crt0.s as defensive measure for linked mode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restore linked compilation path for mega65.h hardware register tests
so they exercise the full crt0 + stdlib pipeline. These tests will
fail until mmemu#45 (F018B DMA CPU halt bug) is resolved.

The compile_direct_test helper is retained for future use.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a CONST I16 is immediately followed by STORE to a frame-allocated
local, fuse into lda #lo / ldz #hi / staz.fp — loads hi byte into Z
directly, avoiding the pha/txa/taz/pla transfer that stax.fp requires.

Also set valueByte_[0..1] in the STORE handler's store-forwarding path
so non-CONST frame stores correctly identify A:X as the value source.

Saves ~5 bytes per constant frame initialization.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New AY-pair frame load/store ops for use when the Z register holds a
value that must be preserved (e.g., loop counter, I32 byte 3):

  lday.fp offset  — load 16-bit from frame into A (lo) and Y (hi)
  stay.fp offset  — store A (lo) and Y (hi) to frame

Completes the register-pair frame access family:
  ldax.fp/stax.fp  — AX pair (standard, X→Z transfer in stax.fp)
  lday.fp/stay.fp  — AY pair (Z-preserving)
  ldaz.fp/staz.fp  — AZ pair (X-free, preferred for constants)
  ldaxyz.fp/staxyz.fp — AXYZ quad (32-bit)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed 32-bit right shifts by 8, 16, or 24 bits now use byte shuffles
with sign extension instead of looping through single-bit asr.32 ops.

Before: >> 16 emitted 16 iterations of asr.32 .AXYZ (~160 bytes)
After:  >> 16 emits 10 instructions (~15 bytes) with sign extension

Sign extension uses CMP #$80; LDA #0; SBC #0 to produce $FF (negative)
or $00 (positive) for the vacated high bytes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document that the O(N²×Z) expiry loop is a theoretical concern only:
N rarely exceeds 200 vregs, Z is capped at 64 ZP slots, so worst case
is ~12K iterations — microseconds in practice.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New tool and common library for creating, reading, and manipulating
Commodore disk images:

Library (DiskImage base + 4 format implementations):
  - D64: C64 1541, 35 tracks, variable sectors/track (170KB)
  - D71: C128 1571, 70 tracks, double-sided D64 (340KB)
  - D81: C65/MEGA65 1581, 80×40 uniform (800KB)
  - D65: MEGA65 native, 162 tracks, double-sided D81 (1.6MB)

Common operations: format, add/remove/extract files, list directory,
BAM management, PETSCII filename conversion, sector chain traversal.

CLI tool (disk45):
  disk45 create <image> [-n name] [-i id]
  disk45 list|info <image>
  disk45 add <image> <file> [cbm_name]
  disk45 extract <image> <cbm_name> <file>
  disk45 remove <image> <cbm_name>

All four formats verified with round-trip data integrity tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GZ compression (transparent):
  - Any disk image can be compressed: .d64.gz, .d81.gz, .d65.gz etc.
  - Auto-detected on load (magic bytes 1F 8B), .gz extension on save
  - Uses system zlib for inflate/deflate
  - 819KB D81 → ~900 bytes when empty, proportional with content

ARK (Arkive) archive format:
  - Uncompressed CBM file collection (29-byte directory entries)
  - Full read/write/add/remove/extract support
  - Block-aligned (254 bytes) data storage

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Commodore ARC archive format with full decompression:
  - Mode 0: Stored (uncompressed)
  - Mode 1: Packed (RLE with configurable control byte)
  - Mode 2: Squeezed (Huffman coding)
  - Mode 3: Crunched (LZW 12-bit, Terry Welch algorithm)
  - Mode 4: Squeezed + Packed (Huffman + RLE)
  - Mode 5: Crunched one-pass (LZW with trailing checksum)

Write support uses stored mode (mode 0). SDA (Self-Dissolving Archive)
headers are detected and skipped automatically.

Format details based on Peter Schepers' ARC.TXT specification and the
cbmconvert unarc.c reference (clean-room reimplementation).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comprehensive documentation covering:
- All commands (create, list, info, add, extract, remove)
- All disk formats (D64, D71, D81, D65) with capacity/layout details
- Archive formats (ARK, ARC/SDA) with compression mode table
- GZ transparent compression
- PETSCII filename handling
- Makefile integration examples
- C++ library API reference

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document disk45 in the main codebase reference: supported formats,
commands, usage examples, and library API.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LNX (Lynx) archive format:
  - BASIC stub with "LYNX" signature + ASCII directory + 254-byte
    block-aligned data
  - Full read/write/add/remove/extract support
  - Directory entries stored as ASCII (CR-terminated fields)

New dump command for disk images:
  disk45 dump <image>                # BAM/header hex dump
  disk45 dump <image> <track>        # all sectors on a track
  disk45 dump <image> <track> <sec>  # single sector hex dump

Output includes hex bytes + ASCII printable characters.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LNX (Lynx) archive format:
  - BASIC stub + ASCII header + 254-byte block-aligned file data
  - Full read/write/add/remove/extract support

New commands:
  disk45 rename <image> <old> <new>      — rename file in directory
  disk45 label <image> [-n name] [-i id] — change disk name/ID
  disk45 validate <image>                — check BAM consistency
    Reports: cross-linked sectors, orphaned sectors, broken chains
  disk45 bam <image>                     — visual sector allocation map
    Shows per-track free/used with . and # characters

setDiskName/setDiskId added to all four disk format classes.
renameFile and validate added to DiskImage base class.
isSectorFree made public for BAM visualization.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When adding or extracting SEQ files, -p converts between ASCII and
PETSCII text encoding:

  disk45 add image.d81 readme.seq "README" -p   # ASCII → PETSCII
  disk45 extract image.d81 "README" out.seq -p  # PETSCII → ASCII

Conversion handles:
  - Case mapping: a-z ↔ $41-$5A, A-Z ↔ $C1-$DA
  - Line endings: LF ↔ CR (strips \r for CRLF input)
  - Special chars: ~ ↔ π, | ↔ bar, \ ↔ £

Round-trip verified: ASCII → PETSCII → ASCII preserves content.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete documentation refresh covering all 11 commands, LNX format,
-p/--petscii flag, command summary table, and library API updates
for renameFile/setDiskName/setDiskId/validate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
  disk45 lock <image> <name>     Set bit 6 of file type (locked)
  disk45 unlock <image> <name>   Clear bit 6 (unlocked)

Locked files display with '<' suffix in directory listing (CBM convention).
lockFile() method added to DiskImage base class.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adapted from SDCC's GCC 8.2 torture test subset (gte/ directory),
filtered for cc45 compatibility (no float, long long, printf, malloc).

Current status: 39/480 compile, remainder blocked by:
  - #106: type specifier combinations (long int, short int) — ~15 tests
  - #107: multi-variable declarations (int a, b, c) — ~70 tests
  - #108: implicit int return type / K&R functions — ~140 tests
  - #109: anonymous/inline struct declarations — ~31 tests

Tests use testfwk.h adapter which maps abort() → $4000=0xFF and
exit(0) → $4000=0xAA for mmemu validation.

adapt_tests.sh can re-generate from a fresh SDCC SVN checkout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tc.)

Accept standard C type specifier combinations where multiple keywords
form a single type:
  long int, short int, signed int, unsigned int,
  signed long, unsigned long, signed short, unsigned short,
  signed char, unsigned char

After matching LONG or SHORT, an optional trailing INT token is consumed.
Applied to all type-parsing locations: variable declarations, function
return types, parameters, casts, sizeof, alignof, va_arg, typedef, and
function pointer signatures.

Fixes #106. Unblocks GTE torture tests using combined type specifiers.
GTE compile count: 39 → 40 (remainder blocked by #107/#108/#109).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Standard C library functions for program termination:

  abort()  — weak, calls __abort (BRK). Override for pre-abort hooks.
  exit()   — weak, calls __exit (ZP restore + RTS). Override for atexit.
  _exit()  — strong, always calls __exit directly. Non-overridable.
  __abort  — core abort implementation in crt0 (BRK instruction).
  __exit   — core exit implementation in crt0 (ZP restore + SP restore + RTS).

Design: users override the weak abort()/exit() for pre-termination hooks
(cleanup, logging), then call __abort/__exit for actual termination.

Both stack and zpCall convention versions provided.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
testfwk.h now just includes <stdlib.h> and <string.h>. No macro
overrides of abort/exit — tests use the real library functions.

GTE tests should be compiled with -c and linked with c45.lib to
get abort() and exit() from the standard library.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CTalkobt and others added 8 commits June 13, 2026 12:09
Parse comma-separated declarators after the type specifier:
  int a, b = 3, *c, d[4];
  static int x = 1, y = 0;
  unsigned char m = 0xAA, n = 0xBB;

Each additional declarator reuses the base type, qualifiers, and
signedness. Pointer levels, array dims, and initializers are parsed
independently per declarator. Multiple declarations are wrapped in
a CompoundStatement.

Global multi-var declarations propagate isGlobal/isStatic/isExtern/
isSigned flags to all declarators in the compound.

Fixes #107. GTE compile count: 54 → 72 (+18 tests unblocked).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implicit int (C89):
  foo() { return 42; }         — return type defaults to int
  static bar(int x) { ... }   — works with storage class specifiers

K&R parameter declarations:
  int add(a, b)
      int a;
      int b;
  { return a + b; }

Parameters in the K&R list default to int if no type declaration
follows. Type declarations after ')' update matching parameter
names with the declared type, pointer level, and qualifiers.

Also fixed: the top-level parser now consumes extern/static/inline
tokens before calling parseFunctionDeclaration() for the implicit-int
code path.

Fixes #108. GTE compile count: 72 → 80 (+8 tests unblocked).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Support inline struct/union definitions in all contexts:

  struct { int x; int y; } point;           — anonymous struct variable
  struct RGB { int r,g,b; } color;          — named struct + variable
  typedef struct { int x; } Point;          — typedef with anon struct
  typedef struct Vec { int dx; } Vec;       — typedef with named struct
  void foo(struct { int a; } *p) { ... }    — in parameters/returns/casts

Anonymous structs get auto-generated names (<anon_struct_N>).
Inline definitions are registered via pendingDefinitions and emitted
before the variable declaration that uses them.

Applied to all 9 type-parsing locations: variable declarations, function
return types, parameters, casts, sizeof, alignof, va_arg, typedef, and
function pointer signatures.

Also handles struct Name { ... } var; at both global and local scope
(previously only struct Name { ... }; was accepted).

Fixes #109. GTE compile count: 80 → 87 (+7 tests unblocked).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…hers

Parse GCC-style __attribute__((...)) in function declarations, variable
declarations, and after struct definitions.

Recognized attributes (silently accepted):
  - noinline: prevent function inlining (no-op, cc45 doesn't auto-inline aggressively)
  - noclone: prevent function cloning (no-op, cc45 doesn't clone)
  - packed: packed struct layout (already default in cc45)

Warned and skipped (#110-#114):
  - noipa: no interprocedural analysis (cc45 has no IPA)
  - aligned: alignment control (use _Alignas instead)
  - mode: force type width (QI/HI/SI/byte)
  - vector_size: SIMD vectors (not applicable to 8-bit)
  - __may_alias__: type punning (cc45 has no TBAA)

Unknown attributes emit a warning and are skipped.

Both __attribute__ and __attribute forms accepted. Handles double-paren
syntax ((attr)), comma-separated lists, and parenthesized arguments.

GTE compile count: 87 → 91 (+4 tests unblocked).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Compiler-level (parsed in parsePrimary):
  __builtin_constant_p(x) — returns 1 for literals/constant expressions,
    0 for variables/calls. Evaluates at parse time using AST node types.
  __builtin_expect(x, v) — returns x (branch hint, no-op)
  __builtin_trap() — maps to BRK via __abort
  __builtin_unreachable() — no-op (UB if reached)

Library-level (individual source files in lib/stdlib/):
  String: __builtin_memcpy, __builtin_memset, __builtin_memmove,
    __builtin_memcmp, __builtin_strlen, __builtin_strcpy, __builtin_strcmp
    — each wraps the corresponding stdlib function
  Math: __builtin_abs, __builtin_labs — inline C
  Bit ops: __builtin_ffs, __builtin_clz, __builtin_ctz, __builtin_popcount
    — hand-written 45GS02 assembly
  Other: __builtin_bswap16 (C), __builtin_trap (asm → __abort)

Both stack and zpCall convention versions provided.
Declarations added to <stdlib.h> and <string.h>.

GTE compile count: 91 → 97 (+6 tests unblocked).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Unnamed function parameters (valid in prototypes):
  void foo(int, char *);
  int bar(int, int, ...);
  Auto-generates internal names (__unnamed_N) for unnamed params.

Multi-variable struct/union member declarations:
  struct Point { long p_x, p_y; };
  struct RGB { unsigned char r, g, b; };
  struct Mixed { int a, *b, c[3]; };
  Each additional member after comma reuses the base type/qualifiers.
  Supports pointer levels, array dims, and bitfield widths per member.

GTE compile count: 97 → 116 (+19 tests unblocked).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
__attribute__ now accepted in all standard GCC positions:
  - Between return type and function name: void __attribute__((noinline)) foo()
  - After struct/union keyword: struct __attribute__((packed)) S { }
  - Before/after typedef alias: typedef int __attribute__((aligned)) T;
  - After variable name: int x __attribute__((unused));
  - After struct member type: struct { int __attribute__((packed)) x; }
  - After struct def before variable: struct S { } __attribute__((packed)) var;

Implicit function declarations (C89):
  - Calling undeclared functions now emits a warning instead of an error
  - Function is assumed to return int with unspecified parameters
  - Enables compilation of code that relies on C89 implicit declarations

GTE compile count: 116 → 202 (+86 tests, 42.1%).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ositions

__extension__ (GCC):
  Recognized as a token and silently skipped in expressions, declarations,
  and qualifier loops. No semantic effect — just suppresses pedantic warnings
  in GCC, which cc45 doesn't need.

void* local declarations:
  void *p = ...; now accepted in local scope (void was missing from the
  statement-level type list that triggers parseVariableDeclaration).
  Also added void to parseVariableDeclaration's type matching.

GTE compile count: 202 → 208 (+6 tests, 43.3%).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

v1.0.3: crt0.s assumes Z register is zero without validation v1.0.3: crt0.s DMA list address written to wrong registers

2 participants