Skip to content

fix(codegen): length-prefix composite entity_id; NULL-safe; no collision#77

Merged
hyperpolymath merged 1 commit into
mainfrom
fix/composite-pk-hash-id
May 14, 2026
Merged

fix(codegen): length-prefix composite entity_id; NULL-safe; no collision#77
hyperpolymath merged 1 commit into
mainfrom
fix/composite-pk-hash-id

Conversation

@hyperpolymath
Copy link
Copy Markdown
Owner

Summary

Per V-L2-B2: composite-PK entity_id was joining columns with ::, which collides whenever a PK value contains ::, and (on Postgres) returned NULL whenever any operand was NULL.

Switch to length-prefix encoding: each column emits LENGTH(CAST(col AS TEXT))::text || '':'' || CAST(col AS TEXT) and parts are concatenated with no inter-column separator. Explicit lengths disambiguate column boundaries — no separator string can collide. NULL gets the literal 'N' (distinguishable from empty string '0:' and from values starting with N which carry a length prefix).

The issue recommended SHA-256 hashing. Length-prefix achieves the same "no collision risk" property without needing pgcrypto / SQLite hash extensions / Postgres-version dependencies. Tradeoff (no uniform output length) isn't covered by the acceptance criteria.

Closes

Test plan

  • cargo clippy --all-targets -- -D warnings clean
  • cargo test --lib --bins 34/34 pass (2 new)
  • test_entity_id_expr_composite_no_separator_collision checks distinct column counts produce distinct shapes
  • test_entity_id_expr_composite_mongodb_uses_plus_concat keeps MongoDB lane's + operator

…ollision

Closes #44.

`build_entity_id_expr` joined composite-PK columns with `'::'`. Any PK
value containing `::` collapsed to a different PK's encoded form (the
documented collision). Worse, on PostgreSQL `||` returns NULL for the
whole expression if any operand is NULL, so any nullable composite-PK
column silently broke entity_id generation.

Switch to **length-prefix encoding**: each column emits
`LENGTH(CAST(col AS TEXT))::text || ':' || CAST(col AS TEXT)` and the
parts are concatenated with no inter-column separator. Explicit lengths
disambiguate column boundaries, so distinct PK values across rows can
never produce the same encoding — regardless of what characters the
values contain.

NULL handling: each part is wrapped in `COALESCE(..., 'N')` so NULL
encodes as the literal `'N'`. Distinguishable from empty string
(encodes as `'0:'`) and from values starting with `N` (those carry a
length prefix). Side effect: also fixes the Postgres NULL-propagation
bug.

The issue recommended SHA-256 hashing. Length-prefix achieves the same
"no collision risk" property using only plain SQL, no extensions
(pgcrypto / SQLite hash extension) and no Postgres-version dependency.
The "uniform length" property is sacrificed but isn't needed for
correctness — only as an indexing hint, which isn't covered by the
acceptance criteria.

Tests:

  - `test_entity_id_expr_composite_pk`: asserts the new shape (LENGTH,
    COALESCE, no '::').
  - `test_entity_id_expr_composite_no_separator_collision`: distinct
    column counts produce distinct shapes; each column gets exactly
    one length-prefix block; no '::' anywhere.
  - `test_entity_id_expr_composite_mongodb_uses_plus_concat`: MongoDB
    branch uses `+` (not `||`) per the existing convention.

`cargo clippy --all-targets -- -D warnings` clean; 34 unit tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@hyperpolymath hyperpolymath merged commit 0c9b766 into main May 14, 2026
16 of 18 checks passed
@hyperpolymath hyperpolymath deleted the fix/composite-pk-hash-id branch May 14, 2026 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

V-L2-B2: composite entity_id ':: '-separator collides; switch to hash-derived id

1 participant