Skip to content

fix(codegen): validate every identifier before interpolation into DDL#83

Merged
hyperpolymath merged 1 commit into
mainfrom
fix/validate-ddl-identifiers
May 14, 2026
Merged

fix(codegen): validate every identifier before interpolation into DDL#83
hyperpolymath merged 1 commit into
mainfrom
fix/validate-ddl-identifiers

Conversation

@hyperpolymath
Copy link
Copy Markdown
Owner

Summary

Per V-L2-G1: generate_sidecar_schema was a SQL-injection sink. A table name like posts''); DROP TABLE x;-- would have been interpolated verbatim into the metadata seed INSERT.

  • Add crate::codegen::ident::validate_identifier accepting ^[A-Za-z_][A-Za-z0-9_]*$ with a 63-char cap. Error message names the offending identifier + kind.
  • Gate every table/column name at the entry of generate_sidecar_schema. Return becomes anyhow::Result<String>.
  • Update main.rs, in-module tests, and tests/integration_test.rs to handle the Result.

Closes

Test plan

  • cargo clippy --all-targets -- -D warnings clean
  • cargo test --lib --bins 42/42 (4 new in codegen::ident::tests)
  • 16 attack strings tested (canonical injection + semicolons, quotes, comments, whitespace, non-ASCII, dots, parens)
  • Overlong (>63 char) identifiers rejected
  • Error message includes both the bad identifier and the kind label

Closes #39.

`generate_sidecar_schema` interpolated `table.name` and PK column names
directly into raw SQL via `format!()`. A table named
`posts'); DROP TABLE x;--` would be injected verbatim into the metadata
seed INSERT. The DDL generator was a SQL-injection sink wide open to
whatever the schema parser handed it.

Add a `crate::codegen::ident::validate_identifier` helper:

  - Accepts `^[A-Za-z_][A-Za-z0-9_]*$` with a 63-character length cap
    (Postgres' NAMEDATALEN minus null terminator).
  - Returns `anyhow::Error` whose message names the offending
    identifier (truncated to 60 chars for legibility) and the kind
    label ("table name" / "column name") so users can find it in
    their schema input.

Gate every identifier at the entry of `generate_sidecar_schema` —
every `table.name` and every `column.name` — so downstream `format!()`
sites are safe by construction. Return type becomes
`anyhow::Result<String>`.

Update call sites:

  - `main.rs` Generate command: `?` propagates the error to the user.
  - In-module tests: `.expect("test schema must validate")`.
  - `tests/integration_test.rs`: same.

New tests in `codegen::ident::tests`:

  - `valid_identifiers_pass`: 7 happy-path cases.
  - `injection_strings_rejected`: 16 attack strings including the
    canonical `posts'); DROP TABLE x;--`, semicolons, quotes,
    comments, whitespace, non-ASCII, dots, parens.
  - `overlong_identifiers_rejected`: 64-char input fails.
  - `error_names_offending_identifier`: error message includes both
    the bad identifier and the kind label.

`cargo clippy --all-targets -- -D warnings` clean; 42 unit tests pass
(up from 38).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@hyperpolymath hyperpolymath merged commit 8db7095 into main May 14, 2026
16 of 18 checks passed
@hyperpolymath hyperpolymath deleted the fix/validate-ddl-identifiers branch May 14, 2026 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

V-L2-G1: validate + escape every user-controlled identifier in generated DDL

1 participant