Skip to content

migration-mvp: DM-18 migration verification in CI#60

Merged
Born14 merged 8 commits into
mainfrom
migration-mvp
Apr 12, 2026
Merged

migration-mvp: DM-18 migration verification in CI#60
Born14 merged 8 commits into
mainfrom
migration-mvp

Conversation

@Born14
Copy link
Copy Markdown
Owner

@Born14 Born14 commented Apr 12, 2026

Summary

  • Adds schema-grounded Postgres migration verification to verify
  • Parser (libpg-query WASM), schema loader with reverse FK index, grounding gate (DM-01..05), safety gate (DM-15..19)
  • GitHub Action detects .sql migration files in PRs, loads schema from prior migrations, runs gates, posts findings as PR comment
  • DM-18 (NOT NULL without default) blocks merge. DM-15 is warning-only.
  • Backtest on 761 real migrations across 3 repos: 19 true positives, 0 false positives

Measured claim

On a reviewed replay set of 761 real migrations across 3 repos, DM-18 produced 19 true positives and 0 false positives.

Test plan

  • 39 unit/integration test assertions pass (schema loader + gates)
  • Local Action simulation passes (5 scenarios: DM-18 blocks, safe passes, ack suppresses, hallucinated table blocks, file detection)
  • This PR includes a test migration (migrations/20260412_test_dm18.sql) that should trigger DM-18 — verify the Action comments correctly
  • Delete test migration after validation

🤖 Generated with Claude Code

Born14 and others added 4 commits April 12, 2026 09:29
Parser (libpg-query WASM), schema loader with reverse FK index,
grounding gate (DM-01..05), safety gate (DM-15..19), spec translator,
corpus replay engine, 7 golden fixtures, 39 test assertions.

Backtest on 761 real migrations (cal.com, formbricks, supabase):
DM-18 produced 19 true positives and 0 false positives.

Measured claim frozen in scripts/mvp-migration/MEASURED-CLAIMS.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds migration verification to the PR Action:
- Detects .sql migration files in PR changed-file list
- Loads schema from prior migrations on the base branch
- Runs grounding + safety gates against each new migration
- Posts findings as a PR comment with shape IDs and ack instructions
- DM-18 (NOT NULL without default) blocks merge
- DM-15 (DROP with FK dependents) warning-only until fully calibrated

Includes libpg-query WASM binary in dist/action/ for the Action runtime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Intentionally triggers DM-18 (ADD COLUMN NOT NULL without DEFAULT).
Delete after validating the Action comments and blocks correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
import.meta.main is always falsy in esbuild CJS output, which prevented
run() from executing in the GitHub Action. Use process.env.GITHUB_ACTIONS
instead — set by GitHub Actions runtime, not set in local dev.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 12, 2026

❌ Verify Agent Check

1 of 3 gates failed. Issues found in this PR.

Gate Status Detail
Constraints (K5) ✅ Pass 0 active constraint(s), none violated
Containment (G5) ✅ Pass All 49 edit(s) traced to predicates (49 direct, 0 scaffolding)
Access Control ❌ Fail 1 error(s), 2 warning(s): 1× path traversal

Issues

Access Control: 1 error(s), 2 warning(s): 1× path traversal

Details
  • Predicates checked: 114
  • Extraction tiers: diff, cross-file
  • Duration: 1.5s
  • Gates run: 3 (2 passed, 1 failed)
  • Timing: 49ms

Powered by @sovereign-labs/verify — deterministic verification of agent edits.
Questions? GitHub Discussions

Born14 and others added 4 commits April 12, 2026 09:36
The Action was detecting verify's own test fixtures and corpus files
as migration files, producing noise findings. Exclude paths under
scripts/, fixtures/, tests/, and corpus/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tion

Schema bootstrap now handles both layouts:
- Flat: migrations/TIMESTAMP_name.sql (Supabase, hand-written SQL)
- Prisma: migrations/TIMESTAMP_name/migration.sql (cal.com, formbricks)

Also handles multi-directory PRs by scanning each unique migration root
instead of stopping after the first.

Removes the test migration file (20260412_test_dm18.sql) that was added
to validate DM-18 detection. Validation passed on run 24309045630.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Agent corpus (agent-corpus.ts):
  15 migration tasks weighted toward DM-18 and FK/drop cases.
  Calls LLM (Gemini/Claude), captures generated SQL, runs verify,
  compares agent hit rate vs human baseline.

  First run (Gemini 2.5 Flash, 15 tasks):
    4/14 parsed runs triggered DM-18 (28.6%) vs 2.5% human baseline.
    Agent is inconsistent — sometimes uses safe atomic pattern,
    sometimes uses risky multi-step pattern, depending on prompt phrasing.

Historical follow-up (historical-followup.ts):
  Scans subsequent migrations after each DM-18 TP for evidence of
  reverts, backfills, or cleanup. Of 19 TPs:
    8 strong evidence (3 explicit NOT NULL reverts, 5 same-migration backfills)
    4 weak evidence
    7 no evidence

  Strongest finding: cal.com shipped guestCompany NOT NULL on April 4,
  reverted to nullable the next day in a migration named
  "make_guest_company_and_email_optional".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added OpenAI/GPT-4o support to agent-corpus.ts. First three-model run
on 15 migration tasks:

  Source            | Migrations | DM-18 hits | Hit rate
  ------------------|------------|------------|--------
  Human (backtest)  | 761        | 19         | 2.5%
  Claude Sonnet     | 15         | 0          | 0.0%
  Gemini 2.5 Flash  | 14         | 5          | 35.7%
  GPT-4o            | 15         | 3          | 20.0%

Key finding: model safety profiles vary dramatically.
Gemini uses risky multi-step pattern (ADD→UPDATE→SET NOT NULL).
GPT-4o uses raw NOT NULL without default on some tasks.
Claude consistently uses safe atomic pattern (ADD COLUMN ... NOT NULL DEFAULT).

All three models hallucinated the same nonexistent table on dm01-01
(DM-01: UserPreferences not found).

Sample size is 15 tasks — directional, not yet publishable externally.
Next session: expand to 50-100 tasks and split by task type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Born14 Born14 merged commit 626bf0d into main Apr 12, 2026
1 check failed
@Born14 Born14 deleted the migration-mvp branch April 16, 2026 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant