migration-mvp: DM-18 migration verification in CI#60
Merged
Conversation
Parser (libpg-query WASM), schema loader with reverse FK index, grounding gate (DM-01..05), safety gate (DM-15..19), spec translator, corpus replay engine, 7 golden fixtures, 39 test assertions. Backtest on 761 real migrations (cal.com, formbricks, supabase): DM-18 produced 19 true positives and 0 false positives. Measured claim frozen in scripts/mvp-migration/MEASURED-CLAIMS.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds migration verification to the PR Action: - Detects .sql migration files in PR changed-file list - Loads schema from prior migrations on the base branch - Runs grounding + safety gates against each new migration - Posts findings as a PR comment with shape IDs and ack instructions - DM-18 (NOT NULL without default) blocks merge - DM-15 (DROP with FK dependents) warning-only until fully calibrated Includes libpg-query WASM binary in dist/action/ for the Action runtime. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Intentionally triggers DM-18 (ADD COLUMN NOT NULL without DEFAULT). Delete after validating the Action comments and blocks correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
import.meta.main is always falsy in esbuild CJS output, which prevented run() from executing in the GitHub Action. Use process.env.GITHUB_ACTIONS instead — set by GitHub Actions runtime, not set in local dev. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
❌ Verify Agent Check1 of 3 gates failed. Issues found in this PR.
IssuesAccess Control: 1 error(s), 2 warning(s): 1× path traversal Details
Powered by @sovereign-labs/verify — deterministic verification of agent edits. |
The Action was detecting verify's own test fixtures and corpus files as migration files, producing noise findings. Exclude paths under scripts/, fixtures/, tests/, and corpus/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tion Schema bootstrap now handles both layouts: - Flat: migrations/TIMESTAMP_name.sql (Supabase, hand-written SQL) - Prisma: migrations/TIMESTAMP_name/migration.sql (cal.com, formbricks) Also handles multi-directory PRs by scanning each unique migration root instead of stopping after the first. Removes the test migration file (20260412_test_dm18.sql) that was added to validate DM-18 detection. Validation passed on run 24309045630. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Agent corpus (agent-corpus.ts):
15 migration tasks weighted toward DM-18 and FK/drop cases.
Calls LLM (Gemini/Claude), captures generated SQL, runs verify,
compares agent hit rate vs human baseline.
First run (Gemini 2.5 Flash, 15 tasks):
4/14 parsed runs triggered DM-18 (28.6%) vs 2.5% human baseline.
Agent is inconsistent — sometimes uses safe atomic pattern,
sometimes uses risky multi-step pattern, depending on prompt phrasing.
Historical follow-up (historical-followup.ts):
Scans subsequent migrations after each DM-18 TP for evidence of
reverts, backfills, or cleanup. Of 19 TPs:
8 strong evidence (3 explicit NOT NULL reverts, 5 same-migration backfills)
4 weak evidence
7 no evidence
Strongest finding: cal.com shipped guestCompany NOT NULL on April 4,
reverted to nullable the next day in a migration named
"make_guest_company_and_email_optional".
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added OpenAI/GPT-4o support to agent-corpus.ts. First three-model run on 15 migration tasks: Source | Migrations | DM-18 hits | Hit rate ------------------|------------|------------|-------- Human (backtest) | 761 | 19 | 2.5% Claude Sonnet | 15 | 0 | 0.0% Gemini 2.5 Flash | 14 | 5 | 35.7% GPT-4o | 15 | 3 | 20.0% Key finding: model safety profiles vary dramatically. Gemini uses risky multi-step pattern (ADD→UPDATE→SET NOT NULL). GPT-4o uses raw NOT NULL without default on some tasks. Claude consistently uses safe atomic pattern (ADD COLUMN ... NOT NULL DEFAULT). All three models hallucinated the same nonexistent table on dm01-01 (DM-01: UserPreferences not found). Sample size is 15 tasks — directional, not yet publishable externally. Next session: expand to 50-100 tasks and split by task type. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
.sqlmigration files in PRs, loads schema from prior migrations, runs gates, posts findings as PR commentMeasured claim
On a reviewed replay set of 761 real migrations across 3 repos, DM-18 produced 19 true positives and 0 false positives.
Test plan
migrations/20260412_test_dm18.sql) that should trigger DM-18 — verify the Action comments correctly🤖 Generated with Claude Code