promote_resolved_unmapped helper + merge duplicate alpha-ketoglutamate → CHEBI:30915#70
Merged
Merged
Conversation
…utamate into CHEBI:30915 Backlog #3 (unmapped curation): - New scripts/promote_resolved_unmapped.py automates the unmapped→mapped migration recipe end to end (move+transform between collections, canonical CHEBI label from the local OAK chebi.db, SSSOM row inserted in subject-label sort order, regenerate per-record + docs, verify reconcile/invariants). Default dry-run; --apply writes. Key safety feature: a PK-collision guard that refuses to create a duplicate CHEBI primary key. Handles exact/close (narrow/broad need registry rows — hand-curate). - That guard immediately earned its keep: the ready-to-map alpha-ketoglutamate (UNMAPPED_0323) targets CHEBI:30915, which is ALREADY a mapped record ("alpha-ketoglutaric acid") — and the anion CHEBI:16810 is too. So it's a DUPLICATE, not a new mapping. Merged it into the existing CHEBI:30915 record (added "alpha-ketoglutamate" as a synonym + a MERGED_FROM_UNMAPPED_DUPLICATE history entry, mirroring the prior a-Ketoglutaric_Acid merge) and removed UNMAPPED_0323. (unmapped 397 → 396) Verified: reconcile_sssom in sync (GAP/ORPHAN/STALE 0); SSSOM invariants A/B1/B2/B3; validate-strict 0 errors; full suite 359 passed. No id/label changes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backlog #3 (unmapped curation): builds the promotion helper and resolves the second batch-1 "ready" record.
New helper:
scripts/promote_resolved_unmapped.pyAutomates the multi-surface unmapped→mapped migration recipe end to end:
chebi.db,PROMOTED_TO_MAPPEDhistory, header counts);skos:<predicate>row in subject-label sort order;reconcile_sssom+validate_sssom_invariants.Default dry-run;
--applywrites. Key safety feature: a PK-collision guard that refuses to create a duplicate CHEBI primary key (handles exact/close; narrow/broad need registry SSSOM rows → hand-curate).The guard immediately earned its keep
The ready-to-map
alpha-ketoglutamate(UNMAPPED_0323) targetsCHEBI:30915— which is already a mapped record ("alpha-ketoglutaric acid"), and the anionCHEBI:16810is too. So it's a duplicate, not a new mapping.→ Merged it into the existing
CHEBI:30915record (addedalpha-ketoglutamateas a synonym + aMERGED_FROM_UNMAPPED_DUPLICATEhistory entry, mirroring the priora-Ketoglutaric_Acidmerge) and removedUNMAPPED_0323. Unmapped: 397 → 396.Verification
reconcile_sssom: in sync (GAP/ORPHAN/STALE 0). The merge needed no SSSOM change (survivor already has its row; the removed record had 0 occurrences).validate-strict: 0 errors over 2274 files; full suite 359 passed. No id/label changes.🤖 Generated with Claude Code