Skip to content

promote_resolved_unmapped helper + merge duplicate alpha-ketoglutamate → CHEBI:30915#70

Merged
realmarcin merged 1 commit into
mainfrom
feat/promote-alpha-ketoglutarate
Jun 16, 2026
Merged

promote_resolved_unmapped helper + merge duplicate alpha-ketoglutamate → CHEBI:30915#70
realmarcin merged 1 commit into
mainfrom
feat/promote-alpha-ketoglutarate

Conversation

@realmarcin

Copy link
Copy Markdown
Collaborator

Backlog #3 (unmapped curation): builds the promotion helper and resolves the second batch-1 "ready" record.

New helper: scripts/promote_resolved_unmapped.py

Automates the multi-surface unmapped→mapped migration recipe end to end:

  • move + transform the record between the source collections (canonical CHEBI label from local OAK chebi.db, PROMOTED_TO_MAPPED history, header counts);
  • regenerate per-record files + docs;
  • insert the SSSOM skos:<predicate> row in subject-label sort order;
  • verify with reconcile_sssom + validate_sssom_invariants.

Default dry-run; --apply writes. Key safety feature: a PK-collision guard that refuses to create a duplicate CHEBI primary key (handles exact/close; narrow/broad need registry SSSOM rows → hand-curate).

The guard immediately earned its keep

The ready-to-map alpha-ketoglutamate (UNMAPPED_0323) targets CHEBI:30915 — which is already a mapped record ("alpha-ketoglutaric acid"), and the anion CHEBI:16810 is too. So it's a duplicate, not a new mapping.

Merged it into the existing CHEBI:30915 record (added alpha-ketoglutamate as a synonym + a MERGED_FROM_UNMAPPED_DUPLICATE history entry, mirroring the prior a-Ketoglutaric_Acid merge) and removed UNMAPPED_0323. Unmapped: 397 → 396.

Verification

  • reconcile_sssom: in sync (GAP/ORPHAN/STALE 0). The merge needed no SSSOM change (survivor already has its row; the removed record had 0 occurrences).
  • SSSOM invariants: Rules A/B1/B2/B3 pass.
  • validate-strict: 0 errors over 2274 files; full suite 359 passed. No id/label changes.

🤖 Generated with Claude Code

…utamate into CHEBI:30915

Backlog #3 (unmapped curation):

- New scripts/promote_resolved_unmapped.py automates the unmapped→mapped migration
  recipe end to end (move+transform between collections, canonical CHEBI label from
  the local OAK chebi.db, SSSOM row inserted in subject-label sort order, regenerate
  per-record + docs, verify reconcile/invariants). Default dry-run; --apply writes.
  Key safety feature: a PK-collision guard that refuses to create a duplicate CHEBI
  primary key. Handles exact/close (narrow/broad need registry rows — hand-curate).

- That guard immediately earned its keep: the ready-to-map alpha-ketoglutamate
  (UNMAPPED_0323) targets CHEBI:30915, which is ALREADY a mapped record
  ("alpha-ketoglutaric acid") — and the anion CHEBI:16810 is too. So it's a
  DUPLICATE, not a new mapping. Merged it into the existing CHEBI:30915 record
  (added "alpha-ketoglutamate" as a synonym + a MERGED_FROM_UNMAPPED_DUPLICATE
  history entry, mirroring the prior a-Ketoglutaric_Acid merge) and removed
  UNMAPPED_0323. (unmapped 397 → 396)

Verified: reconcile_sssom in sync (GAP/ORPHAN/STALE 0); SSSOM invariants A/B1/B2/B3;
validate-strict 0 errors; full suite 359 passed. No id/label changes.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@realmarcin realmarcin merged commit d363fa9 into main Jun 16, 2026
5 checks passed
@realmarcin realmarcin deleted the feat/promote-alpha-ketoglutarate branch June 16, 2026 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant