feat(annoont): AnnotationOntology.from_prd_input + fix msbuilder get_feature lookup by jplfaria · Pull Request #26 · cshenry/ModelSEEDpy

jplfaria · 2026-06-10T18:18:03Z

Two related changes that unblock the modelseed-api bulk-reconstruction endpoint (Phase 3 per the PRD).

Changes

1. New factory `AnnotationOntology.from_prd_input`

Mirrors from_kbase_data but for inputs whose ontology terms have NOT been pre-translated to ModelSEED reaction IDs.

@staticmethod
def from_prd_input(
    genome_id,
    annotations,        # {gene_id: {ontology_type: [{term, score}, ...]}}
    data_dir,
    translator,         # Callable[[namespaced_term: str], list[msrxn_id: str]]
    method="PRD",
    method_version="1.0",
    timestamp=None,
)

The translator callable is injected so the factory stays decoupled from any specific translation backend. The canonical impl is KBUtilLib.KBAnnotationUtils.translate_term_to_modelseed; tests use a small in-memory fake (no data files needed).

Synthesizes one AnnotationOntologyEvent per ontology type, keeps priority-list logic untouched. Unmapped terms (translator returns []) are retained with an empty msrxns set per the PRD requirement that unmapped genes never silently disappear.

2. Fix latent bug at `msbuilder.py:789` in `build_from_annotaton_ontology`

The line called anno_ont.get_feature(gene.id) but AnnotationOntology has no such method (features are keyed in genes or cdss dicts). Any call path that reached this line would AttributeError the moment it tried to attach evidence to a built reaction. Replaced with the correct accessor:

annoont_gene = anno_ont.genes.get(gene.id) or anno_ont.cdss.get(gene.id)

Tests

9 unit tests in tests/core/test_annotationontology.py:

7 cases for from_prd_input: happy path, multi-gene/multi-ontology, score recording, default score, unmapped retention, namespaced-term passing, empty input
2 regression tests for the msbuilder fix: one positive (the accessor works), one negative lock (asserts AnnotationOntology still has no get_feature method, so if it ever gets added back the msbuilder accessor must be reconciled)

All 9 pass locally.

Context

This branch is the upstream half of Phase 3 (bulk reconstruction endpoint) in ModelSEED/modelseed-api. The endpoint's per-genome loop calls MSBuilder.build_from_annotaton_ontology directly, so the msbuilder.py:789 bug is a hard prereq for that work. from_prd_input is the entry point into AnnotationOntology for the PRD's {gene_id: {ontology_type: [{term, score}]}} input shape, replacing the KBase-shaped from_kbase_data path on this code path.

🤖 Generated with Claude Code

…feature lookup Two related changes that unblock the modelseed-api bulk-reconstruction endpoint (Phase 3 per Chris Henry's PRD). 1. New factory: AnnotationOntology.from_prd_input(genome_id, annotations, data_dir, translator, ...). Mirrors from_kbase_data but for inputs whose ontology terms have NOT been pre-translated to ModelSEED reaction IDs. The translator callable is injected so the factory stays decoupled from any specific translation backend (KBUtilLib.KBAnnotationUtils.translate_term_to_modelseed is the canonical impl; tests use a small in-memory fake). Input shape: {gene_id: {ontology_type: [{term, score}, ...]}}. Synthesizes one AnnotationOntologyEvent per ontology type, keeps priority-list logic untouched. Unmapped terms (translator returns []) are retained with an empty msrxns set - per PRD, unmapped genes must never silently disappear. 2. Fix latent bug at msbuilder.py:789 in build_from_annotaton_ontology. The line called anno_ont.get_feature(gene.id) but AnnotationOntology has no such method (features are keyed in genes or cdss dicts). Any call path that reached this line would AttributeError as soon as it tried to attach evidence to a built reaction. Replaced with the correct accessor: anno_ont.genes.get(gene.id) or anno_ont.cdss.get(gene.id). 9 unit tests cover both changes: - 7 cases for from_prd_input (happy path, multi-gene/multi-ontology, score recording, default score, unmapped retention, namespaced-term passing, empty input) - 2 regression tests for the msbuilder fix (one positive, one negative lock to flag if get_feature is ever added back without reconciling the msbuilder line) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Builds N metabolic models in one job from probabilistic-annotation inputs. Per-genome COBRApy JSON models plus combined reactions.csv + genes.csv land in the user's workspace. Implements Chris Henry's Phase 3 PRD. Request shape: list of genomes, each carrying {gene_id: {ontology_type: [{term, score}]}}. Hard cap 100 per call (Pydantic). gapfill OFF by default, FVA ON by default; both flippable per call. Pipeline per genome (sequential, in-process): 1. AnnotationOntology.from_prd_input(...) - the new upstream factory in cshenry/ModelSEEDpy#26. Translator callable injected from KBUtilLib.KBAnnotationUtils so the factory stays decoupled. 2. MSReconstructionUtils.compute_ontology_model_changes(...) to compute reactions_to_add for the build helper. 3. MSReconstructionUtils.build_metabolic_model(..., reactions_to_add) (Chris confirmed: NOT kb_build_metabolic_models, which is KBase- wrapper code we don't want.) 4. Optional MSReconstructionUtils.gapfill_metabolic_model(...) when gapfill=true. 5. Optional FVA (rich + minimal media) when fva=true. Uses bulk_export.compute_fva_classes. 6. Per-genome cobra JSON written to workspace at /<user>/modelseed/bulk_<job_id>/model_<genome_id>.json 7. Rows accumulated for the combined CSVs (one workspace write per CSV at the end, not per-genome appends). Per-genome try/except so one bad genome surfaces as {status: "failed", error: "..."} in result.per_genome without aborting the rest of the batch. CSV column specs mirror KBDatalakeApps' canonical genome_reaction + genome_gene_reaction_essentially_test tables. genes.csv carries an extra `disposition` column (`mapped`|`unmapped`) per the PRD requirement that unmapped genes never silently disappear. Workflow body lives in tasks._run_bulk_reconstruct so the Celery task (thin wrapper that adds progress callbacks) and the subprocess job-script entry point share one implementation and can't drift. Schema, route, dispatcher, task, subprocess script, CSV builders + FVA helpers, 16 unit tests for the CSV layer (full row-shape + column-order + aggregation coverage), and a user-facing docs/BULK_RECONSTRUCT.md. Live deploy is gated on cshenry/ModelSEEDpy#26 being merged + poplar rebuilt; locally everything imports against the branched fork. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(annoont): AnnotationOntology.from_prd_input + fix msbuilder get_feature lookup#26

feat(annoont): AnnotationOntology.from_prd_input + fix msbuilder get_feature lookup#26
jplfaria wants to merge 1 commit into
cshenry:mainfrom
jplfaria:feat/annotation-ontology-from-prd-input

jplfaria commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jplfaria commented Jun 10, 2026

Changes

1. New factory AnnotationOntology.from_prd_input

2. Fix latent bug at msbuilder.py:789 in build_from_annotaton_ontology

Tests

Context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. New factory `AnnotationOntology.from_prd_input`

2. Fix latent bug at `msbuilder.py:789` in `build_from_annotaton_ontology`