Tech-debt batch 1: CI gate, custom-NNP repair, OOM resilience, lint/CLI fixes by isayev · Pull Request #94 · isayevlab/Auto3D_pkg

isayev · 2026-06-12T02:09:59Z

Implements the first batch of the post-hardening tech-debt roadmap
(docs/superpowers/plans/2026-06-11-tech-debt-backlog-roadmap.md). Six
independent workstreams, each TDD'd with a regression test that fails without
the fix. Release item excluded per request.

Workstreams

W3 — lint debt (style): cleared the ruff import-sort + zip-strict + quoted-annotation debt in previously-untouched modules (thermo.py, batchopt.py, padding.py, model_wrapper.py, models/__init__.py, utils/__init__.py). src/Auto3D/ is now fully ruff-clean.
W1 — test CI gate (ci): new .github/workflows/tests.yml runs the suite on push/PR with a matrix without the ani extra (the 8 torchani tests skip) and with it (those tests run for the first time in CI), plus a ruff + mypy job. Previously there was no CI running the tests.
W2 — custom-NNP repair (fix): the custom-NNP feature was broken — CustomModelAdapter only did torch.jit.load, and the example tests called torch.jit.script(...) on a model embedding AIMNet2, which aimnet no longer supports scripting. Now the adapter (and check_input) load eager OR scripted models, auto-detecting. The AIMNet2-based example tests save eager (torch.save) and run. New fast, ungated regression test (tests/test_custom_nnp_eager.py).
W4 — models info (fix): any aimnet2-* registry name (including future variants without their own info block) resolves to the AIMNet2 entry instead of "Unknown engine"; removed the stale --use-ensemble note (no such flag; ensemble removed).
W5 — OOM resilience (fix): forward_batched now recovers from a CUDA OOM by retrying the failing slice with a halved batch; a single molecule that still OOMs raises a clear OptimizationError instead of crashing opaquely.
W6 — input_format single source of truth (refactor): dropped the redundant WorkflowOrchestrator.input_format instance attribute; everything reads the config field (the Hardening: per-module audit + deferred-gaps remediation #93 architect minor).

Worth a closer look (W2 judgment call)

Repairing the custom-NNP tests surfaced a pre-existing example flaw: userNNP2 wraps the bare aimnet model (create_model("AIMNET").model), which omits the calculator's external D3 dispersion (~0.09 Ha for cyclooctane). So it does not match the D3-inclusive reference the old (crashing) tests hardcoded. I loosened those example-test tolerances (0.01→0.2 Ha SPE, 0.02→0.2 thermo) so they verify "the custom pipeline loads + runs + produces a sane energy" rather than exact DFT agreement (a NaN still fails the bound). The strict test_calc_spe_aimnet / test_calc_thermo_aimnet (full calculator, with D3) keep their tight tolerances. Making the example itself D3-correct is noted as a follow-up.

Test Plan

pytest tests/ -q -m "not slow" → 628 passed, 8 skipped, 0 failed, 0 errors (without the ani extra).
ruff check src/Auto3D/ → clean.
CI [ani] job: installs torchani and runs the 8 otherwise-skipped ANI tests + the slow custom-NNP tests.
pytest tests/test_SPE.py::test_calc_spe_userNNP2 -m slow passes (verified locally; the previously-crashing custom-NNP path now runs).

Remaining roadmap items (W7 docs refresh, W8 ANI2xt fp64, W9 benchmark harness) are deferred to follow-up PRs.

… PRs

…stom-NNP tests CustomModelAdapter and check_input now load a custom NNP as either a TorchScript archive (torch.jit.load) OR an eager nn.Module (torch.load), auto-detecting. Modern AIMNet2-based models are no longer torch.jit.script-able (aimnet dropped scripting support), which is why the userNNP2/userNNP3 tests crashed at torch.jit.script before Auto3D ever saw the file. The AIMNet2-based example tests now save eager (torch.save) instead of torch.jit.script and run. They wrap the BARE aimnet model (create_model.model), which omits the calculator's external D3 dispersion (~0.09 Ha), so their tolerance is loosened from exact-DFT agreement to 'pipeline runs + sane energy'; the torchani userNNP1 tests keep jit.script (still scriptable, torchani-gated). Adds a fast, ungated regression test (tests/test_custom_nnp_eager.py) proving the eager-load path with a trivial analytic module (no aimnet/torchani). Documents the eager-or-scripted contract and the species_pad=0 NaN caveat.

…mble note

…ler batch forward_batched computed batch_size = batchsize_atoms // N once; a transient CUDA OOM crashed the whole run. Now each slice is retried with a halved batch on OOM (after empty_cache); a single molecule that still OOMs raises a clear OptimizationError instead of an opaque crash. Order is preserved (slices and sub-slices stay ascending). Adds a regression test that OOMs above batch 1.

… orchestrator attr) input_format was tracked both as a WorkflowOrchestrator instance attribute and as the config field, kept in sync by _validate_input writing both -- a desync footgun now that it is a declared, pickled config field. Drop the instance attr; read self.config.input_format everywhere. Adds a regression test.

…tor, not the bare model) The userNNP2 example wrapped the bare aimnet model (create_model.model), which omits the calculator's external D3 dispersion (~0.09 Ha), so its energies didn't match the D3-inclusive reference. Rewrite it to wrap the AIMNet2 *calculator* (D3 + Coulomb) with ragged mol_idx batching, energy-only (forces=False, so the custom adapter still autograds forces; the calculator preserves the graph when coord requires grad). The calculator is built lazily on the input device because it freezes its device at construction -- this lets the saved model round-trip through torch.save and run on multi-GPU. Tolerances restored to strict (0.01 SPE, 0.02 thermo). Two more load/introspection sites hardened for this path: - ASE/thermo.py _load_hessian_model: load custom NNPs eager-or-scripted (mirrors the CustomModelAdapter fix). - ASE/thermo.py Calculator.__init__: tolerate a param-less custom model (one that builds its backend lazily) for device/dtype detection. Verified: test_calc_spe_userNNP2 (0.01) and test_calc_thermo_userNNP2 (0.02) now pass with strict tolerances (D3-inclusive); fast suite 628 passed, src ruff-clean.

Addresses PR #94 multi-review: - Critical: the new tests.yml installed only .[ase], so pytest was missing and every CI run died with 'pytest: command not found'. Install .[ase,dev]. - Major (DRY/contract drift): the 'TorchScript-or-eager nn.Module' load logic was triplicated (CustomModelAdapter, check_input, _load_hessian_model) with subtle divergence -- _load_hessian_model lacked the nn.Module guard (a bare state_dict raised a raw AttributeError) and adapter/thermo leaked UnpicklingError where validation raised ModelLoadError. Extracted a single Auto3D.models.loading.load_custom_nnp(path, device, *, double=False) that all three call; it always returns an nn.Module on device or raises ModelLoadError. Adds helper tests (eager load, double cast, state_dict -> ModelLoadError, garbage -> ModelLoadError). Slow custom-NNP D3 tests still pass through the shared loader; fast suite 631 passed, src ruff-clean.

…st gate) The first CI run (no GPU, FORCE_COLOR) surfaced three latent test failures that an 8-GPU dev box masks: - test_workflow.py: two tests built Auto3DOptions with the default use_gpu=True, so check_input raised GPUError on the GPU-less runner. Set use_gpu=False (one is the new W6 test, one pre-existing on main). - test_cli.py::test_run_subcommand_help: rich colorizes help under FORCE_COLOR, styling '--' separately from the option name and splitting the literal '--engine' across ANSI codes. Strip ANSI before the substring asserts. Verified locally under CUDA_VISIBLE_DEVICES='' FORCE_COLOR=1: 631 passed, 0 failed.

isayev added 8 commits June 11, 2026 21:51

style: clear ruff import-sort and zip-strict debt in untouched modules

bf9086b

ci: run pytest (with/without the ani extra) and ruff/mypy on push and…

82d8e50

… PRs

fix: resolve any aimnet2-* name in models info; drop stale --use-ense…

66c2fe0

…mble note

isayev mentioned this pull request Jun 12, 2026

docs: refresh for v3.5 (remove deleted ensemble, fix version floors, document aimnet/extras/registry models) #95

Open

2 tasks

isayev merged commit cc70be9 into main Jun 12, 2026
7 of 8 checks passed

isayev deleted the tech-debt/batch-1 branch June 12, 2026 03:00

isayev mentioned this pull request Jun 12, 2026

tech-debt: v3.5 docs refresh, ANI2xt fp64 energies, optimizer benchmark #97

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tech-debt batch 1: CI gate, custom-NNP repair, OOM resilience, lint/CLI fixes#94

Tech-debt batch 1: CI gate, custom-NNP repair, OOM resilience, lint/CLI fixes#94
isayev merged 9 commits into
mainfrom
tech-debt/batch-1

isayev commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

isayev commented Jun 12, 2026

Workstreams

Worth a closer look (W2 judgment call)

Test Plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant