Skip to content

Hardening: per-module audit + deferred-gaps remediation#93

Merged
isayev merged 17 commits into
mainfrom
hardening/audit-round2
Jun 11, 2026
Merged

Hardening: per-module audit + deferred-gaps remediation#93
isayev merged 17 commits into
mainfrom
hardening/audit-round2

Conversation

@isayev

@isayev isayev commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Two-phase hardening pass over the Auto3D pipeline. Every change is a bug fix, robustness guard, or non-destructive perf improvement; no functional API changes. Each fix ships a regression test that fails without it.

Phase 1 — per-module audit (8 commits)

  • Optimizer core: guard FIRE against float32 force-norm underflow (was injecting inf/NaN); report fmax at the final geometry instead of one FIRE step stale.
  • Orchestration: isolate each optimizer chunk so one bad chunk no longer kills the worker and silently drops all remaining chunks; stop leaking the encoded temp file when a setup phase fails; reject unsupported extensions before encoding.
  • Config/validation: declare input_format as a real Auto3DOptions field (survives dataclasses.replace()); tolerate ragged .smi rows; accept aimnet registry engine names.
  • Isomer/embedding: make parallel embedding robust (broaden the per-worker except so RDKit ArgumentError no longer aborts the batch) and deterministic (submission order); guard the serial path against unparseable SMILES.
  • Stereochemistry: stop SMILES string-surgery from corrupting non-tetrahedral centers (@SP/@TH/@OH/@TB) into invalid SMILES that aborted enumeration; tolerate differing stereo-marker counts.
  • Logging: route library logging to stderr so auto3d run --json stdout stays clean, parseable JSON.
  • CLI/tautomer: resolve aimnet2 in models info; reject tauto_k < 1 (silently dropped all tautomers); add a not-found guard to config validate.
  • torch_config/ASE: make deterministic mode reversible and non-fatal (warn_only=True); guard the post-optimization ASE geometry re-read against None/missing-E_tot records.

Phase 2 — deferred-gaps remediation (8 commits)

  • Decouple the test suite from the optional torchani extra: lazy import in ANI2xt_no_rep so Auto3D.ASE.thermo (and the pure-Python thermo helpers) import without torchani; pytest.importorskip on the genuinely-ANI tests. The suite now runs clean without the extra (previously 11 failures + 1 collection abort).
  • TF32 precision: pin via the modern fp32_precision API on torch ≥2.9 (legacy allow_tf32 is deprecated), guarded by hasattr.
  • O(n²) → O(n) RMSD dedup: strip Hs once per conformer instead of per comparison, in both filtering.py and chemistry.py. Non-destructive — returned conformers keep explicit hydrogens and exact H coordinates (verified independently and asserted in tests, since the MLIP and final geometries need explicit H).
  • Concurrency: per-file guard on the omega/flipper temp-file sweep in housekeeping.
  • Docs: clarify ANI2xt float32 energy precision (self-energies cancel across conformers, so ranking is unaffected).

Implementation plan: docs/superpowers/plans/2026-06-11-hardening-gaps-remediation.md.

Test Plan

  • pytest tests/ -q → 623 passed, 8 skipped (the ANI/torchani tests skip gracefully when the optional ani extra is absent), 0 failed, 0 errors.
  • ruff check src/Auto3D/ — no new lint introduced by these changes.
  • With the ani extra installed, the 8 skipped tests run and pass (ANI2xt construction/validation paths).

isayev added 17 commits June 11, 2026 14:21
…ometry

batch_opt/fire_optimizer.py: clamp the force-norm denominator in the
velocity mix. A conformer can be 'progressing' (v.f > 0) while its float32
force norm underflows to exactly 0 (forces ~1e-24, squares round to zero),
so forces / f_norm produced inf/NaN that was selected into the velocity and
permanently corrupted that conformer (and could even pass the fmax > tol
convergence test as NaN). clamp_min(1e-30) is a no-op for any real force.

batch_opt/optimization_engine.py: recompute fmax at the final geometry
alongside the already-recomputed energy. The loop measures forces at the
pre-step geometry then always takes one more FIRE step, so the reported
fmax lagged the reported coordinates by a full step (verified: 6.73 vs 6.58
on a still-moving conformer). Padded-atom slots are zeroed before the
reduction, matching the in-loop convergence check.

Adds regression tests for both; also clears pre-existing lint in the two
touched test files.
…etup failure

workflow_workers.py: wrap optim_rank_wrapper's per-chunk loop body in
try/except. Previously a single failing chunk (a molecule the optimizer
chokes on, a CUDA OOM, an mkdir collision, or an isomer step that produced
an empty/missing SDF -> RDKit OSError when ranking opens it) killed the
whole optimizer process and silently dropped every chunk still queued
behind it, while the parent went on to finalize the partial output as
success. Each chunk is now isolated and logged; the worker drains the rest.
Also skip ranking with a clear warning when optimizing produced no output,
rather than letting RDKit raise on the missing file.

workflow.py: move phase-1 setup (validate/job-dir/logging) inside run()'s
try so the *_encoded temp file written by encode_ids is cleaned up when a
later setup step raises; cleanup now uses is_file() so the Path() default
(the cwd) is never unlinked. Validate the input suffix before encoding so an
unsupported extension raises FileFormatError instead of a generic ValueError
and leaves no orphaned encoded file.

Adds regression tests for chunk isolation, pre-encode format rejection, and
encoded-file cleanup on setup failure (all three fail without the fix).
… .smi rows, accept aimnet registry names

config.py: promote input_format to a declared Auto3DOptions field. It was
set as a dynamic attribute via __setitem__, which dataclasses.replace()
silently dropped -- a latent AttributeError for any consumer reading it off
a replace()'d copy. Now it survives replace()/pickling and is part of the
dict-like API.

utils/validation.py:
- check_smi_format: tolerate ragged rows. The line was unpacked with
  'smiles, id = line.split()', which crashed with 'too many values to
  unpack' on any line with a trailing column even though the chunk loader
  deliberately reads only the first two columns (usecols=[0,1]). Now take
  the first two tokens and raise the structured InputValidationError (not a
  raw ValueError) when a line lacks an ID. Also fix the SMILES count, which
  counted blank lines.
- check_input: add an else-branch raising FileFormatError for an unexpected
  input_format instead of falling through to an UnboundLocalError.
- check_valid_configuration: accept aimnet registry names (aimnet2-2025,
  aimnet2-nse, ...), which model_factory and the CLI schema already accept;
  the validator was wrongly rejecting them as unknown engines.

Updates two tests that asserted the old buggy contract (extra columns must
raise; missing-ID raised ValueError) to the corrected behavior.
…mbed against bad SMILES

isomers/parallel_embed.py:
- Broaden the per-future except from (ValueError, RuntimeError, KeyError,
  PicklingError) to Exception. RDKit raises Boost.Python.ArgumentError (a
  TypeError), which escaped the narrow except, propagated out of the
  generator, and aborted the whole batch -- silently dropping every
  not-yet-completed molecule (a nondeterministic set under as_completed).
  This is a per-molecule boundary, so the wider except cannot mask unrelated
  bugs.
- Iterate futures in submission order instead of as_completed so the emitted
  molecule order is deterministic and matches the serial path; all futures
  are already running concurrently, so no parallelism is lost.

isomer_engine.py: embed_conformer now returns None on an unparseable SMILES
(instead of crashing on AddHs(None)) and _run_serial_embedding skips+warns,
mirroring the parallel worker so a single bad SMILES no longer truncates the
serial run.

Adds regression tests for input-order preservation and the serial None-guard
(all fail without the fix). The broadened except is exercised by the existing
mixed valid/invalid test; direct subprocess fault injection is omitted as too
fragile across mp start methods.
…ral centers and aborting enumeration

utils/stereochemistry.py:
- get_stereo_info: exclude multi-letter stereo-class descriptors (@sp, @th,
  @oh, @tb, @al) from the @ match via a negative lookahead. The old regex
  treated the '@' of @SP1/@th1 as a bare tetrahedral '@', and create_enantiomer
  then spliced in a second '@' to make the unparseable token @@sp1 -- which
  RDKit rejects, silently aborting the molecule's whole stereo enumeration.
  Such tokens (square-planar/octahedral/etc. metal centers) are now left
  intact; plain tetrahedral @/@@ inversion is unchanged.
- no_enantiomer: skip comparisons between stereo-info lists of different
  lengths instead of calling no_enantiomer_helper (which raises on a length
  mismatch). A differing marker count means a different set of stereocenters,
  so the two cannot be enantiomers; the previous ValueError bubbled up and
  made amend_configuration abandon the molecule's enumeration. no_enantiomer_helper
  keeps its strict (and separately tested) contract.

Adds regression tests for non-tetrahedral token preservation and the
length-mismatch path (both fail without the fix).
logging_config.py attached the root Auto3D logger's StreamHandler to stdout.
The workflow emits INFO lines (Output path, energy unit, timing) through
child loggers during every run, so 'auto3d run --json' produced stdout that
interleaved those log lines with the JSON document and was not parseable.
Diagnostics belong on stderr regardless of --json; move the handler there.

Adds a regression test asserting the handler targets stderr.
… validate on missing file

cli/commands/models.py: 'auto3d models info aimnet2' (the canonical default
engine name) reported 'Unknown engine' because ENGINE_INFO keys it as AIMNET;
map AIMNET2 -> AIMNET.

cli/commands/config.py: 'config validate' on a missing file fell through to
'Check YAML syntax and field values', a misleading hint for a not-found file;
add an existence check mirroring 'config show'.

tautomer.py: select_tautomers(k=0) silently dropped every tautomer
(out_mols0[:0]); reject tauto_k < 1. Also use splitext for the output basename
so an input like 'mol.v2.sdf' is not truncated to 'mol' (collision risk).

Note: the '@' separator in select_tautomers/count_from_output is the real
tautomer-grouping separator (id@tautN), not stale -- left intact (see
test_select_tautomers_groups_by_id).

Adds regression tests for all three (each fails without the fix).
…ometry re-read

torch_config.py: the deterministic flags were only ever set to True, so a
process that enabled a reproducible run could never restore fast mode. Set
them unconditionally from config.deterministic, and pass warn_only=True to
torch.use_deterministic_algorithms so AIMNet2/ANI scatter and masked
index-put ops -- which have no deterministic CUDA kernel -- warn instead of
raising and aborting the optimization loop the option is meant to make
reproducible.

ASE/geometry.py: the post-optimization eV->Hartree re-read called
mol.GetProp('E_tot') with no guard, so a None record (failed re-parse) or one
missing E_tot raised and discarded the entire already-completed optimization.
Skip such records. Also use splitext for the output stem so 'batch.v2.sdf' is
not truncated to 'batch'.

Adds regression tests for both (both fail without the fix).
…lowing it

The broad per-future 'except Exception' (added to catch RDKit's
Boost.Python.ArgumentError) also caught concurrent.futures BrokenProcessPool.
If a worker is killed (e.g. OOM on a large embedding) the pool breaks and
every remaining future.result() raises BrokenProcessPool, each logged as a
per-molecule warning -- silently dropping the whole tail of the batch with no
failure signal. Catch BrokenProcessPool separately and re-raise so pool-level
failures surface loudly; the broad except still isolates per-molecule errors.

Adds a fork-guarded regression test that kills a worker and asserts the pool
break propagates (fails without the fix: both molecules swallowed as warnings).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant