Skip to content

Harden core pipeline: input-robustness and correctness fixes (ranking, file IO, isomer/SPE)#92

Merged
isayev merged 7 commits into
mainfrom
hardening/module-audit
Jun 11, 2026
Merged

Harden core pipeline: input-robustness and correctness fixes (ranking, file IO, isomer/SPE)#92
isayev merged 7 commits into
mainfrom
hardening/module-audit

Conversation

@isayev

@isayev isayev commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

A systematic per-module hardening pass over the core conformer-generation pipeline. The recurring root cause addressed: newer functions had input guards (None/blank/duplicate) that their older siblings lacked, so a single bad record — a None molecule, a blank line, a missing property, or a salt/metal atom — could crash an entire run or silently drop data. 16 confirmed bugs fixed across 6 modules, each with a regression test.

Ranking & filtering

  • top_k with k=1 (the default --k=1) now validates connectivity instead of emitting a structurally-broken lowest-energy conformer.
  • run() no longer aborts the whole ranking when one record lacks the Converged property.
  • check_connectivity is salt/metal-safe (previously KeyError on elements outside its radii table).
  • Legacy filter_unique treats an RMSD-comparison failure as "distinct" (keep) rather than "duplicate" (drop), matching the optimized filter.

File IO (file_ops, chunk_manager)

  • None-molecule guards in decode_ids, reorder_sdf, find_smiles_not_in_sdf, count_sdf.
  • reorder_sdf is now crash-safe (temp file + atomic replace) and data-preserving — this also closes a real smiles2mols bug where duplicate input SMILES silently dropped a conformer.
  • SDF2chunks keeps a terminator-less trailing molecule; smiles2smi raises an actionable error on invalid SMILES.
  • Blank-line, ragged-.smi, and degenerate-chunk_size guards.

Isomer enumeration & single-point energy (isomer_engine, SPE)

  • One invalid SMILES no longer aborts the whole batch (guards at four sites).
  • Stereoisomer enumeration no longer silently truncates at 1024 isomers (explicit cap + truncation warning).
  • Clash filter: closed the boundary dead-band and added an MMFF→UFF fallback (rescues B/Se-containing molecules).
  • Conformer-count budget unified across the SMILES and SDF paths on the with-H representation (restores richer hydroxyl/amine sampling) and floored at 1.
  • SPE filters None/conformerless records and fixes an output index-misalignment; actionable error for unsupported ANI2xt elements.

Test plan

  • Full fast suite: 595 passed, 45 deselected (was 563 on main; +32 regression tests, all fail-before/pass-after).
  • ruff check src/ --select F401,F841,UP007 clean.
  • Each fix independently reviewed (chemistry/correctness) before commit.

Notes

  • Branches from main (post-3.5.0). No behavior change on the normal pipeline path beyond the documented correctness fixes; reorder_sdf's normal (all-matched) ordering is byte-identical.
  • Separate, pre-existing issue (not in this PR): the slow-gated test_calc_spe_userNNP2 fails because torch.jit.script cannot compile the aimnet AEV module when wrapping a torchani-based custom NNP — worth a dedicated follow-up.
  • Remaining audited-but-unfixed items queued for follow-up: stereochemistry public-API SMILES surgery on explicit stereo tags, ANI2xt fp64 energy-shift precision, and pad_from_mols host→device transfer batching.

@isayev isayev merged commit be75413 into main Jun 11, 2026
2 of 4 checks passed
@isayev isayev deleted the hardening/module-audit branch June 11, 2026 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant