Harden core pipeline: input-robustness and correctness fixes (ranking, file IO, isomer/SPE)#92
Merged
Merged
Conversation
… salt-safe connectivity, RMSD-fail keeps distinct
…es, ragged smi, and data loss in reorder_sdf
…lent isomer/conformer loss, unify conformer count, align SPE indices
…icher polyol sampling)
…ng/reorder_sdf naming and docs
…ead mols2lists, tidy connectivity docs/RMSD consistency
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A systematic per-module hardening pass over the core conformer-generation pipeline. The recurring root cause addressed: newer functions had input guards (None/blank/duplicate) that their older siblings lacked, so a single bad record — a
Nonemolecule, a blank line, a missing property, or a salt/metal atom — could crash an entire run or silently drop data. 16 confirmed bugs fixed across 6 modules, each with a regression test.Ranking & filtering
top_kwithk=1(the default--k=1) now validates connectivity instead of emitting a structurally-broken lowest-energy conformer.run()no longer aborts the whole ranking when one record lacks theConvergedproperty.check_connectivityis salt/metal-safe (previouslyKeyErroron elements outside its radii table).filter_uniquetreats an RMSD-comparison failure as "distinct" (keep) rather than "duplicate" (drop), matching the optimized filter.File IO (
file_ops,chunk_manager)None-molecule guards indecode_ids,reorder_sdf,find_smiles_not_in_sdf,count_sdf.reorder_sdfis now crash-safe (temp file + atomic replace) and data-preserving — this also closes a realsmiles2molsbug where duplicate input SMILES silently dropped a conformer.SDF2chunkskeeps a terminator-less trailing molecule;smiles2smiraises an actionable error on invalid SMILES..smi, and degenerate-chunk_sizeguards.Isomer enumeration & single-point energy (
isomer_engine,SPE)SPEfilters None/conformerless records and fixes an output index-misalignment; actionable error for unsupported ANI2xt elements.Test plan
main; +32 regression tests, all fail-before/pass-after).ruff check src/ --select F401,F841,UP007clean.Notes
main(post-3.5.0). No behavior change on the normal pipeline path beyond the documented correctness fixes;reorder_sdf's normal (all-matched) ordering is byte-identical.test_calc_spe_userNNP2fails becausetorch.jit.scriptcannot compile theaimnetAEV module when wrapping a torchani-based custom NNP — worth a dedicated follow-up.stereochemistrypublic-API SMILES surgery on explicit stereo tags, ANI2xt fp64 energy-shift precision, andpad_from_molshost→device transfer batching.