feat: add Parsl-based parallel build orchestration#17
Conversation
Implementation PlanProblemBuild orchestration requires Nextflow ( ApproachUse Extend
DeliverablesNew module:
|
| File | Purpose |
|---|---|
__init__.py |
Public API: ExecutorConfig, SlurmExecutorConfig, build_systems, build_systems_dry_run |
config.py |
Pydantic models for executor configuration → parsl.Config objects |
apps.py |
@python_app definition: build_system_app(build_input_dict) wrapping run_build_from_dict |
build.py |
build_systems() — loads Parsl, submits parallel builds, returns futures. build_systems_dry_run() — prints what would run without submitting |
New test files
| File | Coverage |
|---|---|
mdfactory/tests/test_orchestration_config.py |
Config model construction, validation, to_parsl_config() with mocked Parsl |
mdfactory/tests/test_orchestration_build.py |
Build dispatch with mocked Parsl, dry-run output, CSV/YAML/directory input handling, error paths |
Modified files
| File | Change |
|---|---|
pyproject.toml |
Add parsl = ["parsl>=2024.1.1"] to [project.optional-dependencies] |
mdfactory/cli.py |
Extend build command: accept CSV input, add --config (executor YAML) and --dry-run flags. Reuse prepare.py internals for CSV→BuildInput conversion |
Design Details
config.py — Executor configuration:
class ExecutorConfig(BaseModel):
"""Base executor configuration."""
provider: Literal["local", "slurm"] = "local"
worker_init: str = "" # module loads, conda/pixi activation
working_directory: Path | None = None
class SlurmExecutorConfig(ExecutorConfig):
"""SLURM-specific executor configuration."""
provider: Literal["slurm"] = "slurm"
account: str # required for SLURM
partition: str = "gpu"
walltime: str = "2:00:00"
nodes_per_block: int = 1
cores_per_node: int = 12
gpus_per_node: int = 1
scheduler_options: str = "" # extra #SBATCH lines
def to_parsl_config(self) -> "parsl.Config":
"""Build a Parsl Config with HighThroughputExecutor + SlurmProvider."""
...Follows the SlurmConfig dataclass pattern from analysis/submit.py.
Config can be loaded from a YAML file for reuse across campaigns.
apps.py — Parsl app:
@python_app
def build_system_app(build_input_dict: dict):
"""Run a single system build inside a Parsl worker.
All heavy imports (OpenMM, MDAnalysis, OpenFF) happen inside the
worker process — only the plain dict crosses the serialization boundary.
"""
from mdfactory.workflows import run_build_from_dict
return run_build_from_dict(build_input_dict)build.py — Orchestration:
build_systems(build_inputs, config, *, wait=True)— accepts list ofBuildInputor dicts, calls.model_dump()on each, submits viabuild_system_app, optionally waits for all futuresbuild_systems_dry_run(build_inputs, config)— prints each system's hash + simulation type + config summary without loading Parsl- Conditional
import parslinside function bodies (same pattern assubmititinsubmit.py)
CLI changes (cli.py):
The existing build command currently accepts a single YAML path. Extend it to:
mdfactory build <input> # single YAML → local build (unchanged)
mdfactory build <input.csv> # CSV → sequential local builds
mdfactory build <input.csv> --config e.yaml # CSV → parallel Parsl builds
mdfactory build <input.csv> --dry-run # CSV → print what would run
mdfactory build <summary.yaml> --config ... # summary YAML → parallel builds
Input type detection by file extension (.csv vs .yaml/.yml). For CSV input, reuse df_to_build_input_models() from prepare.py and directory creation logic from prepare_build.
Testing Approach
test_orchestration_config.py:
ExecutorConfigandSlurmExecutorConfigconstruction with defaults- Validation: missing
accountfor SLURM raises error to_parsl_config()with Parsl mocked viamonkeypatch.setitem(sys.modules, "parsl", ...)— verify correct provider/executor types- Config round-trip: write YAML → load → compare
worker_initstring propagation into config
test_orchestration_build.py:
- Mock Parsl with
DummyDataFlowKernelcapturing submitted app calls build_systems()submits correct number of tasks- Each submitted task receives the correct
build_input_dict build_systems_dry_run()returns descriptions without loading Parsl- CSV input: creates directories + dispatches builds
- Error handling: missing files, invalid BuildInput, failed futures
- Import guard:
mdfactory.orchestrationimportable without parsl installed
Acceptance Criteria
-
pip install 'mdfactory[parsl]'installs parsl - Importing
mdfactorywithout parsl installed does not error -
ExecutorConfig/SlurmExecutorConfigproduce valid Parsl Config objects for local and SLURM providers -
build_systems()submits parallel builds via@python_appwrappingrun_build_from_dict -
build_systems_dry_run()prints resolved build descriptions without submitting -
mdfactory build input.csv --config executor.yamlreads CSV, creates directories, dispatches parallel builds -
mdfactory build input.csv --dry-runprints what would be built without executing -
mdfactory build single.yamlstill works unchanged (backward compatible) - All new code has NumPy-style docstrings, passes
ruff check+ruff format - Tests pass in CI without parsl installed (mocked)
Risks
- Worker environment —
module load+ pixi activation may behave differently inside Parsl workers. First HPC test will reveal this;worker_initconfig provides the escape hatch. - Parsl version stability — Config API occasionally changes between major versions. Pin
>=2024.1.1. - Shared filesystem assumption — design assumes build outputs land on shared FS (standard for HPC). No file staging.
prepare-buildoverlap — the CSV handling inbuildpartially duplicatesprepare-build. We should refactor shared logic intoprepare.pyhelpers but keepprepare-buildas a standalone command for users who want the two-step flow.
Plan created by mach6
- New mdfactory/orchestration/ module (config, apps, build) - ExecutorConfig/SlurmExecutorConfig with SLURM-native field names - build_systems() dispatches parallel builds via @python_app - build_systems_dry_run() previews without loading Parsl - Rich live progress display with activity log - Ctrl+C triggers parsl.clear() to cancel SLURM jobs - Extended CLI: mdfactory build accepts CSV/YAML with --config/--dry-run - Added parsl to optional-dependencies and pixi dev env - 19 tests covering config, build dispatch, and error handling
Progress UpdateImplementation completeNew module
CLI extension (
Tests (19 passing):
Infrastructure:
Commit: Progress tracked by mach6 |
- _shutdown_parsl() extracts SLURM job IDs from DFK and runs scancel - Ctrl+C now reliably cancels SLURM jobs instead of relying on parsl.clear() - Error messages no longer truncated in activity log - Increased activity log to 12 entries
When species.resname is not 'SOL' (e.g. 'WAT'), Interchange would generate atom types with the wrong prefix (WAT_0 instead of SOL_0), causing a KeyError in OpenMM's GromacsTopFile parser. Force molecule.name='SOL' before Interchange export so atom types are always consistent (SOL_0, SOL_1, SOL_2). Also invalidate stale SOL_params.itp when the ITP is regenerated.
Progress UpdateBug fix: Force Problem: When Fix: Override Commit: Progress tracked by mach6 |
Code ReviewCriticalNo critical findings. ImportantFinding 1 — Parsl DFK leaks on exceptions after Finding 2 — Top-level parsl imports break the optional-dependency pattern (confidence: 90) Finding 3 — Finding 4 — CLI Finding 5 — SuggestionsFinding 6 — Finding 7 — Inconsistent output directory for single-YAML builds (confidence: 85) Finding 8 — Test files fail collection without parsl installed (confidence: 92) Finding 9 — SLURM cleanup and KeyboardInterrupt path never exercised (confidence: 90) Finding 10 — Finding 11 — Finding 12 — Finding 13 — Strengths
Agents run: code-reviewer, error-auditor, test-reviewer, completeness-checker Reviewed by mach6 |
Review AssessmentClassifications
Action PlanMust fix before merge:
Should ship with this PR:
Can follow up:
Assessment by mach6 |
- Move parsl imports inside functions (optional-dependency pattern)
- Wrap build_systems() in try/finally for DFK cleanup on exceptions
- Split _shutdown_parsl() into independent blocks, log at WARNING
- Normalize single-YAML output directory to output/{hash}/
- Add pytest.importorskip guard to orchestration test files
- Add tests: _build_system_impl, SLURM cleanup, KeyboardInterrupt,
dict input path, CLI integration (9 tests), parametrize regression
- Add PLC0415 to ruff ignore (lazy imports intentional throughout)
Progress UpdateReview findings addressed (findings 1–10, 12)Source fixes:
Test additions (18 new tests):
Infrastructure:
All 37 new/modified tests pass. Commit: Progress tracked by mach6 |
Code Review (Re-review after fixes)CriticalNo critical findings. ImportantFinding 1 — Double Finding 2 — Finding 3 — Finding 4 — Empty/null executor config YAML crashes with opaque Finding 5 — Finding 6 — SuggestionsFinding 7 — Empty single-build YAML produces opaque Finding 8 — Finding 9 — Finding 10 — Finding 11 — Strengths
Agents run: code-reviewer, error-auditor, test-reviewer, completeness-checker Reviewed by mach6 |
Review AssessmentClassifications
Action PlanMust fix (correctness bugs):
Should ship (test gaps for new code):
Optional:
Assessment by mach6 |
Progress UpdateRe-review findings addressed (findings 1–11)Source fixes:
Test additions (10 new tests, 46 total):
All 46 orchestration tests pass. Ruff check + format clean. Commit: Progress tracked by mach6 |
Progress UpdateAPI simplification: unified
|
Progress UpdateSummary YAML generation + dry-run API simplificationSummary YAML from CSV builds: n_systems: 10
input: /path/to/input.csv
output: /path/to/output
hash: [hash1, hash2, ...]
simulation_type: [mixedbox, bilayer, ...]
system_directory: [/path/to/output/hash1, ...]
date: 2026-06-08 ...This means Dry-run API simplified:
48 tests pass. Commits: Progress tracked by mach6 |
Closes #16 (PR 1 of 4)
Add Parsl as a Python-native workflow manager for parallel build orchestration on SLURM GPU nodes. Extends
mdfactory buildto accept CSV files directly and dispatch builds in parallel via Parsl's@python_app+HighThroughputExecutor.Implementation plan posted as a comment below.