Add PhenoGraph GPU refactor: new transform module, CPU/GPU templates,…#429
Open
fangliu117 wants to merge 5 commits intodevfrom
Open
Add PhenoGraph GPU refactor: new transform module, CPU/GPU templates,…#429fangliu117 wants to merge 5 commits intodevfrom
fangliu117 wants to merge 5 commits intodevfrom
Conversation
… Galaxy tools, Dockerfiles, and GPU orchestrator
- Dockerfile (CPU v0.9.2): switch from COPY to curl+git install for CCBR/Dockers2 build context compatibility, add UMAP verification test - Dockerfile.gpu (GPU v1.0.0): switch from COPY to git-based install, add matplotlib to preprocess env, add orchestration script verification - phenograph_clustering_gpu.xml: use run_gpu.sh orchestrator instead of direct run_from_json (required for RAPIDS 22.08 two-env architecture) - run_gpu.sh: fix comment typo (.csv -> .npy)
- umap_transformation.xml: remove Run_on_HPC, Batch_Mode, Partition, Number_of_CPUs, Memory_GB, Request_Time params and HPC Configuration section; drop --bool-values for removed HPC booleans; bump to v2.1.0 - umap_transformation_template.py: update HPC comment to note params were removed in v2.1.0
Refer to the PR history on CCBR/Dockers2 GitHub repo: - CPU: https://github.com/CCBR/Dockers2/blob/dev/spac/spac/Dockerfile.v0.9.2 - GPU: https://github.com/CCBR/Dockers2/blob/dev/spac/spac-gpu/Dockerfile.v1.0.0
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
… Galaxy tools, Dockerfiles, and GPU orchestrator
Adds GPU-accelerated PhenoGraph clustering (grapheno/RAPIDS backend) alongside a standalone CPU path, organized under a new spac.transform.clustering.phenograph module tree. The legacy spac.transformations.phenograph_clustering and phenograph_clustering.xml Galaxy tool are untouched.
What's new
New module: phenograph
preprocess.py — shared prepare_features() used by both CPU and GPU paths (identical inputs guaranteed)
cpu.py — standalone CPU PhenoGraph via the phenograph Python package (independent of legacy path)
gpu/grapheno.py — GPU PhenoGraph ported from grapheno_dmap (RAPIDS cuml + cugraph), with numpy-array and AnnData entry points, and cugraph 22.08/24.x API compatibility
New Galaxy templates
phenograph_clustering_cpu_template.py
phenograph_clustering_gpu_template.py (supports profiling mode: multi-resolution sweep reusing KNN+Jaccard)
New Galaxy tools
phenograph_clustering_cpu.xml — CPU tool using nciccbr/spac:v0.9.2
phenograph_clustering_gpu.xml — GPU tool using nciccbr/spac-gpu:v1.0.0
Docker
Dockerfile — updated CPU Dockerfile (v0.9.2), adds GIT_BRANCH/GIT_COMMIT build args and import tests for the new module
Dockerfile.gpu — new GPU Dockerfile on RAPIDS 22.08, two-conda-env pattern (preprocess + rapids)
spac_gpu — three-stage GPU orchestrator (run_gpu.sh, run_grapheno.py, merge_labels.py)
Design decisions
Two independent Galaxy tools, two containers — no HPC toggle, no Run_on_HPC parameter
CPU path is standalone, not a wrapper around legacy code
Shared preprocessing in preprocess.py — both paths get identical feature matrices
RAPIDS 22.08 base for GPU image — matches Biowulf/Foundry production for reproducibility
Branch-aware Docker builds — COPY . . + pip install -e . means the checked-out branch is what runs in the image
What's unchanged
transformations.py (legacy phenograph_clustering)
phenograph_clustering_template.py (legacy template)
phenograph_clustering.xml (legacy Galaxy tool — deprecate in v3.1)