Skip to content

Add PhenoGraph GPU refactor: new transform module, CPU/GPU templates,…#429

Open
fangliu117 wants to merge 5 commits intodevfrom
feature/phenograph-gpu-refactor
Open

Add PhenoGraph GPU refactor: new transform module, CPU/GPU templates,…#429
fangliu117 wants to merge 5 commits intodevfrom
feature/phenograph-gpu-refactor

Conversation

@fangliu117
Copy link
Copy Markdown
Collaborator

… Galaxy tools, Dockerfiles, and GPU orchestrator
Adds GPU-accelerated PhenoGraph clustering (grapheno/RAPIDS backend) alongside a standalone CPU path, organized under a new spac.transform.clustering.phenograph module tree. The legacy spac.transformations.phenograph_clustering and phenograph_clustering.xml Galaxy tool are untouched.

What's new
New module: phenograph
preprocess.py — shared prepare_features() used by both CPU and GPU paths (identical inputs guaranteed)
cpu.py — standalone CPU PhenoGraph via the phenograph Python package (independent of legacy path)
gpu/grapheno.py — GPU PhenoGraph ported from grapheno_dmap (RAPIDS cuml + cugraph), with numpy-array and AnnData entry points, and cugraph 22.08/24.x API compatibility

New Galaxy templates
phenograph_clustering_cpu_template.py
phenograph_clustering_gpu_template.py (supports profiling mode: multi-resolution sweep reusing KNN+Jaccard)
New Galaxy tools
phenograph_clustering_cpu.xml — CPU tool using nciccbr/spac:v0.9.2
phenograph_clustering_gpu.xml — GPU tool using nciccbr/spac-gpu:v1.0.0

Docker
Dockerfile — updated CPU Dockerfile (v0.9.2), adds GIT_BRANCH/GIT_COMMIT build args and import tests for the new module
Dockerfile.gpu — new GPU Dockerfile on RAPIDS 22.08, two-conda-env pattern (preprocess + rapids)
spac_gpu — three-stage GPU orchestrator (run_gpu.sh, run_grapheno.py, merge_labels.py)

Design decisions
Two independent Galaxy tools, two containers — no HPC toggle, no Run_on_HPC parameter
CPU path is standalone, not a wrapper around legacy code
Shared preprocessing in preprocess.py — both paths get identical feature matrices
RAPIDS 22.08 base for GPU image — matches Biowulf/Foundry production for reproducibility
Branch-aware Docker builds — COPY . . + pip install -e . means the checked-out branch is what runs in the image

What's unchanged
transformations.py (legacy phenograph_clustering)
phenograph_clustering_template.py (legacy template)
phenograph_clustering.xml (legacy Galaxy tool — deprecate in v3.1)

… Galaxy tools, Dockerfiles, and GPU orchestrator
- Dockerfile (CPU v0.9.2): switch from COPY to curl+git install for
  CCBR/Dockers2 build context compatibility, add UMAP verification test
- Dockerfile.gpu (GPU v1.0.0): switch from COPY to git-based install,
  add matplotlib to preprocess env, add orchestration script verification
- phenograph_clustering_gpu.xml: use run_gpu.sh orchestrator instead of
  direct run_from_json (required for RAPIDS 22.08 two-env architecture)
- run_gpu.sh: fix comment typo (.csv -> .npy)
- umap_transformation.xml: remove Run_on_HPC, Batch_Mode, Partition,
  Number_of_CPUs, Memory_GB, Request_Time params and HPC Configuration
  section; drop --bool-values for removed HPC booleans; bump to v2.1.0
- umap_transformation_template.py: update HPC comment to note params
  were removed in v2.1.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant