Add HLA-HD v1.7.1 module and BAM-input subworkflow#241
Open
Add HLA-HD v1.7.1 module and BAM-input subworkflow#241
Conversation
Module runs HLA-HD for HLA typing from paired-end FASTQ input. Container-only (not available on conda/bioconda). Private container built from JFrog-hosted binary. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Stub test and real test using HLA-region FASTQ from test-datasets hlahd branch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Composes samtools/view, gatk4/revertsam (optional), samtools/fastq, and hlahd modules into a BAM-to-HLA-typing pipeline. Tests cover both skip_revert_sam paths plus stub test. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Also add the nf-test snapshot file that was missing from prior commits. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…pshots
- Fix output globs: results are at <prefix>/result/, not <prefix>/
- result: ${prefix}/result/${prefix}_final.result.txt
- result_per_locus: ${prefix}/result/${prefix}_*.est.txt
- Switch container URL to dev registry while awaiting next prod release
- Add nextflow.config for subworkflow tests (ext.prefix per process to
avoid GATK4_REVERTSAM input/output name collision)
- Regenerate all snapshots against new HLA class I test data that
produces actual allele calls (A*01:01:01, B*08:01:01, C*07:01:01)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add docker/login-action step to authenticate with mskcc.jfrog.io before running tests. Login is conditional on docker profile and credentials being present, so conda/singularity profiles are unaffected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Switch from default HLA_gene.split.txt symlink to the explicit 3.50.0 dictionary release file. Refresh module and subworkflow snapshots to account for the additional loci rows (E, G, H, J, K, L, V) emitted by the newer split; class I calls are unchanged.
…onventions
The hlahd_from_bam subworkflow imports three nf-core modules
(samtools/view, gatk4/revertsam, samtools/fastq) that live under
.gitignored modules/nf-core/, so CI checkouts cannot resolve the
includes. Add a tiny bash + yq + git sparse-checkout installer that
reads components: from each subworkflow meta.yml and fetches foreign
components into modules/<org_path>/<name>/ before nf-test runs. No
nf-core/tools dependency, no modules.json.
Also modernize the subworkflow against current nf-core/modules
conventions: SAMTOOLS_VIEW now takes 5 inputs (added bed channel);
samtools/{view,fastq} and gatk4/revertsam emit versions via topic
channels rather than emit: versions, so drop the corresponding
ch_versions.mix() lines. HLAHD itself still uses classic emit, so
its mix line stays.
meta.yml components: upgraded to the dict shape (name + git_remote +
org_path) so the installer can resolve the foreign three; hlahd
stays bare-string for local resolution.
Snapshot regeneration is intentionally deferred -- the new run
produces 1 versions.yml hash per test (HLAHD only) where the old
snap had 3 or 4. Will be updated in a follow-up commit using the
hashes CI reports.
Re-recorded all 3 tests against the modernized subworkflow (5-arg
SAMTOOLS_VIEW; topic-versions for samtools/{view,fastq} and
gatk4/revertsam):
- with revert sam: result.txt md5 unchanged (e51e94f4...) -- HLA
calls byte-identical to prior typing. versions: 4 hashes -> 1
(HLAHD only).
- stub: versions: 4 hashes -> 1.
- skip revert sam: result.txt md5 changed (e51e94f4 -> f2b54c8b),
versions: 3 hashes -> 1. The new md5 is "all Not typed" output --
the previous snap matched with-revert-sam by coincidence and
masked a known issue: when GATK4_REVERTSAM is bypassed, samtools
fastq runs on a coord-sorted BAM and emits singletons, so HLAHD
cannot type. Tracked as a follow-up in the project README; not a
blocker for #241.
Local validation: nf-test 0.9.4, nextflow 25.10.4, docker profile,
public docker.io/orgeraj/hlahd:1.7.1 stand-in (HLAHD binary is
identical to the JFrog image; container URL in modules/msk/hlahd
unchanged).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add HLA-HD v1.7.1 module and BAM-input subworkflow
Summary
modules/msk/hlahd— nf-core-style module for HLA-HD v1.7.1 (high-resolution HLA typing from paired FASTQ)subworkflows/msk/hlahd_from_bam— end-to-end BAM-to-HLA-typing workflowtests/config/test_data.config(data onhlahdbranch of test-datasets repo)Module:
modules/msk/hlahdmskcc.jfrog.io/omicswf-docker-dev-local/mskcc-omics-workflows/hlahd:1.7.1[ meta, fastq_1, fastq_2 ]result(final allele calls),result_per_locus(per-gene.est.txtfiles),versionsext.args2(default: 100)Subworkflow:
subworkflows/msk/hlahd_from_bamChains four modules to go from coordinate-sorted BAM to HLA allele calls:
The
skip_revert_samparameter controls whether GATK4 RevertSam runs. Set totruewhen the input BAM has no BQSR applied.Add HLA-HD v1.7.1 module and BAM-input subworkflow
subworkflows/msk/hlahd_from_bam— end-to-end BAM-to-HLA-typing workflowModule:
modules/msk/hlahdmskcc.jfrog.io/omicswf-docker-dev-local/mskcc-omics-workflows/hlahd:1.7.1[ meta, fastq_1, fastq_2 ]ext.args2(default: 100)Subworkflow:
subworkflows/msk/hlahd_from_bamChains four modules to go from coordinate-sorted BAM to HLA allele calls:
Test data
6:29910247-299136616:31321649-313249896:31236526-31239913~21k reads, ~3.3MB across 4 files (BAM + BAI + paired FASTQ).
Example output (
test_sample_final.result.txt)Class II loci are "Not typed" as expected — only class I regions are included in the test data.
Tests
All 5 nf-test tests pass with deterministic snapshot matching:
Module tests (2):
hlahd - fastq pair - result txt— real HLA-HD run, verifies final result md5hlahd - fastq pair - stub— stub run, verifies versions outputSubworkflow tests (3):
hlahd_from_bam - bam - with revert sam - result— full pipeline with GATK4 RevertSamhlahd_from_bam - bam - skip revert sam - result— pipeline skipping RevertSamhlahd_from_bam - bam - stub— stub runBoth revert/skip-revert paths produce identical final calls (md5:
6f83fc8ac5bd3b9f56853b583595e2a0).Checklist
hlahdbranch in test-datasets repometa.ymlcomplete for both module and subworkflow