Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,58 @@ nextflow run KarchinLab/TCRtoolkit \
--input_format adaptive \
--max_memory 10GB --max_cpus 4
```
## Input Formats

`TCRtoolkit` accepts three input formats, specified via `--input_format`:

| Format | Description |
|---|---|
| `adaptive` | Adaptive Biotechnologies output files |
| `cellranger` | 10x Genomics CellRanger 'airr_rearrangement.tsv' output files (single-cell pseudo-bulk) |
| `airr` | AIRR-compliant tab-separated files |

## Workflow Levels

The pipeline supports multiple levels of analysis, controlled by `--workflow_level`:

| Level | Description |
|---|---|
| `sample` | Per-sample QC and repertoire statistics |
| `patient` | Patient-level clonotype aggregation and comparison |
| `compare` | Cross-cohort repertoire comparison and overlap |

Levels can be combined: `--workflow_level sample,patient,compare`

## HTML Reports

After the pipeline finishes, `TCRtoolkit` generates interactive HTML reports using [Quarto](https://quarto.org/). Four main report notebooks are rendered automatically:

| Notebook | Description |
|---|---|
| `template_qc.qmd` | Quality control metrics and filtering summary |
| `template_discovery_brief.qmd` | Repertoire discovery most relevant information |
| `template_details_part1.qmd` | Detailed repertoire analysis, part 1 |
| `template_details_part2.qmd` | Detailed repertoire analysis, part 2 |

### Conditional Report Sections

Certain sub-reports are automatically appended based on input and workflow options:

- `--input_format cellranger` → includes single-cell phenotype report
- `--input_format adaptive` → includes bulk phenotype report
- `--workflow_level sample,patient,compare` (Patient workflow enabled) → includes patient-level clonotype analysis
Comment on lines +79 to +80
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Conditional Report Sections” bullets are too narrow compared to the described behavior: (1) bulk phenotype should apply to --input_format adaptive and airr (per PR description), and (2) patient-level sections should trigger whenever --workflow_level includes patient (not only the exact string sample,patient,compare). Updating these bullets will prevent users from misconfiguring reports based on the README.

Suggested change
- `--input_format adaptive` → includes bulk phenotype report
- `--workflow_level sample,patient,compare` (Patient workflow enabled) → includes patient-level clonotype analysis
- `--input_format adaptive` or `airr` → includes bulk phenotype report
- `--workflow_level` includes `patient` → includes patient-level clonotype analysis

Copilot uses AI. Check for mistakes.
- `--use_gliph2` → additionally includes GLIPH2 clustering report

## Key Parameters

| Parameter | Default | Description |
|---|---|---|
| `--samplesheet` | — | Path or URL to sample sheet CSV |
| `--outdir` | `out` | Output directory |
| `--input_format` | `airr` | Input format: `airr`, `adaptive`, or `cellranger` |
| `--workflow_level` | `sample,compare` | Analysis level(s): `sample`, `patient`, `compare` |
| `--use_gliph2` | `false` | Enable GLIPH2 CDR3 motif clustering |
| `--sobject_gex` | — | Path to TSV file containing cell-barcode phenotypes for pseudo-bulk phenotyping |
| `--max_memory` | `768.GB` | Maximum memory allocation |
| `--max_cpus` | `192` | Maximum CPU allocation |

3 changes: 2 additions & 1 deletion env.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ dependencies:
- numpy=1.25.2
- scipy=1.11.3
- seaborn=0.13.0
- dash=2.14.1
- dash>=2.15.0
- matplotlib=3.8.1
- pip=23.2.1
- jupyterlab=4.0.8
Expand All @@ -26,6 +26,7 @@ dependencies:
- rpy2=3.6.4
- unzip
- openjdk=8
- upsetplot=0.9.0

# R and R packages
- r-base=4.4.2
Expand Down
10 changes: 5 additions & 5 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,11 @@ params {
sample_stats_template = "${projectDir}/notebooks/sample_stats_template.qmd"
compare_stats_template = "${projectDir}/notebooks/compare_stats_template.qmd"

// Sample stats metadata parameters
samplechart_x_col = 'timepoint'
samplechart_color_col = 'origin'
vgene_subject_col = 'subject_id'
vgene_x_cols = 'origin,timepoint'
// Notebooks parameters
timepoint_col = 'timepoint'
timepoint_order_col = 'timepoint_order'
alias_col = 'alias'
subject_col = 'subject_id'

Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change removes samplechart_x_col, samplechart_color_col, vgene_subject_col, and vgene_x_cols, but modules/local/sample/sample_plot.nf still passes these params into quarto render for notebooks/sample_stats_template.qmd. As-is, sample report rendering will break (missing params / null values). Either keep the existing params for backward compatibility or update the module + sample_stats_template.qmd to use the new parameter names.

Suggested change
// Backward-compatible aliases for legacy sample report params
samplechart_x_col = timepoint_col
samplechart_color_col = alias_col
vgene_subject_col = subject_col
vgene_x_cols = [timepoint_col, alias_col]

Copilot uses AI. Check for mistakes.
// OLGA parameters
olga_chunk_length = 100000 // larger chunk size = less parallelization
Expand Down
190 changes: 0 additions & 190 deletions notebooks/compare_stats_template.qmd

This file was deleted.

56 changes: 0 additions & 56 deletions notebooks/gliph2_report_template.qmd

This file was deleted.

Loading
Loading