Skip to content

Quarto notebooks#81

Open
KevinMLanderos wants to merge 6 commits into
mainfrom
quarto-notebooks
Open

Quarto notebooks#81
KevinMLanderos wants to merge 6 commits into
mainfrom
quarto-notebooks

Conversation

@KevinMLanderos
Copy link
Copy Markdown
Collaborator

Add Quarto Report Notebooks to Pipeline

This PR integrates Quarto report notebooks into the TCRtoolkit pipeline. Below are the implementation instructions.


1. New Files — notebooks/

Add the following .qmd files from /lab/home/kmlanderos/projects2/TCRtoolkit/notebooks/:

notebooks/template_qc.qmd
notebooks/template_discovery_brief.qmd
notebooks/template_details_part1.qmd
notebooks/template_details_part2.qmd
notebooks/template_gliph.qmd
notebooks/template_overlap.qmd
notebooks/template_pheno_bulk.qmd
notebooks/template_pheno_sc.qmd
notebooks/template_sample.qmd
notebooks/template_sharing.qmd
notebooks/template_giana.qmd

2. env.yml — Add Dependencies

- dash>=2.15.0
- upsetplot=0.9.0

3. Rendering — 4 Main Notebooks

After the pipeline finishes, render the 4 main notebooks using the following command (repeated for each):

quarto render <notebook.qmd> \
    -P project_name:$params.project_name \
    -P project_dir:$params.project_dir \
    -P workflow_cmd:'$workflow.commandLine' \
    -P sample_table:$sample_table \
    -P timepoint_col:${params.timepoint_col} \
    -P timepoint_order_col:${params.timepoint_order_col} \
    -P alias_col:${params.alias_col} \
    -P subject_col:${params.subject_col} \
    --to html

timepoint_col, timepoint_order_col, alias_col, and subject_col already have defaults defined in nextflow.config — no changes needed there.

The remaining notebooks (template_pheno_sc, template_pheno_bulk, template_gliph, etc.) are not rendered directly — they are embedded via Quarto's {{< include >}} directive.


4. Conditional Sub-Notebook Assembly

Before rendering, copy the relevant main notebook to a working directory and append include lines as needed. Do not modify the source templates directly.

template_discovery_brief.qmd

--input_format value Append to end of notebook copy
cellranger {{< include ./template_pheno_sc.qmd >}}
adaptive or airr {{< include ./template_pheno_bulk.qmd >}}

template_details_part2.qmd

Condition Append to end of notebook copy
--workflow_level includes patient (e.g. sample,patient,compare) {{< include ./template_giana.qmd >}}
Patient workflow enabled and --use_gliph2 is set Also append {{< include ./template_gliph.qmd >}}

Make sure the sub-notebook .qmd files are also present in the same working directory so relative {{< include >}} paths resolve correctly during rendering.

5. README.md

File updated to include patient workflow, pipeline parameters and HTML reports

… default parameters in nextflow.config file. Added dependencies to conda environment that are used in notebooks
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Integrates and refines Quarto notebook templates intended to generate HTML reports for the TCRtoolkit Nextflow pipeline, alongside config and documentation updates.

Changes:

  • Updated multiple Quarto .qmd templates (imports, narrative text, figure captions, and embedded sub-report includes).
  • Adjusted nextflow.config notebook-related params.
  • Expanded README.md with input/workflow/report documentation; renamed the conda environment in env.yml.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
notebooks/template_sharing.qmd Removes unused imports/includes; minor wording updates in sharing/publicity sections.
notebooks/template_sample.qmd Cleans up comments/text and plot annotations in the sample report template.
notebooks/template_qc.qmd Cleans up imports/text and refines rarefaction plotting logic in QC report template.
notebooks/template_pheno_sc.qmd Refactors single-cell phenotype UpSet plotting section and chunk options.
notebooks/template_pheno_bulk.qmd Simplifies bulk phenotype template by removing unused CSV inputs and updating text.
notebooks/template_overlap.qmd Removes unused visualization imports and adjusts setup imports for overlap analysis.
notebooks/template_gliph.qmd Minor text/metadata cleanup in GLIPH2 sub-notebook template.
notebooks/template_discovery_brief.qmd Tweaks HTML TOC option, removes unused path var, and changes sub-notebook inclusion behavior.
notebooks/template_details_part1.qmd Minor grammar and punctuation fixes in the details notebook intro.
notebooks/template_details_part2.qmd Minor grammar fixes and changes sub-notebook inclusion behavior.
nextflow.config Replaces previously existing sample-report params with new generic notebook params.
env.yml Renames the conda environment.
README.md Adds sections describing input formats, workflow levels, report outputs, and parameters.
Comments suppressed due to low confidence (2)

notebooks/template_gliph.qmd:34

  • This notebook imports logomaker (and igraph) in the setup cell, but logomaker is not used anywhere in the current template. Keeping the import adds an unnecessary dependency and will break rendering if logomaker isn’t installed in the runtime environment. Please remove the unused logomaker import (or add logomaker explicitly to the environment if it’s intended to be used).
from IPython.display import Image, HTML, display
import os
import datetime
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
import igraph as ig
import logomaker
import io

notebooks/template_pheno_sc.qmd:435

  • This section imports upsetplot (from upsetplot import ...), but the PR’s env.yml diff does not add upsetplot. If the runtime environment doesn’t already include it, Quarto rendering will fail with ModuleNotFoundError. Please ensure upsetplot is added to the environment (and consider removing the duplicate import matplotlib.pyplot as plt in this block to keep imports clean).
import matplotlib
import matplotlib.pyplot as plt
from upsetplot import from_contents, UpSet
import warnings


from IPython.display import display, Markdown
import matplotlib.pyplot as plt
import sys

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md
Comment on lines +79 to +80
- `--input_format adaptive` → includes bulk phenotype report
- `--workflow_level sample,patient,compare` (Patient workflow enabled) → includes patient-level clonotype analysis
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Conditional Report Sections” bullets are too narrow compared to the described behavior: (1) bulk phenotype should apply to --input_format adaptive and airr (per PR description), and (2) patient-level sections should trigger whenever --workflow_level includes patient (not only the exact string sample,patient,compare). Updating these bullets will prevent users from misconfiguring reports based on the README.

Suggested change
- `--input_format adaptive` → includes bulk phenotype report
- `--workflow_level sample,patient,compare` (Patient workflow enabled) → includes patient-level clonotype analysis
- `--input_format adaptive` or `airr` → includes bulk phenotype report
- `--workflow_level` includes `patient` → includes patient-level clonotype analysis

Copilot uses AI. Check for mistakes.
Comment thread nextflow.config
timepoint_order_col = 'timepoint_order'
alias_col = 'alias'
subject_col = 'subject_id'

Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change removes samplechart_x_col, samplechart_color_col, vgene_subject_col, and vgene_x_cols, but modules/local/sample/sample_plot.nf still passes these params into quarto render for notebooks/sample_stats_template.qmd. As-is, sample report rendering will break (missing params / null values). Either keep the existing params for backward compatibility or update the module + sample_stats_template.qmd to use the new parameter names.

Suggested change
// Backward-compatible aliases for legacy sample report params
samplechart_x_col = timepoint_col
samplechart_color_col = alias_col
vgene_subject_col = subject_col
vgene_x_cols = [timepoint_col, alias_col]

Copilot uses AI. Check for mistakes.
Comment on lines +32 to 40
import igraph as ig
import logomaker

import base64
import datetime
import glob
import h5py
import holoviews as hv
import igraph as ig
import io
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the setup cell, igraph is imported twice (import igraph as ig at lines 32 and 39). Also logomaker is imported but (a) isn’t used anywhere in this notebook and (b) isn’t listed in env.yml, so Quarto execution will fail if it tries to import it. Remove the duplicate igraph import and either drop logomaker or add it as an explicit dependency.

Copilot uses AI. Check for mistakes.

::: {.callout-warning title="Warning"}
Be mindful that the y-axis of this plot uses a logarithmic scale, this visual compression means the height of the bars does not scale linearly with their value. A bar representing 1,000 clones will not appear ten times taller than a bar for 100 clones. **Always hover over the plot segments to see the precise numerical counts, specially when comparing categories.**
Be mindful that the y-axis of this plot uses a logarithmic scale; this visual compression means the height of the bars does not scale linearly with their value. A bar representing 1,000 clones will not appear ten times taller than a bar for 100 clones. **Always hover over the plot segments to see the precise numerical counts, specially when comparing categories.**
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the warning callout: “specially when comparing categories” should be “especially when comparing categories.”

Suggested change
Be mindful that the y-axis of this plot uses a logarithmic scale; this visual compression means the height of the bars does not scale linearly with their value. A bar representing 1,000 clones will not appear ten times taller than a bar for 100 clones. **Always hover over the plot segments to see the precise numerical counts, specially when comparing categories.**
Be mindful that the y-axis of this plot uses a logarithmic scale; this visual compression means the height of the bars does not scale linearly with their value. A bar representing 1,000 clones will not appear ten times taller than a bar for 100 clones. **Always hover over the plot segments to see the precise numerical counts, especially when comparing categories.**

Copilot uses AI. Check for mistakes.
Comment thread env.yml Outdated
@@ -1,4 +1,4 @@
name: base
name: tcr_pipeline
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description says env.yml should add dash>=2.15.0 and upsetplot=0.9.0, but this diff only changes the environment name and does not add/adjust those dependencies (dash is still pinned to 2.14.1 and upsetplot is absent). Either update env.yml to include the required packages/versions or adjust the PR description so it matches what’s actually being shipped.

Copilot uses AI. Check for mistakes.
Comment on lines 2529 to +2531
{{< include ./template_pheno_sc.qmd >}}

<!-- {{< include ./template_pheno_bulk.qmd >}} --> No newline at end of file
{{< include ./template_pheno_bulk.qmd >}} No newline at end of file
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

template_discovery_brief.qmd now unconditionally includes both template_pheno_sc.qmd and template_pheno_bulk.qmd. The PR description specifies conditional assembly (include SC only for --input_format cellranger, and bulk for adaptive/airr) by appending include lines to a copied notebook, not editing the source template. Please revert these unconditional includes and implement the conditional copy/append logic in the Nextflow rendering step (or otherwise ensure only the correct sub-notebook is included).

Copilot uses AI. Check for mistakes.
Comment on lines 104 to +108
{{< include ./template_sharing.qmd >}}

{{< include ./template_giana.qmd >}}

{{< include ./template_gliph.qmd >}}
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

template_details_part2.qmd now always includes template_giana.qmd and template_gliph.qmd. Per the PR description, these sections should be appended only when the patient workflow is enabled (and GLIPH only when --use_gliph2 is set), and the source templates should not be modified directly. Keeping these unconditional includes will also break report rendering when the GIANA/GLIPH outputs are absent. Please remove these includes from the template and implement conditional assembly in the rendering step.

Copilot uses AI. Check for mistakes.
Comment on lines +1546 to 1548
import pandas as pd
import numpy as np
import itertools
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This chunk re-imports pandas as pd, numpy, and itertools even though they’re already imported in the notebook’s main setup section. Re-importing is harmless at runtime but makes the notebook harder to maintain and can mask missing setup dependencies. Prefer relying on the initial imports (or, if you need optional imports, add a short comment explaining why they’re repeated here).

Suggested change
import pandas as pd
import numpy as np
import itertools

Copilot uses AI. Check for mistakes.

## Reading concatenated cdr3 file
concat_df = pd.read_csv(concat_csv, sep='\t')
concat_df = concat_df.merge(meta[['sample', subject_col, 'alias', 'timepoint', 'timepoint_order']], on='sample', how='left')
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This merge hardcodes metadata column names ('alias', 'timepoint', 'timepoint_order') instead of using the parameterized variables (alias_col, timepoint_col, timepoint_order_col) that the main notebooks expose. If a user overrides these column names (as supported by nextflow.config), this will raise a KeyError and break report rendering. Please use the *_col variables consistently here (and in any downstream references) so the notebook honors pipeline parameters.

Suggested change
concat_df = concat_df.merge(meta[['sample', subject_col, 'alias', 'timepoint', 'timepoint_order']], on='sample', how='left')
concat_df = concat_df.merge(
meta[['sample', subject_col, alias_col, timepoint_col, timepoint_order_col]],
on='sample',
how='left'
)

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 27, 2026

Unit Test Results

10 tests   10 ✅  2m 48s ⏱️
 2 suites   0 💤
 1 files     0 ❌

Results for commit 693daa6.

♻️ This comment has been updated with latest results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants