Quarto notebooks#81
Conversation
… default parameters in nextflow.config file. Added dependencies to conda environment that are used in notebooks
There was a problem hiding this comment.
Pull request overview
Integrates and refines Quarto notebook templates intended to generate HTML reports for the TCRtoolkit Nextflow pipeline, alongside config and documentation updates.
Changes:
- Updated multiple Quarto
.qmdtemplates (imports, narrative text, figure captions, and embedded sub-report includes). - Adjusted
nextflow.confignotebook-related params. - Expanded
README.mdwith input/workflow/report documentation; renamed the conda environment inenv.yml.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| notebooks/template_sharing.qmd | Removes unused imports/includes; minor wording updates in sharing/publicity sections. |
| notebooks/template_sample.qmd | Cleans up comments/text and plot annotations in the sample report template. |
| notebooks/template_qc.qmd | Cleans up imports/text and refines rarefaction plotting logic in QC report template. |
| notebooks/template_pheno_sc.qmd | Refactors single-cell phenotype UpSet plotting section and chunk options. |
| notebooks/template_pheno_bulk.qmd | Simplifies bulk phenotype template by removing unused CSV inputs and updating text. |
| notebooks/template_overlap.qmd | Removes unused visualization imports and adjusts setup imports for overlap analysis. |
| notebooks/template_gliph.qmd | Minor text/metadata cleanup in GLIPH2 sub-notebook template. |
| notebooks/template_discovery_brief.qmd | Tweaks HTML TOC option, removes unused path var, and changes sub-notebook inclusion behavior. |
| notebooks/template_details_part1.qmd | Minor grammar and punctuation fixes in the details notebook intro. |
| notebooks/template_details_part2.qmd | Minor grammar fixes and changes sub-notebook inclusion behavior. |
| nextflow.config | Replaces previously existing sample-report params with new generic notebook params. |
| env.yml | Renames the conda environment. |
| README.md | Adds sections describing input formats, workflow levels, report outputs, and parameters. |
Comments suppressed due to low confidence (2)
notebooks/template_gliph.qmd:34
- This notebook imports
logomaker(andigraph) in the setup cell, butlogomakeris not used anywhere in the current template. Keeping the import adds an unnecessary dependency and will break rendering iflogomakerisn’t installed in the runtime environment. Please remove the unusedlogomakerimport (or addlogomakerexplicitly to the environment if it’s intended to be used).
from IPython.display import Image, HTML, display
import os
import datetime
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
import igraph as ig
import logomaker
import io
notebooks/template_pheno_sc.qmd:435
- This section imports
upsetplot(from upsetplot import ...), but the PR’senv.ymldiff does not addupsetplot. If the runtime environment doesn’t already include it, Quarto rendering will fail withModuleNotFoundError. Please ensureupsetplotis added to the environment (and consider removing the duplicateimport matplotlib.pyplot as pltin this block to keep imports clean).
import matplotlib
import matplotlib.pyplot as plt
from upsetplot import from_contents, UpSet
import warnings
from IPython.display import display, Markdown
import matplotlib.pyplot as plt
import sys
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - `--input_format adaptive` → includes bulk phenotype report | ||
| - `--workflow_level sample,patient,compare` (Patient workflow enabled) → includes patient-level clonotype analysis |
There was a problem hiding this comment.
The “Conditional Report Sections” bullets are too narrow compared to the described behavior: (1) bulk phenotype should apply to --input_format adaptive and airr (per PR description), and (2) patient-level sections should trigger whenever --workflow_level includes patient (not only the exact string sample,patient,compare). Updating these bullets will prevent users from misconfiguring reports based on the README.
| - `--input_format adaptive` → includes bulk phenotype report | |
| - `--workflow_level sample,patient,compare` (Patient workflow enabled) → includes patient-level clonotype analysis | |
| - `--input_format adaptive` or `airr` → includes bulk phenotype report | |
| - `--workflow_level` includes `patient` → includes patient-level clonotype analysis |
| timepoint_order_col = 'timepoint_order' | ||
| alias_col = 'alias' | ||
| subject_col = 'subject_id' | ||
|
|
There was a problem hiding this comment.
This change removes samplechart_x_col, samplechart_color_col, vgene_subject_col, and vgene_x_cols, but modules/local/sample/sample_plot.nf still passes these params into quarto render for notebooks/sample_stats_template.qmd. As-is, sample report rendering will break (missing params / null values). Either keep the existing params for backward compatibility or update the module + sample_stats_template.qmd to use the new parameter names.
| // Backward-compatible aliases for legacy sample report params | |
| samplechart_x_col = timepoint_col | |
| samplechart_color_col = alias_col | |
| vgene_subject_col = subject_col | |
| vgene_x_cols = [timepoint_col, alias_col] |
| import igraph as ig | ||
| import logomaker | ||
|
|
||
| import base64 | ||
| import datetime | ||
| import glob | ||
| import h5py | ||
| import holoviews as hv | ||
| import igraph as ig | ||
| import io |
There was a problem hiding this comment.
In the setup cell, igraph is imported twice (import igraph as ig at lines 32 and 39). Also logomaker is imported but (a) isn’t used anywhere in this notebook and (b) isn’t listed in env.yml, so Quarto execution will fail if it tries to import it. Remove the duplicate igraph import and either drop logomaker or add it as an explicit dependency.
|
|
||
| ::: {.callout-warning title="Warning"} | ||
| Be mindful that the y-axis of this plot uses a logarithmic scale, this visual compression means the height of the bars does not scale linearly with their value. A bar representing 1,000 clones will not appear ten times taller than a bar for 100 clones. **Always hover over the plot segments to see the precise numerical counts, specially when comparing categories.** | ||
| Be mindful that the y-axis of this plot uses a logarithmic scale; this visual compression means the height of the bars does not scale linearly with their value. A bar representing 1,000 clones will not appear ten times taller than a bar for 100 clones. **Always hover over the plot segments to see the precise numerical counts, specially when comparing categories.** |
There was a problem hiding this comment.
Typo in the warning callout: “specially when comparing categories” should be “especially when comparing categories.”
| Be mindful that the y-axis of this plot uses a logarithmic scale; this visual compression means the height of the bars does not scale linearly with their value. A bar representing 1,000 clones will not appear ten times taller than a bar for 100 clones. **Always hover over the plot segments to see the precise numerical counts, specially when comparing categories.** | |
| Be mindful that the y-axis of this plot uses a logarithmic scale; this visual compression means the height of the bars does not scale linearly with their value. A bar representing 1,000 clones will not appear ten times taller than a bar for 100 clones. **Always hover over the plot segments to see the precise numerical counts, especially when comparing categories.** |
| @@ -1,4 +1,4 @@ | |||
| name: base | |||
| name: tcr_pipeline | |||
There was a problem hiding this comment.
PR description says env.yml should add dash>=2.15.0 and upsetplot=0.9.0, but this diff only changes the environment name and does not add/adjust those dependencies (dash is still pinned to 2.14.1 and upsetplot is absent). Either update env.yml to include the required packages/versions or adjust the PR description so it matches what’s actually being shipped.
| {{< include ./template_pheno_sc.qmd >}} | ||
|
|
||
| <!-- {{< include ./template_pheno_bulk.qmd >}} --> No newline at end of file | ||
| {{< include ./template_pheno_bulk.qmd >}} No newline at end of file |
There was a problem hiding this comment.
template_discovery_brief.qmd now unconditionally includes both template_pheno_sc.qmd and template_pheno_bulk.qmd. The PR description specifies conditional assembly (include SC only for --input_format cellranger, and bulk for adaptive/airr) by appending include lines to a copied notebook, not editing the source template. Please revert these unconditional includes and implement the conditional copy/append logic in the Nextflow rendering step (or otherwise ensure only the correct sub-notebook is included).
| {{< include ./template_sharing.qmd >}} | ||
|
|
||
| {{< include ./template_giana.qmd >}} | ||
|
|
||
| {{< include ./template_gliph.qmd >}} |
There was a problem hiding this comment.
template_details_part2.qmd now always includes template_giana.qmd and template_gliph.qmd. Per the PR description, these sections should be appended only when the patient workflow is enabled (and GLIPH only when --use_gliph2 is set), and the source templates should not be modified directly. Keeping these unconditional includes will also break report rendering when the GIANA/GLIPH outputs are absent. Please remove these includes from the template and implement conditional assembly in the rendering step.
| import pandas as pd | ||
| import numpy as np | ||
| import itertools |
There was a problem hiding this comment.
This chunk re-imports pandas as pd, numpy, and itertools even though they’re already imported in the notebook’s main setup section. Re-importing is harmless at runtime but makes the notebook harder to maintain and can mask missing setup dependencies. Prefer relying on the initial imports (or, if you need optional imports, add a short comment explaining why they’re repeated here).
| import pandas as pd | |
| import numpy as np | |
| import itertools |
|
|
||
| ## Reading concatenated cdr3 file | ||
| concat_df = pd.read_csv(concat_csv, sep='\t') | ||
| concat_df = concat_df.merge(meta[['sample', subject_col, 'alias', 'timepoint', 'timepoint_order']], on='sample', how='left') |
There was a problem hiding this comment.
This merge hardcodes metadata column names ('alias', 'timepoint', 'timepoint_order') instead of using the parameterized variables (alias_col, timepoint_col, timepoint_order_col) that the main notebooks expose. If a user overrides these column names (as supported by nextflow.config), this will raise a KeyError and break report rendering. Please use the *_col variables consistently here (and in any downstream references) so the notebook honors pipeline parameters.
| concat_df = concat_df.merge(meta[['sample', subject_col, 'alias', 'timepoint', 'timepoint_order']], on='sample', how='left') | |
| concat_df = concat_df.merge( | |
| meta[['sample', subject_col, alias_col, timepoint_col, timepoint_order_col]], | |
| on='sample', | |
| how='left' | |
| ) |
Unit Test Results10 tests 10 ✅ 2m 48s ⏱️ Results for commit 693daa6. ♻️ This comment has been updated with latest results. |
Add Quarto Report Notebooks to Pipeline
This PR integrates Quarto report notebooks into the TCRtoolkit pipeline. Below are the implementation instructions.
1. New Files —
notebooks/Add the following
.qmdfiles from/lab/home/kmlanderos/projects2/TCRtoolkit/notebooks/:2.
env.yml— Add Dependencies3. Rendering — 4 Main Notebooks
After the pipeline finishes, render the 4 main notebooks using the following command (repeated for each):
The remaining notebooks (
template_pheno_sc,template_pheno_bulk,template_gliph, etc.) are not rendered directly — they are embedded via Quarto's{{< include >}}directive.4. Conditional Sub-Notebook Assembly
Before rendering, copy the relevant main notebook to a working directory and append include lines as needed. Do not modify the source templates directly.
template_discovery_brief.qmd--input_formatvaluecellranger{{< include ./template_pheno_sc.qmd >}}adaptiveorairr{{< include ./template_pheno_bulk.qmd >}}template_details_part2.qmd--workflow_levelincludespatient(e.g.sample,patient,compare){{< include ./template_giana.qmd >}}--use_gliph2is set{{< include ./template_gliph.qmd >}}Make sure the sub-notebook
.qmdfiles are also present in the same working directory so relative{{< include >}}paths resolve correctly during rendering.5. README.md
File updated to include patient workflow, pipeline parameters and HTML reports