Docs tests by csudre · Pull Request #58 · Project-MONAI/MetricsReloaded

csudre · 2026-04-10T14:17:25Z

Update of all documentation to ensure all parameters are explained and function listed

Summary by CodeRabbit

New Features
- Added a “Normalised expected cost” option to multiclass pairwise metrics.
Bug Fixes
- CI workflow trigger contains unresolved merge markers (may break CI).
- Improved handling for empty references in several metrics (now yields NaN in some cases).
Documentation
- Expanded and clarified docstrings across metrics, processes, and utilities.
Chores / Other
- Removed or disabled several utility helpers; added some debug/log prints.
Tests
- Updated and expanded tests for metrics, utilities, morphology, and processes.

…descriptions

coderabbitai · 2026-04-10T14:17:41Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

This PR updates many docstrings, removes/comment-outs several utility exports, adjusts probabilistic threshold functions and empty-reference/missing-case handling, adds a "nec" metric key, changes some internal variable logic, updates tests accordingly, and leaves unresolved merge conflict markers in the GitHub Actions workflow.

Changes

Cohort / File(s)	Summary
GitHub Workflow `.github/workflows/python-app.yml`	`on.push.branches` entry modified from `docs_tests` → `docs_test` and the file contains unresolved Git merge conflict markers (`<<<<<<<`, `=======`, `>>>>>>>`) around the branches block (YAML likely invalid).
Calibration metrics `MetricsReloaded/metrics/calibration_measures.py`	Expanded/corrected docstrings (math/LaTeX and params), renamed local binning variables `(l,u)`→`(lo,up)`, removed/commented unused locals; no public API changes.
Pairwise measures `MetricsReloaded/metrics/pairwise_measures.py`	Removed unused imports, added `"nec": (self.normalised_expected_cost, "NEC")` to `MultiClassPairwiseMeasures` registry, fixed `n_pos_pred` return, widened docstrings; core algorithms unchanged.
Probability pairwise measures `MetricsReloaded/metrics/prob_pairwise_measures.py`	Added/expanded docstrings and altered internal logic: reordered conditionals in `all_multi_threshold_values`, empty-reference returns switch to `np.nan` for several thr/sensitivity/PPV functions, changed case iteration/indexing and prints in `fppi_thr`/`froc`, and default `value_fppi` in `sensitivity_at_fppi` from `0.8`→`2`. Review numeric behavior carefully.
Processes — overall & mixed `MetricsReloaded/processes/overall_process.py`, `MetricsReloaded/processes/mixed_measures_processes.py`	Docstring expansion; `ProcessEvaluation.__init__` now computes/stores `self.list_empty_ref` via `identify_empty_ref()` (category-aware). `complete_missing_cases()` now uses `ref_missing_pred`, renames loop var to `lab`, and returns early after assembling missing-case DataFrames. Minor removal of unused temporaries in mixed processes.
Utility — assignment & utils `MetricsReloaded/utility/assignment_localization.py`, `MetricsReloaded/utility/utils.py`	Many docstrings added. New instance attrs `ref_loc_mod`/`pred_loc_mod`; added validation branches that set `flag_usable=False` and emit warnings/prints. Several utilities disabled/commented out and removed from `__all__` (e.g., `to_string_*`, `to_dict_meas_`, `MorphologyOps.border_map2`). `merge_list_df` behavior/logging changed. Check for callers of removed helpers.
Tests `test/...` (multiple files, e.g. `test_metrics/`, `test_utility/`, `test_processes/*`)	Tests updated to reflect API/behavior changes: fixture key renamed (`ref_missing`→`ref_missing_pred`), added/expanded tests for probability threshold/FROC behavior, pandas/dataframe coverage, morphology/3D point tests, assignment-localization tests, and numerous commented-out/adjusted assertions; some tests emit debug prints. Verify NaN and numeric expectation adjustments.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through docs and fixed the math,

added "nec" and chased a path,
empty refs I learned to see,
thresholds changed their melody,
please mend the workflow’s merge—then clap with me! 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 62.15% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Docs tests' is vague and generic, using non-descriptive terms that don't clearly convey the actual scope of changes (documentation updates, parameter documentation, function listing, testing improvements, and multiple code fixes across diverse modules).	Use a more specific title that captures the primary objective, such as 'Improve documentation coverage and add parameter documentation' or 'Update documentation and expand test coverage for assignment localization'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch docs_tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (1)

MetricsReloaded/processes/overall_process.py (1)

644-644: Replace mutable default argument {} with None in label_aggregation.

Mutable default arguments are shared across all function calls, leading to unexpected behavior. Initialize the default value inside the function instead.

Proposed fix

-def label_aggregation(self, option='average',dict_args={}):
+def label_aggregation(self, option='average', dict_args=None):
+    if dict_args is None:
+        dict_args = {}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@MetricsReloaded/processes/overall_process.py` at line 644, The function
label_aggregation currently uses a mutable default dict_args={} which can be
shared across calls; change the signature to use dict_args=None and inside
label_aggregation initialize dict_args = {} if dict_args is None, then proceed
to use dict_args as before (reference the label_aggregation function and its
option parameter) so behavior remains identical but avoids the mutable default
trap.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/python-app.yml:
- Line 12: The workflow push trigger lists branches: [ main, docs_test ] but the
repo’s active branch is named docs_tests (plural); update the branches array to
include the correct branch name (replace or add "docs_tests") so the push event
will trigger CI for that branch (look for the branches: [ main, docs_test ] line
in the workflow YAML and change "docs_test" to "docs_tests").

In `@MetricsReloaded/metrics/calibration_measures.py`:
- Around line 111-112: Several docstrings in
MetricsReloaded/metrics/calibration_measures.py contain LaTeX math with
unescaped backslashes (e.g., the cwECE formula), causing W605 errors; fix by
making those docstrings raw strings (prefix with r) or escaping backslashes
(e.g., \\dfrac → \\\\dfrac). Update the docstrings that include the formulas
referenced around the cwECE expression and the other occurrences noted (lines
near 167-168, 214-215, 262, 282, 295-296, 406, 444, 469, 501-502) so the
functions/classes that contain them (search for the docstrings in
calibration_measures.py and identifiers like cwECE) use r"..." or properly
escaped backslashes throughout. Ensure you change only the docstring delimiters
and escapes, not the formula text itself.

In `@MetricsReloaded/metrics/pairwise_measures.py`:
- Around line 94-95: normalised_expected_cost currently divides by naive_cost
without checking for zero/invalid values; update the function
normalised_expected_cost to validate naive_cost (the variable used in the
denominator) by ensuring it is finite and > 0 (e.g., via
math.isfinite/np.isfinite and a >0 check) and if invalid return a safe sentinel
(e.g., float('nan') or 0) instead of performing the division; ensure the guard
covers all code paths where naive_cost is used (see the division in
normalised_expected_cost and related lines around where naive_cost is computed).
- Around line 99-113: The docstrings in
MetricsReloaded/metrics/pairwise_measures.py that contain LaTeX math (e.g., the
Expected Cost docstring with EC and \sum, and other blocks referenced around
lines 739–747, 876–880, 1017–1018, 1410–1411) must be converted to raw string
literals so backslashes are not treated as escape sequences; update each
triple-quoted docstring ("""...""") that contains LaTeX commands like \sum,
\dfrac, \text, \cap, \cup, etc., to use the raw prefix r"""...""" (preserve the
exact content and formatting inside the docstring) throughout
pairwise_measures.py to eliminate W605 warnings.

In `@MetricsReloaded/processes/overall_process.py`:
- Around line 586-587: The code inconsistently uses the keys 'ref_missing' and
'ref_missing_pred', causing skipped processing or KeyError; update
complete_missing_cases() and all checks (e.g., the if guard that currently reads
len(self.data['ref_missing_pred']) == 0 and the other occurrences around the
598-599 area) to read the values via self.data.get('ref_missing') or
self.data.get('ref_missing_pred') (prefer a single unified key by normalizing:
value = self.data.get('ref_missing') or self.data.get('ref_missing_pred') or [])
and act on that normalized list; also update the constructor gate to call
complete_missing_cases() when either key exists (e.g., if 'ref_missing' in
self.data or 'ref_missing_pred' in self.data: self.complete_missing_cases()) so
both schemas are handled safely.

In `@pyproject.toml`:
- Line 4: The build requirement string in pyproject.toml is missing a closing
quote (the entry currently reads "setuptools < 82.0.0), causing TOML parsing to
fail; update the build-system.requires entry so the setuptools requirement is a
properly quoted string (close the trailing double quote on the "setuptools <
82.0.0" token) ensuring the value is a valid TOML string.

---

Nitpick comments:
In `@MetricsReloaded/processes/overall_process.py`:
- Line 644: The function label_aggregation currently uses a mutable default
dict_args={} which can be shared across calls; change the signature to use
dict_args=None and inside label_aggregation initialize dict_args = {} if
dict_args is None, then proceed to use dict_args as before (reference the
label_aggregation function and its option parameter) so behavior remains
identical but avoids the mutable default trap.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6a4f87c8-270b-49c7-a690-7ff609bf0f31

📥 Commits

Reviewing files that changed from the base of the PR and between bfec35a and b0fa03e.

📒 Files selected for processing (9)

.github/workflows/python-app.yml
MetricsReloaded/metrics/calibration_measures.py
MetricsReloaded/metrics/pairwise_measures.py
MetricsReloaded/metrics/prob_pairwise_measures.py
MetricsReloaded/processes/mixed_measures_processes.py
MetricsReloaded/processes/overall_process.py
MetricsReloaded/utility/assignment_localization.py
MetricsReloaded/utility/utils.py
pyproject.toml

coderabbitai · 2026-04-10T14:23:56Z

+            cwECE = \\dfrac{1}{K}\sum_{k=1}^{K}\sum_{i=1}^{N}\\dfrac{\\vert B_{i,k} \\vert}{N} \\left(y_{k}(B_{i,k}) - p_{k}(B_{i,k})\\right)



⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash ruff check MetricsReloaded/metrics/calibration_measures.py --select W605 -n

Repository: Project-MONAI/MetricsReloaded

Length of output: 22957

Resolve W605 invalid escape sequences in formula docstrings.

The math expressions in docstrings contain unescaped backslashes. Convert to raw strings (prefix with r) or escape backslashes (e.g., \\dfrac → \\\\dfrac) to resolve these violations.

Also applies to: 167-168, 214-215, 262, 282, 295-296, 406, 444, 469, 501-502

🧰 Tools

🪛 Ruff (0.15.9)

[warning] 111-111: Invalid escape sequence: \s

Add backslash to escape sequence

(W605)

[warning] 111-111: Invalid escape sequence: \s

Add backslash to escape sequence

(W605)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@MetricsReloaded/metrics/calibration_measures.py` around lines 111 - 112, Several docstrings in MetricsReloaded/metrics/calibration_measures.py contain LaTeX math with unescaped backslashes (e.g., the cwECE formula), causing W605 errors; fix by making those docstrings raw strings (prefix with r) or escaping backslashes (e.g., \\dfrac → \\\\dfrac). Update the docstrings that include the formulas referenced around the cwECE expression and the other occurrences noted (lines near 167-168, 214-215, 262, 282, 295-296, 406, 444, 469, 501-502) so the functions/classes that contain them (search for the docstrings in calibration_measures.py and identifiers like cwECE) use r"..." or properly escaped backslashes throughout. Ensure you change only the docstring delimiters and escapes, not the formula text itself.

coderabbitai · 2026-04-10T14:23:56Z

+            "nec": (self.normalised_expected_cost,"NEC"),
+


⚠️ Potential issue | 🟠 Major

Guard NEC against zero/invalid naive baseline before division.

By adding Line 94 ("nec"), normalised_expected_cost() becomes part of the default metric surface. Line 153 currently divides by naive_cost unguarded, which can yield inf/nan for degenerate cost setups.

💡 Proposed fix

def normalised_expected_cost(self): @@ naive_cost = self.best_naive_ec() ec = self.expected_cost() - return ec / naive_cost + if np.isnan(naive_cost) or naive_cost == 0: + warnings.warn("Naive expected cost is zero/undefined; NEC is not defined") + return np.nan + return ec / naive_cost

Also applies to: 147-153

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@MetricsReloaded/metrics/pairwise_measures.py` around lines 94 - 95, normalised_expected_cost currently divides by naive_cost without checking for zero/invalid values; update the function normalised_expected_cost to validate naive_cost (the variable used in the denominator) by ensuring it is finite and > 0 (e.g., via math.isfinite/np.isfinite and a >0 check) and if invalid return a safe sentinel (e.g., float('nan') or 0) instead of performing the division; ensure the guard covers all code paths where naive_cost is used (see the division in normalised_expected_cost and related lines around where naive_cost is computed).

coderabbitai · 2026-04-10T14:23:56Z

+        """
+        Calculates the expected cost defined as:
+
+        Luciana Ferrer - Analysis and comparison of classification metrics - https://arxiv.org/pdf/2209.05355
+
+        .. math::
+
+            EC = \sum_{r}\sum_p c_{rp} P_rD_{rp}
+
+        where :math: `c_{rp}` {is the cost of misclassifying class r as class p. :math: `P_r` is the probability of 
+        class r in the reference data, :math: `D_{rp}` is the fraction of samples of class r that are classified as 
+        class p
+
+
+        """


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash ruff check MetricsReloaded/metrics/pairwise_measures.py --select W605 -n

Repository: Project-MONAI/MetricsReloaded

Length of output: 31191

Use raw docstrings (r"""...""") for all math blocks with LaTeX escape sequences to fix W605 errors across the file.

The file contains 57 invalid escape sequence warnings in docstrings with LaTeX math notation. While the specific ranges mentioned (99–113, 739–747, 876–880, 1017–1018, 1410–1411) are accurate, the problem is more widespread. All docstrings containing LaTeX patterns like \sum, \dfrac, \text, \cap, \cup, and similar should use raw string literals to prevent Python from interpreting backslashes as escape characters. This will eliminate all W605 violations and prevent lint-gated CI failures.

🧰 Tools

🪛 Ruff (0.15.9)

[warning] 106-106: Invalid escape sequence: \s

Use a raw string literal

(W605)

[warning] 106-106: Invalid escape sequence: \s

Use a raw string literal

(W605)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@MetricsReloaded/metrics/pairwise_measures.py` around lines 99 - 113, The docstrings in MetricsReloaded/metrics/pairwise_measures.py that contain LaTeX math (e.g., the Expected Cost docstring with EC and \sum, and other blocks referenced around lines 739–747, 876–880, 1017–1018, 1410–1411) must be converted to raw string literals so backslashes are not treated as escape sequences; update each triple-quoted docstring ("""...""") that contains LaTeX commands like \sum, \dfrac, \text, \cap, \cup, etc., to use the raw prefix r"""...""" (preserve the exact content and formatting inside the docstring) throughout pairwise_measures.py to eliminate W605 warnings.

coderabbitai · 2026-04-10T14:23:56Z

+        if len(self.data['ref_missing_pred']) == 0:
            return


⚠️ Potential issue | 🟠 Major

Unify missing-case key handling to avoid skipped processing/KeyError.

Line 586 assumes ref_missing_pred, while the constructor gate still checks ref_missing. This can silently bypass completion or fail at runtime depending on input schema.

💡 Proposed fix

def complete_missing_cases(self): @@ - if len(self.data['ref_missing_pred']) == 0: + missing_cases = self.data.get("ref_missing_pred", self.data.get("ref_missing", [])) + if len(missing_cases) == 0: return @@ - for (i,f) in enumerate(self.data['ref_missing_pred']): + for (i, f) in enumerate(missing_cases):

Also update the constructor gate to call complete_missing_cases() when either key exists.

Also applies to: 598-599

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@MetricsReloaded/processes/overall_process.py` around lines 586 - 587, The code inconsistently uses the keys 'ref_missing' and 'ref_missing_pred', causing skipped processing or KeyError; update complete_missing_cases() and all checks (e.g., the if guard that currently reads len(self.data['ref_missing_pred']) == 0 and the other occurrences around the 598-599 area) to read the values via self.data.get('ref_missing') or self.data.get('ref_missing_pred') (prefer a single unified key by normalizing: value = self.data.get('ref_missing') or self.data.get('ref_missing_pred') or []) and act on that normalized list; also update the constructor gate to call complete_missing_cases() when either key exists (e.g., if 'ref_missing' in self.data or 'ref_missing_pred' in self.data: self.complete_missing_cases()) so both schemas are handled safely.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

MetricsReloaded/metrics/calibration_measures.py (2)

463-491: ⚠️ Potential issue | 🟡 Minor

Parameter order mismatch in documentation.

The method signature is dirichlet_kernel(self, j, i) but the docstring documents parameters in the order (i, j). While not incorrect (both are documented), it may cause confusion.

📝 Suggested fix to align parameter order

-        :param i: index of first sample to consider
-        :param j: index of second sample to consider
+        :param j: index of first sample to consider
+        :param i: index of second sample to consider

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@MetricsReloaded/metrics/calibration_measures.py` around lines 463 - 491, The
docstring for dirichlet_kernel documents parameters as (i, j) but the method
signature is dirichlet_kernel(self, j, i); update the docstring to list
parameters in the same order (j then i) and adjust the :param descriptions to
match (e.g., :param j: index of first sample, :param i: index of second sample),
so the documented parameter names and order match the function signature
(dirichlet_kernel).

518-529: ⚠️ Potential issue | 🟡 Minor

Missing parameter documentation for fmt.

The docstring does not document the fmt parameter that appears in the method signature. Although it appears to be unused in the implementation (line 527 is commented out), it should still be documented.

📝 Suggested fix

     def to_dict_meas(self, fmt="{:.4f}"):
         """
         Given the selected metrics provides a dictionary with relevant metrics
         
+        :param fmt: format string for results (currently unused)
         :return: result_dict dictionary of results
         """

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@MetricsReloaded/metrics/calibration_measures.py` around lines 518 - 529, The
method to_dict_meas has an undocumented parameter fmt and currently doesn't use
it; update the to_dict_meas docstring to include a concise description of the
fmt parameter (e.g., expected format string like "{:.4f}" and that it is applied
to numeric metric results), and then either (a) re-enable and use fmt to format
each metric result before assigning to result_dict (apply fmt.format(result)) or
(b) remove the fmt parameter from the signature if formatting is not desired;
adjust the implementation in to_dict_meas accordingly so the docstring and
signature/behavior remain consistent with each other.

♻️ Duplicate comments (1)

MetricsReloaded/metrics/calibration_measures.py (1)

111-111: ⚠️ Potential issue | 🟡 Minor

Unresolved W605 invalid escape sequences in LaTeX formula.

The LaTeX formula contains unescaped backslashes that produce W605 warnings. This issue was previously flagged but remains unresolved. Convert to a raw string by prefixing with r or escape all backslashes.

🔧 Proposed fix

-            cwECE = \\dfrac{1}{K}\sum_{k=1}^{K}\sum_{i=1}^{N}\\dfrac{\\vert B_{i,k} \\vert}{N} \\left(y_{k}(B_{i,k}) - p_{k}(B_{i,k})\\right)
+            cwECE = \\dfrac{1}{K}\\sum_{k=1}^{K}\\sum_{i=1}^{N}\\dfrac{\\vert B_{i,k} \\vert}{N} \\left(y_{k}(B_{i,k}) - p_{k}(B_{i,k})\\right)

Alternatively, use a raw string (preferred):

-        .. math::
-
-            cwECE = \\dfrac{1}{K}\sum_{k=1}^{K}\sum_{i=1}^{N}\\dfrac{\\vert B_{i,k} \\vert}{N} \\left(y_{k}(B_{i,k}) - p_{k}(B_{i,k})\\right)
+        .. math::
+
+            cwECE = \\dfrac{1}{K}\\sum_{k=1}^{K}\\sum_{i=1}^{N}\\dfrac{\\vert B_{i,k} \\vert}{N} \\left(y_{k}(B_{i,k}) - p_{k}(B_{i,k})\\right)

Note: The same fix pattern applies to all other formulas with invalid escape sequences in this file.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@MetricsReloaded/metrics/calibration_measures.py` at line 111, The LaTeX
formula string for cwECE uses unescaped backslashes causing W605; update the
literal containing "cwECE = \\dfrac{1}{K}..." in calibration_measures.py to be a
raw string (prefix with r) or escape every backslash so Python won't treat them
as invalid escape sequences, and apply the same fix to all other formula strings
in the file (look for other LaTeX strings like those referencing B_{i,k}, y_{k}
and p_{k}).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/python-app.yml:
- Around line 12-16: The workflow file contains unresolved Git conflict markers
(<<<<<<<, =======, >>>>>>>) around the branches array; remove the conflict
markers and keep a single valid YAML line for the branches key (e.g., branches:
[ main, docs_tests ] or branches: [ main, docs_test ] depending on the intended
branch name) so that the branches array is valid YAML and the GitHub Actions
workflow can parse; ensure there are no leftover markers anywhere in the file
and run a YAML lint or quick git diff to confirm only one branches entry
remains.

---

Outside diff comments:
In `@MetricsReloaded/metrics/calibration_measures.py`:
- Around line 463-491: The docstring for dirichlet_kernel documents parameters
as (i, j) but the method signature is dirichlet_kernel(self, j, i); update the
docstring to list parameters in the same order (j then i) and adjust the :param
descriptions to match (e.g., :param j: index of first sample, :param i: index of
second sample), so the documented parameter names and order match the function
signature (dirichlet_kernel).
- Around line 518-529: The method to_dict_meas has an undocumented parameter fmt
and currently doesn't use it; update the to_dict_meas docstring to include a
concise description of the fmt parameter (e.g., expected format string like
"{:.4f}" and that it is applied to numeric metric results), and then either (a)
re-enable and use fmt to format each metric result before assigning to
result_dict (apply fmt.format(result)) or (b) remove the fmt parameter from the
signature if formatting is not desired; adjust the implementation in
to_dict_meas accordingly so the docstring and signature/behavior remain
consistent with each other.

---

Duplicate comments:
In `@MetricsReloaded/metrics/calibration_measures.py`:
- Line 111: The LaTeX formula string for cwECE uses unescaped backslashes
causing W605; update the literal containing "cwECE = \\dfrac{1}{K}..." in
calibration_measures.py to be a raw string (prefix with r) or escape every
backslash so Python won't treat them as invalid escape sequences, and apply the
same fix to all other formula strings in the file (look for other LaTeX strings
like those referencing B_{i,k}, y_{k} and p_{k}).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 11d7b665-8b1b-4a2c-8f7f-84ccaac29225

📥 Commits

Reviewing files that changed from the base of the PR and between b0fa03e and 2f49a90.

📒 Files selected for processing (2)

.github/workflows/python-app.yml
MetricsReloaded/metrics/calibration_measures.py

…docs_tests Making all tests up to date Please enter a commit message to explain why this merge is necessary,

'

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (2)

MetricsReloaded/metrics/pairwise_measures.py (2)

94-95: ⚠️ Potential issue | 🟠 Major

Guard NEC denominator before division.

"nec" is now part of the selectable multiclass metrics, but NEC still divides by naive_cost without validating finite/non-zero denominator. This can return inf/nan for degenerate class/cost setups.

💡 Proposed fix

 def normalised_expected_cost(self):
     """
     Calculates the normalised expected cost as the ratio of the expected cost to the naive expected cost.
     """
     naive_cost = self.best_naive_ec()
     ec = self.expected_cost()
-    return ec / naive_cost
+    if not np.isfinite(naive_cost) or naive_cost <= 0:
+        warnings.warn("Naive expected cost is non-finite or <= 0; NEC is not defined")
+        return np.nan
+    return ec / naive_cost

Also applies to: 147-153

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@MetricsReloaded/metrics/pairwise_measures.py` around lines 94 - 95, The NEC
computation uses self.normalised_expected_cost which divides by naive_cost
without checking the denominator; update the code that computes
normalised_expected_cost (and the same block around the other NEC calculation at
lines ~147-153) to guard the division by testing that naive_cost is finite and
non-zero (e.g. use numpy.isfinite and naive_cost != 0); if the check fails, set
the normalized NEC result to a safe value (e.g. 0.0 or np.nan consistently)
instead of performing the division so you never return inf/nan for degenerate
class/cost setups.

99-113: ⚠️ Potential issue | 🟠 Major

Convert LaTeX docstrings to raw strings to fix W605.

These changed docstrings contain LaTeX backslashes in non-raw triple-quoted strings, triggering invalid escape warnings and risking lint/doc-check failures.

💡 Proposed fix (pattern)

-    def expected_cost(self):
-        """
+    def expected_cost(self):
+        r"""
@@
-    def expected_matching_ck(self):
-        """
+    def expected_matching_ck(self):
+        r"""
@@
-    def dsc(self):
-        """
+    def dsc(self):
+        r"""
@@
-    def intersection_over_reference(self):
-        """
+    def intersection_over_reference(self):
+        r"""
@@
-    def measured_masd(self):
-        """
+    def measured_masd(self):
+        r"""

#!/bin/bash
# Verify invalid escape warnings in the reviewed file only (read-only).
ruff check MetricsReloaded/metrics/pairwise_measures.py --select W605 -n

Also applies to: 740-748, 876-881, 1018-1018, 1411-1412

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@MetricsReloaded/metrics/pairwise_measures.py` around lines 99 - 113, The
docstrings containing LaTeX backslashes (e.g., the one starting "Calculates the
expected cost" and the other docstrings containing math backslashes) must be
converted to raw triple-quoted strings to avoid W605 invalid-escape warnings:
change the opening quotes from """ to r""" for those docstrings (the ones that
include sequences like \sum, \math::, or other LaTeX backslashes) and ensure no
other string concatenation is broken; after updating, run a W605 check (ruff
--select W605) to confirm all occurrences (including the other reported
docstrings) are fixed.

🧹 Nitpick comments (1)

test/test_utility/test_utils.py (1)

83-87: Drop these debug prints before merging.

They do not support any assertion and will just add noise to pytest output when this test runs.

🧹 Proposed cleanup

-    print(df_sensdscnsd)
-    print(val_merge4)
-    print(df3)
-    print(df5)
-    print(df4)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/test_utility/test_utils.py` around lines 83 - 87, Remove the leftover
debug print statements printing df_sensdscnsd, val_merge4, df3, df5, and df4 in
test_utility/test_utils.py so the test does not emit noisy output; locate the
prints (print(df_sensdscnsd), print(val_merge4), print(df3), print(df5),
print(df4)) and delete them, ensuring the test contains only assertions and
setup/teardown logic.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@MetricsReloaded/metrics/pairwise_measures.py`:
- Around line 307-315: Update the docstring entry for the existing :param empty:
in the pairwise_measures docblock to explicitly state what the parameter
represents, its expected type and accepted values, the default value, and how it
affects metric computation (e.g., how empty/missing pixels or regions are
treated or substituted when computing measures). Locate the docstring containing
the parameters pred, ref, measures, connectivity_type, pixdim, empty, dict_args
in pairwise_measures.py (the function or class where those params are declared)
and replace the blank :param empty: line with a concise description mentioning
its role and behavior.

In `@MetricsReloaded/utility/utils.py`:
- Around line 115-128: A public method was removed by commenting it out, causing
runtime AttributeError for callers; restore the original implementation of
MorphologyOps.border_map2 (the method that computes border using ndimage.shift
over self.binary_map and returns border) so the public API remains available, or
if this was intentional, revert the comment change and instead open a dedicated
breaking-change PR that either (a) removes border_map2 after announcing the
breaking change, or (b) adds a compatibility shim named
MorphologyOps.border_map2 that raises a clear DeprecationWarning and directs
users to the new API; locate the symbol MorphologyOps.border_map2 in
MetricsReloaded/utility/utils.py and either re-enable the original code block
(west/east/north/south/top/bottom, cumulative, border) or replace it with the
deprecation shim and ensure tests/documentation are updated accordingly.
- Around line 420-517: The module's __all__ includes to_string_count,
to_string_dist, to_string_mt, and to_dict_meas_ but their implementations are
commented out; either restore these functions (uncomment and ensure
signatures/behavior match what importers expect) or remove those four names from
__all__ so the public API is accurate; update the module accordingly and run
tests/linters to confirm no import-time failures referencing to_string_count,
to_string_dist, to_string_mt, or to_dict_meas_.
- Around line 557-568: The function merge_list_df currently emits raw prints
(e.g., print(f, ' not present'), print(len(list_fin))) which pollute runtime
output; update merge_list_df to remove these print() calls and instead use the
module logger: add an import logging and a logger = logging.getLogger(__name__)
(or reuse existing), replace the prints with logger.debug(...) containing the
same diagnostic text, and remove any stray commented print lines so the function
no longer writes to stdout during normal execution while preserving debug
visibility via logging.

In `@test/test_processes/test_overall_process.py`:
- Line 46: The test changed the input key to 'ref_missing_pred' which prevents
ProcessEvaluation (in MetricsReloaded/processes/overall_process.py) from
detecting ref_missing and calling complete_missing_cases(), causing the
assertion to reflect the broken path; modify the test to restore the original
key name 'ref_missing' for data_miss (i.e., revert data_miss['ref_missing_pred']
back to data_miss['ref_missing']) and restore the expected result to [3,8] so
the test asserts the completed-missing-cases behavior checked by
ProcessEvaluation.complete_missing_cases().

---

Duplicate comments:
In `@MetricsReloaded/metrics/pairwise_measures.py`:
- Around line 94-95: The NEC computation uses self.normalised_expected_cost
which divides by naive_cost without checking the denominator; update the code
that computes normalised_expected_cost (and the same block around the other NEC
calculation at lines ~147-153) to guard the division by testing that naive_cost
is finite and non-zero (e.g. use numpy.isfinite and naive_cost != 0); if the
check fails, set the normalized NEC result to a safe value (e.g. 0.0 or np.nan
consistently) instead of performing the division so you never return inf/nan for
degenerate class/cost setups.
- Around line 99-113: The docstrings containing LaTeX backslashes (e.g., the one
starting "Calculates the expected cost" and the other docstrings containing math
backslashes) must be converted to raw triple-quoted strings to avoid W605
invalid-escape warnings: change the opening quotes from """ to r""" for those
docstrings (the ones that include sequences like \sum, \math::, or other LaTeX
backslashes) and ensure no other string concatenation is broken; after updating,
run a W605 check (ruff --select W605) to confirm all occurrences (including the
other reported docstrings) are fixed.

---

Nitpick comments:
In `@test/test_utility/test_utils.py`:
- Around line 83-87: Remove the leftover debug print statements printing
df_sensdscnsd, val_merge4, df3, df5, and df4 in test_utility/test_utils.py so
the test does not emit noisy output; locate the prints (print(df_sensdscnsd),
print(val_merge4), print(df3), print(df5), print(df4)) and delete them, ensuring
the test contains only assertions and setup/teardown logic.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1b463270-6054-4711-852c-81cfb4650109

📥 Commits

Reviewing files that changed from the base of the PR and between 2f49a90 and 52e57b4.

📒 Files selected for processing (4)

MetricsReloaded/metrics/pairwise_measures.py
MetricsReloaded/utility/utils.py
test/test_processes/test_overall_process.py
test/test_utility/test_utils.py

coderabbitai · 2026-04-13T20:17:53Z

+    Input includes:
+    :param pred: Prediction
+    :param ref: Reference
+    :param measures: list of measures to extract
+    :param connectivity_type: Type of connectivity to use
+    :param pixdim: list of pixel dimensions
+    :param empty:
+    :param dict_args: Dictionary with additional arguments for the different metrics
+    """


⚠️ Potential issue | 🟡 Minor

Complete the empty parameter documentation.

The class docstring leaves :param empty: without any description, so the “all parameters explained” objective is still incomplete.

💡 Proposed fix

- :param empty: + :param empty: Whether empty-reference/prediction handling flags are enabled for metric computation.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Input includes:

:param pred: Prediction

:param ref: Reference

:param measures: list of measures to extract

:param connectivity_type: Type of connectivity to use

:param pixdim: list of pixel dimensions

:param empty:

:param dict_args: Dictionary with additional arguments for the different metrics

"""

Input includes:

:param pred: Prediction

:param ref: Reference

:param measures: list of measures to extract

:param connectivity_type: Type of connectivity to use

:param pixdim: list of pixel dimensions

:param empty: Whether empty-reference/prediction handling flags are enabled for metric computation.

:param dict_args: Dictionary with additional arguments for the different metrics

"""

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@MetricsReloaded/metrics/pairwise_measures.py` around lines 307 - 315, Update the docstring entry for the existing :param empty: in the pairwise_measures docblock to explicitly state what the parameter represents, its expected type and accepted values, the default value, and how it affects metric computation (e.g., how empty/missing pixels or regions are treated or substituted when computing measures). Locate the docstring containing the parameters pred, ref, measures, connectivity_type, pixdim, empty, dict_args in pairwise_measures.py (the function or class where those params are declared) and replace the blank :param empty: line with a concise description mentioning its role and behavior.

coderabbitai · 2026-04-13T20:17:53Z

+    # def border_map2(self):
+    #     """
+    #     Creates the border for a 3D image
+    #     :return:
+    #     """
+    #     west = ndimage.shift(self.binary_map, [-1, 0, 0], order=0)
+    #     east = ndimage.shift(self.binary_map, [1, 0, 0], order=0)
+    #     north = ndimage.shift(self.binary_map, [0, 1, 0], order=0)
+    #     south = ndimage.shift(self.binary_map, [0, -1, 0], order=0)
+    #     top = ndimage.shift(self.binary_map, [0, 0, 1], order=0)
+    #     bottom = ndimage.shift(self.binary_map, [0, 0, -1], order=0)
+    #     cumulative = west + east + north + south + top + bottom
+    #     border = ((cumulative < 6) * self.binary_map) == 1
+    #     return border


⚠️ Potential issue | 🟠 Major

Keep MorphologyOps.border_map2() live or remove it in a dedicated breaking-change PR.

Commenting out the method removes a public API from MorphologyOps, so any existing caller now gets AttributeError. That is a runtime regression, not a docs-only change.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@MetricsReloaded/utility/utils.py` around lines 115 - 128, A public method was removed by commenting it out, causing runtime AttributeError for callers; restore the original implementation of MorphologyOps.border_map2 (the method that computes border using ndimage.shift over self.binary_map and returns border) so the public API remains available, or if this was intentional, revert the comment change and instead open a dedicated breaking-change PR that either (a) removes border_map2 after announcing the breaking change, or (b) adds a compatibility shim named MorphologyOps.border_map2 that raises a clear DeprecationWarning and directs users to the new API; locate the symbol MorphologyOps.border_map2 in MetricsReloaded/utility/utils.py and either re-enable the original code block (west/east/north/south/top/bottom, cumulative, border) or replace it with the deprecation shim and ensure tests/documentation are updated accordingly.

coderabbitai · 2026-04-13T20:17:53Z

+                    print(f, ' not present')
+                    break
            if flag_on:
                list_fin.append(k)
    if len(list_fin) == 0:
        return None
    elif len(list_fin) == 1:
        return list_fin[0]
    else:
-        print("list fin is ",list_fin)
+        #print("list fin is ",list_fin)
        df_fin = list_fin[0]
+        print(len(list_fin))


⚠️ Potential issue | 🟡 Minor

Remove the raw print() calls from merge_list_df.

This helper now writes debug output on normal execution, which will leak into library consumers and test logs. If you still need diagnostics here, use a logger at debug level instead.

✂️ Proposed cleanup

for f in on: if f not in k.columns: flag_on = False - print(f, ' not present') break @@ else: - `#print`("list fin is ",list_fin) df_fin = list_fin[0] - print(len(list_fin)) for k in list_fin[1:]: df_fin = pd.merge(df_fin, k, on=on)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

print(f, ' not present')

break

if flag_on:

list_fin.append(k)

if len(list_fin) == 0:

return None

elif len(list_fin) == 1:

return list_fin[0]

else:

print("list fin is ",list_fin)

#print("list fin is ",list_fin)

df_fin = list_fin[0]

print(len(list_fin))

break

if flag_on:

list_fin.append(k)

if len(list_fin) == 0:

return None

elif len(list_fin) == 1:

return list_fin[0]

else:

df_fin = list_fin[0]

for k in list_fin[1:]:

df_fin = pd.merge(df_fin, k, on=on)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@MetricsReloaded/utility/utils.py` around lines 557 - 568, The function merge_list_df currently emits raw prints (e.g., print(f, ' not present'), print(len(list_fin))) which pollute runtime output; update merge_list_df to remove these print() calls and instead use the module logger: add an import logging and a logger = logging.getLogger(__name__) (or reuse existing), replace the prints with logger.debug(...) containing the same diagnostic text, and remove any stray commented print lines so the function no longer writes to stdout during normal execution while preserving debug visibility via logging.

coderabbitai · 2026-04-13T20:17:53Z

 data_miss['list_values'] = [1]
 data_miss['pred_prob'] = [None,None]
-data_miss['ref_missing'] = [ref3]
+data_miss['ref_missing_pred'] = [ref3]


⚠️ Potential issue | 🟠 Major

Test is currently codifying a regression in missing-case handling.

Line 46 switches to ref_missing_pred, but ProcessEvaluation still checks for ref_missing before calling complete_missing_cases() (in MetricsReloaded/processes/overall_process.py Lines 359-365).
Because of that, Line 76’s new expectation [2,8] likely reflects the broken path (missing case not appended) rather than intended behavior. Please keep this test asserting the missing-case completion result (previously [3,8]) and remove the temporary “to modify” expectation.

[suggested fix in test]

Patch to keep regression coverage strict

- assert_allclose(pe.grouped_lab.shape,[2,8]) # to modify + assert_allclose(pe.grouped_lab.shape,[3,8])

Also applies to: 76-76

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/test_processes/test_overall_process.py` at line 46, The test changed the input key to 'ref_missing_pred' which prevents ProcessEvaluation (in MetricsReloaded/processes/overall_process.py) from detecting ref_missing and calling complete_missing_cases(), causing the assertion to reflect the broken path; modify the test to restore the original key name 'ref_missing' for data_miss (i.e., revert data_miss['ref_missing_pred'] back to data_miss['ref_missing']) and restore the expected result to [3,8] so the test asserts the completed-missing-cases behavior checked by ProcessEvaluation.complete_missing_cases().

coderabbitai

Actionable comments posted: 8

♻️ Duplicate comments (4)

test/test_processes/test_overall_process.py (1)
39-39: ⚠️ Potential issue | 🟠 Major

Test expectation change may codify a regression in missing-case handling.

The fixture key was changed from ref_missing to ref_missing_pred (line 39), and the expected shape changed from [3,8] to [2,8] (line 69). Combined with the constructor gate still checking ref_missing in overall_process.py (line 360), this test now passes because complete_missing_cases() is never called.

The # to modify comment suggests this is a known interim state. Once the key inconsistency in overall_process.py is fixed, restore the expected shape to [3,8].

Also applies to: 69-69
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/test_processes/test_overall_process.py` at line 39, The test switched
the fixture key from "ref_missing" to "ref_missing_pred" and adjusted the
expected output shape to [2,8], which masks a bug: the constructor in
overall_process.py still checks for "ref_missing" so complete_missing_cases()
never runs; update overall_process.py to look for the new key "ref_missing_pred"
(or make both keys acceptable) in the constructor/gating logic that determines
whether to call complete_missing_cases(), then revert the test expectation back
to [3,8] to reflect the correct missing-case handling; ensure references to
"ref_missing" and "ref_missing_pred" and the method complete_missing_cases() are
consistently updated.
MetricsReloaded/processes/overall_process.py (1)
360-361: ⚠️ Potential issue | 🟠 Major

Key name inconsistency between constructor gate and complete_missing_cases().

The constructor at line 360 checks for 'ref_missing' in self.data.keys(), but complete_missing_cases() at line 585 uses self.data['ref_missing_pred']. This mismatch can cause:

The method to be called when data contains ref_missing but then fail with KeyError because ref_missing_pred doesn't exist

The method to be skipped entirely if only ref_missing_pred is provided (not ref_missing)
💡 Proposed fix
         if self.flag_valid:
             self.process_data()
-            if 'ref_missing' in self.data.keys():
+            if 'ref_missing_pred' in self.data.keys():
                 self.complete_missing_cases()
Or unify both keys with a fallback:
     def complete_missing_cases(self):
-        if len(self.data['ref_missing_pred']) == 0:
+        missing_cases = self.data.get('ref_missing_pred', self.data.get('ref_missing', []))
+        if len(missing_cases) == 0:
             return
Also applies to: 585-586
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@MetricsReloaded/processes/overall_process.py` around lines 360 - 361, The
constructor currently checks for 'ref_missing' in self.data but
complete_missing_cases() reads self.data['ref_missing_pred'], causing KeyError
or skipped execution; update the gate and the method to use a single canonical
key (e.g., 'ref_missing_pred') or add a fallback: modify the initial conditional
that decides to call complete_missing_cases() to check for either 'ref_missing'
or 'ref_missing_pred', and inside complete_missing_cases() replace direct
indexing self.data['ref_missing_pred'] with a safe lookup (e.g.,
self.data.get('ref_missing_pred', self.data.get('ref_missing'))) so the method
always finds the data regardless of which key was provided.
MetricsReloaded/metrics/pairwise_measures.py (2)
142-148: ⚠️ Potential issue | 🟠 Major

Guard normalised_expected_cost() against zero/invalid naive baseline before division.

The normalised_expected_cost() method divides by naive_cost at line 148 without checking if it's zero or NaN, which can produce inf or nan for degenerate cost setups. Now that "nec" is added to the measures_dict (line 89), this becomes part of the default metric surface.
💡 Proposed fix
     def normalised_expected_cost(self):
         """
         Calculates the normalised expected cost as the ratio of the expected cost to the naive expected cost.
         """
         naive_cost = self.best_naive_ec()
         ec = self.expected_cost()
+        if np.isnan(naive_cost) or naive_cost == 0:
+            warnings.warn("Naive expected cost is zero/undefined; NEC is not defined")
+            return np.nan
         return ec / naive_cost
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@MetricsReloaded/metrics/pairwise_measures.py` around lines 142 - 148,
normalised_expected_cost currently divides ec by naive_cost without guarding for
zero/invalid baselines; update the normalised_expected_cost method to check the
value returned by best_naive_ec() (and handle None) and guard against zero/NaN
before performing the division (use math.isnan or numpy.isnan as appropriate)
and return a safe sentinel (e.g. float('nan') or raise a clear ValueError)
instead of producing inf/nan; ensure this change keeps the existing call to
expected_cost() and note that "nec" in measures_dict will now behave safely when
the naive baseline is degenerate.
308-308: ⚠️ Potential issue | 🟡 Minor

Complete the empty parameter documentation.

The :param empty: entry is still missing its description, leaving the PR's "all parameters explained" objective incomplete.
💡 Proposed fix
-    :param empty:
+    :param empty: Flag indicating whether empty-reference/prediction handling is enabled for metric computation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@MetricsReloaded/metrics/pairwise_measures.py` at line 308, Complete the
missing :param empty: docstring for the function in
MetricsReloaded/metrics/pairwise_measures.py by describing what the empty
parameter controls (how the function behaves when given empty inputs), enumerate
the accepted values/types for empty and the effect of each option (e.g., return
value, raise error, or fill with a sentinel), and state the default value;
reference the parameter name "empty" in the description so future readers can
locate it easily.

🧹 Nitpick comments (7)

MetricsReloaded/processes/overall_process.py (3)

643-643: Avoid mutable default argument dict_args={}.

Mutable default arguments are shared across all calls and can lead to unexpected behavior if modified.

♻️ Proposed fix

-    def label_aggregation(self, option='average',dict_args={}):
+    def label_aggregation(self, option='average', dict_args=None):
+        if dict_args is None:
+            dict_args = {}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@MetricsReloaded/processes/overall_process.py` at line 643, The function
label_aggregation currently uses a mutable default argument dict_args={} which
can be shared across calls; change the signature of label_aggregation (the
method named label_aggregation) to use dict_args=None and inside the method set
dict_args = {} if dict_args is None, updating any logic that expects dict_args
to exist; ensure any callers that rely on a persistent dict are updated
accordingly so no behavior changes occur.

597-597: Rename unused loop variable f to _f.

The loop variable f representing the filename/case is not used within the loop body. Prefix with underscore to indicate it's intentionally unused.

♻️ Proposed fix

-                for (i,f) in enumerate(self.data['ref_missing_pred']):
+                for (i, _f) in enumerate(self.data['ref_missing_pred']):

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@MetricsReloaded/processes/overall_process.py` at line 597, The loop in
overall_process.py iterates with (i, f) over self.data['ref_missing_pred'] but
never uses f; rename the unused variable to _f to signal it's intentionally
unused (i,_f) in the for loop header (the loop that currently reads "for (i,f)
in enumerate(self.data['ref_missing_pred']):") so linters and readers understand
it is unused.

561-562: Add stacklevel=2 to warnings.warn() call.

Without explicit stacklevel, the warning will point to this line rather than the caller's location, making debugging harder.

♻️ Proposed fix

-            warnings.warn("No need to identify empty reference with image level classification - only suitable for instance segmentation, object detection and image segmentation") 
+            warnings.warn("No need to identify empty reference with image level classification - only suitable for instance segmentation, object detection and image segmentation", stacklevel=2)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@MetricsReloaded/processes/overall_process.py` around lines 561 - 562, Update
the warnings.warn call so the warning originates at the caller by adding
stacklevel=2: in the block that currently calls warnings.warn("No need to
identify empty reference with image level classification - only suitable for
instance segmentation, object detection and image segmentation") (just before
returning list_empty), pass stacklevel=2 as an additional argument to
warnings.warn; keep the message and return behavior unchanged.

test/test_utility/test_utils.py (2)

107-107: Prefix unused unpacked variables with underscore.

Variables list_ind_lab and list_com are unpacked but never used. Prefix with underscore to indicate intentional disuse.

♻️ Proposed fix

-    list_ind_lab, list_volumes, list_com = mo.list_foreground_component()
+    _list_ind_lab, list_volumes, _list_com = mo.list_foreground_component()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/test_utility/test_utils.py` at line 107, The test unpacks three values
from mo.list_foreground_component() but only uses list_volumes; rename the
unused unpacked variables to start with an underscore (e.g., _list_ind_lab and
_list_com) to signal intentional non-use and silence linter warnings; update the
unpacking assignment where list_foreground_component() is called in the test
(replace list_ind_lab, list_volumes, list_com with _list_ind_lab, list_volumes,
_list_com).

82-86: Remove debug print statements from test.

These print statements add noise to test output and were likely added during development/debugging.

♻️ Proposed fix

-    print(df_sensdscnsd)
-    print(val_merge4)
-    print(df3)
-    print(df5)
-    print(df4)
-

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/test_utility/test_utils.py` around lines 82 - 86, Remove the debug print
statements in the test that print df_sensdscnsd, val_merge4, df3, df5, and df4;
locate the prints (print(df_sensdscnsd), print(val_merge4), print(df3),
print(df5), print(df4)) in test_utils.py and delete them so the test output is
clean, leaving any assertions or setup code intact.

test/test_metrics/test_pairwise_measures.py (1)

791-791: Consider fixing the dict_args key instead of commenting out.

The commented line uses dict_args={"fbeta": 2}, but per the implementation in pairwise_measures.py (line 908), the expected key is "beta", not "fbeta". Consider uncommenting with the correct key to maintain test coverage for custom beta values.
♻️ Proposed fix
-    # bpm2 = PM(p_pred, p_ref, dict_args={"fbeta": 2})
+    bpm2 = PM(p_pred, p_ref, dict_args={"beta": 2})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/test_metrics/test_pairwise_measures.py` at line 791, Uncomment the test
line that constructs a PM with custom beta and fix the dict_args key from
"fbeta" to "beta" so it matches the PM implementation; specifically update the
commented call to PM(p_pred, p_ref, dict_args={"beta": 2}) in the test for
pairwise measures to restore coverage of custom-beta behavior.

MetricsReloaded/utility/utils.py (1)

521-533: Remove debug print() calls from combine_df().

Similar to merge_list_df(), these print statements will pollute library output.

♻️ Proposed fix

     if df1 is None or df1.shape[0]==0:
-        print('Nothing in first')
         if df2 is None:
             return None
@@
     else:
-        print("Performing concatenation")
         return pd.concat([df1, df2])

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@MetricsReloaded/utility/utils.py` around lines 521 - 533, The combine_df
function contains debug print() calls ("Nothing in first" and "Performing
concatenation") that should be removed to avoid polluting library output; edit
the combine_df implementation to delete those print() lines (leaving all
conditional logic and returns intact) or replace them with appropriate
logger.debug calls if project logging is required, ensuring behavior of
combine_df and interplay with merge_list_df remains unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@MetricsReloaded/processes/mixed_measures_processes.py`:
- Around line 850-854: Fix the typo in the method docstring that currently reads
"mutilabel"—update it to "multilabel" in the function whose docstring describes
creation of the multilabel results and returns pd_mcc, pd_cal so the summary
reads "Creation of the multilabel results with multilabel counting metrics and
calibration metrics returned as separate dictionaries".
- Around line 551-555: The docstring's return type is incorrect: it currently
lists "dict_det, dict_mt" but the function actually returns two pandas DataFrame
objects; update the Returns section to state that the function returns two
pd.DataFrame (e.g., "Returns: Tuple[pd.DataFrame, pd.DataFrame]" or a brief
description like "Two pandas DataFrames: dict_det and dict_mt") and mention the
variable names dict_det and dict_mt so readers can map names to types; adjust
any type annotations or import hints in the docstring accordingly.
- Around line 736-740: Update the docstring return type: the current return line
"dict_bin, dict_mt" is incorrect—these variables are pandas DataFrame objects,
not dicts; modify the docstring in the function that builds and returns dict_bin
and dict_mt to state it returns two pd.DataFrame objects (e.g., "Returns:
tuple[pd.DataFrame, pd.DataFrame]: dict_bin, dict_mt") and import or reference
pandas as pd in the docstring for clarity.
- Around line 313-318: Update the processing-steps docstring in
MetricsReloaded/processes/mixed_measures_processes.py by correcting the typo
"assigment" to "assignment" in the numbered list (the paragraph describing steps
1–5). Locate the triple-quoted docstring that contains the list including
"assigment" and replace that word so the entry reads "assignment procedure based
on the segmentation images".

In `@test/test_metrics/test_prob_pairwise_measures.py`:
- Line 32: The unpacking from ppm.all_multi_threshold_values() currently binds
unused variables (fp, se, sp, pp); rename those to underscore-prefixed names to
satisfy Ruff RUF059. Update the unpackings (the statement calling
ppm.all_multi_threshold_values()) so only the used variable remains
non-underscored (e.g., keep t) and prefix the other bindings with an underscore
(e.g., _se, _sp, _pp, _fp) at each occurrence (the lines where t, se, sp, pp are
assigned and the other two reported occurrences).
- Around line 79-80: Replace brittle exact-equality float assertions with
tolerant comparisons: instead of `assert value_test == expected_value` and
`assert value_test2 == expected_value`, use numpy.testing.assert_allclose (or
pytest.approx) to compare `value_test` and `value_test2` to `expected_value`
with a reasonable absolute tolerance (e.g., atol=1e-8) to stabilize tests; apply
the same change to the other listed assertions (lines referencing the same
pattern).
- Line 4: Remove the debug-only print() calls in test_prob_pairwise_measures.py
(the three prints mentioned) and remove the debug-only call to
precision_recall_curve (imported as prc) that exists solely to print results; if
that prc() call assigns to variables that are unused after removal, delete those
assignments as well so the test contains only assertions and necessary
computations (refer to the test functions in this module and the prc import to
locate the debug statements).

In `@test/test_utility/test_utils.py`:
- Around line 11-13: Rename the local variable named `map` to `test_map` to
avoid shadowing Python's builtin `map`, and update all references to it
(including in the tests `test_foreground_component` and
`test_list_foreground_component`) to use `test_map` so the tests use the new
variable name consistently.

---

Duplicate comments:
In `@MetricsReloaded/metrics/pairwise_measures.py`:
- Around line 142-148: normalised_expected_cost currently divides ec by
naive_cost without guarding for zero/invalid baselines; update the
normalised_expected_cost method to check the value returned by best_naive_ec()
(and handle None) and guard against zero/NaN before performing the division (use
math.isnan or numpy.isnan as appropriate) and return a safe sentinel (e.g.
float('nan') or raise a clear ValueError) instead of producing inf/nan; ensure
this change keeps the existing call to expected_cost() and note that "nec" in
measures_dict will now behave safely when the naive baseline is degenerate.
- Line 308: Complete the missing :param empty: docstring for the function in
MetricsReloaded/metrics/pairwise_measures.py by describing what the empty
parameter controls (how the function behaves when given empty inputs), enumerate
the accepted values/types for empty and the effect of each option (e.g., return
value, raise error, or fill with a sentinel), and state the default value;
reference the parameter name "empty" in the description so future readers can
locate it easily.

In `@MetricsReloaded/processes/overall_process.py`:
- Around line 360-361: The constructor currently checks for 'ref_missing' in
self.data but complete_missing_cases() reads self.data['ref_missing_pred'],
causing KeyError or skipped execution; update the gate and the method to use a
single canonical key (e.g., 'ref_missing_pred') or add a fallback: modify the
initial conditional that decides to call complete_missing_cases() to check for
either 'ref_missing' or 'ref_missing_pred', and inside complete_missing_cases()
replace direct indexing self.data['ref_missing_pred'] with a safe lookup (e.g.,
self.data.get('ref_missing_pred', self.data.get('ref_missing'))) so the method
always finds the data regardless of which key was provided.

In `@test/test_processes/test_overall_process.py`:
- Line 39: The test switched the fixture key from "ref_missing" to
"ref_missing_pred" and adjusted the expected output shape to [2,8], which masks
a bug: the constructor in overall_process.py still checks for "ref_missing" so
complete_missing_cases() never runs; update overall_process.py to look for the
new key "ref_missing_pred" (or make both keys acceptable) in the
constructor/gating logic that determines whether to call
complete_missing_cases(), then revert the test expectation back to [3,8] to
reflect the correct missing-case handling; ensure references to "ref_missing"
and "ref_missing_pred" and the method complete_missing_cases() are consistently
updated.

---

Nitpick comments:
In `@MetricsReloaded/processes/overall_process.py`:
- Line 643: The function label_aggregation currently uses a mutable default
argument dict_args={} which can be shared across calls; change the signature of
label_aggregation (the method named label_aggregation) to use dict_args=None and
inside the method set dict_args = {} if dict_args is None, updating any logic
that expects dict_args to exist; ensure any callers that rely on a persistent
dict are updated accordingly so no behavior changes occur.
- Line 597: The loop in overall_process.py iterates with (i, f) over
self.data['ref_missing_pred'] but never uses f; rename the unused variable to _f
to signal it's intentionally unused (i,_f) in the for loop header (the loop that
currently reads "for (i,f) in enumerate(self.data['ref_missing_pred']):") so
linters and readers understand it is unused.
- Around line 561-562: Update the warnings.warn call so the warning originates
at the caller by adding stacklevel=2: in the block that currently calls
warnings.warn("No need to identify empty reference with image level
classification - only suitable for instance segmentation, object detection and
image segmentation") (just before returning list_empty), pass stacklevel=2 as an
additional argument to warnings.warn; keep the message and return behavior
unchanged.

In `@MetricsReloaded/utility/utils.py`:
- Around line 521-533: The combine_df function contains debug print() calls
("Nothing in first" and "Performing concatenation") that should be removed to
avoid polluting library output; edit the combine_df implementation to delete
those print() lines (leaving all conditional logic and returns intact) or
replace them with appropriate logger.debug calls if project logging is required,
ensuring behavior of combine_df and interplay with merge_list_df remains
unchanged.

In `@test/test_metrics/test_pairwise_measures.py`:
- Line 791: Uncomment the test line that constructs a PM with custom beta and
fix the dict_args key from "fbeta" to "beta" so it matches the PM
implementation; specifically update the commented call to PM(p_pred, p_ref,
dict_args={"beta": 2}) in the test for pairwise measures to restore coverage of
custom-beta behavior.

In `@test/test_utility/test_utils.py`:
- Line 107: The test unpacks three values from mo.list_foreground_component()
but only uses list_volumes; rename the unused unpacked variables to start with
an underscore (e.g., _list_ind_lab and _list_com) to signal intentional non-use
and silence linter warnings; update the unpacking assignment where
list_foreground_component() is called in the test (replace list_ind_lab,
list_volumes, list_com with _list_ind_lab, list_volumes, _list_com).
- Around line 82-86: Remove the debug print statements in the test that print
df_sensdscnsd, val_merge4, df3, df5, and df4; locate the prints
(print(df_sensdscnsd), print(val_merge4), print(df3), print(df5), print(df4)) in
test_utils.py and delete them so the test output is clean, leaving any
assertions or setup code intact.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dc66db1f-4acd-4eac-887d-077b7aa97055

📥 Commits

Reviewing files that changed from the base of the PR and between 52e57b4 and f5dc43a.

📒 Files selected for processing (14)

MetricsReloaded/metrics/calibration_measures.py
MetricsReloaded/metrics/pairwise_measures.py
MetricsReloaded/metrics/prob_pairwise_measures.py
MetricsReloaded/processes/mixed_measures_processes.py
MetricsReloaded/processes/overall_process.py
MetricsReloaded/utility/assignment_localization.py
MetricsReloaded/utility/utils.py
test/test_metrics/test_calibration_metrics.py
test/test_metrics/test_pairwise_measures.py
test/test_metrics/test_prob_pairwise_measures.py
test/test_processes/test_mixed_measures_processes.py
test/test_processes/test_overall_process.py
test/test_utility/test_assignment_localization.py
test/test_utility/test_utils.py

💤 Files with no reviewable changes (1)

test/test_processes/test_mixed_measures_processes.py

✅ Files skipped from review due to trivial changes (3)

test/test_metrics/test_calibration_metrics.py
test/test_utility/test_assignment_localization.py
MetricsReloaded/utility/assignment_localization.py

🚧 Files skipped from review as they are similar to previous changes (1)

MetricsReloaded/metrics/prob_pairwise_measures.py

coderabbitai · 2026-04-15T11:00:08Z

+
+        1. identification for each predicted case of the items considered as of the class specified by label considered
+        2. identification for each associated case, the items considered as of the class specified by the considered label 
+        3. creation of the associated list of individual images of elements selected beforehand both in the prediction images and the reference images (the images are listed in the same order with one element per image)
+        4. assigment procedure based on the segmentation images
+        5. derivation of metrics either on a case by case basis or grouping all cases together. 


⚠️ Potential issue | 🟡 Minor

Fix typo in processing steps docstring.

Line 317 has a typo (assigment → assignment).

Suggested doc fix

- 4. assigment procedure based on the segmentation images + 4. assignment procedure based on the segmentation images

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

1. identification for each predicted case of the items considered as of the class specified by label considered

2. identification for each associated case, the items considered as of the class specified by the considered label

3. creation of the associated list of individual images of elements selected beforehand both in the prediction images and the reference images (the images are listed in the same order with one element per image)

4. assigment procedure based on the segmentation images

5. derivation of metrics either on a case by case basis or grouping all cases together.

1. identification for each predicted case of the items considered as of the class specified by label considered

2. identification for each associated case, the items considered as of the class specified by the considered label

3. creation of the associated list of individual images of elements selected beforehand both in the prediction images and the reference images (the images are listed in the same order with one element per image)

4. assignment procedure based on the segmentation images

5. derivation of metrics either on a case by case basis or grouping all cases together.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@MetricsReloaded/processes/mixed_measures_processes.py` around lines 313 - 318, Update the processing-steps docstring in MetricsReloaded/processes/mixed_measures_processes.py by correcting the typo "assigment" to "assignment" in the numbered list (the paragraph describing steps 1–5). Locate the triple-quoted docstring that contains the list including "assigment" and replace that word so the entry reads "assignment procedure based on the segmentation images".

coderabbitai · 2026-04-15T11:00:09Z

+        """
+        Process allowing for the creation of dictionaries with the resulting measures for per label detection metrics and multi threshold metrics
+
+        :return: dict_det, dict_mt
+        """


⚠️ Potential issue | 🟡 Minor

Return type in docstring is inaccurate.

Line 554 says dict_det, dict_mt, but this method returns two pd.DataFrame objects.

Suggested doc fix

- :return: dict_det, dict_mt + :return: pd.DataFrame, pd.DataFrame

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@MetricsReloaded/processes/mixed_measures_processes.py` around lines 551 - 555, The docstring's return type is incorrect: it currently lists "dict_det, dict_mt" but the function actually returns two pandas DataFrame objects; update the Returns section to state that the function returns two pd.DataFrame (e.g., "Returns: Tuple[pd.DataFrame, pd.DataFrame]" or a brief description like "Two pandas DataFrames: dict_det and dict_mt") and mention the variable names dict_det and dict_mt so readers can map names to types; adjust any type annotations or import hints in the docstring accordingly.

coderabbitai · 2026-04-15T11:00:09Z

+        """
+        Process allowing for the creation of dictionaries with the resulting measures for per label binary pairwise metrics and multi threshold metrics
+
+        :return: dict_bin, dict_mt
+        """


⚠️ Potential issue | 🟡 Minor

Return type in docstring is inaccurate.

Line 739 says dict_bin, dict_mt, but this method returns two pd.DataFrame objects.

Suggested doc fix

- :return: dict_bin, dict_mt + :return: pd.DataFrame, pd.DataFrame

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"""

Process allowing for the creation of dictionaries with the resulting measures for per label binary pairwise metrics and multi threshold metrics

:return: dict_bin, dict_mt

"""

"""

Process allowing for the creation of dictionaries with the resulting measures for per label binary pairwise metrics and multi threshold metrics

:return: pd.DataFrame, pd.DataFrame

"""

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@MetricsReloaded/processes/mixed_measures_processes.py` around lines 736 - 740, Update the docstring return type: the current return line "dict_bin, dict_mt" is incorrect—these variables are pandas DataFrame objects, not dicts; modify the docstring in the function that builds and returns dict_bin and dict_mt to state it returns two pd.DataFrame objects (e.g., "Returns: tuple[pd.DataFrame, pd.DataFrame]: dict_bin, dict_mt") and import or reference pandas as pd in the docstring for clarity.

coderabbitai · 2026-04-15T11:00:09Z

+        """
+        Creation of the multilabel results with mutilabel counting metrics and calibration metrics returned as separate dictionaries
+
+        :return: pd_mcc, pd_cal
+        """


⚠️ Potential issue | 🟡 Minor

Fix typo in method summary.

Line 851 uses mutilabel; should be multilabel.

Suggested doc fix

- Creation of the multilabel results with mutilabel counting metrics and calibration metrics returned as separate dictionaries + Creation of the multilabel results with multilabel counting metrics and calibration metrics returned as separate dictionaries

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"""

Creation of the multilabel results with mutilabel counting metrics and calibration metrics returned as separate dictionaries

:return: pd_mcc, pd_cal

"""

"""

Creation of the multilabel results with multilabel counting metrics and calibration metrics returned as separate dictionaries

:return: pd_mcc, pd_cal

"""

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@MetricsReloaded/processes/mixed_measures_processes.py` around lines 850 - 854, Fix the typo in the method docstring that currently reads "mutilabel"—update it to "multilabel" in the function whose docstring describes creation of the multilabel results and returns pd_mcc, pd_cal so the summary reads "Creation of the multilabel results with multilabel counting metrics and calibration metrics returned as separate dictionaries".

coderabbitai · 2026-04-15T11:00:09Z

 from numpy.testing import assert_allclose
-from sklearn.metrics import cohen_kappa_score as cks
-from sklearn.metrics import matthews_corrcoef as mcc
 from sklearn.metrics import precision_recall_curve as prc


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Candidate dependency files ==" fd -i 'pyproject.toml|setup.py|setup.cfg|requirements.*|environment.*\.ya?ml' echo echo "== Mentions of sklearn/scikit-learn in dependency files ==" rg -n --iglob 'pyproject.toml' --iglob 'setup.py' --iglob 'setup.cfg' --iglob 'requirements*' --iglob 'environment*.yml' --iglob 'environment*.yaml' 'scikit-learn|sklearn' || true echo echo "== precision_recall_curve + print usage in this test file ==" rg -n -C2 'precision_recall_curve|prc\(|print\(' test/test_metrics/test_prob_pairwise_measures.py

Repository: Project-MONAI/MetricsReloaded

Length of output: 1238

Remove debug-only print statements from this test module.

The three print() calls on lines 122, 140, and 151 serve only for debugging and add noise to CI output. Line 140's prc() call is exclusively used for debug printing. These can be safely removed.

♻️ Proposed cleanup

-from sklearn.metrics import precision_recall_curve as prc @@ - print(value_test) @@ - print("From SK", prc(ref, pred_proba)) @@ - print(dict_ppm.keys())

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/test_metrics/test_prob_pairwise_measures.py` at line 4, Remove the debug-only print() calls in test_prob_pairwise_measures.py (the three prints mentioned) and remove the debug-only call to precision_recall_curve (imported as prc) that exists solely to print results; if that prc() call assigns to variables that are unused after removal, delete those assignments as well so the test contains only assertions and necessary computations (refer to the test functions in this module and the prc import to locate the debug statements).

coderabbitai · 2026-04-15T11:00:09Z

+    list_ppv_exp = [1, 1, 1, 0.71, 0.5]
+    unique_thresh = [1.9, 0.9, 0.65, 0.45, 0.2]
+
+    t, se, sp, pp, fp = ppm.all_multi_threshold_values()


⚠️ Potential issue | 🟡 Minor

Address unused unpacked values flagged by Ruff (RUF059).

Unused unpacked variables (fp, se, sp, pp) should be prefixed with _ to keep lint clean.

🧹 Minimal lint-safe edits

- t, se, sp, pp, fp = ppm.all_multi_threshold_values() + t, se, sp, pp, _fp = ppm.all_multi_threshold_values() @@ - t, se, sp, pp, fp = ppm.all_multi_threshold_values() + t, se, sp, pp, _fp = ppm.all_multi_threshold_values() @@ - t, se, sp, pp, fp = ppm.all_multi_threshold_values(max_number_samples=50, max_number_thresh=10) + t, _se, _sp, _pp, _fp = ppm.all_multi_threshold_values(max_number_samples=50, max_number_thresh=10)

Also applies to: 54-54, 108-108

🧰 Tools

🪛 Ruff (0.15.10)

[warning] 32-32: Unpacked variable fp is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/test_metrics/test_prob_pairwise_measures.py` at line 32, The unpacking from ppm.all_multi_threshold_values() currently binds unused variables (fp, se, sp, pp); rename those to underscore-prefixed names to satisfy Ruff RUF059. Update the unpackings (the statement calling ppm.all_multi_threshold_values()) so only the used variable remains non-underscored (e.g., keep t) and prefix the other bindings with an underscore (e.g., _se, _sp, _pp, _fp) at each occurrence (the lines where t, se, sp, pp are assigned and the other two reported occurrences).

coderabbitai · 2026-04-15T11:00:09Z

+    assert value_test == expected_value
+    assert value_test2 == expected_value


⚠️ Potential issue | 🟡 Minor

Use tolerant assertions for floating-point metric outputs.

Direct == on floats is brittle; these should use assert_allclose(..., atol=...) for stability.

✅ Suggested assertion pattern

- assert value_test == expected_value - assert value_test2 == expected_value + assert_allclose(value_test, expected_value, atol=1e-8) + assert_allclose(value_test2, expected_value, atol=1e-8)

Also applies to: 194-194, 206-206, 218-218, 230-230, 241-241, 253-253

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/test_metrics/test_prob_pairwise_measures.py` around lines 79 - 80, Replace brittle exact-equality float assertions with tolerant comparisons: instead of `assert value_test == expected_value` and `assert value_test2 == expected_value`, use numpy.testing.assert_allclose (or pytest.approx) to compare `value_test` and `value_test2` to `expected_value` with a reasonable absolute tolerance (e.g., atol=1e-8) to stabilize tests; apply the same change to the other listed assertions (lines referencing the same pattern).

coderabbitai · 2026-04-15T11:00:09Z

+map = np.zeros([10,10])
+map[3:5,5:8] = 1
+map[7:9,0:3] = 1


⚠️ Potential issue | 🟡 Minor

Rename map to avoid shadowing Python builtin.

The variable name map shadows the built-in map() function, which could cause confusion and unexpected behavior if map() is used later in the file.

♻️ Proposed fix

-map = np.zeros([10,10]) -map[3:5,5:8] = 1 -map[7:9,0:3] = 1 +test_map = np.zeros([10,10]) +test_map[3:5,5:8] = 1 +test_map[7:9,0:3] = 1

Also update references in test_foreground_component and test_list_foreground_component to use test_map.

🧰 Tools

🪛 Ruff (0.15.10)

[error] 11-11: Variable map is shadowing a Python builtin

(A001)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/test_utility/test_utils.py` around lines 11 - 13, Rename the local variable named `map` to `test_map` to avoid shadowing Python's builtin `map`, and update all references to it (including in the tests `test_foreground_component` and `test_list_foreground_component`) to use `test_map` so the tests use the new variable name consistently.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

MetricsReloaded/utility/assignment_localization.py (1)

623-630: ⚠️ Potential issue | 🟠 Major

Operator precedence bug: condition may not work as intended.

Line 626 has an operator precedence issue. The condition df_ambiguous_ref.shape[0] == 0 and df_ambiguous_pred.shape[0] == 0 binds more tightly than the preceding or, causing unintended logic. The likely intent is to check if either df_ambiguous_ref is None/empty OR df_ambiguous_pred is empty.

🐛 Proposed fix to add parentheses for clarity

         if (
             df_ambiguous_ref is None
             or df_ambiguous_ref.shape[0] == 0
-            and df_ambiguous_pred.shape[0] == 0
+            or df_ambiguous_pred.shape[0] == 0
         ):

Or if the intent is to require both to be empty:

         if (
             df_ambiguous_ref is None
-            or df_ambiguous_ref.shape[0] == 0
-            and df_ambiguous_pred.shape[0] == 0
+            or (df_ambiguous_ref.shape[0] == 0 and df_ambiguous_pred.shape[0] == 0)
         ):

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@MetricsReloaded/utility/assignment_localization.py` around lines 623 - 630,
The conditional has an operator-precedence bug: change the test so the checks
for df_ambiguous_ref being None/empty are grouped before the OR with
df_ambiguous_pred; e.g., replace the current condition with one that explicitly
groups (df_ambiguous_ref is None or df_ambiguous_ref.shape[0] == 0) or
df_ambiguous_pred.shape[0] == 0 so the intent "either df_ambiguous_ref is
missing/empty OR df_ambiguous_pred is empty" is enforced; update the if in
assignment_localization.py (the block that prints "No ambiguity in matching" and
returns df_matching_all) accordingly.

🧹 Nitpick comments (6)

test/test_utility/test_assignment_localization.py (3)

415-415: Remove commented-out dead code.

If m22 is no longer needed for assertions, remove the commented line.

♻️ Suggested cleanup

     m12 = match1[match1["label"] == 1]
     m21 = match2[match2["label"] == 0]
-    # m22 = match2[match2["label"] == 1]
     print(m12)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/test_utility/test_assignment_localization.py` at line 415, Remove the
commented-out dead code line referencing m22 (the commented line "# m22 =
match2[match2[\"label\"] == 1]") from test_assignment_localization.py; locate
the occurrence of m22 in the test function or surrounding block and delete the
commented line so the test file contains no unused commented code.

374-376: Prefix unused variables with underscore.

The list_val, list_val_h, and list_val_p variables are unpacked but never used. Prefix them with _ to indicate they are intentionally unused.

♻️ Suggested fix

-        df_match, list_val = asm.resolve_ambiguities_matching()
-        df_match_h, list_val_h = asm_h.resolve_ambiguities_matching()
-        df_match_p, list_val_p = asm_p.resolve_ambiguities_matching()
+        df_match, _list_val = asm.resolve_ambiguities_matching()
+        df_match_h, _list_val_h = asm_h.resolve_ambiguities_matching()
+        df_match_p, _list_val_p = asm_p.resolve_ambiguities_matching()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/test_utility/test_assignment_localization.py` around lines 374 - 376,
The unpacked return values list_val, list_val_h, and list_val_p from calls to
resolve_ambiguities_matching() are unused; rename them to _list_val,
_list_val_h, and _list_val_p (or simply _ ) where the assignments occur
(df_match, _list_val = asm.resolve_ambiguities_matching(), etc.) to indicate
intentional unused variables and silence linter warnings.

125-134: Remove unused commented-out variables.

The ref_com and ref_coms variables are commented out and not used in the test. Consider removing them entirely.

♻️ Suggested cleanup

 def test_check_localization():
     ref_box = [[2,2,4,4]]
-    # ref_com = [[3,3]]
     pred_box = [[2,2,4,4]]
     pred_com = [[3,3]]
     ref_mask = np.zeros([14,14])
     pred_mask = np.zeros([14,14])
     ref_mask[2:5,2:5]=1
     pred_mask[2:5,2:5]=1
     ref_boxes = np.vstack([ref_box])
     ref_masks = np.asarray([ref_mask])
-    # ref_coms = np.vstack([ref_com])
     pred_coms = np.vstack([pred_com])

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/test_utility/test_assignment_localization.py` around lines 125 - 134,
The test contains unused commented-out variables ref_com and ref_coms; remove
the commented lines referencing ref_com and ref_coms from
test_assignment_localization.py (the lines that define "# ref_com = [[3,3]]" and
"# ref_coms = np.vstack([ref_com])") so the test only keeps the active variables
pred_box, pred_com, ref_mask, pred_mask, ref_boxes, and ref_masks; ensure no
other references to ref_com or ref_coms remain in the surrounding test code.

MetricsReloaded/utility/assignment_localization.py (3)

379-380: Remove commented-out dead code.

These commented lines appear to be remnants of previous logic. If the code is no longer needed, consider removing it to improve readability. If it's temporarily disabled for debugging, add a TODO comment explaining when to restore it.

♻️ Suggested cleanup

         if self.flag_refmod:
             ref_boxes = self.ref_loc_mod
-        # if self.flag_predmod:
-        #     pred_points = self.pred_loc_mod
         matrix_pinb = np.zeros([pred_points.shape[0],ref_boxes.shape[0]])

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@MetricsReloaded/utility/assignment_localization.py` around lines 379 - 380,
Remove the dead commented lines referencing self.flag_predmod and pred_points =
self.pred_loc_mod in assignment_localization.py; either delete them outright to
clean up the codebase or, if they’re intentionally disabled temporarily, replace
the commented block with a short TODO comment explaining when/why to restore it
and referencing the symbols self.flag_predmod and pred_loc_mod so future readers
know the intent.

175-181: Add stacklevel=2 to warnings.warn() calls.

Without stacklevel, the warning message will point to this internal location rather than the user's call site, making debugging harder.

♻️ Suggested fix

-                warnings.warn("No adequate localization strategy chosen - not going ahead")
+                warnings.warn("No adequate localization strategy chosen - not going ahead", stacklevel=2)
         else:
                 print(' not valid localisation ')
                 self.flag_usable = False
                 self.df_matching = None
                 self.valid = None
-                warnings.warn("No adequate localization strategy chosen - not going ahead")
+                warnings.warn("No adequate localization strategy chosen - not going ahead", stacklevel=2)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@MetricsReloaded/utility/assignment_localization.py` around lines 175 - 181,
The two warnings.warn(...) calls in assignment_localization.py (the branch that
sets self.flag_usable = False, self.df_matching = None, self.valid = None and
the earlier else branch) should include stacklevel=2 so the warning points to
the user's call site; update both warnings.warn calls to warnings.warn("No
adequate localization strategy chosen - not going ahead", stacklevel=2). Ensure
you modify both occurrences that correspond to the block manipulating
self.flag_usable, self.df_matching, and self.valid.

396-399: Remove commented-out dead code.

Same as above - if this code is no longer needed, remove it for clarity.

♻️ Suggested cleanup

         ref_masks = self.ref_loc
         pred_points = self.pred_loc
-        # if self.flag_refmod:
-        #     ref_masks = self.ref_loc_mod
-        # if self.flag_predmod:
-        #     pred_points = self.pred_loc_mod
         matrix_pinm = np.zeros([pred_points.shape[0],ref_masks.shape[0]])

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@MetricsReloaded/utility/assignment_localization.py` around lines 396 - 399,
Remove the commented-out dead code relating to conditional assignments for
localization modification flags: delete the commented lines referencing
self.flag_refmod, ref_masks = self.ref_loc_mod, self.flag_predmod, and
pred_points = self.pred_loc_mod in assignment_localization.py; ensure no other
code depends on these commented branches (search for self.flag_refmod,
self.ref_loc_mod, self.flag_predmod, self.pred_loc_mod) and if needed replace
with a clear, minimal comment explaining current behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@MetricsReloaded/utility/assignment_localization.py`:
- Around line 638-642: The DataFrame filter in assignment_localization.py is
wrong due to operator precedence: change the df_tmp assignment so both equality
comparisons are grouped before applying the bitwise AND, e.g. build the boolean
mask as (df_matching["pred"] == list_valid[r]) & (df_matching["ref"] == c) and
use that to index df_matching; update the expression that currently uses
list_valid[r] & (df_matching["ref"] == c) so df_tmp and subsequent list_matching
append use the corrected mask (variables: df_matching, list_valid, r, c, df_tmp,
list_matching).

---

Outside diff comments:
In `@MetricsReloaded/utility/assignment_localization.py`:
- Around line 623-630: The conditional has an operator-precedence bug: change
the test so the checks for df_ambiguous_ref being None/empty are grouped before
the OR with df_ambiguous_pred; e.g., replace the current condition with one that
explicitly groups (df_ambiguous_ref is None or df_ambiguous_ref.shape[0] == 0)
or df_ambiguous_pred.shape[0] == 0 so the intent "either df_ambiguous_ref is
missing/empty OR df_ambiguous_pred is empty" is enforced; update the if in
assignment_localization.py (the block that prints "No ambiguity in matching" and
returns df_matching_all) accordingly.

---

Nitpick comments:
In `@MetricsReloaded/utility/assignment_localization.py`:
- Around line 379-380: Remove the dead commented lines referencing
self.flag_predmod and pred_points = self.pred_loc_mod in
assignment_localization.py; either delete them outright to clean up the codebase
or, if they’re intentionally disabled temporarily, replace the commented block
with a short TODO comment explaining when/why to restore it and referencing the
symbols self.flag_predmod and pred_loc_mod so future readers know the intent.
- Around line 175-181: The two warnings.warn(...) calls in
assignment_localization.py (the branch that sets self.flag_usable = False,
self.df_matching = None, self.valid = None and the earlier else branch) should
include stacklevel=2 so the warning points to the user's call site; update both
warnings.warn calls to warnings.warn("No adequate localization strategy chosen -
not going ahead", stacklevel=2). Ensure you modify both occurrences that
correspond to the block manipulating self.flag_usable, self.df_matching, and
self.valid.
- Around line 396-399: Remove the commented-out dead code relating to
conditional assignments for localization modification flags: delete the
commented lines referencing self.flag_refmod, ref_masks = self.ref_loc_mod,
self.flag_predmod, and pred_points = self.pred_loc_mod in
assignment_localization.py; ensure no other code depends on these commented
branches (search for self.flag_refmod, self.ref_loc_mod, self.flag_predmod,
self.pred_loc_mod) and if needed replace with a clear, minimal comment
explaining current behavior.

In `@test/test_utility/test_assignment_localization.py`:
- Line 415: Remove the commented-out dead code line referencing m22 (the
commented line "# m22 = match2[match2[\"label\"] == 1]") from
test_assignment_localization.py; locate the occurrence of m22 in the test
function or surrounding block and delete the commented line so the test file
contains no unused commented code.
- Around line 374-376: The unpacked return values list_val, list_val_h, and
list_val_p from calls to resolve_ambiguities_matching() are unused; rename them
to _list_val, _list_val_h, and _list_val_p (or simply _ ) where the assignments
occur (df_match, _list_val = asm.resolve_ambiguities_matching(), etc.) to
indicate intentional unused variables and silence linter warnings.
- Around line 125-134: The test contains unused commented-out variables ref_com
and ref_coms; remove the commented lines referencing ref_com and ref_coms from
test_assignment_localization.py (the lines that define "# ref_com = [[3,3]]" and
"# ref_coms = np.vstack([ref_com])") so the test only keeps the active variables
pred_box, pred_com, ref_mask, pred_mask, ref_boxes, and ref_masks; ensure no
other references to ref_com or ref_coms remain in the surrounding test code.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 92a20a0d-c4a9-43d4-9945-1a0dfeab5047

📥 Commits

Reviewing files that changed from the base of the PR and between f5dc43a and 2160489.

📒 Files selected for processing (2)

MetricsReloaded/utility/assignment_localization.py
test/test_utility/test_assignment_localization.py

coderabbitai · 2026-04-15T15:51:59Z

                for (r, c) in zip(row, col):
                    df_tmp = df_matching[
-                        df_matching["seg"] == list_valid[r] & (df_matching["ref"] == c)
+                        df_matching["pred"] == list_valid[r] & (df_matching["ref"] == c)
                    ]
                    list_matching.append(df_tmp)


⚠️ Potential issue | 🔴 Critical

Critical: Operator precedence bug in DataFrame filter.

Line 640 has incorrect operator precedence. The & operator binds more tightly than ==, so list_valid[r] & (df_matching["ref"] == c) is evaluated first, producing incorrect filtering results.

🐛 Proposed fix to add parentheses

df_tmp = df_matching[ - df_matching["pred"] == list_valid[r] & (df_matching["ref"] == c) + (df_matching["pred"] == list_valid[r]) & (df_matching["ref"] == c) ]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for (r, c) in zip(row, col):

df_tmp = df_matching[

df_matching["seg"] == list_valid[r] & (df_matching["ref"] == c)

df_matching["pred"] == list_valid[r] & (df_matching["ref"] == c)

]

list_matching.append(df_tmp)

for (r, c) in zip(row, col):

df_tmp = df_matching[

(df_matching["pred"] == list_valid[r]) & (df_matching["ref"] == c)

]

list_matching.append(df_tmp)

🧰 Tools

🪛 Ruff (0.15.10)

[warning] 638-638: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@MetricsReloaded/utility/assignment_localization.py` around lines 638 - 642, The DataFrame filter in assignment_localization.py is wrong due to operator precedence: change the df_tmp assignment so both equality comparisons are grouped before applying the bitwise AND, e.g. build the boolean mask as (df_matching["pred"] == list_valid[r]) & (df_matching["ref"] == c) and use that to index df_matching; update the expression that currently uses list_valid[r] & (df_matching["ref"] == c) so df_tmp and subsequent list_matching append use the corrected mask (variables: df_matching, list_valid, r, c, df_tmp, list_matching).

Carole Sudre added 3 commits April 8, 2026 10:34

Trying to fix wheel building issue with setuptools

1f1321b

Allowing codecov on docs_tests branch

424e07b

Updated documentation checking for all parameters, class and modules …

b0fa03e

…descriptions

coderabbitai bot reviewed Apr 10, 2026

View reviewed changes

update of actions

2f49a90

coderabbitai bot reviewed Apr 13, 2026

View reviewed changes

Comment thread .github/workflows/python-app.yml Outdated

Carole Sudre added 2 commits April 13, 2026 16:10

Merge branch 'main' of github.com:Project-MONAI/MetricsReloaded into …

332f25d

…docs_tests Making all tests up to date Please enter a commit message to explain why this merge is necessary,

Add complementarity tests for edge cases in utils'

52e57b4

'

coderabbitai bot reviewed Apr 13, 2026

View reviewed changes

Test of edge cases in calibration, pairwise and probabilistic pairwise

f5dc43a

coderabbitai bot reviewed Apr 15, 2026

View reviewed changes

Improving testing of assignment localisation

2160489

coderabbitai bot reviewed Apr 15, 2026

View reviewed changes

Carole Sudre added 2 commits April 16, 2026 09:08

Trying to fix workflow

eab8bca

Update badges sources

ebf9ba6

		cwECE = \\dfrac{1}{K}\sum_{k=1}^{K}\sum_{i=1}^{N}\\dfrac{\\vert B_{i,k} \\vert}{N} \\left(y_{k}(B_{i,k}) - p_{k}(B_{i,k})\\right)

		assert value_test == expected_value
		assert value_test2 == expected_value

Conversation

csudre commented Apr 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

csudre commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 10, 2026 •

edited

Loading