feat: Add resumable vulnerability scanning with probe/attempt granularity by shrikantpachpor · Pull Request #1589 · NVIDIA/garak

shrikantpachpor · 2026-02-02T02:11:16Z

Implements comprehensive resume functionality allowing interrupted garak scans to continue from where they stopped, with fine-grained control over resume granularity at probe or attempt levels.

Closes #141

Overview

Long-running vulnerability scans can be interrupted by network issues, rate limits, or system crashes. This feature enables users to resume interrupted scans without starting from scratch, saving time and computational resources.

Key Features

Resume Service Architecture

Complete resume service (resumeservice.py) with atomic state management
Run state persistence in ~/.garak/runs/<run_id>/
Automatic run ID generation with timestamp tracking
Version compatibility validation
Cross-platform state storage support
Report file continuation (appends to existing reports on resume)

Granularity Control

Two resumption levels with configurable granularity:

probe: Skip entire completed probes (faster, coarser)
attempt: Skip individual completed prompts (precise, fine-grained)

CLI Commands

Five new CLI arguments for resume control:

--resumable [true/false]: Enable/disable resumable scans (default: enabled)
--resume <run_id>: Resume a previous interrupted run
--list_runs: Display all unfinished runs with status
--delete_run <run_id>: Clean up run state
--resume_granularity {probe|attempt}: Set resume precision level

State Management

Atomic file writes prevent state corruption
Configuration snapshots preserve run parameters
Progress tracking at probe and attempt levels
Probe metadata validation for resumability
Run lifecycle management (init, save, load, cleanup)
Report continuation preserves original run context

Testing

19 probe metadata tests (all passed)
Comprehensive integration tests available
100% pass rate on functional tests

Modified Files

Core Files

garak/cli.py: Added 5 CLI arguments with str_to_bool() helper
garak/command.py: Resume service initialization, cleanup, and report continuation
garak/harnesses/probewise.py: Attempt/probe-level filtering
garak/_config.py: Resume configuration fields
garak/probes/base.py: Probe resumability metadata
garak/exception.py: ResumeValidationError exception
garak/harnesses/base.py: Probe-level skip logic
garak/buffs/base.py: Import compatibility fix
README.md: Resume documentation section

New Files

garak/resumeservice.py (1,052 lines): Complete resume service
garak/serializers.py (50 lines): Report normalization for persistence
garak-config.example.yaml (450+ lines): Comprehensive example configuration with resume feature documentation
tests/test_resume_integration.py (403 lines): Integration tests
tests/test_probe_resumability.py (274 lines): Metadata tests

Compatibility

Fully backward compatible (opt-in feature)
No breaking changes to existing functionality
No new dependencies required
Works with all existing probes, detectors, and generators
Resumed reports maintain original run context and append new results

Testing

All tests pass:

19 probe resumability tests: PASSED
70 config tests: PASSED
7 requirements tests: PASSED
Black formatting: PASSED (all modified files)

Benefits

Save time on interrupted long-running scans
Reduce API costs by avoiding redundant prompts
Survive system crashes, network failures, user cancellation
Enable flexible scan scheduling (pause/resume)
Two granularity levels: fast (probe) vs precise (attempt)
Seamless report continuation maintains scan history
Example configuration file helps users get started with resume feature

Tell us what this change does. If you're fixing a bug, please mention
the github issue number.

Please ensure you are submitting from a unique branch in your repository to main upstream.

Verification

List the steps needed to make sure this thing works

Supporting configuration such as generator configuration file

{
    "huggingface": {
        "torch_type": "float32"
    }
}

garak -t <target_type> -n <model_name>
Run the tests and ensure they pass python -m pytest tests/
...
Verify the thing does what it should
Verify the thing does not do what it should not
Document the thing and how it works (Example)

If you are opening a PR for a new plugin that targets a specific piece of hardware or requires a complex or hard-to-find testing environment, we recommend that you send us as much detail as possible.

Specific Hardware Examples:

GPU related
- Specific support required cuda / mps ( Please not cuda via ROCm if related )
- Minium GPU Memory

Complex Software Examples:

Expensive proprietary software
Software with an extensive installation process
Software without an English language UI

github-actions · 2026-02-02T02:11:29Z

DCO Assistant Lite bot All contributors have signed the DCO ✍️ ✅

shrikantpachpor · 2026-02-02T02:26:01Z

I have read the DCO Document and I hereby sign the DCO

shrikantpachpor · 2026-02-02T02:31:25Z

recheck

jmartin-tech

This is a preliminary review and more will be added. The current implementation is a good start and looks to build off of #1531. There are similar concerns mentioned here to comments made on that other implementation.

Further review and thought will be forthcoming.

garak-config.example.yaml

.gitignore

garak/serializers.py

garak/resumeservice.py

jmartin-tech · 2026-02-02T14:16:29Z

garak/command.py

My comment from PR 1531 applies to this file's changes as well.

I still view modification of the input report as something that should be avoided as it introduces significant complexity to manage state and could cause corruption to the original input file if it is reused for output.

Understood. The report modification approach introduces complexity and corruption risks. This needs to be addressed as a part of the broader architectural discussion about how resume should work. I'll address this in a follow-up where we can properly design the file strategy.

jmartin-tech · 2026-02-02T14:50:09Z

garak/cli.py

+            # RESUME SUPPORT: Restore report paths and override probe spec with probes from resumed run
+            from garak import resumeservice
+
+            if resumeservice.enabled():
+                resumed_state = resumeservice.get_state()
+                if resumed_state:
+                    # Restore report directory and prefix from resumed state
+                    if "report_dir" in resumed_state:
+                        _config.reporting.report_dir = resumed_state["report_dir"]
+                        logging.info(
+                            f"Restored report_dir from state: {resumed_state['report_dir']}"
+                        )
+                    if "report_prefix" in resumed_state:
+                        _config.reporting.report_prefix = resumed_state["report_prefix"]
+                        logging.info(
+                            f"Restored report_prefix from state: {resumed_state['report_prefix']}"
+                        )
+
+                    # Use the original run_id to maintain report filename consistency
+                    if "run_id" in resumed_state:
+                        # Extract UUID from full run_id format "garak-run-<uuid>-<timestamp>"
+                        full_run_id = resumed_state["run_id"]
+                        original_run_id = resumeservice.extract_uuid_from_run_id(
+                            full_run_id
+                        )
+                        _config.transient.run_id = original_run_id
+                        logging.info(f"Restored run_id from state: {original_run_id}")
+
+                    # Override probe spec with probes from resumed run
+                    if "probenames" in resumed_state:
+                        resumed_probes = resumed_state["probenames"]
+                        # Strip "probes." prefix if present for parse_plugin_spec compatibility
+                        resumed_probes_clean = [
+                            p.replace("probes.", "") for p in resumed_probes
+                        ]
+                        # Convert probe list to comma-separated spec
+                        _config.plugins.probe_spec = ",".join(resumed_probes_clean)
+                        logging.info(
+                            f"Resuming run with probes from state: {resumed_probes}"
+                        )
+                        print(
+                            f"🔄 Using probes from resumed run: {', '.join(resumed_probes_clean)}"
+                        )
+


The logic here does not belong in cli. This package is intended parse and setup runtime _config state to allow launch of a run. The functionality here to tightly coupled and likely should be deferred to a separate stage in the run.

Agreed, CLI should only parse arguments and setup config. The state restoration and probe spec override logic is too tightly coupled here and should happen at a different stage. This will be addressed as part of moving resume logic to the proper place in the run lifecycle.

jmartin-tech · 2026-02-02T14:54:19Z

garak/cli.py

+        # Handle resume-related commands
+        if hasattr(args, "list_runs") and args.list_runs:
+            from garak import resumeservice
+            from datetime import datetime
+
+            runs = resumeservice.list_runs()
+            if not runs:
+                print("\n📋 No unfinished runs found.")
+                print(
+                    "\nStart a new resumable scan with: garak --resumable [options]\n"
+                )
+            else:
+                print("\n📋 Resumable Runs\n")
+
+                # Print header
+                print(
+                    f"{'#':<4} {'Run ID':<38} {'Started':<20} {'Progress':<12} {'%':<6}"
+                )
+                print("-" * 82)
+
+                for idx, run in enumerate(runs, 1):
+                    # Calculate percentage
+                    percentage = (
+                        (run["progress"] / run["total"] * 100)
+                        if run["total"] > 0
+                        else 0
+                    )
+
+                    # Format the timestamp more readably
+                    try:
+                        dt = datetime.fromisoformat(run["start_time"])
+                        formatted_time = dt.strftime("%Y-%m-%d %H:%M")
+                    except:
+                        formatted_time = run["start_time"][:16]
+
+                    # Progress format
+                    progress_str = f"{run['progress']}/{run['total']}"
+
+                    print(
+                        f"{idx:<4} {run['run_id']:<38} {formatted_time:<20} {progress_str:<12} {percentage:>5.1f}%"
+                    )
+
+                print("-" * 82)
+                print(f"\nTotal: {len(runs)} unfinished run(s)")
+                print("\nTo resume: garak --resume <run_id>")
+                print("To delete: garak --delete_run <run_id>\n")
+            return
+
+        if hasattr(args, "delete_run") and args.delete_run:
+            from garak import resumeservice
+
+            try:
+                resumeservice.delete_run(args.delete_run)
+                print(f"✅ Deleted run: {args.delete_run}")
+            except Exception as e:
+                print(f"❌ Failed to delete run: {e}")
+                import sys
+
+                sys.exit(1)
+            return
+


Parsing of resume related options should setup the global configuration for later actions.

Services are launched at harness initialization and should not be accessed directly by cli. The idea of managing existing runs on disk is novel and interesting, however it seems to be a unique feature that should should be independent of a resume specific service. I had noted in PR #1531 that it might make sense to have a runservice that manages run state and also provides support for a resume feature.

You're right on multiple points. Services should be launched at harness initialization, not accessed directly by CLI. The run management feature being independent from resume-specific service makes sense - your suggestion about having a "runservice" that manages run state and also provides resume support seems like a better separation of concerns. This needs architectural rethinking about where run management lives and how CLI-only commands should work.

jmartin-tech · 2026-02-02T15:08:36Z

garak/harnesses/probewise.py

+            class MinimalAttempt:
+                def __init__(self, data):
+                    self.uuid = data.get("uuid", "")
+                    self.seq = data.get("seq", 0)
+                    self.status = 1  # Was executed
+                    self.probe_classname = data.get("probe_classname", "")
+                    self.probe_params = data.get("probe_params", {})
+                    self.targets = data.get("targets", [])
+                    self.notes = data.get("notes", {})
+                    self.goal = data.get("goal", "")
+                    self.detector_results = data.get("detector_results", {})
+                    self.conversations = data.get("conversations", [])
+                    self.reverse_translation_outputs = data.get(
+                        "reverse_translation_outputs", []
+                    )
+                    # Store prompt and outputs as-is (already serialized)
+                    self._prompt_data = data.get("prompt", {})
+                    self._outputs_data = data.get("outputs", [])
+                    # Cache outputs as Message objects for evaluator
+                    self._outputs_cache = None
+                    self._prompt_cache = None

+                @property
+                def prompt(self):
+                    """Return prompt as Conversation object (for evaluator)"""
+                    if self._prompt_cache is None:
+                        from garak.attempt import Conversation
+
+                        # Reconstruct Conversation from the stored dict
+                        if isinstance(self._prompt_data, dict):
+                            self._prompt_cache = Conversation.from_dict(
+                                self._prompt_data
+                            )
+                        else:
+                            # Fallback to empty conversation
+                            self._prompt_cache = Conversation([])
+                    return self._prompt_cache
+
+                @property
+                def outputs(self):
+                    """Return output messages as a list (for evaluator)"""
+                    if self._outputs_cache is None:
+                        from garak.attempt import Message
+
+                        messages = []
+                        if isinstance(self._outputs_data, list):
+                            for output in self._outputs_data:
+                                if isinstance(output, dict):
+                                    msg = Message(
+                                        text=output.get("text", ""),
+                                        lang=output.get("lang", "en"),
+                                        data_path=output.get("data_path"),
+                                        data_type=output.get("data_type"),
+                                        data_checksum=output.get("data_checksum"),
+                                        notes=output.get("notes", {}),
+                                    )
+                                    messages.append(msg)
+                                elif isinstance(output, str):
+                                    messages.append(Message(text=output, lang="en"))
+                        self._outputs_cache = messages
+                    return self._outputs_cache
+
+                def as_dict(self):
+                    """Return dictionary representation for report writing"""
+                    return {
+                        "entry_type": "attempt",
+                        "uuid": self.uuid,
+                        "seq": self.seq,
+                        "status": self.status,
+                        "probe_classname": self.probe_classname,
+                        "probe_params": self.probe_params,
+                        "targets": self.targets,
+                        "prompt": self._prompt_data,
+                        "outputs": self._outputs_data,
+                        "detector_results": self.detector_results,
+                        "notes": self.notes,
+                        "goal": self.goal,
+                        "conversations": self.conversations,
+                        "reverse_translation_outputs": self.reverse_translation_outputs,
+                    }
+
+                def outputs_for(self, lang_spec):
+                    """Return output messages for detector evaluation"""
+                    # Reconstruct Message objects from outputs_data
+                    from garak.attempt import Message
+
+                    messages = []
+                    if isinstance(self._outputs_data, list):
+                        for output in self._outputs_data:
+                            if isinstance(output, dict):
+                                msg = Message(
+                                    text=output.get("text", ""),
+                                    lang=output.get("lang", lang_spec),
+                                    data_path=output.get("data_path"),
+                                    data_type=output.get("data_type"),
+                                    data_checksum=output.get("data_checksum"),
+                                    notes=output.get("notes", {}),
+                                )
+                                messages.append(msg)
+                            elif isinstance(output, str):
+                                messages.append(Message(text=output, lang=lang_spec))
+                    return messages
+
+            attempt = MinimalAttempt(entry)
+            incomplete_attempts.append(attempt)


The need to introduce this class suggest a level of complexity here that likely needs to be reworked.

Agreed, the need for this class indicates a design problem. I created it to bridge the gap between loading previous attempts from the report and what the evaluator expects, but this complexity suggests the approach needs reworking. This ties into the broader attempt-level resumption design issues.

jmartin-tech · 2026-02-02T15:11:32Z

garak/harnesses/probewise.py

Most of the changes in this class are tightly coupled to resume specifically and should not be the responsibility of the harness. The state machine for executing a run can be organized in more cleanly to enable harnesses to request additional data from the service that is aware of the resume state vs having to build and manage that state.

Right. The harness is managing resume state when it should be requesting data from a service that's aware of the resume context. The state machine for run execution needs cleaner organization - harness should ask the service what work needs to be done rather than building and managing that state itself. This is a fundamental separation of concerns issue.

jmartin-tech · 2026-02-02T15:13:00Z

garak/harnesses/probewise.py

+class AttemptMock:
+    def __init__(self, outputs, probename, prompt=None, seq=0):
+        self.all_outputs = [
+            OutputMock(output)
+            for output in (outputs if isinstance(outputs, list) else [outputs])
+        ]
+        self.probename = probename
+        self.probe_classname = ".".join(
+            probename.split(".")[1:]
+        )  # Changed to "category.Class" format, e.g., "xss.ColabAIDataLeakage"
+        self.prompt = (
+            prompt
+            if prompt
+            else {
+                "turns": [
+                    {
+                        "role": "user",
+                        "content": {"text": prompt or "", "lang": "en", "notes": {}},
+                    }
+                ]
+            }
        )
-        if detector:
-            return detector
-        else:
-            print(f" detector load failed: {detector_name}, skipping >>")
-            logging.error(f" detector load failed: {detector_name}, skipping >>")
-        return False
+        self.status = "success"
+        self.detector_results = {}
+        self.notes = {"terms": ["summary", "conversation"]}
+        self.outputs = [output.text for output in self.all_outputs]
+        self.uuid = str(uuid.uuid4())
+        self.seq = seq
+        self.probe_params = {}
+        self.targets = []
+        self.conversations = [
+            {
+                "turns": [
+                    {
+                        "role": "user",
+                        "content": {"text": prompt or "", "lang": "en", "notes": {}},
+                    },
+                    {
+                        "role": "assistant",
+                        "content": {"text": output.text, "lang": "en", "notes": {}},
+                    },
+                ]
+            }
+            for output in self.all_outputs
+        ]
+        self.reverse_translation_outputs = []


I suspect this is some sort of testing method, if so it should not be in the package class.

You're correct, this is testing/debug code that shouldn't be in the package. I'll remove it.

…rity Implements comprehensive resume functionality allowing interrupted garak scans to continue from where they stopped, with fine-grained control over resume granularity at probe or attempt levels. Closes NVIDIA#141 ## Overview Long-running vulnerability scans can be interrupted by network issues, rate limits, or system crashes. This feature enables users to resume interrupted scans without starting from scratch, saving time and computational resources. ## Key Features ### Resume Service Architecture - Complete resume service (resumeservice.py) with atomic state management - Run state persistence in ~/.garak/runs/<run_id>/ - Automatic run ID generation with timestamp tracking - Version compatibility validation - Cross-platform state storage support - Report file continuation (appends to existing reports on resume) ### Granularity Control Two resumption levels with configurable granularity: - **probe**: Skip entire completed probes (faster, coarser) - **attempt**: Skip individual completed prompts (precise, fine-grained) ### CLI Commands Five new CLI arguments for resume control: - --resumable [true/false]: Enable/disable resumable scans (default: enabled) - --resume <run_id>: Resume a previous interrupted run - --list_runs: Display all unfinished runs with status - --delete_run <run_id>: Clean up run state - --resume_granularity {probe|attempt}: Set resume precision level ### State Management - Atomic file writes prevent state corruption - Configuration snapshots preserve run parameters - Progress tracking at probe and attempt levels - Probe metadata validation for resumability - Run lifecycle management (init, save, load, cleanup) - Report continuation preserves original run context ### Testing - 19 probe metadata tests (all passed) - Comprehensive integration tests available - 100% pass rate on functional tests ## Modified Files ### Core Files - garak/cli.py: Added 5 CLI arguments with str_to_bool() helper - garak/command.py: Resume service initialization, cleanup, and report continuation - garak/harnesses/probewise.py: Attempt/probe-level filtering - garak/_config.py: Resume configuration fields - garak/probes/base.py: Probe resumability metadata - garak/exception.py: ResumeValidationError exception - garak/harnesses/base.py: Probe-level skip logic - garak/buffs/base.py: Import compatibility fix - README.md: Resume documentation section - .gitignore: Added .garak/runs/ for resume state storage ## New Files - garak/resumeservice.py (1,047 lines): Complete resume service - tests/test_resume_integration.py (403 lines): Integration tests - tests/test_probe_resumability.py (274 lines): Metadata tests ## Compatibility - Fully backward compatible (opt-in feature) - No breaking changes to existing functionality - No new dependencies required - Works with all existing probes, detectors, and generators - Resumed reports maintain original run context and append new results ## Testing All tests pass: - 19 probe resumability tests: PASSED - 70 config tests: PASSED - 7 requirements tests: PASSED - Black formatting: PASSED (all modified files) ## Benefits - Save time on interrupted long-running scans - Reduce API costs by avoiding redundant prompts - Survive system crashes, network failures, user cancellation - Enable flexible scan scheduling (pause/resume) - Two granularity levels: fast (probe) vs precise (attempt) - Seamless report continuation maintains scan history Signed-off-by: Shrikant Pachpor <shri.pachpor24@gmail.com>

Addresses maintainer feedback on storage location (concern NVIDIA#6). Changes: - Use _config.transient.data_dir instead of hardcoded ~/.garak/runs/ - Respects XDG Base Directory Specification - Honors user config and CLI options - Updated documentation to reflect XDG paths State now stored in: - Linux: ~/.local/share/garak/runs/ - macOS: ~/Library/Application Support/garak/runs/ - Windows: %LOCALAPPDATA%\garak\runs" Signed-off-by: Shrikant Pachpor <shri.pachpor24@gmail.com>

…A#10) - Simplified run_id format to just UUID (removed timestamp) - Moved state files from report_dir to cache_dir - Removed extract_uuid_from_run_id function - Updated documentation to reflect cache directory usage Fixes maintainer concerns about run ID format changes and state file location. State files now properly stored in cache directory as per project conventions. Signed-off-by: Shrikant Pachpor <shri.pachpor24@gmail.com>

shrikantpachpor · 2026-02-03T00:14:13Z

@jmartin-tech I've addressed the straightforward fixes in commits 2949288 and 07556ad. However, your review has identified fundamental architectural issues that I think warrant discussion before implementation.

Core Issues Identified:

The feedback points to a consistent pattern - resume logic is scattered across CLI, harness, and service instead of being properly encapsulated:

CLI doing business logic: Report setup, state restoration, and direct service access all violate the principle that CLI should only parse arguments and setup config.

Harness managing state: The probewise.py changes are building and managing resume state when the harness should be requesting data from a service that's aware of the resume context.

Attempt-level design: The seq-based matching doesn't work due to randomization, external state storage adds unnecessary complexity when Attempts already have state indicators, and the MinimalAttempt class suggests the approach needs rethinking. Your reference to PR #1531 indicates the correct approach requires prompt comparison and seed validation.

Report modification strategy: The complexity and corruption risks around appending to input reports.

State management scope: Questions about where storage belongs and what the service should be responsible for.

Service naming: Your suggestion about "runservice" managing run state and providing resume support makes sense as a cleaner separation.

Path Forward:

These issues are interconnected and need holistic design rather than piecemeal fixes. I see two approaches:

Redesign in this PR: Address all architectural concerns here, but this is 2-4 weeks of work with significant design discussion needed.

Two-PR strategy: The quick fixes are done. Create a new PR focused entirely on the architectural redesign with proper service/harness/CLI separation, potentially simplified scope initially (probe-level only).

I'm leaning toward option 2 to keep the PRs focused and allow proper design discussion, but I'm open to your preference.

Questions for design:

Service architecture: Should the service provide get_probes_to_run() and get_attempts_to_run() APIs that harnesses query, rather than harnesses managing state?

CLI commands: For --list_runs and --delete_run that don't launch a full run, how should they access run management if services are only launched at harness initialization?

Attempt resumption: Should we simplify to probe-level only initially, or implement the full prompt-based comparison approach from PR #1531?

Report strategy: New output file per resume (read-only input) vs. appending to existing?

github-actions bot added a commit that referenced this pull request Feb 2, 2026

@shrikantpachpor has signed the CLA in #1589

91f5477

jmartin-tech requested changes Feb 2, 2026

View reviewed changes

shrikantpachpor force-pushed the feature/resumable-scanning branch from 1100bbe to 47c10e0 Compare February 2, 2026 20:28

shrikantpachpor force-pushed the feature/resumable-scanning branch from 47c10e0 to 2949288 Compare February 2, 2026 20:41

shrikantpachpor added 2 commits February 3, 2026 02:48

Conversation

shrikantpachpor commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Key Features

Resume Service Architecture

Granularity Control

CLI Commands

State Management

Testing

Modified Files

Core Files

New Files

Compatibility

Testing

Benefits

Verification

Uh oh!

github-actions bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shrikantpachpor commented Feb 2, 2026

Uh oh!

shrikantpachpor commented Feb 2, 2026

Uh oh!

jmartin-tech left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shrikantpachpor commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

shrikantpachpor commented Feb 2, 2026 •

edited

Loading

github-actions bot commented Feb 2, 2026 •

edited

Loading