Skip to content

feat: Add resumable vulnerability scanning with probe/attempt granularity#1589

Open
shrikantpachpor wants to merge 3 commits intoNVIDIA:mainfrom
shrikantpachpor:feature/resumable-scanning
Open

feat: Add resumable vulnerability scanning with probe/attempt granularity#1589
shrikantpachpor wants to merge 3 commits intoNVIDIA:mainfrom
shrikantpachpor:feature/resumable-scanning

Conversation

@shrikantpachpor
Copy link

@shrikantpachpor shrikantpachpor commented Feb 2, 2026

Implements comprehensive resume functionality allowing interrupted garak scans to continue from where they stopped, with fine-grained control over resume granularity at probe or attempt levels.

Closes #141

Overview

Long-running vulnerability scans can be interrupted by network issues, rate limits, or system crashes. This feature enables users to resume interrupted scans without starting from scratch, saving time and computational resources.

Key Features

Resume Service Architecture

  • Complete resume service (resumeservice.py) with atomic state management
  • Run state persistence in ~/.garak/runs/<run_id>/
  • Automatic run ID generation with timestamp tracking
  • Version compatibility validation
  • Cross-platform state storage support
  • Report file continuation (appends to existing reports on resume)

Granularity Control

Two resumption levels with configurable granularity:

  • probe: Skip entire completed probes (faster, coarser)
  • attempt: Skip individual completed prompts (precise, fine-grained)

CLI Commands

Five new CLI arguments for resume control:

  • --resumable [true/false]: Enable/disable resumable scans (default: enabled)
  • --resume <run_id>: Resume a previous interrupted run
  • --list_runs: Display all unfinished runs with status
  • --delete_run <run_id>: Clean up run state
  • --resume_granularity {probe|attempt}: Set resume precision level

State Management

  • Atomic file writes prevent state corruption
  • Configuration snapshots preserve run parameters
  • Progress tracking at probe and attempt levels
  • Probe metadata validation for resumability
  • Run lifecycle management (init, save, load, cleanup)
  • Report continuation preserves original run context

Testing

  • 19 probe metadata tests (all passed)
  • Comprehensive integration tests available
  • 100% pass rate on functional tests

Modified Files

Core Files

  • garak/cli.py: Added 5 CLI arguments with str_to_bool() helper
  • garak/command.py: Resume service initialization, cleanup, and report continuation
  • garak/harnesses/probewise.py: Attempt/probe-level filtering
  • garak/_config.py: Resume configuration fields
  • garak/probes/base.py: Probe resumability metadata
  • garak/exception.py: ResumeValidationError exception
  • garak/harnesses/base.py: Probe-level skip logic
  • garak/buffs/base.py: Import compatibility fix
  • README.md: Resume documentation section

New Files

  • garak/resumeservice.py (1,052 lines): Complete resume service
  • garak/serializers.py (50 lines): Report normalization for persistence
  • garak-config.example.yaml (450+ lines): Comprehensive example configuration with resume feature documentation
  • tests/test_resume_integration.py (403 lines): Integration tests
  • tests/test_probe_resumability.py (274 lines): Metadata tests

Compatibility

  • Fully backward compatible (opt-in feature)
  • No breaking changes to existing functionality
  • No new dependencies required
  • Works with all existing probes, detectors, and generators
  • Resumed reports maintain original run context and append new results

Testing

All tests pass:

  • 19 probe resumability tests: PASSED
  • 70 config tests: PASSED
  • 7 requirements tests: PASSED
  • Black formatting: PASSED (all modified files)

Benefits

  • Save time on interrupted long-running scans
  • Reduce API costs by avoiding redundant prompts
  • Survive system crashes, network failures, user cancellation
  • Enable flexible scan scheduling (pause/resume)
  • Two granularity levels: fast (probe) vs precise (attempt)
  • Seamless report continuation maintains scan history
  • Example configuration file helps users get started with resume feature

Tell us what this change does. If you're fixing a bug, please mention
the github issue number.

Please ensure you are submitting from a unique branch in your repository to main upstream.

Verification

List the steps needed to make sure this thing works

  • Supporting configuration such as generator configuration file
{
    "huggingface": {
        "torch_type": "float32"
    }
}
  • garak -t <target_type> -n <model_name>
  • Run the tests and ensure they pass python -m pytest tests/
  • ...
  • Verify the thing does what it should
  • Verify the thing does not do what it should not
  • Document the thing and how it works (Example)

If you are opening a PR for a new plugin that targets a specific piece of hardware or requires a complex or hard-to-find testing environment, we recommend that you send us as much detail as possible.

Specific Hardware Examples:

  • GPU related
    • Specific support required cuda / mps ( Please not cuda via ROCm if related )
    • Minium GPU Memory

Complex Software Examples:

  • Expensive proprietary software
  • Software with an extensive installation process
  • Software without an English language UI

@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2026

DCO Assistant Lite bot All contributors have signed the DCO ✍️ ✅

@shrikantpachpor
Copy link
Author

I have read the DCO Document and I hereby sign the DCO

@shrikantpachpor
Copy link
Author

recheck

github-actions bot added a commit that referenced this pull request Feb 2, 2026
Copy link
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a preliminary review and more will be added. The current implementation is a good start and looks to build off of #1531. There are similar concerns mentioned here to comments made on that other implementation.

Further review and thought will be forthcoming.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment from PR 1531 applies to this file's changes as well.

I still view modification of the input report as something that should be avoided as it introduces significant complexity to manage state and could cause corruption to the original input file if it is reused for output.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. The report modification approach introduces complexity and corruption risks. This needs to be addressed as a part of the broader architectural discussion about how resume should work. I'll address this in a follow-up where we can properly design the file strategy.

Comment on lines 728 to 771
# RESUME SUPPORT: Restore report paths and override probe spec with probes from resumed run
from garak import resumeservice

if resumeservice.enabled():
resumed_state = resumeservice.get_state()
if resumed_state:
# Restore report directory and prefix from resumed state
if "report_dir" in resumed_state:
_config.reporting.report_dir = resumed_state["report_dir"]
logging.info(
f"Restored report_dir from state: {resumed_state['report_dir']}"
)
if "report_prefix" in resumed_state:
_config.reporting.report_prefix = resumed_state["report_prefix"]
logging.info(
f"Restored report_prefix from state: {resumed_state['report_prefix']}"
)

# Use the original run_id to maintain report filename consistency
if "run_id" in resumed_state:
# Extract UUID from full run_id format "garak-run-<uuid>-<timestamp>"
full_run_id = resumed_state["run_id"]
original_run_id = resumeservice.extract_uuid_from_run_id(
full_run_id
)
_config.transient.run_id = original_run_id
logging.info(f"Restored run_id from state: {original_run_id}")

# Override probe spec with probes from resumed run
if "probenames" in resumed_state:
resumed_probes = resumed_state["probenames"]
# Strip "probes." prefix if present for parse_plugin_spec compatibility
resumed_probes_clean = [
p.replace("probes.", "") for p in resumed_probes
]
# Convert probe list to comma-separated spec
_config.plugins.probe_spec = ",".join(resumed_probes_clean)
logging.info(
f"Resuming run with probes from state: {resumed_probes}"
)
print(
f"🔄 Using probes from resumed run: {', '.join(resumed_probes_clean)}"
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here does not belong in cli. This package is intended parse and setup runtime _config state to allow launch of a run. The functionality here to tightly coupled and likely should be deferred to a separate stage in the run.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, CLI should only parse arguments and setup config. The state restoration and probe spec override logic is too tightly coupled here and should happen at a different stage. This will be addressed as part of moving resume logic to the proper place in the run lifecycle.

Comment on lines +545 to +605
# Handle resume-related commands
if hasattr(args, "list_runs") and args.list_runs:
from garak import resumeservice
from datetime import datetime

runs = resumeservice.list_runs()
if not runs:
print("\n📋 No unfinished runs found.")
print(
"\nStart a new resumable scan with: garak --resumable [options]\n"
)
else:
print("\n📋 Resumable Runs\n")

# Print header
print(
f"{'#':<4} {'Run ID':<38} {'Started':<20} {'Progress':<12} {'%':<6}"
)
print("-" * 82)

for idx, run in enumerate(runs, 1):
# Calculate percentage
percentage = (
(run["progress"] / run["total"] * 100)
if run["total"] > 0
else 0
)

# Format the timestamp more readably
try:
dt = datetime.fromisoformat(run["start_time"])
formatted_time = dt.strftime("%Y-%m-%d %H:%M")
except:
formatted_time = run["start_time"][:16]

# Progress format
progress_str = f"{run['progress']}/{run['total']}"

print(
f"{idx:<4} {run['run_id']:<38} {formatted_time:<20} {progress_str:<12} {percentage:>5.1f}%"
)

print("-" * 82)
print(f"\nTotal: {len(runs)} unfinished run(s)")
print("\nTo resume: garak --resume <run_id>")
print("To delete: garak --delete_run <run_id>\n")
return

if hasattr(args, "delete_run") and args.delete_run:
from garak import resumeservice

try:
resumeservice.delete_run(args.delete_run)
print(f"✅ Deleted run: {args.delete_run}")
except Exception as e:
print(f"❌ Failed to delete run: {e}")
import sys

sys.exit(1)
return

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing of resume related options should setup the global configuration for later actions.

Services are launched at harness initialization and should not be accessed directly by cli. The idea of managing existing runs on disk is novel and interesting, however it seems to be a unique feature that should should be independent of a resume specific service. I had noted in PR #1531 that it might make sense to have a runservice that manages run state and also provides support for a resume feature.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right on multiple points. Services should be launched at harness initialization, not accessed directly by CLI. The run management feature being independent from resume-specific service makes sense - your suggestion about having a "runservice" that manages run state and also provides resume support seems like a better separation of concerns. This needs architectural rethinking about where run management lives and how CLI-only commands should work.

Comment on lines 151 to 255
class MinimalAttempt:
def __init__(self, data):
self.uuid = data.get("uuid", "")
self.seq = data.get("seq", 0)
self.status = 1 # Was executed
self.probe_classname = data.get("probe_classname", "")
self.probe_params = data.get("probe_params", {})
self.targets = data.get("targets", [])
self.notes = data.get("notes", {})
self.goal = data.get("goal", "")
self.detector_results = data.get("detector_results", {})
self.conversations = data.get("conversations", [])
self.reverse_translation_outputs = data.get(
"reverse_translation_outputs", []
)
# Store prompt and outputs as-is (already serialized)
self._prompt_data = data.get("prompt", {})
self._outputs_data = data.get("outputs", [])
# Cache outputs as Message objects for evaluator
self._outputs_cache = None
self._prompt_cache = None

@property
def prompt(self):
"""Return prompt as Conversation object (for evaluator)"""
if self._prompt_cache is None:
from garak.attempt import Conversation

# Reconstruct Conversation from the stored dict
if isinstance(self._prompt_data, dict):
self._prompt_cache = Conversation.from_dict(
self._prompt_data
)
else:
# Fallback to empty conversation
self._prompt_cache = Conversation([])
return self._prompt_cache

@property
def outputs(self):
"""Return output messages as a list (for evaluator)"""
if self._outputs_cache is None:
from garak.attempt import Message

messages = []
if isinstance(self._outputs_data, list):
for output in self._outputs_data:
if isinstance(output, dict):
msg = Message(
text=output.get("text", ""),
lang=output.get("lang", "en"),
data_path=output.get("data_path"),
data_type=output.get("data_type"),
data_checksum=output.get("data_checksum"),
notes=output.get("notes", {}),
)
messages.append(msg)
elif isinstance(output, str):
messages.append(Message(text=output, lang="en"))
self._outputs_cache = messages
return self._outputs_cache

def as_dict(self):
"""Return dictionary representation for report writing"""
return {
"entry_type": "attempt",
"uuid": self.uuid,
"seq": self.seq,
"status": self.status,
"probe_classname": self.probe_classname,
"probe_params": self.probe_params,
"targets": self.targets,
"prompt": self._prompt_data,
"outputs": self._outputs_data,
"detector_results": self.detector_results,
"notes": self.notes,
"goal": self.goal,
"conversations": self.conversations,
"reverse_translation_outputs": self.reverse_translation_outputs,
}

def outputs_for(self, lang_spec):
"""Return output messages for detector evaluation"""
# Reconstruct Message objects from outputs_data
from garak.attempt import Message

messages = []
if isinstance(self._outputs_data, list):
for output in self._outputs_data:
if isinstance(output, dict):
msg = Message(
text=output.get("text", ""),
lang=output.get("lang", lang_spec),
data_path=output.get("data_path"),
data_type=output.get("data_type"),
data_checksum=output.get("data_checksum"),
notes=output.get("notes", {}),
)
messages.append(msg)
elif isinstance(output, str):
messages.append(Message(text=output, lang=lang_spec))
return messages

attempt = MinimalAttempt(entry)
incomplete_attempts.append(attempt)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The need to introduce this class suggest a level of complexity here that likely needs to be reworked.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, the need for this class indicates a design problem. I created it to bridge the gap between loading previous attempts from the report and what the evaluator expects, but this complexity suggests the approach needs reworking. This ties into the broader attempt-level resumption design issues.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the changes in this class are tightly coupled to resume specifically and should not be the responsibility of the harness. The state machine for executing a run can be organized in more cleanly to enable harnesses to request additional data from the service that is aware of the resume state vs having to build and manage that state.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. The harness is managing resume state when it should be requesting data from a service that's aware of the resume context. The state machine for run execution needs cleaner organization - harness should ask the service what work needs to be done rather than building and managing that state itself. This is a fundamental separation of concerns issue.

Comment on lines +27 to +72
class AttemptMock:
def __init__(self, outputs, probename, prompt=None, seq=0):
self.all_outputs = [
OutputMock(output)
for output in (outputs if isinstance(outputs, list) else [outputs])
]
self.probename = probename
self.probe_classname = ".".join(
probename.split(".")[1:]
) # Changed to "category.Class" format, e.g., "xss.ColabAIDataLeakage"
self.prompt = (
prompt
if prompt
else {
"turns": [
{
"role": "user",
"content": {"text": prompt or "", "lang": "en", "notes": {}},
}
]
}
)
if detector:
return detector
else:
print(f" detector load failed: {detector_name}, skipping >>")
logging.error(f" detector load failed: {detector_name}, skipping >>")
return False
self.status = "success"
self.detector_results = {}
self.notes = {"terms": ["summary", "conversation"]}
self.outputs = [output.text for output in self.all_outputs]
self.uuid = str(uuid.uuid4())
self.seq = seq
self.probe_params = {}
self.targets = []
self.conversations = [
{
"turns": [
{
"role": "user",
"content": {"text": prompt or "", "lang": "en", "notes": {}},
},
{
"role": "assistant",
"content": {"text": output.text, "lang": "en", "notes": {}},
},
]
}
for output in self.all_outputs
]
self.reverse_translation_outputs = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this is some sort of testing method, if so it should not be in the package class.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, this is testing/debug code that shouldn't be in the package. I'll remove it.

@shrikantpachpor shrikantpachpor force-pushed the feature/resumable-scanning branch from 1100bbe to 47c10e0 Compare February 2, 2026 20:28
…rity

Implements comprehensive resume functionality allowing interrupted garak
scans to continue from where they stopped, with fine-grained control over
resume granularity at probe or attempt levels.

Closes NVIDIA#141

## Overview

Long-running vulnerability scans can be interrupted by network issues,
rate limits, or system crashes. This feature enables users to resume
interrupted scans without starting from scratch, saving time and
computational resources.

## Key Features

### Resume Service Architecture
- Complete resume service (resumeservice.py) with atomic state management
- Run state persistence in ~/.garak/runs/<run_id>/
- Automatic run ID generation with timestamp tracking
- Version compatibility validation
- Cross-platform state storage support
- Report file continuation (appends to existing reports on resume)

### Granularity Control
Two resumption levels with configurable granularity:
- **probe**: Skip entire completed probes (faster, coarser)
- **attempt**: Skip individual completed prompts (precise, fine-grained)

### CLI Commands
Five new CLI arguments for resume control:
- --resumable [true/false]: Enable/disable resumable scans (default: enabled)
- --resume <run_id>: Resume a previous interrupted run
- --list_runs: Display all unfinished runs with status
- --delete_run <run_id>: Clean up run state
- --resume_granularity {probe|attempt}: Set resume precision level

### State Management
- Atomic file writes prevent state corruption
- Configuration snapshots preserve run parameters
- Progress tracking at probe and attempt levels
- Probe metadata validation for resumability
- Run lifecycle management (init, save, load, cleanup)
- Report continuation preserves original run context

### Testing
- 19 probe metadata tests (all passed)
- Comprehensive integration tests available
- 100% pass rate on functional tests

## Modified Files

### Core Files
- garak/cli.py: Added 5 CLI arguments with str_to_bool() helper
- garak/command.py: Resume service initialization, cleanup, and report continuation
- garak/harnesses/probewise.py: Attempt/probe-level filtering
- garak/_config.py: Resume configuration fields
- garak/probes/base.py: Probe resumability metadata
- garak/exception.py: ResumeValidationError exception
- garak/harnesses/base.py: Probe-level skip logic
- garak/buffs/base.py: Import compatibility fix
- README.md: Resume documentation section
- .gitignore: Added .garak/runs/ for resume state storage

## New Files
- garak/resumeservice.py (1,047 lines): Complete resume service
- tests/test_resume_integration.py (403 lines): Integration tests
- tests/test_probe_resumability.py (274 lines): Metadata tests

## Compatibility

- Fully backward compatible (opt-in feature)
- No breaking changes to existing functionality
- No new dependencies required
- Works with all existing probes, detectors, and generators
- Resumed reports maintain original run context and append new results

## Testing

All tests pass:
- 19 probe resumability tests: PASSED
- 70 config tests: PASSED
- 7 requirements tests: PASSED
- Black formatting: PASSED (all modified files)

## Benefits

- Save time on interrupted long-running scans
- Reduce API costs by avoiding redundant prompts
- Survive system crashes, network failures, user cancellation
- Enable flexible scan scheduling (pause/resume)
- Two granularity levels: fast (probe) vs precise (attempt)
- Seamless report continuation maintains scan history

Signed-off-by: Shrikant Pachpor <shri.pachpor24@gmail.com>
@shrikantpachpor shrikantpachpor force-pushed the feature/resumable-scanning branch from 47c10e0 to 2949288 Compare February 2, 2026 20:41
Addresses maintainer feedback on storage location (concern NVIDIA#6).

Changes:
- Use _config.transient.data_dir instead of hardcoded ~/.garak/runs/
- Respects XDG Base Directory Specification
- Honors user config and CLI options
- Updated documentation to reflect XDG paths

State now stored in:
- Linux: ~/.local/share/garak/runs/
- macOS: ~/Library/Application Support/garak/runs/
- Windows: %LOCALAPPDATA%\garak\runs"

Signed-off-by: Shrikant Pachpor <shri.pachpor24@gmail.com>
…A#10)

- Simplified run_id format to just UUID (removed timestamp)
- Moved state files from report_dir to cache_dir
- Removed extract_uuid_from_run_id function
- Updated documentation to reflect cache directory usage

Fixes maintainer concerns about run ID format changes and
state file location. State files now properly stored in
cache directory as per project conventions.

Signed-off-by: Shrikant Pachpor <shri.pachpor24@gmail.com>
@shrikantpachpor
Copy link
Author

@jmartin-tech I've addressed the straightforward fixes in commits 2949288 and 07556ad. However, your review has identified fundamental architectural issues that I think warrant discussion before implementation.

Core Issues Identified:

The feedback points to a consistent pattern - resume logic is scattered across CLI, harness, and service instead of being properly encapsulated:

CLI doing business logic: Report setup, state restoration, and direct service access all violate the principle that CLI should only parse arguments and setup config.

Harness managing state: The probewise.py changes are building and managing resume state when the harness should be requesting data from a service that's aware of the resume context.

Attempt-level design: The seq-based matching doesn't work due to randomization, external state storage adds unnecessary complexity when Attempts already have state indicators, and the MinimalAttempt class suggests the approach needs rethinking. Your reference to PR #1531 indicates the correct approach requires prompt comparison and seed validation.

Report modification strategy: The complexity and corruption risks around appending to input reports.

State management scope: Questions about where storage belongs and what the service should be responsible for.

Service naming: Your suggestion about "runservice" managing run state and providing resume support makes sense as a cleaner separation.

Path Forward:

These issues are interconnected and need holistic design rather than piecemeal fixes. I see two approaches:

Redesign in this PR: Address all architectural concerns here, but this is 2-4 weeks of work with significant design discussion needed.

Two-PR strategy: The quick fixes are done. Create a new PR focused entirely on the architectural redesign with proper service/harness/CLI separation, potentially simplified scope initially (probe-level only).

I'm leaning toward option 2 to keep the PRs focused and allow proper design discussion, but I'm open to your preference.

Questions for design:

Service architecture: Should the service provide get_probes_to_run() and get_attempts_to_run() APIs that harnesses query, rather than harnesses managing state?

CLI commands: For --list_runs and --delete_run that don't launch a full run, how should they access run management if services are only launched at harness initialization?

Attempt resumption: Should we simplify to probe-level only initially, or implement the full prompt-based comparison approach from PR #1531?

Report strategy: New output file per resume (read-only input) vs. appending to existing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

support continuation / resume of runs

2 participants

Comments