feat: Add resumable vulnerability scanning with probe/attempt granularity#1589
feat: Add resumable vulnerability scanning with probe/attempt granularity#1589shrikantpachpor wants to merge 3 commits intoNVIDIA:mainfrom
Conversation
|
DCO Assistant Lite bot All contributors have signed the DCO ✍️ ✅ |
|
I have read the DCO Document and I hereby sign the DCO |
|
recheck |
jmartin-tech
left a comment
There was a problem hiding this comment.
This is a preliminary review and more will be added. The current implementation is a good start and looks to build off of #1531. There are similar concerns mentioned here to comments made on that other implementation.
Further review and thought will be forthcoming.
There was a problem hiding this comment.
My comment from PR 1531 applies to this file's changes as well.
I still view modification of the input report as something that should be avoided as it introduces significant complexity to manage state and could cause corruption to the original input file if it is reused for output.
There was a problem hiding this comment.
Understood. The report modification approach introduces complexity and corruption risks. This needs to be addressed as a part of the broader architectural discussion about how resume should work. I'll address this in a follow-up where we can properly design the file strategy.
| # RESUME SUPPORT: Restore report paths and override probe spec with probes from resumed run | ||
| from garak import resumeservice | ||
|
|
||
| if resumeservice.enabled(): | ||
| resumed_state = resumeservice.get_state() | ||
| if resumed_state: | ||
| # Restore report directory and prefix from resumed state | ||
| if "report_dir" in resumed_state: | ||
| _config.reporting.report_dir = resumed_state["report_dir"] | ||
| logging.info( | ||
| f"Restored report_dir from state: {resumed_state['report_dir']}" | ||
| ) | ||
| if "report_prefix" in resumed_state: | ||
| _config.reporting.report_prefix = resumed_state["report_prefix"] | ||
| logging.info( | ||
| f"Restored report_prefix from state: {resumed_state['report_prefix']}" | ||
| ) | ||
|
|
||
| # Use the original run_id to maintain report filename consistency | ||
| if "run_id" in resumed_state: | ||
| # Extract UUID from full run_id format "garak-run-<uuid>-<timestamp>" | ||
| full_run_id = resumed_state["run_id"] | ||
| original_run_id = resumeservice.extract_uuid_from_run_id( | ||
| full_run_id | ||
| ) | ||
| _config.transient.run_id = original_run_id | ||
| logging.info(f"Restored run_id from state: {original_run_id}") | ||
|
|
||
| # Override probe spec with probes from resumed run | ||
| if "probenames" in resumed_state: | ||
| resumed_probes = resumed_state["probenames"] | ||
| # Strip "probes." prefix if present for parse_plugin_spec compatibility | ||
| resumed_probes_clean = [ | ||
| p.replace("probes.", "") for p in resumed_probes | ||
| ] | ||
| # Convert probe list to comma-separated spec | ||
| _config.plugins.probe_spec = ",".join(resumed_probes_clean) | ||
| logging.info( | ||
| f"Resuming run with probes from state: {resumed_probes}" | ||
| ) | ||
| print( | ||
| f"🔄 Using probes from resumed run: {', '.join(resumed_probes_clean)}" | ||
| ) | ||
|
|
There was a problem hiding this comment.
The logic here does not belong in cli. This package is intended parse and setup runtime _config state to allow launch of a run. The functionality here to tightly coupled and likely should be deferred to a separate stage in the run.
There was a problem hiding this comment.
Agreed, CLI should only parse arguments and setup config. The state restoration and probe spec override logic is too tightly coupled here and should happen at a different stage. This will be addressed as part of moving resume logic to the proper place in the run lifecycle.
| # Handle resume-related commands | ||
| if hasattr(args, "list_runs") and args.list_runs: | ||
| from garak import resumeservice | ||
| from datetime import datetime | ||
|
|
||
| runs = resumeservice.list_runs() | ||
| if not runs: | ||
| print("\n📋 No unfinished runs found.") | ||
| print( | ||
| "\nStart a new resumable scan with: garak --resumable [options]\n" | ||
| ) | ||
| else: | ||
| print("\n📋 Resumable Runs\n") | ||
|
|
||
| # Print header | ||
| print( | ||
| f"{'#':<4} {'Run ID':<38} {'Started':<20} {'Progress':<12} {'%':<6}" | ||
| ) | ||
| print("-" * 82) | ||
|
|
||
| for idx, run in enumerate(runs, 1): | ||
| # Calculate percentage | ||
| percentage = ( | ||
| (run["progress"] / run["total"] * 100) | ||
| if run["total"] > 0 | ||
| else 0 | ||
| ) | ||
|
|
||
| # Format the timestamp more readably | ||
| try: | ||
| dt = datetime.fromisoformat(run["start_time"]) | ||
| formatted_time = dt.strftime("%Y-%m-%d %H:%M") | ||
| except: | ||
| formatted_time = run["start_time"][:16] | ||
|
|
||
| # Progress format | ||
| progress_str = f"{run['progress']}/{run['total']}" | ||
|
|
||
| print( | ||
| f"{idx:<4} {run['run_id']:<38} {formatted_time:<20} {progress_str:<12} {percentage:>5.1f}%" | ||
| ) | ||
|
|
||
| print("-" * 82) | ||
| print(f"\nTotal: {len(runs)} unfinished run(s)") | ||
| print("\nTo resume: garak --resume <run_id>") | ||
| print("To delete: garak --delete_run <run_id>\n") | ||
| return | ||
|
|
||
| if hasattr(args, "delete_run") and args.delete_run: | ||
| from garak import resumeservice | ||
|
|
||
| try: | ||
| resumeservice.delete_run(args.delete_run) | ||
| print(f"✅ Deleted run: {args.delete_run}") | ||
| except Exception as e: | ||
| print(f"❌ Failed to delete run: {e}") | ||
| import sys | ||
|
|
||
| sys.exit(1) | ||
| return | ||
|
|
There was a problem hiding this comment.
Parsing of resume related options should setup the global configuration for later actions.
Services are launched at harness initialization and should not be accessed directly by cli. The idea of managing existing runs on disk is novel and interesting, however it seems to be a unique feature that should should be independent of a resume specific service. I had noted in PR #1531 that it might make sense to have a runservice that manages run state and also provides support for a resume feature.
There was a problem hiding this comment.
You're right on multiple points. Services should be launched at harness initialization, not accessed directly by CLI. The run management feature being independent from resume-specific service makes sense - your suggestion about having a "runservice" that manages run state and also provides resume support seems like a better separation of concerns. This needs architectural rethinking about where run management lives and how CLI-only commands should work.
| class MinimalAttempt: | ||
| def __init__(self, data): | ||
| self.uuid = data.get("uuid", "") | ||
| self.seq = data.get("seq", 0) | ||
| self.status = 1 # Was executed | ||
| self.probe_classname = data.get("probe_classname", "") | ||
| self.probe_params = data.get("probe_params", {}) | ||
| self.targets = data.get("targets", []) | ||
| self.notes = data.get("notes", {}) | ||
| self.goal = data.get("goal", "") | ||
| self.detector_results = data.get("detector_results", {}) | ||
| self.conversations = data.get("conversations", []) | ||
| self.reverse_translation_outputs = data.get( | ||
| "reverse_translation_outputs", [] | ||
| ) | ||
| # Store prompt and outputs as-is (already serialized) | ||
| self._prompt_data = data.get("prompt", {}) | ||
| self._outputs_data = data.get("outputs", []) | ||
| # Cache outputs as Message objects for evaluator | ||
| self._outputs_cache = None | ||
| self._prompt_cache = None | ||
|
|
||
| @property | ||
| def prompt(self): | ||
| """Return prompt as Conversation object (for evaluator)""" | ||
| if self._prompt_cache is None: | ||
| from garak.attempt import Conversation | ||
|
|
||
| # Reconstruct Conversation from the stored dict | ||
| if isinstance(self._prompt_data, dict): | ||
| self._prompt_cache = Conversation.from_dict( | ||
| self._prompt_data | ||
| ) | ||
| else: | ||
| # Fallback to empty conversation | ||
| self._prompt_cache = Conversation([]) | ||
| return self._prompt_cache | ||
|
|
||
| @property | ||
| def outputs(self): | ||
| """Return output messages as a list (for evaluator)""" | ||
| if self._outputs_cache is None: | ||
| from garak.attempt import Message | ||
|
|
||
| messages = [] | ||
| if isinstance(self._outputs_data, list): | ||
| for output in self._outputs_data: | ||
| if isinstance(output, dict): | ||
| msg = Message( | ||
| text=output.get("text", ""), | ||
| lang=output.get("lang", "en"), | ||
| data_path=output.get("data_path"), | ||
| data_type=output.get("data_type"), | ||
| data_checksum=output.get("data_checksum"), | ||
| notes=output.get("notes", {}), | ||
| ) | ||
| messages.append(msg) | ||
| elif isinstance(output, str): | ||
| messages.append(Message(text=output, lang="en")) | ||
| self._outputs_cache = messages | ||
| return self._outputs_cache | ||
|
|
||
| def as_dict(self): | ||
| """Return dictionary representation for report writing""" | ||
| return { | ||
| "entry_type": "attempt", | ||
| "uuid": self.uuid, | ||
| "seq": self.seq, | ||
| "status": self.status, | ||
| "probe_classname": self.probe_classname, | ||
| "probe_params": self.probe_params, | ||
| "targets": self.targets, | ||
| "prompt": self._prompt_data, | ||
| "outputs": self._outputs_data, | ||
| "detector_results": self.detector_results, | ||
| "notes": self.notes, | ||
| "goal": self.goal, | ||
| "conversations": self.conversations, | ||
| "reverse_translation_outputs": self.reverse_translation_outputs, | ||
| } | ||
|
|
||
| def outputs_for(self, lang_spec): | ||
| """Return output messages for detector evaluation""" | ||
| # Reconstruct Message objects from outputs_data | ||
| from garak.attempt import Message | ||
|
|
||
| messages = [] | ||
| if isinstance(self._outputs_data, list): | ||
| for output in self._outputs_data: | ||
| if isinstance(output, dict): | ||
| msg = Message( | ||
| text=output.get("text", ""), | ||
| lang=output.get("lang", lang_spec), | ||
| data_path=output.get("data_path"), | ||
| data_type=output.get("data_type"), | ||
| data_checksum=output.get("data_checksum"), | ||
| notes=output.get("notes", {}), | ||
| ) | ||
| messages.append(msg) | ||
| elif isinstance(output, str): | ||
| messages.append(Message(text=output, lang=lang_spec)) | ||
| return messages | ||
|
|
||
| attempt = MinimalAttempt(entry) | ||
| incomplete_attempts.append(attempt) |
There was a problem hiding this comment.
The need to introduce this class suggest a level of complexity here that likely needs to be reworked.
There was a problem hiding this comment.
Agreed, the need for this class indicates a design problem. I created it to bridge the gap between loading previous attempts from the report and what the evaluator expects, but this complexity suggests the approach needs reworking. This ties into the broader attempt-level resumption design issues.
There was a problem hiding this comment.
Most of the changes in this class are tightly coupled to resume specifically and should not be the responsibility of the harness. The state machine for executing a run can be organized in more cleanly to enable harnesses to request additional data from the service that is aware of the resume state vs having to build and manage that state.
There was a problem hiding this comment.
Right. The harness is managing resume state when it should be requesting data from a service that's aware of the resume context. The state machine for run execution needs cleaner organization - harness should ask the service what work needs to be done rather than building and managing that state itself. This is a fundamental separation of concerns issue.
| class AttemptMock: | ||
| def __init__(self, outputs, probename, prompt=None, seq=0): | ||
| self.all_outputs = [ | ||
| OutputMock(output) | ||
| for output in (outputs if isinstance(outputs, list) else [outputs]) | ||
| ] | ||
| self.probename = probename | ||
| self.probe_classname = ".".join( | ||
| probename.split(".")[1:] | ||
| ) # Changed to "category.Class" format, e.g., "xss.ColabAIDataLeakage" | ||
| self.prompt = ( | ||
| prompt | ||
| if prompt | ||
| else { | ||
| "turns": [ | ||
| { | ||
| "role": "user", | ||
| "content": {"text": prompt or "", "lang": "en", "notes": {}}, | ||
| } | ||
| ] | ||
| } | ||
| ) | ||
| if detector: | ||
| return detector | ||
| else: | ||
| print(f" detector load failed: {detector_name}, skipping >>") | ||
| logging.error(f" detector load failed: {detector_name}, skipping >>") | ||
| return False | ||
| self.status = "success" | ||
| self.detector_results = {} | ||
| self.notes = {"terms": ["summary", "conversation"]} | ||
| self.outputs = [output.text for output in self.all_outputs] | ||
| self.uuid = str(uuid.uuid4()) | ||
| self.seq = seq | ||
| self.probe_params = {} | ||
| self.targets = [] | ||
| self.conversations = [ | ||
| { | ||
| "turns": [ | ||
| { | ||
| "role": "user", | ||
| "content": {"text": prompt or "", "lang": "en", "notes": {}}, | ||
| }, | ||
| { | ||
| "role": "assistant", | ||
| "content": {"text": output.text, "lang": "en", "notes": {}}, | ||
| }, | ||
| ] | ||
| } | ||
| for output in self.all_outputs | ||
| ] | ||
| self.reverse_translation_outputs = [] |
There was a problem hiding this comment.
I suspect this is some sort of testing method, if so it should not be in the package class.
There was a problem hiding this comment.
You're correct, this is testing/debug code that shouldn't be in the package. I'll remove it.
1100bbe to
47c10e0
Compare
…rity Implements comprehensive resume functionality allowing interrupted garak scans to continue from where they stopped, with fine-grained control over resume granularity at probe or attempt levels. Closes NVIDIA#141 ## Overview Long-running vulnerability scans can be interrupted by network issues, rate limits, or system crashes. This feature enables users to resume interrupted scans without starting from scratch, saving time and computational resources. ## Key Features ### Resume Service Architecture - Complete resume service (resumeservice.py) with atomic state management - Run state persistence in ~/.garak/runs/<run_id>/ - Automatic run ID generation with timestamp tracking - Version compatibility validation - Cross-platform state storage support - Report file continuation (appends to existing reports on resume) ### Granularity Control Two resumption levels with configurable granularity: - **probe**: Skip entire completed probes (faster, coarser) - **attempt**: Skip individual completed prompts (precise, fine-grained) ### CLI Commands Five new CLI arguments for resume control: - --resumable [true/false]: Enable/disable resumable scans (default: enabled) - --resume <run_id>: Resume a previous interrupted run - --list_runs: Display all unfinished runs with status - --delete_run <run_id>: Clean up run state - --resume_granularity {probe|attempt}: Set resume precision level ### State Management - Atomic file writes prevent state corruption - Configuration snapshots preserve run parameters - Progress tracking at probe and attempt levels - Probe metadata validation for resumability - Run lifecycle management (init, save, load, cleanup) - Report continuation preserves original run context ### Testing - 19 probe metadata tests (all passed) - Comprehensive integration tests available - 100% pass rate on functional tests ## Modified Files ### Core Files - garak/cli.py: Added 5 CLI arguments with str_to_bool() helper - garak/command.py: Resume service initialization, cleanup, and report continuation - garak/harnesses/probewise.py: Attempt/probe-level filtering - garak/_config.py: Resume configuration fields - garak/probes/base.py: Probe resumability metadata - garak/exception.py: ResumeValidationError exception - garak/harnesses/base.py: Probe-level skip logic - garak/buffs/base.py: Import compatibility fix - README.md: Resume documentation section - .gitignore: Added .garak/runs/ for resume state storage ## New Files - garak/resumeservice.py (1,047 lines): Complete resume service - tests/test_resume_integration.py (403 lines): Integration tests - tests/test_probe_resumability.py (274 lines): Metadata tests ## Compatibility - Fully backward compatible (opt-in feature) - No breaking changes to existing functionality - No new dependencies required - Works with all existing probes, detectors, and generators - Resumed reports maintain original run context and append new results ## Testing All tests pass: - 19 probe resumability tests: PASSED - 70 config tests: PASSED - 7 requirements tests: PASSED - Black formatting: PASSED (all modified files) ## Benefits - Save time on interrupted long-running scans - Reduce API costs by avoiding redundant prompts - Survive system crashes, network failures, user cancellation - Enable flexible scan scheduling (pause/resume) - Two granularity levels: fast (probe) vs precise (attempt) - Seamless report continuation maintains scan history Signed-off-by: Shrikant Pachpor <shri.pachpor24@gmail.com>
47c10e0 to
2949288
Compare
Addresses maintainer feedback on storage location (concern NVIDIA#6). Changes: - Use _config.transient.data_dir instead of hardcoded ~/.garak/runs/ - Respects XDG Base Directory Specification - Honors user config and CLI options - Updated documentation to reflect XDG paths State now stored in: - Linux: ~/.local/share/garak/runs/ - macOS: ~/Library/Application Support/garak/runs/ - Windows: %LOCALAPPDATA%\garak\runs" Signed-off-by: Shrikant Pachpor <shri.pachpor24@gmail.com>
…A#10) - Simplified run_id format to just UUID (removed timestamp) - Moved state files from report_dir to cache_dir - Removed extract_uuid_from_run_id function - Updated documentation to reflect cache directory usage Fixes maintainer concerns about run ID format changes and state file location. State files now properly stored in cache directory as per project conventions. Signed-off-by: Shrikant Pachpor <shri.pachpor24@gmail.com>
|
@jmartin-tech I've addressed the straightforward fixes in commits 2949288 and 07556ad. However, your review has identified fundamental architectural issues that I think warrant discussion before implementation. Core Issues Identified: The feedback points to a consistent pattern - resume logic is scattered across CLI, harness, and service instead of being properly encapsulated: CLI doing business logic: Report setup, state restoration, and direct service access all violate the principle that CLI should only parse arguments and setup config. Harness managing state: The probewise.py changes are building and managing resume state when the harness should be requesting data from a service that's aware of the resume context. Attempt-level design: The seq-based matching doesn't work due to randomization, external state storage adds unnecessary complexity when Attempts already have state indicators, and the MinimalAttempt class suggests the approach needs rethinking. Your reference to PR #1531 indicates the correct approach requires prompt comparison and seed validation. Report modification strategy: The complexity and corruption risks around appending to input reports. State management scope: Questions about where storage belongs and what the service should be responsible for. Service naming: Your suggestion about "runservice" managing run state and providing resume support makes sense as a cleaner separation. Path Forward: These issues are interconnected and need holistic design rather than piecemeal fixes. I see two approaches: Redesign in this PR: Address all architectural concerns here, but this is 2-4 weeks of work with significant design discussion needed. Two-PR strategy: The quick fixes are done. Create a new PR focused entirely on the architectural redesign with proper service/harness/CLI separation, potentially simplified scope initially (probe-level only). I'm leaning toward option 2 to keep the PRs focused and allow proper design discussion, but I'm open to your preference. Questions for design: Service architecture: Should the service provide get_probes_to_run() and get_attempts_to_run() APIs that harnesses query, rather than harnesses managing state? CLI commands: For --list_runs and --delete_run that don't launch a full run, how should they access run management if services are only launched at harness initialization? Attempt resumption: Should we simplify to probe-level only initially, or implement the full prompt-based comparison approach from PR #1531? Report strategy: New output file per resume (read-only input) vs. appending to existing? |
Implements comprehensive resume functionality allowing interrupted garak scans to continue from where they stopped, with fine-grained control over resume granularity at probe or attempt levels.
Closes #141
Overview
Long-running vulnerability scans can be interrupted by network issues, rate limits, or system crashes. This feature enables users to resume interrupted scans without starting from scratch, saving time and computational resources.
Key Features
Resume Service Architecture
Granularity Control
Two resumption levels with configurable granularity:
CLI Commands
Five new CLI arguments for resume control:
State Management
Testing
Modified Files
Core Files
New Files
Compatibility
Testing
All tests pass:
Benefits
Tell us what this change does. If you're fixing a bug, please mention
the github issue number.
Please ensure you are submitting from a unique branch in your repository to
mainupstream.Verification
List the steps needed to make sure this thing works
{ "huggingface": { "torch_type": "float32" } }garak -t <target_type> -n <model_name>python -m pytest tests/If you are opening a PR for a new plugin that targets a specific piece of hardware or requires a complex or hard-to-find testing environment, we recommend that you send us as much detail as possible.
Specific Hardware Examples:
cuda/mps( Please notcudaviaROCmif related )Complex Software Examples: