Skip to content

Add unified CLI and handoff/SARIF support for pii-scan and verify#1

Merged
dkitchell merged 1 commit intomainfrom
claude/gap-analysis-gtm-MBjYl
Apr 18, 2026
Merged

Add unified CLI and handoff/SARIF support for pii-scan and verify#1
dkitchell merged 1 commit intomainfrom
claude/gap-analysis-gtm-MBjYl

Conversation

@dkitchell
Copy link
Copy Markdown
Contributor

Summary

This PR introduces a unified command-line interface (@certifieddata/cli) and extends @certifieddata/pii-scan with sanitized handoff summaries and SARIF 2.1.0 output support. The changes enable a complete local-first workflow: detect PII → hand off to generation → verify certificates, all while maintaining strict privacy guarantees (no raw data transmission).

Key Changes

New Package: @certifieddata/cli

  • Unified top-level CLI dispatcher with subcommands: pii-scan, generate, verify, registry
  • pii-scan subcommand wraps @certifieddata/pii-scan with additional output formats
  • generate subcommand implements a browser-based handoff workflow (no file uploads, only aggregate counts in URL params)
  • verify subcommand wraps @certifieddata/verify for offline certificate validation
  • Comprehensive help system and flag parsing for all subcommands
  • Full test coverage with smoke tests for exit codes and URL structure validation

@certifieddata/pii-scan Extensions

  • Handoff module (handoff.ts): Builds sanitized summary artifacts containing only aggregate counts, column names, and risk labels—never raw samples or values
    • buildHandoff() function creates a HandoffSummary with schema version pii-scan.handoff.v1
    • handoffContinueUrl() helper generates deeplinks with only counts and risk (no column names in URL)
    • Aggregate findings per column with risk promotion logic
  • SARIF output (sarif.ts): Generates SARIF 2.1.0-compliant logs for GitHub Code Scanning integration
    • Maps PII risk levels to SARIF severity levels (HIGH→error, MEDIUM→warning, LOW→note)
    • Includes rule metadata and help URIs without exposing raw sample values
    • Stable rule IDs keyed by pattern source and name
  • CLI enhancements (cli.ts):
    • --emit-handoff flag to print sanitized handoff JSON
    • --output-handoff <path> to write handoff to disk
    • --open-generate to launch browser with continue URL
    • --sarif flag for SARIF output
    • --base-url override for handoff generation
  • Library exports: buildHandoff, handoffContinueUrl, buildSarif
  • Comprehensive tests for handoff privacy guarantees and SARIF structure

@certifieddata/verify Extensions

  • Bundle verification (bundle.ts): Offline verification of certificate bundles in three formats
    • verifyManifestFile(): Verify a manifest JSON against a PEM public key
    • verifyBundleDirectory(): Verify an unpacked bundle directory with auto-discovery of manifest and key files
    • verifyBundleZip(): Verify a zipped bundle with support for stored and deflate compression
    • Flexible basename configuration for non-standard bundle layouts
  • Minimal zip reader (zip.ts): Custom implementation supporting STORE (0) and DEFLATE (8) compression
    • No external zip library dependency—keeps verification portable for long-term archival
    • Rejects unsupported formats (ZIP64, encryption, exotic compression)
  • Full test coverage with transient bundle fixtures and both compression methods

Documentation

  • docs/compliance.md: Crosswalk showing how tooling supports SOC 2, GDPR, and HIPAA workflows
  • docs/pricing.md: Clear delineation of what is local/free vs. hosted/account-required
  • Updated README.md with three-step workflow example
  • Updated llms.txt with CLI package description

Notable Implementation Details

  • Privacy-first design: Handoff artifacts and deeplinks never include raw sample values, file contents, or column names in URLs—only aggregate counts and risk levels
  • Offline-by-default: pii-scan and verify subcommands make zero network calls; generate opens a browser but transmits only

https://claude.ai/code/session_01V7ARryoR769vHVQpnUFnqH

…erify

Closes the middle of the three-point GTM flow (scan → generate → verify) by
adding the missing bridge layers while keeping the public repo free of any
proprietary generation-API surface.

@certifieddata/cli (new)
  - Unified top-level command: pii-scan, generate, verify, registry
  - generate is a web-handoff only: never uploads datasets, never embeds
    hosted-API details. Reads a sanitized handoff or a local file, builds a
    continue-generation URL with aggregate counts only, and opens a browser.

@certifieddata/pii-scan
  - --emit-handoff / --output-handoff <path> — sanitized handoff JSON
    (pii-scan.handoff.v1). No raw values, no redacted samples.
  - --open-generate — opens the continue-generation URL; URL carries only
    risk level, finding count, column count, row count.
  - --sarif — SARIF 2.1.0 output for GitHub Code Scanning.
  - Library exports: buildHandoff, handoffContinueUrl, buildSarif.
  - Tests lock in the privacy rule: raw row values never appear in handoff
    artifacts, SARIF logs, or deeplink URLs.

@certifieddata/verify
  - verifyManifestFile(path, opts) — offline manifest-file verification.
  - verifyBundleDirectory(dir, opts) — offline unpacked-bundle verification.
  - verifyBundleZip(zipPath, opts) — offline zipped-bundle verification.
  - Minimal built-in STORE+DEFLATE zip reader (no new runtime deps).
  - Fulfills the "if certifieddata.io goes away, the zip still verifies"
    promise end-to-end.

Docs
  - docs/pricing.md — local vs. hosted boundary, evaluation-without-account.
  - docs/compliance.md — SOC 2 / GDPR / HIPAA / CCPA / EU AI Act crosswalk.
    Support-only framing, no certification claims.
  - README hero flow updated to lead with the unified CLI three-step.
  - llms.txt updated with the new CLI surface.

All builds clean, 93 tests passing (22 verify, 44 pii-scan, 21 schemas,
6 cli). Lint clean across the workspace.

https://claude.ai/code/session_01V7ARryoR769vHVQpnUFnqH
@dkitchell dkitchell merged commit 058a7c8 into main Apr 18, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants