Skip to content

[REVIEW] hipaa-review: add de-identified dataset reidentification gates #2679

@stmr

Description

@stmr

Skill Being Reviewed

Skill name: hipaa-review
Skill path: skills/compliance/hipaa-review/

False Positive Analysis

Benign code that triggers a false positive:

De-identification method documents expert determination or safe-harbor field removal.

Why this is a false positive:
The current review guidance can push an agent to flag this pattern without first proving the attacker-controlled path, tenant boundary, or effective runtime behavior. The review should require evidence before classification.

Coverage Gaps

Missed variant 1:

Analytics dataset keeps rare ZIP/date/device combinations that re-identify patients.

Why it should be caught:
This is a realistic failure mode for hipaa-review. It changes the effective security boundary without matching the simpler examples currently emphasized by the skill.

Missed variant 2:

Supposed de-identified data is joinable with support tickets containing identifiers.

Why it should be caught:
This variant commonly appears in production systems and needs explicit reviewer prompts so agents do not stop at static configuration review.

Edge Cases

De-identification is contextual and needs linkage analysis.

Remediation Quality

  • Fix resolves the vulnerability
  • Fix doesn't introduce new security issues
  • Fix doesn't break functionality
  • Issues found: Remediation guidance should add an evidence gate for the specific boundary, a benign exception pattern, and a regression check. Without those, fixes may be either cosmetic or over-broad.

Comparison to Other Tools

Tool Catches this? Notes
Semgrep Partial Can catch static patterns, but not effective policy, ownership, or runtime propagation without custom rules.
CodeQL Partial Strong for code/data-flow cases, weaker for cloud/control-plane and process evidence unless modeled.
Other: manual review Yes Human review can verify effective behavior, exception ownership, and operational evidence.

Overall Assessment

Strengths:
The skill gives useful practitioner framing and asks for concrete evidence instead of generic advice.

Needs improvement:
Need re-identification risk gates.

Priority recommendations:

  1. Add a checklist item for de-identified dataset reidentification evidence.
  2. Add one benign exception example so reviewers avoid noisy findings.
  3. Add one regression or verification step that proves the effective boundary after remediation.

Bounty Info

  • I have read and agree to the CONTRIBUTING.md bounty terms
  • Preferred payment method: PayPal samik4184@gmail.com

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions