[REVIEW] hipaa-review: add de-identified dataset reidentification gates

## Skill Being Reviewed
**Skill name:** hipaa-review
**Skill path:** `skills/compliance/hipaa-review/`

## False Positive Analysis

**Benign code that triggers a false positive:**
```text
De-identification method documents expert determination or safe-harbor field removal.
```

**Why this is a false positive:**
The current review guidance can push an agent to flag this pattern without first proving the attacker-controlled path, tenant boundary, or effective runtime behavior. The review should require evidence before classification.

## Coverage Gaps

**Missed variant 1:**
```text
Analytics dataset keeps rare ZIP/date/device combinations that re-identify patients.
```
**Why it should be caught:**
This is a realistic failure mode for hipaa-review. It changes the effective security boundary without matching the simpler examples currently emphasized by the skill.

**Missed variant 2:**
```text
Supposed de-identified data is joinable with support tickets containing identifiers.
```
**Why it should be caught:**
This variant commonly appears in production systems and needs explicit reviewer prompts so agents do not stop at static configuration review.

## Edge Cases
De-identification is contextual and needs linkage analysis.

## Remediation Quality

- [x] Fix resolves the vulnerability
- [x] Fix doesn't introduce new security issues
- [x] Fix doesn't break functionality
- **Issues found:** Remediation guidance should add an evidence gate for the specific boundary, a benign exception pattern, and a regression check. Without those, fixes may be either cosmetic or over-broad.

## Comparison to Other Tools

| Tool | Catches this? | Notes |
|------|:---:|-------|
| Semgrep | Partial | Can catch static patterns, but not effective policy, ownership, or runtime propagation without custom rules. |
| CodeQL | Partial | Strong for code/data-flow cases, weaker for cloud/control-plane and process evidence unless modeled. |
| Other: manual review | Yes | Human review can verify effective behavior, exception ownership, and operational evidence. |

## Overall Assessment

**Strengths:**
The skill gives useful practitioner framing and asks for concrete evidence instead of generic advice.

**Needs improvement:**
Need re-identification risk gates.

**Priority recommendations:**
1. Add a checklist item for de-identified dataset reidentification evidence.
2. Add one benign exception example so reviewers avoid noisy findings.
3. Add one regression or verification step that proves the effective boundary after remediation.

## Bounty Info
- [x] I have read and agree to the [CONTRIBUTING.md](../../CONTRIBUTING.md) bounty terms
- **Preferred payment method:** PayPal `samik4184@gmail.com`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] hipaa-review: add de-identified dataset reidentification gates #2679

Skill Being Reviewed

False Positive Analysis

Coverage Gaps

Edge Cases

Remediation Quality

Comparison to Other Tools

Overall Assessment

Bounty Info

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Tool	Catches this?	Notes
Semgrep	Partial	Can catch static patterns, but not effective policy, ownership, or runtime propagation without custom rules.
CodeQL	Partial	Strong for code/data-flow cases, weaker for cloud/control-plane and process evidence unless modeled.
Other: manual review	Yes	Human review can verify effective behavior, exception ownership, and operational evidence.

[REVIEW] hipaa-review: add de-identified dataset reidentification gates #2679

Description

Skill Being Reviewed

False Positive Analysis

Coverage Gaps

Edge Cases

Remediation Quality

Comparison to Other Tools

Overall Assessment

Bounty Info

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions