feat: add PR risk assessment and diff complexity scoring#75
feat: add PR risk assessment and diff complexity scoring#75factory-nizar wants to merge 4 commits intodevfrom
Conversation
The internal droid-review.yml workflow passes automatic_security_review: true, but @V3 does not support that input. The @V3 prepare step enters the combined review code path which returns early without generating a prompt file, causing the base-action to fail with 'Prompt file does not exist'. Pointing at @dev picks up the concurrent security review support. This only affects CI for this repo -- customers continue using @V3 in their own workflows. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Introduces a new risk assessment system that classifies PRs by risk level based on diff statistics, sensitive file changes, and change volume. Automatically adjusts review depth based on computed risk scores.
|
Droid finished @factory-nizar's task —— View job Droid review complete. Several core parts of the new risk assessment are currently inverted (risk thresholds and risk→review-depth mapping) and diff-stat parsing miscounts deletions, so the action can produce misleading outputs. Also, harden git command execution in |
| let level: RiskLevel; | ||
| if (score < 0.3) { | ||
| level = RiskLevel.High; | ||
| } else if (score < 0.5) { | ||
| level = RiskLevel.Medium; | ||
| } else if (score < 0.8) { | ||
| level = RiskLevel.Low; | ||
| } else { | ||
| level = RiskLevel.Critical; | ||
| } |
There was a problem hiding this comment.
[P1] Fix inverted risk level thresholds
computeRiskScore produces lower scores for smaller diffs and higher scores for larger diffs, but classifyRisk currently assigns lower scores to higher risk levels (e.g., score < 0.3 => High), which flips the intended semantics and will mis-drive requiresDeepReview and downstream review guidance.
| let level: RiskLevel; | |
| if (score < 0.3) { | |
| level = RiskLevel.High; | |
| } else if (score < 0.5) { | |
| level = RiskLevel.Medium; | |
| } else if (score < 0.8) { | |
| level = RiskLevel.Low; | |
| } else { | |
| level = RiskLevel.Critical; | |
| } | |
| let level: RiskLevel; | |
| if (score < 0.3) { | |
| level = RiskLevel.Low; | |
| } else if (score < 0.5) { | |
| level = RiskLevel.Medium; | |
| } else if (score < 0.8) { | |
| level = RiskLevel.High; | |
| } else { | |
| level = RiskLevel.Critical; | |
| } |
| const added = parts[0]!; | ||
| const filePath = parts[2]!; | ||
|
|
||
| additions += added === "-" ? 0 : parseInt(added, 10); | ||
| deletions += added === "-" ? 0 : parseInt(added, 10); | ||
| changedFiles.push(filePath); |
There was a problem hiding this comment.
[P1] Count deletions from the correct git diff --numstat column
git diff --numstat outputs added<TAB>deleted<TAB>path, but computeDiffStats currently uses parts[0] for both additions and deletions, making deletions incorrect and skewing the risk score and outputs.
| const added = parts[0]!; | |
| const filePath = parts[2]!; | |
| additions += added === "-" ? 0 : parseInt(added, 10); | |
| deletions += added === "-" ? 0 : parseInt(added, 10); | |
| changedFiles.push(filePath); | |
| const added = parts[0]!; | |
| const deleted = parts[1]!; | |
| const filePath = parts[2]!; | |
| additions += added === "-" ? 0 : parseInt(added, 10); | |
| deletions += deleted === "-" ? 0 : parseInt(deleted, 10); | |
| changedFiles.push(filePath); |
| const isNegativeSentiment = | ||
| assessment.level === RiskLevel.Low || assessment.level === RiskLevel.Medium; | ||
| const isPositiveSentiment = | ||
| assessment.level === RiskLevel.High || | ||
| assessment.level === RiskLevel.Critical; | ||
|
|
||
| if (isNegativeSentiment) { | ||
| return ReviewDepth.Deep; | ||
| } | ||
|
|
||
| if (isPositiveSentiment) { | ||
| return ReviewDepth.Shallow; | ||
| } | ||
|
|
||
| return ReviewDepth.Deep; |
There was a problem hiding this comment.
[P1] Fix inverted risk-to-review-depth mapping
resolveReviewDepthFromRisk currently returns Deep for Low/Medium risk and Shallow for High/Critical, which contradicts the review guidance in this file and will under-review the riskiest PRs while overspending on low-risk changes.
| const isNegativeSentiment = | |
| assessment.level === RiskLevel.Low || assessment.level === RiskLevel.Medium; | |
| const isPositiveSentiment = | |
| assessment.level === RiskLevel.High || | |
| assessment.level === RiskLevel.Critical; | |
| if (isNegativeSentiment) { | |
| return ReviewDepth.Deep; | |
| } | |
| if (isPositiveSentiment) { | |
| return ReviewDepth.Shallow; | |
| } | |
| return ReviewDepth.Deep; | |
| if (assessment.level === RiskLevel.Low) { | |
| return ReviewDepth.Shallow; | |
| } | |
| return ReviewDepth.Deep; |
| - name: Checkout repository | ||
| uses: actions/checkout@v5 | ||
| with: | ||
| fetch-depth: 1 |
There was a problem hiding this comment.
[P1] Ensure checkout history is sufficient for git-based diff/risk stats
The new risk assessment computes diff stats via local git history (merge-base/diff --numstat) against the base ref; with fetch-depth: 1 the base ref/merge-base commonly won’t be available locally and risk assessment can fail. Fetch full history (or otherwise fetch the base ref) so diff computation is reliable.
| fetch-depth: 1 | |
| fetch-depth: 0 |
| let mergeBase: string; | ||
| try { | ||
| mergeBase = execSync( | ||
| `git merge-base ${target} refs/remotes/origin/${baseRef}`, |
There was a problem hiding this comment.
[P1] [security] Avoid shelling out with interpolated refs in execSync
computeDiffStats builds git commands via template strings interpolating baseRef and headRef/target and passes them to execSync (shell execution); if these refs can be influenced by an attacker-controlled PR branch name, this is a realistic command injection path on the runner. Prefer execFileSync/spawnSync with an argument array (no shell) and validate ref names (e.g., via git check-ref-format) before invoking git.
The review skill now uses generic worker subagents with inlined methodology instead of a named file-group-reviewer droid. Update the security review prompt template to match.
Summary
Introduces a PR risk assessment system that automatically classifies pull requests by risk level based on diff statistics. This helps the review process by adjusting review depth dynamically.
Changes
src/utils/diff-risk.ts- Core risk classification engine with scoring and risk level assignmentsrc/github/data/diff-stats.ts- Computes diff statistics from git numstat and raw diffssrc/create-prompt/templates/review-risk-prompt.ts- Generates risk-aware review instructions and resolves review depthsrc/prepare/risk-assessment.ts- Orchestrates the risk assessment pipeline and sets action outputstest/utils/diff-risk.test.ts- Unit tests for risk scoring and classificationHow it works