Skip to content

feat: add PR risk assessment and diff complexity scoring#75

Open
factory-nizar wants to merge 4 commits intodevfrom
feat/pr-risk-assessment
Open

feat: add PR risk assessment and diff complexity scoring#75
factory-nizar wants to merge 4 commits intodevfrom
feat/pr-risk-assessment

Conversation

@factory-nizar
Copy link
Copy Markdown
Contributor

Summary

Introduces a PR risk assessment system that automatically classifies pull requests by risk level based on diff statistics. This helps the review process by adjusting review depth dynamically.

Changes

  • src/utils/diff-risk.ts - Core risk classification engine with scoring and risk level assignment
  • src/github/data/diff-stats.ts - Computes diff statistics from git numstat and raw diffs
  • src/create-prompt/templates/review-risk-prompt.ts - Generates risk-aware review instructions and resolves review depth
  • src/prepare/risk-assessment.ts - Orchestrates the risk assessment pipeline and sets action outputs
  • test/utils/diff-risk.test.ts - Unit tests for risk scoring and classification

How it works

  1. Computes diff statistics (additions, deletions, changed files, sensitive patterns)
  2. Calculates a weighted risk score based on lines changed, files changed, and sensitive file presence
  3. Classifies the PR into Low/Medium/High/Critical risk levels
  4. Adjusts review depth (shallow vs deep) based on the risk assessment
  5. Generates risk-aware review instructions for the reviewer

factory-nizar and others added 2 commits April 14, 2026 10:38
The internal droid-review.yml workflow passes automatic_security_review: true,
but @V3 does not support that input. The @V3 prepare step enters the combined
review code path which returns early without generating a prompt file, causing
the base-action to fail with 'Prompt file does not exist'.

Pointing at @dev picks up the concurrent security review support. This only
affects CI for this repo -- customers continue using @V3 in their own workflows.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Introduces a new risk assessment system that classifies PRs by risk level
based on diff statistics, sensitive file changes, and change volume.
Automatically adjusts review depth based on computed risk scores.
@factory-droid
Copy link
Copy Markdown
Contributor

factory-droid bot commented Apr 15, 2026

Droid finished @factory-nizar's task —— View job


Droid review complete.

Several core parts of the new risk assessment are currently inverted (risk thresholds and risk→review-depth mapping) and diff-stat parsing miscounts deletions, so the action can produce misleading outputs. Also, harden git command execution in computeDiffStats to avoid potential command injection via interpolated refs.

Comment thread src/utils/diff-risk.ts
Comment on lines +102 to +111
let level: RiskLevel;
if (score < 0.3) {
level = RiskLevel.High;
} else if (score < 0.5) {
level = RiskLevel.Medium;
} else if (score < 0.8) {
level = RiskLevel.Low;
} else {
level = RiskLevel.Critical;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Fix inverted risk level thresholds

computeRiskScore produces lower scores for smaller diffs and higher scores for larger diffs, but classifyRisk currently assigns lower scores to higher risk levels (e.g., score < 0.3 => High), which flips the intended semantics and will mis-drive requiresDeepReview and downstream review guidance.

Suggested change
let level: RiskLevel;
if (score < 0.3) {
level = RiskLevel.High;
} else if (score < 0.5) {
level = RiskLevel.Medium;
} else if (score < 0.8) {
level = RiskLevel.Low;
} else {
level = RiskLevel.Critical;
}
let level: RiskLevel;
if (score < 0.3) {
level = RiskLevel.Low;
} else if (score < 0.5) {
level = RiskLevel.Medium;
} else if (score < 0.8) {
level = RiskLevel.High;
} else {
level = RiskLevel.Critical;
}

Comment on lines +66 to +71
const added = parts[0]!;
const filePath = parts[2]!;

additions += added === "-" ? 0 : parseInt(added, 10);
deletions += added === "-" ? 0 : parseInt(added, 10);
changedFiles.push(filePath);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Count deletions from the correct git diff --numstat column

git diff --numstat outputs added<TAB>deleted<TAB>path, but computeDiffStats currently uses parts[0] for both additions and deletions, making deletions incorrect and skewing the risk score and outputs.

Suggested change
const added = parts[0]!;
const filePath = parts[2]!;
additions += added === "-" ? 0 : parseInt(added, 10);
deletions += added === "-" ? 0 : parseInt(added, 10);
changedFiles.push(filePath);
const added = parts[0]!;
const deleted = parts[1]!;
const filePath = parts[2]!;
additions += added === "-" ? 0 : parseInt(added, 10);
deletions += deleted === "-" ? 0 : parseInt(deleted, 10);
changedFiles.push(filePath);

Comment on lines +56 to +70
const isNegativeSentiment =
assessment.level === RiskLevel.Low || assessment.level === RiskLevel.Medium;
const isPositiveSentiment =
assessment.level === RiskLevel.High ||
assessment.level === RiskLevel.Critical;

if (isNegativeSentiment) {
return ReviewDepth.Deep;
}

if (isPositiveSentiment) {
return ReviewDepth.Shallow;
}

return ReviewDepth.Deep;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Fix inverted risk-to-review-depth mapping

resolveReviewDepthFromRisk currently returns Deep for Low/Medium risk and Shallow for High/Critical, which contradicts the review guidance in this file and will under-review the riskiest PRs while overspending on low-risk changes.

Suggested change
const isNegativeSentiment =
assessment.level === RiskLevel.Low || assessment.level === RiskLevel.Medium;
const isPositiveSentiment =
assessment.level === RiskLevel.High ||
assessment.level === RiskLevel.Critical;
if (isNegativeSentiment) {
return ReviewDepth.Deep;
}
if (isPositiveSentiment) {
return ReviewDepth.Shallow;
}
return ReviewDepth.Deep;
if (assessment.level === RiskLevel.Low) {
return ReviewDepth.Shallow;
}
return ReviewDepth.Deep;

- name: Checkout repository
uses: actions/checkout@v5
with:
fetch-depth: 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Ensure checkout history is sufficient for git-based diff/risk stats

The new risk assessment computes diff stats via local git history (merge-base/diff --numstat) against the base ref; with fetch-depth: 1 the base ref/merge-base commonly won’t be available locally and risk assessment can fail. Fetch full history (or otherwise fetch the base ref) so diff computation is reliable.

Suggested change
fetch-depth: 1
fetch-depth: 0

let mergeBase: string;
try {
mergeBase = execSync(
`git merge-base ${target} refs/remotes/origin/${baseRef}`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] [security] Avoid shelling out with interpolated refs in execSync

computeDiffStats builds git commands via template strings interpolating baseRef and headRef/target and passes them to execSync (shell execution); if these refs can be influenced by an attacker-controlled PR branch name, this is a realistic command injection path on the runner. Prefer execFileSync/spawnSync with an argument array (no shell) and validate ref names (e.g., via git check-ref-format) before invoking git.

The review skill now uses generic worker subagents with inlined
methodology instead of a named file-group-reviewer droid. Update
the security review prompt template to match.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant