Skip to content

feat: hardened AI review — strict JSON, slop filter, quote-or-die#24

Merged
avrabe merged 1 commit intomainfrom
feat/hardened-ai-review-prompt
Apr 26, 2026
Merged

feat: hardened AI review — strict JSON, slop filter, quote-or-die#24
avrabe merged 1 commit intomainfrom
feat/hardened-ai-review-prompt

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented Apr 26, 2026

Why

The deployed AI review on PR #23 was textbook small-model slop: duplicated content, generic platitudes, hallucinated risks ("tokens" in a config-flip PR), zero file:line anchors. Root cause is the freeform 5-section prompt — it gave the 3B model permission to fill each section with whatever sounded reviewer-ish.

Per the user's brief: every escape hatch the model uses to avoid producing concrete output must be made mechanically invalid, not just discouraged.

What

New module src/ai-review-prompt.js containing four locked-down primitives:

Primitive Job
STRICT_SYSTEM_PROMPT Forces JSON output, banned hedging words, banned slop phrases, NEVER refuse. One-shot example included.
tryParseReview() Strict shape enforcement: enum verdict, array findings, all fields typed. On parse failure, return {ok: false} — no retry.
filterFindings() Quote-or-die: drop any finding whose quoted_line isn't verbatim in the diff. Slop filter: drop any claim with hedging words or known filler phrases.
computeVerdict() Deterministic from filtered findings (approve if empty, comment if any). Model's verdict is advisory only.

renderReviewMarkdown() returns null when verdict is approve and findings are empty — the bot then does NOT post. Silence > slop.

Behaviour change

  • Default ai_review.system_prompt flipped from the freeform 5-section string to STRICT_SYSTEM_PROMPT. Users with system_prompt: set explicitly in config.local.yml continue using the legacy freeform path (back-compat).
  • New review records carry assessment and findings count → dashboard's getReviewStats() finally has real data instead of unknown.

Test plan

  • All 737 tests pass (was 698 — added 39 covering parser, slop filter, verdict computation, render-skip behavior)
  • eslint clean
  • After merge + self-update: AI review on this PR (or the next one) should produce one of: (a) a tight comment with verdict + summary + file:line-anchored findings, or (b) no comment at all (when nothing concrete to flag).
  • If the model keeps producing prose around the JSON, follow-up: tighten `temperature` (currently 0.3) or add an assistant-prefix pre-fill (`"verdict": "`).

Risk & rollout

  • Risk: low. Worst case (Ollama produces unparseable output every time) is silent — no comments posted. Better than the current state where every comment is slop.
  • Rollout: self-update on merge. Confirmation = the next non-bot PR opened in any pulseengine repo gets either a strict-format review or no review.

Follow-up (PR-B)

Wire `rivet validate` / `rivet impact` as a prepended mechanical-oracle finding when the target repo has `rivet.yaml`. Findings from the oracle bypass the model entirely. Already prototyped — `rivet v0.4.3` against the rivet repo emits exactly the kind of name-anchored finding we want (e.g. `spar:SPAR-REQ-001 — no downstream artifacts`).

🤖 Generated with Claude Code

…-escape verdict

## Why
The deployed AI review on PR #23 was textbook small-model slop: duplicated
content, generic platitudes, hallucinated risks (claimed "tokens" in a
config-flip PR), zero file:line anchors. Root cause is the freeform 5-section
prompt (`src/ai-review.js:336`) — it gave the 3B model permission to fill
each section with whatever sounded reviewer-ish.

Per the user's request: every escape hatch the model uses to avoid producing
concrete output must be made mechanically invalid.

## What
New module `src/ai-review-prompt.js` containing four locked-down primitives:

| Primitive | Job |
|---|---|
| `STRICT_SYSTEM_PROMPT` | Forces JSON output, banned hedging words, banned slop phrases, NEVER refuse. Includes a one-shot example to anchor format. |
| `tryParseReview()` | Strict shape enforcement: enum verdict, array findings, all fields typed. On parse failure, return `{ok: false}` — no retry. |
| `filterFindings()` | Quote-or-die: drop any finding whose `quoted_line` isn't verbatim in the diff. Slop filter: drop any claim with hedging words or known filler phrases. |
| `computeVerdict()` | Deterministic from filtered findings (`approve` if empty, `comment` if any). Model's verdict is advisory only — postprocessor decides. |

`renderReviewMarkdown()` returns `null` when verdict is `approve` and
findings are empty — the bot then does NOT post. Silence > slop.

## Behaviour change
- Default `ai_review.system_prompt` flipped from the freeform 5-section
  string to `STRICT_SYSTEM_PROMPT`. Users with `system_prompt:` set
  explicitly in `config.local.yml` continue using the legacy freeform path
  (back-compat for existing deployments).
- New review records carry `assessment` and `findings` count → dashboard's
  `getReviewStats()` finally has real data instead of `unknown`.

## Test plan
- [x] All 737 tests pass (was 698 — added 39 covering parser, slop filter,
      verdict computation, render-skip behavior)
- [x] eslint clean
- [ ] After merge + self-update: AI review on the next PR should produce
      one of: (a) a tight comment with verdict + summary + file:line-anchored
      findings, or (b) no comment at all (when nothing concrete to flag).
- [ ] Verify on netcup logs that the JSON parse rate is high — if Ollama
      keeps producing prose around the JSON, may need to tighten the
      `temperature` config (currently 0.3) or add an assistant-prefix
      pre-fill.

## Risk & rollout
- Risk: low. Worst case (model produces unparseable output every time) is
  silent — no comments posted. Better than the current state where every
  comment is slop.
- Rollout: self-update on merge. Confirmation = the next non-bot PR opened
  in any pulseengine repo gets either a strict-format review or no review.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@temper-pulseengine
Copy link
Copy Markdown
Contributor

AI Code Review for PR #24

pulseengine/temper:feat/hardened-ai-review-prompt → pulseengine/temper:main

Summary of Changes

The PR introduces a new module ai-review-prompt.js that contains four locked-down primitives to ensure strict JSON output, ban hedging words and slop phrases, and enforce quote-or-die behavior. The default system prompt is now set to the strict-JSON contract, which requires the AI model to produce JSON output with specific fields and constraints.

Potential Bugs or Issues

  1. Parsing Failure Handling: The tryParseReview() function does not handle parsing errors gracefully. If the AI response cannot be parsed as valid JSON, it will return {ok: false} without any further processing. This could lead to unexpected behavior if the bot attempts to render or use the review data.

  2. Slop Filter Logic: The slop filter logic is based on known filler phrases and hedging words. However, there may be edge cases where these filters do not catch all potential issues. For example, a model might produce a comment that contains a phrase like "ensure proper error handling" without being flagged as a slop.

  3. Quote-or-die Logic: The quote-or-die logic is based on the quoted_line field in each finding. However, there may be cases where a model produces a comment with a line number that does not match any actual changes in the diff. This could lead to false positives or negatives.

  4. Performance Overhead: The introduction of new functions and constraints might introduce performance overhead, especially if the AI model is already under heavy load.

  5. User Feedback: If users rely on the legacy freeform prompt for specific features or formatting, they may experience issues with the new strict-JSON contract.

Security Concerns

  1. Data Exposure: The new system prompt and parsing logic introduce potential security risks if the AI model is trained on sensitive data. For example, if the model produces a comment that includes personal information or proprietary details, it could be exposed to unauthorized users.

  2. Model Bias: The introduction of strict constraints might inadvertently bias the AI model towards certain types of comments or findings. This could lead to an unfair review process if the model is not trained on diverse data.

Suggestions for Improvement

  1. Enhance Parsing Error Handling: Implement more robust error handling in tryParseReview() to provide better feedback to users and developers when parsing fails.

  2. Expand Slop Filter Logic: Consider adding additional known filler phrases and hedging words to the slop filter logic to improve its effectiveness.

  3. Implement Quote-or-die Validation: Ensure that the quote-or-die validation is robust enough to handle cases where a model produces comments with line numbers that do not match any actual changes in the diff.

  4. Optimize Performance: Profile and optimize the new functions and constraints to minimize performance overhead.

  5. User Feedback Mechanism: Provide a way for users to report issues or suggest improvements related to the new strict-JSON contract and parsing logic.

Overall Assessment

The PR introduces significant changes to the AI review system, which is a positive step towards improving its reliability and consistency. However, it also introduces potential bugs, security concerns, and performance overhead that need to be addressed before deployment. The team should carefully evaluate these issues and make necessary adjustments to ensure a robust and secure AI review process.


This review was generated by a local AI model. It is advisory only and may contain inaccuracies.

Reviewed at d76c1aa

@avrabe avrabe merged commit ada4e06 into main Apr 26, 2026
5 checks passed
@avrabe avrabe deleted the feat/hardened-ai-review-prompt branch April 26, 2026 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant