feat: hardened AI review — strict JSON, slop filter, quote-or-die by avrabe · Pull Request #24 · pulseengine/temper

avrabe · 2026-04-26T06:43:20Z

Why

The deployed AI review on PR #23 was textbook small-model slop: duplicated content, generic platitudes, hallucinated risks ("tokens" in a config-flip PR), zero file:line anchors. Root cause is the freeform 5-section prompt — it gave the 3B model permission to fill each section with whatever sounded reviewer-ish.

Per the user's brief: every escape hatch the model uses to avoid producing concrete output must be made mechanically invalid, not just discouraged.

What

New module src/ai-review-prompt.js containing four locked-down primitives:

Primitive	Job
`STRICT_SYSTEM_PROMPT`	Forces JSON output, banned hedging words, banned slop phrases, NEVER refuse. One-shot example included.
`tryParseReview()`	Strict shape enforcement: enum verdict, array findings, all fields typed. On parse failure, return `{ok: false}` — no retry.
`filterFindings()`	Quote-or-die: drop any finding whose `quoted_line` isn't verbatim in the diff. Slop filter: drop any claim with hedging words or known filler phrases.
`computeVerdict()`	Deterministic from filtered findings (`approve` if empty, `comment` if any). Model's verdict is advisory only.

renderReviewMarkdown() returns null when verdict is approve and findings are empty — the bot then does NOT post. Silence > slop.

Behaviour change

Default ai_review.system_prompt flipped from the freeform 5-section string to STRICT_SYSTEM_PROMPT. Users with system_prompt: set explicitly in config.local.yml continue using the legacy freeform path (back-compat).
New review records carry assessment and findings count → dashboard's getReviewStats() finally has real data instead of unknown.

Test plan

All 737 tests pass (was 698 — added 39 covering parser, slop filter, verdict computation, render-skip behavior)
eslint clean
After merge + self-update: AI review on this PR (or the next one) should produce one of: (a) a tight comment with verdict + summary + file:line-anchored findings, or (b) no comment at all (when nothing concrete to flag).
If the model keeps producing prose around the JSON, follow-up: tighten `temperature` (currently 0.3) or add an assistant-prefix pre-fill (`"verdict": "`).

Risk & rollout

Risk: low. Worst case (Ollama produces unparseable output every time) is silent — no comments posted. Better than the current state where every comment is slop.
Rollout: self-update on merge. Confirmation = the next non-bot PR opened in any pulseengine repo gets either a strict-format review or no review.

Follow-up (PR-B)

Wire `rivet validate` / `rivet impact` as a prepended mechanical-oracle finding when the target repo has `rivet.yaml`. Findings from the oracle bypass the model entirely. Already prototyped — `rivet v0.4.3` against the rivet repo emits exactly the kind of name-anchored finding we want (e.g. `spar:SPAR-REQ-001 — no downstream artifacts`).

🤖 Generated with Claude Code

…-escape verdict ## Why The deployed AI review on PR #23 was textbook small-model slop: duplicated content, generic platitudes, hallucinated risks (claimed "tokens" in a config-flip PR), zero file:line anchors. Root cause is the freeform 5-section prompt (`src/ai-review.js:336`) — it gave the 3B model permission to fill each section with whatever sounded reviewer-ish. Per the user's request: every escape hatch the model uses to avoid producing concrete output must be made mechanically invalid. ## What New module `src/ai-review-prompt.js` containing four locked-down primitives: | Primitive | Job | |---|---| | `STRICT_SYSTEM_PROMPT` | Forces JSON output, banned hedging words, banned slop phrases, NEVER refuse. Includes a one-shot example to anchor format. | | `tryParseReview()` | Strict shape enforcement: enum verdict, array findings, all fields typed. On parse failure, return `{ok: false}` — no retry. | | `filterFindings()` | Quote-or-die: drop any finding whose `quoted_line` isn't verbatim in the diff. Slop filter: drop any claim with hedging words or known filler phrases. | | `computeVerdict()` | Deterministic from filtered findings (`approve` if empty, `comment` if any). Model's verdict is advisory only — postprocessor decides. | `renderReviewMarkdown()` returns `null` when verdict is `approve` and findings are empty — the bot then does NOT post. Silence > slop. ## Behaviour change - Default `ai_review.system_prompt` flipped from the freeform 5-section string to `STRICT_SYSTEM_PROMPT`. Users with `system_prompt:` set explicitly in `config.local.yml` continue using the legacy freeform path (back-compat for existing deployments). - New review records carry `assessment` and `findings` count → dashboard's `getReviewStats()` finally has real data instead of `unknown`. ## Test plan - [x] All 737 tests pass (was 698 — added 39 covering parser, slop filter, verdict computation, render-skip behavior) - [x] eslint clean - [ ] After merge + self-update: AI review on the next PR should produce one of: (a) a tight comment with verdict + summary + file:line-anchored findings, or (b) no comment at all (when nothing concrete to flag). - [ ] Verify on netcup logs that the JSON parse rate is high — if Ollama keeps producing prose around the JSON, may need to tighten the `temperature` config (currently 0.3) or add an assistant-prefix pre-fill. ## Risk & rollout - Risk: low. Worst case (model produces unparseable output every time) is silent — no comments posted. Better than the current state where every comment is slop. - Rollout: self-update on merge. Confirmation = the next non-bot PR opened in any pulseengine repo gets either a strict-format review or no review. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

temper-pulseengine · 2026-04-26T06:47:37Z

AI Code Review for PR #24

pulseengine/temper:feat/hardened-ai-review-prompt → pulseengine/temper:main

Summary of Changes

The PR introduces a new module ai-review-prompt.js that contains four locked-down primitives to ensure strict JSON output, ban hedging words and slop phrases, and enforce quote-or-die behavior. The default system prompt is now set to the strict-JSON contract, which requires the AI model to produce JSON output with specific fields and constraints.

Potential Bugs or Issues

Parsing Failure Handling: The tryParseReview() function does not handle parsing errors gracefully. If the AI response cannot be parsed as valid JSON, it will return {ok: false} without any further processing. This could lead to unexpected behavior if the bot attempts to render or use the review data.
Slop Filter Logic: The slop filter logic is based on known filler phrases and hedging words. However, there may be edge cases where these filters do not catch all potential issues. For example, a model might produce a comment that contains a phrase like "ensure proper error handling" without being flagged as a slop.
Quote-or-die Logic: The quote-or-die logic is based on the quoted_line field in each finding. However, there may be cases where a model produces a comment with a line number that does not match any actual changes in the diff. This could lead to false positives or negatives.
Performance Overhead: The introduction of new functions and constraints might introduce performance overhead, especially if the AI model is already under heavy load.
User Feedback: If users rely on the legacy freeform prompt for specific features or formatting, they may experience issues with the new strict-JSON contract.

Security Concerns

Data Exposure: The new system prompt and parsing logic introduce potential security risks if the AI model is trained on sensitive data. For example, if the model produces a comment that includes personal information or proprietary details, it could be exposed to unauthorized users.
Model Bias: The introduction of strict constraints might inadvertently bias the AI model towards certain types of comments or findings. This could lead to an unfair review process if the model is not trained on diverse data.

Suggestions for Improvement

Enhance Parsing Error Handling: Implement more robust error handling in tryParseReview() to provide better feedback to users and developers when parsing fails.
Expand Slop Filter Logic: Consider adding additional known filler phrases and hedging words to the slop filter logic to improve its effectiveness.
Implement Quote-or-die Validation: Ensure that the quote-or-die validation is robust enough to handle cases where a model produces comments with line numbers that do not match any actual changes in the diff.
Optimize Performance: Profile and optimize the new functions and constraints to minimize performance overhead.
User Feedback Mechanism: Provide a way for users to report issues or suggest improvements related to the new strict-JSON contract and parsing logic.

Overall Assessment

The PR introduces significant changes to the AI review system, which is a positive step towards improving its reliability and consistency. However, it also introduces potential bugs, security concerns, and performance overhead that need to be addressed before deployment. The team should carefully evaluate these issues and make necessary adjustments to ensure a robust and secure AI review process.

This review was generated by a local AI model. It is advisory only and may contain inaccuracies.

Reviewed at d76c1aa

avrabe merged commit ada4e06 into main Apr 26, 2026
5 checks passed

avrabe deleted the feat/hardened-ai-review-prompt branch April 26, 2026 07:54

This was referenced Apr 26, 2026

feat: rivet mechanical oracle (validate + impact → findings) #26

Merged

feat: wire rivet mechanical oracle into AI review #28

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: hardened AI review — strict JSON, slop filter, quote-or-die#24

feat: hardened AI review — strict JSON, slop filter, quote-or-die#24
avrabe merged 1 commit intomainfrom
feat/hardened-ai-review-prompt

avrabe commented Apr 26, 2026

Uh oh!

temper-pulseengine Bot commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

avrabe commented Apr 26, 2026

Why

What

Behaviour change

Test plan

Risk & rollout

Follow-up (PR-B)

Uh oh!

temper-pulseengine Bot commented Apr 26, 2026

AI Code Review for PR #24

Summary of Changes

Potential Bugs or Issues

Security Concerns

Suggestions for Improvement

Overall Assessment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant