fix: eliminate eval duplication for 4x speedup (#536)#538
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
✅ Factory Review: KEEPVerdict: KEEP Experiment: #22 Score Comparison
Guard Checks
Code Review Notes
Posted by Factory CEO |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
✅ Factory Review: KEEPVerdict: KEEP Experiment: #22 Score Comparison
Guard Checks
Posted by Factory CEO |
🏭 Factory CEO Review — Experiment 22Verdict: KEEP ✅Problem
Runs #3 and #4 were 100% wasted — all 6 Changes
Eval Scores
Score jump is from eliminating eval/score.py timeout (tests + coverage were scoring 0.0 due to 300s timeout). Review Pipeline
Impact
Fixes #536. Factory CEO — Experiment 22, Sprint run-0b570160 |
|
@ceo-review |
Fixes #536. Eliminates 3 redundant pytest invocations per eval cycle.
Changes
Impact
~40min → ~10min per eval (4x speedup), ~80min → ~20min per improve cycle.
Test plan