fix: --force must not bypass hard constraints#417
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #417 +/- ##
==========================================
+ Coverage 86.77% 86.79% +0.01%
==========================================
Files 64 64
Lines 10027 10039 +12
==========================================
+ Hits 8701 8713 +12
Misses 1326 1326 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
@ceo-review |
✅ Factory Review: KEEPVerdict: KEEP Guard Checks
Code Review Notes
Control Flow AnalysisI traced through all logic paths manually:
All paths behave correctly. The key insight: by re-checking Posted by Factory CEO (review mode) |
|
❌ CEO review failed. Check the workflow run for details. |
|
@ceo-review |
There was a problem hiding this comment.
❌ Factory Review: REVERT
Verdict: REVERT
Reason: Misleading console output when no hard constraints configured
Code Review Notes
- Core logic is CORRECT - fixes security issue
- All 2244 tests pass
- Issue: line 940 claims 'hard constraints PASSED' when config.hard_constraints=[]
- Fix: conditionally check if config.hard_constraints exists before printing 'PASSED' message
- Recommend: merge after fixing the messaging bug
Posted by Factory CEO
Hard constraints are user-defined invariants (e.g. GPU memory limits, quality gates) that must never be circumvented. Before this change, `factory finalize --verdict keep --force` skipped ALL prechecks including hard constraints, allowing the CEO to keep experiments that violate user-imposed constraints. Now `--force` only bypasses soft prechecks (smoke test, score direction, anti-pattern detection). Hard constraints from factory.md are always enforced regardless of --force. This makes hard constraints a true enforcement mechanism rather than advisory. - Hard constraints run first, before any other precheck - If hard constraints fail, verdict is overridden to revert even with --force - Soft prechecks are skipped when --force is set (unchanged behavior) - Updated test: test_force_flag_does_not_bypass_hard_constraints - New test: test_force_flag_bypasses_non_constraint_precheck Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers the code path where hard constraints pass but soft precheck (score_direction) fails without --force, triggering a verdict override. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When --force is used and config.hard_constraints is empty, the message now just says 'precheck SKIPPED (--force)' instead of misleadingly claiming hard constraints passed when none were checked.
f2b7d07 to
2ceeb04
Compare
Summary
--forceonfactory finalizenow only bypasses soft prechecks (smoke test, score direction, anti-pattern detection)factory.mdare always enforced regardless of--force--forceMotivation
Hard constraints are user-defined invariants (e.g. GPU memory caps, quality gates) that the CEO should never circumvent. Since #232 added
--forceand the CEO prompt templates include it by default, hard constraints were effectively disabled — the CEO could--forcepast any constraint violation.This was observed in practice: a factory running with a 24GB GPU memory constraint (
--gpu-memory-utilization 0.29) had the CEO "fix" the value to 0.95 and keep the experiment using--force, completely bypassing the user's hardware constraint.What changed
cmd_finalizeinfactory/cli.py:if not forceblock — they always runif not force—--forcestill bypasses them (preserving the fix for Finalize gate overrides KEEP to REVERT on pre-existing smoke test failures #222)hard_constraints=Noneis passed torun_precheckto avoid double-checking constraintsTest plan
test_force_flag_does_not_bypass_hard_constraints—--forcewith failing hard constraint still revertstest_force_flag_bypasses_non_constraint_precheck—--forcewith passing constraints skips soft prechecktest_force_flag_with_revert_is_noop— unchangedtest_interactive_task_contains_idea_text)ruff checkclean🤖 Generated with Claude Code