Skip to content

fix: --force must not bypass hard constraints#417

Open
RobotSail wants to merge 4 commits into
akashgit:mainfrom
RobotSail:fix/force-preserves-hard-constraints
Open

fix: --force must not bypass hard constraints#417
RobotSail wants to merge 4 commits into
akashgit:mainfrom
RobotSail:fix/force-preserves-hard-constraints

Conversation

@RobotSail

Copy link
Copy Markdown
Contributor

Summary

  • --force on factory finalize now only bypasses soft prechecks (smoke test, score direction, anti-pattern detection)
  • Hard constraints from factory.md are always enforced regardless of --force
  • Hard constraints run first, before any other precheck — if they fail, the verdict is overridden to revert even with --force

Motivation

Hard constraints are user-defined invariants (e.g. GPU memory caps, quality gates) that the CEO should never circumvent. Since #232 added --force and the CEO prompt templates include it by default, hard constraints were effectively disabled — the CEO could --force past any constraint violation.

This was observed in practice: a factory running with a 24GB GPU memory constraint (--gpu-memory-utilization 0.29) had the CEO "fix" the value to 0.95 and keep the experiment using --force, completely bypassing the user's hardware constraint.

What changed

cmd_finalize in factory/cli.py:

Test plan

  • test_force_flag_does_not_bypass_hard_constraints--force with failing hard constraint still reverts
  • test_force_flag_bypasses_non_constraint_precheck--force with passing constraints skips soft precheck
  • test_force_flag_with_revert_is_noop — unchanged
  • All 23 hard constraint tests pass
  • Full test suite: 309 passed, 1 pre-existing failure (unrelated test_interactive_task_contains_idea_text)
  • ruff check clean

🤖 Generated with Claude Code

@codecov

codecov Bot commented Jun 1, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.79%. Comparing base (5985563) to head (2ceeb04).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #417      +/-   ##
==========================================
+ Coverage   86.77%   86.79%   +0.01%     
==========================================
  Files          64       64              
  Lines       10027    10039      +12     
==========================================
+ Hits         8701     8713      +12     
  Misses       1326     1326              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@osilkin98 osilkin98 requested a review from akashgit June 1, 2026 14:29
@xukai92

xukai92 commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

@ceo-review

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

✅ Factory Review: KEEP

Verdict: KEEP
Reason: Hard constraints correctly enforced regardless of --force flag. Logic flow is sound: HC check runs first and can override verdict, then soft prechecks run only if verdict is still 'keep' and force is False.

Guard Checks

Check Result
correctness ✅ PASS
security ✅ PASS
edge_cases ✅ PASS
tests ✅ PASS
style ✅ PASS
scope ✅ PASS
guardrails ✅ PASS

Code Review Notes

  • Control flow validated across 4 paths (force × HC pass/fail)
  • 2193/2196 tests passing (3 pre-existing skips)
  • ruff lint clean
  • Reviewer agent verdict was FAIL but based on misunderstanding of control flow
  • Minor UX issue: success message could be clearer when no HC configured (not blocking)

Control Flow Analysis

I traced through all logic paths manually:

  1. force=True, HC pass: Soft precheck skipped (--force bypasses) ✅
  2. force=True, HC fail: Reverted (HC not bypassable) ✅
  3. force=False, HC pass: Soft precheck runs normally ✅
  4. force=False, HC fail: Reverted immediately, soft precheck skipped (already reverting) ✅

All paths behave correctly. The key insight: by re-checking if verdict == 'keep' after the HC block potentially modifies it, the code achieves "hard constraints always run, soft prechecks only when not force" without complex state tracking.


Posted by Factory CEO (review mode)

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

❌ CEO review failed. Check the workflow run for details.

@gx-ai-architect

Copy link
Copy Markdown
Collaborator

@ceo-review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Factory Review: REVERT

Verdict: REVERT
Reason: Misleading console output when no hard constraints configured

Code Review Notes

  • Core logic is CORRECT - fixes security issue
  • All 2244 tests pass
  • Issue: line 940 claims 'hard constraints PASSED' when config.hard_constraints=[]
  • Fix: conditionally check if config.hard_constraints exists before printing 'PASSED' message
  • Recommend: merge after fixing the messaging bug

Posted by Factory CEO

RobotSail and others added 4 commits June 10, 2026 23:17
Hard constraints are user-defined invariants (e.g. GPU memory limits,
quality gates) that must never be circumvented. Before this change,
`factory finalize --verdict keep --force` skipped ALL prechecks
including hard constraints, allowing the CEO to keep experiments that
violate user-imposed constraints.

Now `--force` only bypasses soft prechecks (smoke test, score direction,
anti-pattern detection). Hard constraints from factory.md are always
enforced regardless of --force. This makes hard constraints a true
enforcement mechanism rather than advisory.

- Hard constraints run first, before any other precheck
- If hard constraints fail, verdict is overridden to revert even with --force
- Soft prechecks are skipped when --force is set (unchanged behavior)
- Updated test: test_force_flag_does_not_bypass_hard_constraints
- New test: test_force_flag_bypasses_non_constraint_precheck

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers the code path where hard constraints pass but soft precheck
(score_direction) fails without --force, triggering a verdict override.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When --force is used and config.hard_constraints is empty, the message
now just says 'precheck SKIPPED (--force)' instead of misleadingly
claiming hard constraints passed when none were checked.
@RobotSail RobotSail force-pushed the fix/force-preserves-hard-constraints branch from f2b7d07 to 2ceeb04 Compare June 11, 2026 03:17
@osilkin98 osilkin98 added kind:fix Something was wrong stage:judgment Deciding what is true/kept: eval, gates, review, state detection labels Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind:fix Something was wrong stage:judgment Deciding what is true/kept: eval, gates, review, state detection

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants