Z3 bench reboot by hdson07 · Pull Request #465 · algorithmicsuperintelligence/openevolve

hdson07 · 2026-06-09T10:20:28Z

No description provided.

- build_samples.py replaces build_stage1_sample.py: scans raw-data, regenerates problems.jsonl, writes stage1_sample.json (5 SAT) and stage2_sample.json (50 SAT+UNSAT) using quintile-spread by (num_hard_constraints, num_variables). Runtime no longer used for selection — raw-data can accumulate beyond 50 without reshuffling. - evaluator.py: per-problem timeout = max(5s, baseline_ms * 1.3), replacing fixed OPENEVOLVE_STAGE{1,2}_TIMEOUT envs. Adaptive cap absorbs both small-problem startup overhead and huge-problem proportional headroom. _filter_stage2 added; _load_problems no longer filters by local_baseline (stage2 returns full set). - run_phase.sh / rebaseline_local.py / baseline_params.py: refs updated to build_samples.py. - README: reflect new sample layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- build_samples.py: SAT runtime quintile split → stage1 (Q1+2, 5), stage2 (Q3+4, 5), stage3 (Q5, 5); stage4 (broad 20, SAT+UNSAT, dedup vs stage1-3). Tukey IQR k=3 outlier filter drops far-tail monsters (~180s/~130s with current data) so they don't distort quintile boundaries or inflate stage wall-clock. - evaluator.py: evaluate_stage3 chains stage4 internally because openevolve cascade hardcodes 3 stage slots. Gate threshold for the internal stage3→stage4 transition read from config.evaluator.cascade_thresholds[2] via runtime.cascade_threshold(). - config.yaml: cascade_thresholds=[1.03, 1.03, 1.03] — 3% gain required at each transition (stage1→2, stage2→3, stage3→4). - runtime.py: cascade_threshold(index, default) helper. - rebaseline_local.py: extend union to all 4 stage samples. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add three evolution surfaces per phase (GLOBAL_OVERRIDES, SIZE_BUCKETS, STAGE3_OVERRIDES) with constraint-count buckets and outlier-only stage3 overrides. Stage3 sample now picks from Statistics/outliers_top.csv (top-5 residual, capped at MAX_BASELINE_MS). Evaluator resolves params per problem with backward-compat shim for legacy get_params(). extract_best / prepare_phase chain bucket + stage3 dicts across phases. config.yaml system_message embeds full CpSolverParameters reference (275 fields) so the LLM has authoritative knob names in every call.

- process_parallel: pass log_file to workers and configure root logger - gitignore + untrack cpsat-bench shared run artifacts (samples, baselines, phase outputs) - add phase5_custom_subsolvers initial program - tune cpsat evolve config, samples, evaluator, worker Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Lift per-bench evaluator / sampler / rebaseline / scoring / phase extraction / unified-phase materialization into reusable _lib modules. Per-bench surface shrinks to four hand-edited files (config.yaml, params.json, adapter.py, _solve_worker.py) plus phase initial_program.py modules. All orchestration commands route via `python -m _lib.<name> <bench>`; no per-bench wrapper scripts. New _lib modules: bench_paths, params_catalog, averaging, scorer, sampler, rebaseline, self_test, evaluator_core, evaluator_entry, final_verify, finalize. Extended extract_best + prepare_phase + load_bench_config to support CLI invocation by bench name. Per-bench changes: - shared/ subdirs removed entirely (worker moved to evolve/ root). - baseline_params.py / score.py / evaluator.py / runtime.py / <solver>_runner.py / build_samples.py / rebaseline_local.py / extract_best.py / prepare_phase*.py / final_verify.py wrappers deleted (logic now in _lib). - New adapter.py exposes solver-specific constants (DECISIVE_RESULTS, KEY_STATS, STATUS_FIELD, WORKERS_KEY, SCORE_MODE, STATS_WEIGHTS, get_problem_size). - New params.json: rich catalog (defaults + locked + grouped params with type/range/desc) replaces BASELINE/LOCKED dicts. - config.yaml gains bench.clustering, bench.evaluation, bench.adapter / params_catalog / worker_path / unified_dict_name. - z3 extras (validate_keys.py, probe_cli_keys.py, _probe_parallel.py, verify_stage1_baseline.py) and cpsat extras (benchmark_final.py, materialize_final.py) removed. - Phase initial_program.py modules: - imports BASELINE from params_catalog - EVOLVE-BLOCK reset to empty (clean slate) - robust bench-root resolution via OPENEVOLVE_BENCH_ROOT env + walk-up fallback (works after openevolve copies program to /tmp) run_phase.sh: invokes _lib modules directly, exports OPENEVOLVE_BENCH_ROOT, passes _lib/evaluator_entry.py as evaluator, auto-runs _lib.finalize after the last phase to emit <bench>/evolve/final_program.py (self-contained — all _load_prev*() calls inlined as literal dicts so no cache/ dependency at runtime). Behavior unified across solvers: - 10-run averaging is the standard (configurable via bench.evaluation.repeats). - Clustering driven by bench.clustering (kmeans / quintile / thresholds); z3 clusters by features.num_hard_constraints, cpsat by features.num_constraints. - Score mode pluggable: `cost` (cpsat: dtime + cost_ratio) or `speedup` (z3: wall-clock). - SIZE_BUCKETS / STAGE3_OVERRIDES are opt-in via bench.evaluation.enable_size_buckets / enable_outlier_stage. - LLM param validation runs against the catalog before subprocess spawn — unknown / range-violating keys surface as invalid_param without a solver run. Generated stage samples + local_baseline now live in <bench>/evolve/cache/ (safe to delete; regenerated on first run). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- _lib/sampler.py: print per-stage report (picks vs pool quartiles for baseline_ms + feature, result distribution, pick list with sha / result / ms / feature). Surfaces clustering quality + sample spread at a glance. - cpsat-bench/build_problems.py + z3-bench/build_problems.py: scan raw-data/*.meta.jsonl and emit problems.jsonl (one row per meta entry, sorted by sha + applied_params_hash). Replaces the raw-scan portion of the deleted build_samples.py wrappers. Flags: --filter-decisive, --applied-params-hash, --dry-run, --out. _lib stays solver-agnostic; these scripts handle solver-specific meta.jsonl format. - ADD_NEW_SOLVER.md: quick-start guide for integrating a new solver (cvc5, minizinc, picosat, ...). Covers raw-data layout, problems.jsonl schema, the 4 hand-edited per-bench files (config.yaml + params.json + adapter.py + _solve_worker.py), phase modules, verification steps, option decisions (score_mode / worker axis / buckets / clustering), common pitfalls, time estimates. - .gitignore: stop tracking locally-regenerated artifacts. Drop the dead `shared/*` patterns (refactor removed shared/), add: input/*/evolve/cache/ # _lib regenerates on demand input/*/evolve/final_program.py # _lib.finalize regenerates input/*/evolve/phase*/openevolve_output*/ input/*/evolve/phase*/__pycache__/ Untrack 8 previously-committed cache/stage{1..4}_sample.json files (still on disk, rebuilt by _lib.sampler). - cpsat/z3 config.yaml: minor knob tweaks (user-side feature-path tuning during smoke runs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per-bench pipeline generator skill at .claude/skills/openevolve-pipeline/. Drives the ADD_NEW_SOLVER.md contract: interview user, scaffold 4 per-bench files + N phase modules under input/<bench>/evolve/, verify with _lib CLIs (sampler / self_test / rebaseline / run_phase.sh). Templates cover both speedup (z3-style flat overrides) and cost+worker-axis (cpsat-style SIZE_BUCKETS) phase shapes, plus Python-binding and CLI-binary worker variants. References split into interview / decision-guide / verify / gotchas to keep SKILL.md short. Unignore .claude/skills/ so the skill ships with the repo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- z3-bench: build_problems_reboot.py, sat/optimize config examples, phase4_unified update, regenerated shared + cache-sat baselines, and a 20260519 backup snapshot of the prior phase layout - _lib: sampler/scorer/evaluator_core/finalize/bench_paths refinements - openevolve-pipeline skill + claude_code llm backend updates - gitignore: ignore input/z3-bench/reboot-raw-data Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

CLAassistant · 2026-06-09T10:20:36Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

hdson-Axion and others added 30 commits May 18, 2026 09:59

[feat] initial implement for z3-solver opt pipeline

b1c43c1

[feat] add claude code sdk backend

8b15494

Merge branch 'hdson/z3-initial-tune'

a94cadd

[feat] upgrade optimize loop

626e54c

[feat] enable multi-pricessing

82c3648

[fix] update utils

8eb20fd

update samples

ceb1150

[fix] remove rebaseline timeout

d005d86

change core idx

32b19d3

use core isolation on docker

88f0f68

Merge branch 'main' of github.com:hdson07/openevolve

78a1008

Merge branch 'main' of github.com:hdson07/openevolve

9eaa911

add center pick option

e33438d

update stats ratio

ea29ccc

[fix] update scoring logic

0161fef

[feat] enable cpsat bench

e21d456

fix core allocation error

9f789df

add verify on cpsat

4939185

update regression score logic

da66606

add final bench

89950c8

sample clustering

22d0a99

cpsat update phase

045a561

fix rebase logic

b550561

[feat] add statistics for cp-sat

d484866

update default params

1c8b7cf

tune stage3 sampling

e51f4ab

tune score logic

739fcc8

hdson-Axion and others added 15 commits May 26, 2026 18:25

add optimization target prompts

3a4ebdd

avoid zero sample crash:

8d77730

chage claude max turn

6642f78

add isolate large profile

44aba61

stage 3 tune

c51b489

[docs] add openevolve study

a0bb884

[docs] update slides

4dc2be8

[feat] add small profile mode

35e87cf

[docs] update docs

0555709

[fix] extract final solutions

b9e60ef

hdson07 closed this Jun 9, 2026

hdson07 deleted the z3-bench-reboot branch June 9, 2026 10:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Z3 bench reboot#465

Z3 bench reboot#465
hdson07 wants to merge 45 commits into
algorithmicsuperintelligence:mainfrom
hdson07:z3-bench-reboot

hdson07 commented Jun 9, 2026

Uh oh!

CLAassistant commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hdson07 commented Jun 9, 2026

Uh oh!

CLAassistant commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants