This is an experiment to have the LLM do its own quantitative research across multiple parallel strategies, not just iterate on one.
The core bet of v0.2.0: since backtests are cheap and single-strategy iteration
tends to anchor on a single paradigm (see versions/0.1.0/retrospective.md),
maintaining up to 3 strategies simultaneously should resist anchoring and
produce a richer picture of what genuinely works on this asset pair.
To set up a new experiment, work with the user to:
- Agree on a run tag: propose a tag based on today's date (e.g.
may1). The branchautoresearch/<tag>must not already exist. - Create the branch:
git checkout -b autoresearch/<tag>from currentmaster. - Read the in-scope files. The repo is small. Read these files for full context:
README.md— repository contextconfig.json— fixed FreqTrade config (pairs, timeframe, fees). Do not modify.prepare.py— data download. Do not modify.run.py— the batch backtest oracle. Do not modify.user_data/strategies/_template.py.example— skeleton for new strategies. Note: the folder may also contain__pycache__; ignore it.versions/0.1.0/retrospective.md— optional but valuable context on what the previous run discovered and what went wrong with single-file iteration.
- Verify data exists: Check that
user_data/data/BTC_USDT-1h.featheranduser_data/data/ETH_USDT-1h.featherexist. If not, tell the user to runuv run prepare.py. - Initialize results.tsv: Create
results.tsvwith just the header row:Tab-separated. Do not commit this file — it's gitignored on purpose.commit event strategy_name sharpe max_dd note - Create 1-3 starting strategies. This is the most important setup step.
- Each strategy goes in its own file:
user_data/strategies/<YourName>.py - Class name MUST match filename stem (FreqTrade requirement)
- Each strategy's docstring MUST fill all 5 metadata fields (Paradigm, Hypothesis, Parent, Created, Status)
- Each strategy MUST target a different paradigm. Don't create 3 mean-reversion variants as a "safe start" — that defeats the whole point of v0.2.0. Pick from: mean-reversion, trend-following, volatility, breakout, other. At least 2 different categories.
- Keep each strategy minimal initially. You'll iterate in the loop.
- Each strategy goes in its own file:
- Confirm and go: Confirm setup looks good with the user.
Once you get confirmation, kick off the experimentation.
Each round runs a backtest on ALL active strategies on a fixed timerange
(20230101-20251231) over BTC/USDT and ETH/USDT at 1h. run.py emits one
--- summary block per strategy.
- Modify any file under
user_data/strategies/(that isn't prefixed_) - Create a new strategy file
- Delete a strategy file (via
git rm) - Copy an existing strategy to create a variant (fork)
- Modify
prepare.py,run.py, orconfig.json. These are the evaluation contract. uv addnew dependencies. Use what's already inpyproject.toml.- Call the
freqtradeCLI directly. The only way to run backtests is viauv run run.py. - Modify the timerange, pair list, or
_template.py.example. - Have more than 3 active strategies at any time (see hard cap below).
Rule 1: Hard cap — 3 active strategies.
At any moment, user_data/strategies/ must contain at most 3 non-underscore
.py files. To add a 4th, you must first git rm one of the existing.
Rule 2: Stagnation gate — 3 stable rounds.
Each round, every strategy gets one of these events logged in results.tsv:
create— you added it this roundevolve— you modified it this roundstable— it existed, got measured, but you didn't touch itfork— you copied it to create a derivative (logged on the child, withparent→childin the strategy_name field)kill— you removed it this round
If a strategy has accumulated 3 consecutive stable events with no evolve
or fork, the next round it MUST receive one of: evolve, fork, or kill.
Cannot sit idle for a 4th stable round. You decide which treatment. This rule
exists because the cap is 3 — we can't afford a slot sitting still.
Rule 3: Every round must touch at least one strategy.
A round where all events are stable is not an experiment — it's wasted time.
At minimum, evolve one strategy per round. (Exception: the very first backtest
round right after setup, where you log create events for what you built.)
Rule 4: Paradigm diversity at setup. See setup step 6 above. First 1-3 strategies must target different paradigms. After that, you're free to create same-paradigm variants (e.g. two mean-reversion approaches with different signals) — but sparingly. Diversity is more valuable than depth in this run.
Once run.py finishes, stdout has one --- block per strategy, like:
---
strategy: MeanRevRSI
commit: abc1234
timerange: 20230101-20251231
sharpe: 0.8234
sortino: 1.0412
calmar: 0.5821
total_profit_pct: 45.3210
max_drawdown_pct: -8.9123
trade_count: 142
win_rate_pct: 54.2300
profit_factor: 1.6700
pairs: BTC/USDT,ETH/USDT
---
strategy: TrendDonchian
commit: abc1234
...
If a strategy crashes, its block looks like:
---
strategy: SomeBrokenStrategy
commit: abc1234
status: ERROR
error_type: NameError
error_msg: name 'foo' is not defined
traceback:
...
Extract all strategies' metrics at once:
grep "^---\|^strategy:\|^sharpe:\|^trade_count:\|^max_drawdown_pct:" run.logFull per-strategy block:
awk '/^---$/,/^$/' run.logAfter each round, append one row to results.tsv per strategy touched.
Tab-separated, 6 columns:
commit event strategy_name sharpe max_dd note
Rules:
commitis the short git hash of the round's commiteventis one ofcreate | evolve | stable | fork | kill- For
fork,strategy_nameusesparent→childformat (e.g.MeanRevRSI→MRVolGate) - For
kill, leavesharpeandmax_ddas-(dash). The strategy is gone. noteis your reasoning in free text. This is load-bearing — when you later decide keep vs kill, you re-read these notes. Be specific:- Bad:
"tried MACD, didn't work" - Good:
"replaced RSI entry with MACD cross-up. wr 68→51, sharpe 0.82→0.31. MACD crossovers on 1h crypto trigger inside ongoing drops, catching knives. Discarding paradigm."
- Bad:
- Every strategy that exists this round gets a row, even if
stable. This is how the stagnation counter stays visible.
Do NOT commit results.tsv. It is gitignored on purpose — the log
survives git reset --hard, which is essential so you don't forget what
you've already tried.
The experiment runs on the dedicated branch (e.g. autoresearch/may1).
LOOP FOREVER:
-
Look at state: read
results.tsv(tail ~30 rows), note the current active strategies and their stagnation counters (how many consecutivestableevents each has). -
Decide this round's action. Your toolkit per round:
evolve <strategy>: modify an existing strategy filecreate <name>: add a new strategy (if cap has room)fork <parent>→<child>: cp a strategy file to a new name, then modify the childkill <strategy>:git rmthe file- You can combine: e.g. "kill A and create B in the same commit" (make room for something new), or "fork A→A' and evolve B" (two strategies touched)
-
Respect the rules. In particular:
- Cap: max 3 active strategies after this round's changes
- Stagnation: any strategy with 3 prior consecutive
stableevents must be evolved, forked, or killed THIS round - Every round touches ≥ 1 strategy
-
Make the code changes. Write/modify files under
user_data/strategies/. -
git commit -am "<short summary of this round>" -
Run the backtest:
uv run run.py > run.log 2>&1 -
Read the summary:
awk '/^---$/,/^$/' run.log(shows all blocks) orgrep "^---\|^strategy:\|^sharpe:\|^trade_count:" run.log(compact). -
Check for crashes: a strategy with
status: ERRORneeds to be fixed (if the error is trivial — syntax, typo) OR killed (if the hypothesis is broken). Don't leave ERROR strategies around. -
Log to results.tsv: one row per strategy that existed this round. Fill in the event, metrics (or
-for kills), and your reasoning note. -
Decide keep vs rollback.
- Common case: per-strategy decisions happen inline (you either evolved to something better, or the change was bad and you git-reset only that strategy's commit). The whole round doesn't have one "keep/discard" decision — individual strategies do.
- If the whole round was a mistake (broke everything, wrong direction),
git reset --hard HEAD~1to undo all changes. - If some strategies improved and others didn't: keep the commit, log
stablefor the unchanged ones, logevolveorkilletc. for the changed ones.
-
Loop.
A strategy deserves to stay if:
- It has positive edge (Sharpe > 0, profit factor > 1) with meaningful trade count (>20 trades over 3 years)
- Its paradigm is distinct from what the other active strategies cover
- Recent evolutions have moved it in the right direction
A strategy deserves to die if:
- It's been stable for 3 rounds with no improvement and you don't have a new idea for it
- Its paradigm overlaps strongly with another active strategy that's doing better
- You need its slot for a fresh paradigm you want to try
- It's consistently negative and further tweaks won't help (e.g. wrong timeframe for this paradigm)
Always log your reasoning. These notes become the v0.2.0 retrospective — future you (and the meta-analysis layer) will read them to extract what this run actually learned about BTC/ETH 1h.
From v0.1.0 we learned the agent can inadvertently game the metric:
exit_profit_only=True→ 100% win rate by never realizing losses (regime-dependent, breaks in bear markets)- Tight
minimal_roiclipping → tiny uniform returns → low stddev → huge Sharpe (profit goes DOWN even as Sharpe goes UP)
If you find a Sharpe jump that comes with a profit drop or a DD collapse to ~0, that's a gaming signal, not real edge. Log it, document the mechanism, then either kill the strategy or explicitly note "this is an oracle artifact, not edge" in the description.
Multi-strategy helps here: if strategy A's Sharpe jumps while B and C stay flat on the same data, A's jump is more likely a real discovery. If ALL three strategies' Sharpe jumped on the same commit — you probably modified something shared, or the oracle itself has a hole.
Timeout: each full round (3 strategies × backtest) should take under 3 minutes. If a single run exceeds 10 minutes, kill it and treat it as a failure (revert the commit, skip the round).
Once the experiment loop has begun (after initial setup), do NOT pause to ask the human if you should continue. Do NOT ask "should I keep going?" or "is this a good stopping point?". The human may be asleep, or away from the computer, and expects you to continue working indefinitely until manually stopped.
If you run out of ideas:
- Re-read the
versions/0.1.0/retrospective.md— it lists directions v0.1.0 never tried (multi-timeframe, ATR dynamic exits, volatility regime, per-pair customization, time-of-day effects) - Look at your stagnant strategies — can you fork them with a bolder change?
- Try combining winners from different paradigms (e.g. a volatility-gated version of a winning mean-reversion strategy)
- Try completely new indicator families you haven't touched
The loop runs until the human interrupts you, period.
As an example use case, a user might leave you running while they sleep. Each round of 3-strategy backtests takes ~2-3 minutes, so you can run several dozen per hour. The user then wakes up to a rich multi-strategy research trace ready for meta-analysis.