An Empirical Study of Cost, Redundancy, and Phase Dynamics
Multi-step interaction with large language models is widely used in:
- iterative refinement
- agent loops
- retry chains
- structured prompting workflows
A common assumption is that additional steps improve output quality.
This work examines that assumption at the execution level, focusing on how marginal contribution evolves across steps.
Across models, tasks, and prompt variations, we observe a consistent pattern:
- early steps account for the majority of measured information gain
- marginal contribution declines rapidly with continued execution
- redundancy increases across steps
- cost grows monotonically while measured information gain declines
Execution can remain locally valid at each step while producing globally diminishing marginal contribution, with no intrinsic signal indicating when continuation ceases to be productive.
This is an empirical study of execution behavior.
Instead of evaluating only final outputs, we analyze execution step-by-step using observable signals:
- marginal output gain
- incremental cost (tokens)
- redundancy between steps
We define an efficiency signal representing marginal information gain per unit cost.
In this work, information gain (sometimes referred to as useful output) is approximated using redundancy-adjusted output and should be interpreted as a proxy for marginal textual contribution, not task-level utility.
Initial steps account for the majority of measured information gain
Later steps contribute progressively less new information
Cost increases monotonically, while measured information gain grows sub-linearly
Later steps increasingly reuse prior content
Execution is better described as a distribution over phases rather than fixed transitions
In evaluated settings, stopping at intermediate steps (Step 2–3):
- retains ~50–84% of measured information gain
- reduces cost by ~50–70%
Multi-step execution is widely assumed to improve outputs.
This work shows that beyond early convergence, continued execution often produces diminishing marginal contribution without any intrinsic signal indicating when to stop.
As a result:
- systems continue execution without observable signals for marginal contribution
- cost accumulates without proportional gain
- redundancy becomes dominant in later steps
This exposes a structural gap:
- continuation decisions are not conditioned on observable execution state
This study focuses on:
- linear, iterative execution
- single-task continuation across steps
- execution behavior beyond early convergence under continued iteration
It does not evaluate:
- task correctness or factual accuracy
- tool use or multi-stage pipelines
- agent planning or branching workflows
This work focuses on execution behavior, not model capability.
It identifies a structural limitation in current systems:
- continuation decisions are not conditioned on execution state
- cost constraints alone do not ensure productive execution
The results suggest that effective execution requires:
- step-level evaluation of marginal contribution
- trajectory-aware monitoring of execution behavior
- state-aware continuation decisions based on observed signals
/paper → full paper (PDF)
/figures → plots used in the paper
P., V. (2026).
Efficiency Collapse in Multi-Step LLM Execution:
An Empirical Study of Cost, Redundancy, and Phase Dynamics.
https://doi.org/10.5281/zenodo.19928793
- Metrics are model-agnostic and proxy-based
- Results reflect execution behavior, not outcome quality
- Absolute values vary across models, but patterns are consistent
This work isolates the information-generation component of execution.
It motivates further work on:
- detecting non-progressing execution during runtime
- identifying trajectory-level degradation patterns
- conditioning continuation on observable execution signals
CC-BY-4.0