Summary
Add per-agent timeout_seconds configuration so individual slow agents don't block entire workflows.
Motivation
Conductor only supports workflow-level timeout_seconds today. With agents becoming more autonomous and models varying widely in response time, one slow agent can consume the entire workflow's time budget. Research on "context rot" shows agent performance degrades after ~60% context fill — timeouts prevent wasted tokens on degraded agents.
This is already identified as a gap in the conductor docs.
Proposed Design
agents:
- name: slow_researcher
model: claude-opus-4.5
timeout_seconds: 120 # agent-specific hard limit
- name: fast_classifier
model: gpt-5.2-mini
timeout_seconds: 15 # tight limit for simple tasks
Behavior
- Agent execution cancelled after
timeout_seconds wall-clock time
- Raises
TimeoutError — handled by existing error semantics (fail_fast, continue_on_error)
- Workflow-level timeout still applies as an overall cap
- Events emitted:
agent_timeout with agent name, elapsed time, and configured limit
Interaction with Other Features
| Feature |
Interaction |
| Retry policies (#80) |
retry_on: [timeout] enables automatic retry on timeout |
| Fallback models (#84) |
Timed-out agent retries with fallback_model for faster response |
| Parallel execution |
Timed-out agent in parallel group follows the group's failure mode |
| Workflow timeout |
Agent timeout cannot exceed remaining workflow timeout |
Example: Timeout + Retry + Fallback
agents:
- name: researcher
model: claude-opus-4.5
timeout_seconds: 120
fallback_model: claude-haiku-4.5 # faster model on retry
retry:
max_attempts: 2
retry_on:
- timeout
If claude-opus-4.5 times out at 120s → retry with claude-haiku-4.5 (much faster) → complete within budget.
Why It Fits Conductor
- Trivial YAML addition — single field per agent
- Builds on existing
LimitEnforcer infrastructure
- Implementation: wraps agent execution in
asyncio.wait_for() with the configured timeout
- Combined with retry + fallback_model, creates a complete resilience story
Effort Estimate
Low — wraps agent execution in asyncio.wait_for(), adds one schema field, emits one new event.
Summary
Add per-agent
timeout_secondsconfiguration so individual slow agents don't block entire workflows.Motivation
Conductor only supports workflow-level
timeout_secondstoday. With agents becoming more autonomous and models varying widely in response time, one slow agent can consume the entire workflow's time budget. Research on "context rot" shows agent performance degrades after ~60% context fill — timeouts prevent wasted tokens on degraded agents.This is already identified as a gap in the conductor docs.
Proposed Design
Behavior
timeout_secondswall-clock timeTimeoutError— handled by existing error semantics (fail_fast,continue_on_error)agent_timeoutwith agent name, elapsed time, and configured limitInteraction with Other Features
retry_on: [timeout]enables automatic retry on timeoutfallback_modelfor faster responseExample: Timeout + Retry + Fallback
If
claude-opus-4.5times out at 120s → retry withclaude-haiku-4.5(much faster) → complete within budget.Why It Fits Conductor
LimitEnforcerinfrastructureasyncio.wait_for()with the configured timeoutEffort Estimate
Low — wraps agent execution in
asyncio.wait_for(), adds one schema field, emits one new event.