Skip to content

Feature: Agent-Level Timeouts #82

@jrob5756

Description

@jrob5756

Summary

Add per-agent timeout_seconds configuration so individual slow agents don't block entire workflows.

Motivation

Conductor only supports workflow-level timeout_seconds today. With agents becoming more autonomous and models varying widely in response time, one slow agent can consume the entire workflow's time budget. Research on "context rot" shows agent performance degrades after ~60% context fill — timeouts prevent wasted tokens on degraded agents.

This is already identified as a gap in the conductor docs.

Proposed Design

agents:
  - name: slow_researcher
    model: claude-opus-4.5
    timeout_seconds: 120          # agent-specific hard limit

  - name: fast_classifier
    model: gpt-5.2-mini
    timeout_seconds: 15           # tight limit for simple tasks

Behavior

  • Agent execution cancelled after timeout_seconds wall-clock time
  • Raises TimeoutError — handled by existing error semantics (fail_fast, continue_on_error)
  • Workflow-level timeout still applies as an overall cap
  • Events emitted: agent_timeout with agent name, elapsed time, and configured limit

Interaction with Other Features

Feature Interaction
Retry policies (#80) retry_on: [timeout] enables automatic retry on timeout
Fallback models (#84) Timed-out agent retries with fallback_model for faster response
Parallel execution Timed-out agent in parallel group follows the group's failure mode
Workflow timeout Agent timeout cannot exceed remaining workflow timeout

Example: Timeout + Retry + Fallback

agents:
  - name: researcher
    model: claude-opus-4.5
    timeout_seconds: 120
    fallback_model: claude-haiku-4.5    # faster model on retry
    retry:
      max_attempts: 2
      retry_on:
        - timeout

If claude-opus-4.5 times out at 120s → retry with claude-haiku-4.5 (much faster) → complete within budget.

Why It Fits Conductor

  • Trivial YAML addition — single field per agent
  • Builds on existing LimitEnforcer infrastructure
  • Implementation: wraps agent execution in asyncio.wait_for() with the configured timeout
  • Combined with retry + fallback_model, creates a complete resilience story

Effort Estimate

Low — wraps agent execution in asyncio.wait_for(), adds one schema field, emits one new event.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions