Skip to content

feat(providers): provider-aware retry with exponential backoff and rate-limit handling #23

@EngineerProjects

Description

@EngineerProjects

Summary

Providers like Anthropic and OpenAI return 429 Too Many Requests with
a retry-after header. The current retry logic does not honour this header
and uses fixed delays. This leads to wasted retries and unnecessary failures
on long multi-agent runs.

Scope

  • Read retry-after / x-ratelimit-reset-requests headers on 429 responses
  • Exponential backoff with jitter for 5xx errors
  • Per-provider retry config: max retries, base delay, max delay
  • Expose config.RetryPolicy in ClientConfig

Proposed config

type RetryPolicy struct {
    MaxRetries    int           // default 3
    BaseDelay     time.Duration // default 1s
    MaxDelay      time.Duration // default 60s
    HonorRetryAfter bool        // default true
}

Acceptance criteria

  • RetryPolicy struct in pkg/sdk
  • retry-after header parsed and respected (sleep until reset time)
  • Exponential backoff with ±10% jitter for 5xx
  • Unit tests: 429 with header → correct sleep duration; 5xx → backoff sequence
  • No retry on 4xx errors other than 429

Metadata

Metadata

Assignees

No one assigned

    Labels

    coreCore runtime / engine layerenhancementNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions