Skip to content

feat: add resilient background job retry and monitoring system#1024

Open
zxzok wants to merge 1 commit into
rohitdash08:mainfrom
zxzok:feat/job-retry-monitoring
Open

feat: add resilient background job retry and monitoring system#1024
zxzok wants to merge 1 commit into
rohitdash08:mainfrom
zxzok:feat/job-retry-monitoring

Conversation

@zxzok
Copy link
Copy Markdown

@zxzok zxzok commented May 12, 2026

Summary

Implements a resilient background job execution system with automatic retry, exponential backoff, and a monitoring dashboard.

Changes

Backend

  • New JobExecution model — tracks job lifecycle (pending → running → completed/failed/retrying)
  • New services/jobs.py — job enqueue, execute with retry (exponential backoff: 30s → 120s → 480s), process pending jobs, stats aggregation
  • New routes/jobs.py — 4 endpoints: GET /jobs/stats, GET /jobs/recent, POST /jobs/retry/<id>, POST /jobs/process
  • Updated services/reminders.py — wraps send_email/send_whatsapp with job system, backward compatible
  • 17 tests covering retry logic, backoff timing, max attempts, stats, and all endpoints

Frontend

  • New api/jobs.ts — API client module
  • New JobMonitor.tsx — monitoring dashboard with stats cards, status filters, recent jobs table, retry button
  • Route added at /jobs

How It Works

  1. enqueue_job("reminder_email", payload) creates a pending job record
  2. execute_job(job_id) runs it via registered handlers
  3. On failure: status → "retrying", next_retry_at set with exponential backoff
  4. After max_attempts (default 3): status → "failed"
  5. process_pending_jobs() picks up retryable jobs (call from external cron or /jobs/process)

Test plan

  • Job creation and execution flow
  • Retry logic with exponential backoff
  • Max attempts exceeded → failed
  • process_pending_jobs finds eligible jobs
  • Stats endpoint returns correct counts
  • Manual retry of failed jobs
  • Reminder integration with mocked send functions

@zxzok zxzok requested a review from rohitdash08 as a code owner May 12, 2026 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant