Skip to content

Improve indexer resilience for Redis outages, branch renames, and deleted repos#17

Merged
aanogueira merged 1 commit intomainfrom
aanogueira/indexer-resilience-redis-branch-cleanup
Apr 17, 2026
Merged

Improve indexer resilience for Redis outages, branch renames, and deleted repos#17
aanogueira merged 1 commit intomainfrom
aanogueira/indexer-resilience-redis-branch-cleanup

Conversation

@aanogueira
Copy link
Copy Markdown
Contributor

The indexer would hang indefinitely when Redis became unreachable, silently mark repos as empty when their default branch was renamed, and waste retries on repos deleted from the code host.

Redis outage recovery:

  • Worker exits after ~5 min of consecutive Redis failures so K8s restarts
  • CleanupOldJobs now removes ghost entries from running/pending status indexes, type indexes, and processing set
  • DequeueForShard cleans up ghost job index entries on pop

Branch rename detection:

  • processIndexJob auto-detects the actual default branch when configured branch doesn't exist (prefers main > master > first available)
  • Updates the database so future jobs use the correct branch

Deleted repo detection:

  • Add PermanentError type to skip retries on unrecoverable failures
  • Detect 404/not-found git errors during clone/fetch
  • Mark deleted repos as excluded immediately instead of retrying 3 times
  • New repo_not_found category in failure metrics

Other:

  • Set fsGroupChangePolicy: OnRootMismatch to speed up pod restarts
  • golines formatting fixes

@aanogueira aanogueira force-pushed the aanogueira/indexer-resilience-redis-branch-cleanup branch 2 times, most recently from 3663d2b to 598c799 Compare April 17, 2026 15:44
…nd deleted repos

The indexer would hang indefinitely when Redis became unreachable,
silently mark repos as empty when their default branch was renamed,
and waste retries on repos deleted from the code host.

Redis outage recovery:
- Worker exits after ~5 min of consecutive Redis failures so K8s restarts
- CleanupOldJobs now removes ghost entries from running/pending status
  indexes, type indexes, and processing set
- DequeueForShard cleans up ghost job index entries on pop

Branch rename detection:
- processIndexJob auto-detects the actual default branch when configured
  branch doesn't exist (prefers main > master > first available)
- Updates the database so future jobs use the correct branch

Deleted repo detection:
- Add PermanentError type to skip retries on unrecoverable failures
- Detect 404/not-found git errors during clone/fetch
- Mark deleted repos as excluded immediately instead of retrying 3 times
- New repo_not_found category in failure metrics

Other:
- Set fsGroupChangePolicy: OnRootMismatch to speed up pod restarts
- golines formatting fixes

Signed-off-by: Andre Nogueira <aanogueira@protonmail.com>
Signed-off-by: Andre Nogueira <andre.nogueira@mollie.com>
@aanogueira aanogueira force-pushed the aanogueira/indexer-resilience-redis-branch-cleanup branch from 598c799 to 767c0fb Compare April 17, 2026 15:52
@aanogueira aanogueira merged commit be18600 into main Apr 17, 2026
11 checks passed
@aanogueira aanogueira deleted the aanogueira/indexer-resilience-redis-branch-cleanup branch April 17, 2026 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant