Skip to content

Comments

fix: auto-retry database connection after server restart (#340)#342

Open
Razano26 wants to merge 2 commits intousesend:mainfrom
Razano26:main
Open

fix: auto-retry database connection after server restart (#340)#342
Razano26 wants to merge 2 commits intousesend:mainfrom
Razano26:main

Conversation

@Razano26
Copy link

@Razano26 Razano26 commented Jan 17, 2026


Summary by cubic

Automatically retries Prisma read-only database operations after server restarts to recover from connection drops and reduce timeouts. Uses capped exponential backoff with jitter and structured logging, with a 1-minute total retry window.

  • Bug Fixes
    • Retries on common Prisma connection errors (P1001, P1002, P1008, P1017, P2024) and network errors.
    • Time-based retries for up to 1 minute; exponential backoff (start 1s, max 10s) with jitter.
    • Applied via PrismaClient $extends to read-only operations only.
    • Warn logs for each retry and an error log on final failure.

Written for commit 6afe05a. Summary will update on new commits.

Summary by CodeRabbit

  • Bug Fixes
    • Database operations now include automatic retry logic with exponential backoff and jitter for transient failures. Retries focus on safe idempotent/read operations and are gated by error classification to avoid retrying non-retryable failures. Retries are limited by a maximum duration, include delay/backoff between attempts, and log each retry attempt and final outcome to improve reliability and observability.

@vercel
Copy link

vercel bot commented Jan 17, 2026

@Razano26 is attempting to deploy a commit to the kmkoushik's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 17, 2026

Walkthrough

Adds automatic retry logic to the Prisma client in apps/web/src/server/db.ts. Imports Prisma alongside PrismaClient. Introduces retry configuration constants, a RETRYABLE_ERROR_CODES list, and an isRetryableError classifier for Prisma/init/network errors. Implements calculateDelay (exponential backoff with jitter) and sleep. Extends the Prisma client via $extends to wrap $allOperations, retrying read-only/idempotent operations on retryable errors until MAX_RETRY_DURATION_MS is exceeded, logging attempts and rethrowing after exhaustion.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding automatic retry logic for database connections following server restarts, which is the core purpose of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Important

Action Needed: IP Allowlist Update

If your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:

  • 136.113.208.247/32 (new)
  • 34.170.211.100/32
  • 35.222.179.152/32

Failure to add the new IP will result in interrupted reviews.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="apps/web/src/server/db.ts">

<violation number="1" location="apps/web/src/server/db.ts:22">
P2: All PrismaClientInitializationError instances are retried, including permanent misconfigurations (invalid credentials/URL), causing pointless retry delays instead of failing fast.</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Ask questions if you need clarification on any suggestion

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@apps/web/src/server/db.ts`:
- Around line 52-105: The retry wrapper currently retries every Prisma operation
in PrismaClient.$extends -> query.$allOperations (function $allOperations),
which can re-run non-idempotent mutations; update $allOperations to only perform
the retry loop for read-only operations (e.g., findUnique, findUniqueOrThrow,
findFirst, findFirstOrThrow, findMany, count, aggregate, groupBy) by checking
the incoming operation string against that allowlist before entering the
for-loop; for all other operations (creates/updates/deletes) skip retries and
immediately call and return await query(args) (or let errors bubble) so
non-idempotent mutations are not retried. Ensure isRetryableError, MAX_RETRIES,
calculateDelay, sleep and logging remain unchanged for the read-only retry path.
🧹 Nitpick comments (1)
apps/web/src/server/db.ts (1)

17-39: Consider tightening the message heuristics.

Broad substrings like "connection" can match non-transient configuration errors (e.g., invalid connection string), leading to unnecessary retries. Narrowing the match or adding a small denylist would reduce false positives.

Signed-off-by: Louis Labeyrie <labeyrielouis@gmail.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@apps/web/src/server/db.ts`:
- Around line 29-55: The fallback string matching in isRetryableError is too
broad (matching "connection"/"socket"/"timeout") and can cause false positives;
narrow it to explicit network error signatures instead. Update the Error branch
in isRetryableError to check for specific tokens/phrases such as "ec
CONNREFUSED"/"econnrefused"/"connection refused", "etimedout"/"timed out",
"econnreset"/"connection reset", "socket hang up", "getaddrinfo
ENOTFOUND"/"enotfound", "ehostunreach"/"host unreachable",
"econnaborted"/"connection aborted" (or an equivalent regex that matches these
exact network error patterns) instead of generic
"connection"/"socket"/"timeout"; keep the existing Prisma checks
(PrismaClientKnownRequestError, PrismaClientInitializationError and
RETRYABLE_ERROR_CODES) unchanged. Ensure the string comparisons are
case-insensitive and avoid matching generic words like "connection" used in
validation messages.

Comment on lines +29 to +55
function isRetryableError(error: unknown): boolean {
if (error instanceof Prisma.PrismaClientKnownRequestError) {
return RETRYABLE_ERROR_CODES.has(error.code);
}

if (error instanceof Prisma.PrismaClientInitializationError) {
// Only retry transient connection errors, not permanent misconfigurations
// (e.g., invalid credentials, wrong database URL, schema errors)
return (
error.errorCode !== undefined && RETRYABLE_ERROR_CODES.has(error.errorCode)
);
}

if (error instanceof Error) {
const message = error.message.toLowerCase();
return (
message.includes("econnrefused") ||
message.includes("etimedout") ||
message.includes("econnreset") ||
message.includes("connection") ||
message.includes("socket") ||
message.includes("timeout")
);
}

return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tighten the fallback string match to avoid retrying unrelated errors.

The broad "connection"/"socket" checks can match non-network errors (e.g., validation messages mentioning a connection field), causing unnecessary retries and noisy logs. Consider narrowing to explicit network signatures.

🔧 Suggested refinement
-    return (
-      message.includes("econnrefused") ||
-      message.includes("etimedout") ||
-      message.includes("econnreset") ||
-      message.includes("connection") ||
-      message.includes("socket") ||
-      message.includes("timeout")
-    );
+    return (
+      message.includes("econnrefused") ||
+      message.includes("etimedout") ||
+      message.includes("econnreset") ||
+      message.includes("socket hang up") ||
+      message.includes("timeout")
+    );
🤖 Prompt for AI Agents
In `@apps/web/src/server/db.ts` around lines 29 - 55, The fallback string matching
in isRetryableError is too broad (matching "connection"/"socket"/"timeout") and
can cause false positives; narrow it to explicit network error signatures
instead. Update the Error branch in isRetryableError to check for specific
tokens/phrases such as "ec CONNREFUSED"/"econnrefused"/"connection refused",
"etimedout"/"timed out", "econnreset"/"connection reset", "socket hang up",
"getaddrinfo ENOTFOUND"/"enotfound", "ehostunreach"/"host unreachable",
"econnaborted"/"connection aborted" (or an equivalent regex that matches these
exact network error patterns) instead of generic
"connection"/"socket"/"timeout"; keep the existing Prisma checks
(PrismaClientKnownRequestError, PrismaClientInitializationError and
RETRYABLE_ERROR_CODES) unchanged. Ensure the string comparisons are
case-insensitive and avoid matching generic words like "connection" used in
validation messages.

@rahulkumarsingh73690
Copy link

Max retries upto 1 minute is the best option

Change from fixed retry count (5) to time-based retry limit of 1 minute,
giving the database server more time to recover from transient issues
like restarts.

Signed-off-by: Louis Labeyrie <labeyrielouis@gmail.com>
Copilot AI review requested due to automatic review settings February 3, 2026 10:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces centralized retry logic for Prisma read-only database operations to improve resilience to transient connectivity issues and server restarts.

Changes:

  • Adds capped exponential backoff with jitter and a 1-minute total retry window for selected retryable Prisma error codes and connection-related errors.
  • Restricts retries to read-only operations via a PrismaClient $extends hook to avoid re-running non-idempotent mutations.
  • Enhances logging with structured warn entries for each retry and an error entry when retry attempts are exhausted.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@KMKoushik
Copy link
Member

@greptileai

@greptile-apps
Copy link

greptile-apps bot commented Feb 17, 2026

Greptile Summary

This PR replaces the original fixed-count retry strategy (5 attempts) from a8e9b5b with a time-based retry strategy (1-minute window) for read-only Prisma operations, using exponential backoff starting at 1s capped at 10s with jitter. The overall approach is well-structured — it correctly limits retries to idempotent read operations, uses a typed Prisma error allowlist for the primary check, and includes structured logging for observability.

Two issues were identified:

  • Overly broad error matching in isRetryableError (logic issue): The instanceof Error fallback on lines 42-51 retries any error whose message contains "connection", "socket", or "timeout". This goes well beyond the intended transient network errors and will silently retry permanent misconfigurations (bad DATABASE_URL, wrong credentials, DBA-enforced statement timeouts) for up to a minute. The econnrefused, etimedout, and econnreset checks on lines 45-47 already cover the legitimate transient cases — the three broad substring checks should be removed.

  • 1-minute window is a soft cap (style): The elapsed-time check at line 100 runs after a failed query completes. If a single query invocation itself exceeds MAX_RETRY_DURATION_MS (e.g., a TCP-level timeout of 75s), the loop will still break correctly, but the actual wall-clock time will have already exceeded 1 minute. An additional guard checking elapsedTime >= MAX_RETRY_DURATION_MS immediately after catching the error would make the cap stricter.

Confidence Score: 3/5

  • Safe to merge with minor risk — the broad error matching may cause unnecessary retries on permanent config errors, adding latency before failure, but will not cause data corruption or silent data loss.
  • The core retry mechanism is logically sound and correctly scoped to read-only operations. The main concern is the overly broad isRetryableError fallback which can retry misconfiguration errors for 60 seconds before surfacing them, degrading user-facing response times and producing misleading retry log noise. This is a meaningful behavioral regression for error scenarios but not a data-safety or correctness issue. The soft time-cap is a minor concern.
  • apps/web/src/server/db.ts — specifically the isRetryableError function's instanceof Error fallback branch (lines 42-51)

Last reviewed commit: 6afe05a

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +42 to +51
if (error instanceof Error) {
const message = error.message.toLowerCase();
return (
message.includes("econnrefused") ||
message.includes("etimedout") ||
message.includes("econnreset") ||
message.includes("connection") ||
message.includes("socket") ||
message.includes("timeout")
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overly broad string matching retries permanent errors

The instanceof Error fallback branch matches any error whose message contains "connection", "socket", or "timeout" as a substring. This catches many permanent, non-transient failures:

  • "Invalid connection string provided" — a bad DATABASE_URL config will be retried for a full minute before failing
  • "Authentication failed for connection" — wrong credentials, same result
  • "Statement timeout exceeded" — a server-side PostgreSQL statement timeout (a permanent DBA limit) would be retried endlessly
  • "WebSocket connection failed" — any library throwing a generic Error with "connection" in the message would trigger database retries

The typed Prisma error branches above this (PrismaClientKnownRequestError, PrismaClientInitializationError) use a tight allowlist of specific Prisma error codes, which is correct. This fallback path should follow the same principle. It should be restricted to well-known transient POSIX errno strings (econnrefused, etimedout, econnreset) which are already covered on lines 45-47, removing the broad "connection", "socket", and "timeout" checks:

Suggested change
if (error instanceof Error) {
const message = error.message.toLowerCase();
return (
message.includes("econnrefused") ||
message.includes("etimedout") ||
message.includes("econnreset") ||
message.includes("connection") ||
message.includes("socket") ||
message.includes("timeout")
);
if (error instanceof Error) {
const message = error.message.toLowerCase();
return (
message.includes("econnrefused") ||
message.includes("etimedout") ||
message.includes("econnreset")
);
}

Comment on lines +96 to +102
const elapsedTime = Date.now() - startTime;
const delay = calculateDelay(attempt);

// Stop retrying if we've exceeded the 1 minute window
if (elapsedTime + delay > MAX_RETRY_DURATION_MS) {
break;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1-minute window is a soft cap, not a hard bound

The time-window check runs after a query has already failed, using the elapsed time at that moment. If a query itself takes a long time to fail (e.g., a TCP connection timeout of 30s), elapsedTime can already exceed MAX_RETRY_DURATION_MS before delay is even added. For example, if the first attempt hangs for 62 seconds, elapsedTime = 62,000ms and the loop will break — but 62 seconds have already elapsed, 2s beyond the documented 1-minute window. More importantly, the check should also guard against the case where elapsedTime alone already exceeds the cap before factoring in delay:

Suggested change
const elapsedTime = Date.now() - startTime;
const delay = calculateDelay(attempt);
// Stop retrying if we've exceeded the 1 minute window
if (elapsedTime + delay > MAX_RETRY_DURATION_MS) {
break;
}
const elapsedTime = Date.now() - startTime;
// Stop retrying if we've already exceeded the 1 minute window
if (elapsedTime >= MAX_RETRY_DURATION_MS) {
break;
}
const delay = calculateDelay(attempt);
// Stop retrying if the next attempt would exceed the 1 minute window
if (elapsedTime + delay > MAX_RETRY_DURATION_MS) {
break;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants