fix: auto-retry database connection after server restart (#340)#342
fix: auto-retry database connection after server restart (#340)#342Razano26 wants to merge 2 commits intousesend:mainfrom
Conversation
|
@Razano26 is attempting to deploy a commit to the kmkoushik's projects Team on Vercel. A member of the Team first needs to authorize it. |
WalkthroughAdds automatic retry logic to the Prisma client in apps/web/src/server/db.ts. Imports 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Important Action Needed: IP Allowlist UpdateIf your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:
Failure to add the new IP will result in interrupted reviews. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
1 issue found across 1 file
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="apps/web/src/server/db.ts">
<violation number="1" location="apps/web/src/server/db.ts:22">
P2: All PrismaClientInitializationError instances are retried, including permanent misconfigurations (invalid credentials/URL), causing pointless retry delays instead of failing fast.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Ask questions if you need clarification on any suggestion
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@apps/web/src/server/db.ts`:
- Around line 52-105: The retry wrapper currently retries every Prisma operation
in PrismaClient.$extends -> query.$allOperations (function $allOperations),
which can re-run non-idempotent mutations; update $allOperations to only perform
the retry loop for read-only operations (e.g., findUnique, findUniqueOrThrow,
findFirst, findFirstOrThrow, findMany, count, aggregate, groupBy) by checking
the incoming operation string against that allowlist before entering the
for-loop; for all other operations (creates/updates/deletes) skip retries and
immediately call and return await query(args) (or let errors bubble) so
non-idempotent mutations are not retried. Ensure isRetryableError, MAX_RETRIES,
calculateDelay, sleep and logging remain unchanged for the read-only retry path.
🧹 Nitpick comments (1)
apps/web/src/server/db.ts (1)
17-39: Consider tightening the message heuristics.Broad substrings like
"connection"can match non-transient configuration errors (e.g., invalid connection string), leading to unnecessary retries. Narrowing the match or adding a small denylist would reduce false positives.
Signed-off-by: Louis Labeyrie <labeyrielouis@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@apps/web/src/server/db.ts`:
- Around line 29-55: The fallback string matching in isRetryableError is too
broad (matching "connection"/"socket"/"timeout") and can cause false positives;
narrow it to explicit network error signatures instead. Update the Error branch
in isRetryableError to check for specific tokens/phrases such as "ec
CONNREFUSED"/"econnrefused"/"connection refused", "etimedout"/"timed out",
"econnreset"/"connection reset", "socket hang up", "getaddrinfo
ENOTFOUND"/"enotfound", "ehostunreach"/"host unreachable",
"econnaborted"/"connection aborted" (or an equivalent regex that matches these
exact network error patterns) instead of generic
"connection"/"socket"/"timeout"; keep the existing Prisma checks
(PrismaClientKnownRequestError, PrismaClientInitializationError and
RETRYABLE_ERROR_CODES) unchanged. Ensure the string comparisons are
case-insensitive and avoid matching generic words like "connection" used in
validation messages.
| function isRetryableError(error: unknown): boolean { | ||
| if (error instanceof Prisma.PrismaClientKnownRequestError) { | ||
| return RETRYABLE_ERROR_CODES.has(error.code); | ||
| } | ||
|
|
||
| if (error instanceof Prisma.PrismaClientInitializationError) { | ||
| // Only retry transient connection errors, not permanent misconfigurations | ||
| // (e.g., invalid credentials, wrong database URL, schema errors) | ||
| return ( | ||
| error.errorCode !== undefined && RETRYABLE_ERROR_CODES.has(error.errorCode) | ||
| ); | ||
| } | ||
|
|
||
| if (error instanceof Error) { | ||
| const message = error.message.toLowerCase(); | ||
| return ( | ||
| message.includes("econnrefused") || | ||
| message.includes("etimedout") || | ||
| message.includes("econnreset") || | ||
| message.includes("connection") || | ||
| message.includes("socket") || | ||
| message.includes("timeout") | ||
| ); | ||
| } | ||
|
|
||
| return false; | ||
| } |
There was a problem hiding this comment.
Tighten the fallback string match to avoid retrying unrelated errors.
The broad "connection"/"socket" checks can match non-network errors (e.g., validation messages mentioning a connection field), causing unnecessary retries and noisy logs. Consider narrowing to explicit network signatures.
🔧 Suggested refinement
- return (
- message.includes("econnrefused") ||
- message.includes("etimedout") ||
- message.includes("econnreset") ||
- message.includes("connection") ||
- message.includes("socket") ||
- message.includes("timeout")
- );
+ return (
+ message.includes("econnrefused") ||
+ message.includes("etimedout") ||
+ message.includes("econnreset") ||
+ message.includes("socket hang up") ||
+ message.includes("timeout")
+ );🤖 Prompt for AI Agents
In `@apps/web/src/server/db.ts` around lines 29 - 55, The fallback string matching
in isRetryableError is too broad (matching "connection"/"socket"/"timeout") and
can cause false positives; narrow it to explicit network error signatures
instead. Update the Error branch in isRetryableError to check for specific
tokens/phrases such as "ec CONNREFUSED"/"econnrefused"/"connection refused",
"etimedout"/"timed out", "econnreset"/"connection reset", "socket hang up",
"getaddrinfo ENOTFOUND"/"enotfound", "ehostunreach"/"host unreachable",
"econnaborted"/"connection aborted" (or an equivalent regex that matches these
exact network error patterns) instead of generic
"connection"/"socket"/"timeout"; keep the existing Prisma checks
(PrismaClientKnownRequestError, PrismaClientInitializationError and
RETRYABLE_ERROR_CODES) unchanged. Ensure the string comparisons are
case-insensitive and avoid matching generic words like "connection" used in
validation messages.
|
Max retries upto 1 minute is the best option |
Change from fixed retry count (5) to time-based retry limit of 1 minute, giving the database server more time to recover from transient issues like restarts. Signed-off-by: Louis Labeyrie <labeyrielouis@gmail.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces centralized retry logic for Prisma read-only database operations to improve resilience to transient connectivity issues and server restarts.
Changes:
- Adds capped exponential backoff with jitter and a 1-minute total retry window for selected retryable Prisma error codes and connection-related errors.
- Restricts retries to read-only operations via a PrismaClient
$extendshook to avoid re-running non-idempotent mutations. - Enhances logging with structured
warnentries for each retry and anerrorentry when retry attempts are exhausted.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Greptile SummaryThis PR replaces the original fixed-count retry strategy (5 attempts) from Two issues were identified:
Confidence Score: 3/5
Last reviewed commit: 6afe05a |
| if (error instanceof Error) { | ||
| const message = error.message.toLowerCase(); | ||
| return ( | ||
| message.includes("econnrefused") || | ||
| message.includes("etimedout") || | ||
| message.includes("econnreset") || | ||
| message.includes("connection") || | ||
| message.includes("socket") || | ||
| message.includes("timeout") | ||
| ); |
There was a problem hiding this comment.
Overly broad string matching retries permanent errors
The instanceof Error fallback branch matches any error whose message contains "connection", "socket", or "timeout" as a substring. This catches many permanent, non-transient failures:
"Invalid connection string provided"— a badDATABASE_URLconfig will be retried for a full minute before failing"Authentication failed for connection"— wrong credentials, same result"Statement timeout exceeded"— a server-side PostgreSQL statement timeout (a permanent DBA limit) would be retried endlessly"WebSocket connection failed"— any library throwing a genericErrorwith "connection" in the message would trigger database retries
The typed Prisma error branches above this (PrismaClientKnownRequestError, PrismaClientInitializationError) use a tight allowlist of specific Prisma error codes, which is correct. This fallback path should follow the same principle. It should be restricted to well-known transient POSIX errno strings (econnrefused, etimedout, econnreset) which are already covered on lines 45-47, removing the broad "connection", "socket", and "timeout" checks:
| if (error instanceof Error) { | |
| const message = error.message.toLowerCase(); | |
| return ( | |
| message.includes("econnrefused") || | |
| message.includes("etimedout") || | |
| message.includes("econnreset") || | |
| message.includes("connection") || | |
| message.includes("socket") || | |
| message.includes("timeout") | |
| ); | |
| if (error instanceof Error) { | |
| const message = error.message.toLowerCase(); | |
| return ( | |
| message.includes("econnrefused") || | |
| message.includes("etimedout") || | |
| message.includes("econnreset") | |
| ); | |
| } |
| const elapsedTime = Date.now() - startTime; | ||
| const delay = calculateDelay(attempt); | ||
|
|
||
| // Stop retrying if we've exceeded the 1 minute window | ||
| if (elapsedTime + delay > MAX_RETRY_DURATION_MS) { | ||
| break; | ||
| } |
There was a problem hiding this comment.
1-minute window is a soft cap, not a hard bound
The time-window check runs after a query has already failed, using the elapsed time at that moment. If a query itself takes a long time to fail (e.g., a TCP connection timeout of 30s), elapsedTime can already exceed MAX_RETRY_DURATION_MS before delay is even added. For example, if the first attempt hangs for 62 seconds, elapsedTime = 62,000ms and the loop will break — but 62 seconds have already elapsed, 2s beyond the documented 1-minute window. More importantly, the check should also guard against the case where elapsedTime alone already exceeds the cap before factoring in delay:
| const elapsedTime = Date.now() - startTime; | |
| const delay = calculateDelay(attempt); | |
| // Stop retrying if we've exceeded the 1 minute window | |
| if (elapsedTime + delay > MAX_RETRY_DURATION_MS) { | |
| break; | |
| } | |
| const elapsedTime = Date.now() - startTime; | |
| // Stop retrying if we've already exceeded the 1 minute window | |
| if (elapsedTime >= MAX_RETRY_DURATION_MS) { | |
| break; | |
| } | |
| const delay = calculateDelay(attempt); | |
| // Stop retrying if the next attempt would exceed the 1 minute window | |
| if (elapsedTime + delay > MAX_RETRY_DURATION_MS) { | |
| break; | |
| } |
Summary by cubic
Automatically retries Prisma read-only database operations after server restarts to recover from connection drops and reduce timeouts. Uses capped exponential backoff with jitter and structured logging, with a 1-minute total retry window.
Written for commit 6afe05a. Summary will update on new commits.
Summary by CodeRabbit