Conversation
|
Reviewed this together with roboflow/async-serverless#260. The core change is fine in isolation but I'm flagging a regression that only shows up when you combine it with the Main concern: restart loop on inference-container-only crashes
Configured thresholds vs. typical GPU preload:
Because preload frequently exceeds those windows, we'd be promoting every inference crash into a full pod restart, and — if preload consistently exceeds the threshold — potentially into a restart loop (pod starts, sidecar grace period covers first preload, inference crashes again after N minutes, pod restarts, …). Suggested mitigations (pick one)
Second-order observation (not blocking)
Also worth noting: |
PawelPeczek-Roboflow
left a comment
There was a problem hiding this comment.
Rejecting until clarified how that may go wrong in terms of k8s termination due to pre-loading and marking unhealthy at that moment
What does this PR do?
Extends
/healthzto also check theis_readyflag (same one used by/readiness). Returns 503 withreason: "not_ready"if model initialization hasn't completed, otherwise proceeds with the existing CUDA health check.Type of Change
Testing
Checklist