Fix MetricBuffer#retrieve_updates segfault#397
Open
dacuna-ic wants to merge 2 commits intotemporalio:mainfrom
Open
Fix MetricBuffer#retrieve_updates segfault#397dacuna-ic wants to merge 2 commits intotemporalio:mainfrom
dacuna-ic wants to merge 2 commits intotemporalio:mainfrom
Conversation
Two issues caused segfaults when draining the metric buffer under load: 1. Ruby VALUES stored in Rust Vec invisible to GC: convert_metric_events collected Update objects into a Vec<Value> on the Rust heap. Ruby's conservative GC scans the native stack but not the Rust heap, so if a subsequent funcall triggered GC, previously created Update objects could be collected, leaving dangling VALUE pointers. Fixed by pushing directly into an RArray which acts as a GC-visible root. 2. BoxValue dropped on non-Ruby Tokio threads: The Core SDK's Tokio threads call update_attributes on metric instruments, which can drop Arc references to BufferedMetricAttributes/BufferedMetricRef. When the last Arc is dropped on a Tokio thread, BoxValue's destructor calls rb_gc_unregister_address from a thread unknown to Ruby's VM, corrupting GC internals. Fixed by introducing ThreadSafeBoxValue which uses ManuallyDrop and leaks the BoxValue when dropped on non-Ruby threads (acceptable since metric instruments/attributes are bounded cardinality). Fixes temporalio#396 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes two root causes of a segfault when calling
MetricBuffer#retrieve_updatesunder load (#396):Ruby VALUEs on the Rust heap invisible to GC:
convert_metric_eventscollectedUpdateobjects into aVec<Value>whose backing buffer lives on the Rust heap. Ruby's conservative GC scans the native stack but not the Rust heap, so if a subsequentfuncalltriggered GC, previously createdUpdateobjects could be collected — leaving dangling VALUE pointers. Fixed by pushing directly into anRArraywhich acts as a GC-visible root.BoxValuedropped on non-Ruby Tokio threads: The Core SDK's Tokio threads callupdate_attributeson metric instruments, which can dropArcreferences toBufferedMetricAttributes/BufferedMetricRef. When the lastArcis dropped on a Tokio thread,BoxValue's destructor callsrb_gc_unregister_addressfrom a thread unknown to Ruby's VM, corrupting GC internals. Fixed by introducingThreadSafeBoxValuewhich usesManuallyDropand leaks theBoxValuewhen dropped on non-Ruby threads (acceptable since metric instruments/attributes are bounded cardinality).Test plan
GC.starton a separate thread — zero segfaults (previously crashed within seconds)Fixes #396
🤖 Generated with Claude Code