Skip to content

Fix MetricBuffer#retrieve_updates segfault#397

Open
dacuna-ic wants to merge 2 commits intotemporalio:mainfrom
dacuna-ic:fix-metric-buffer-segfault
Open

Fix MetricBuffer#retrieve_updates segfault#397
dacuna-ic wants to merge 2 commits intotemporalio:mainfrom
dacuna-ic:fix-metric-buffer-segfault

Conversation

@dacuna-ic
Copy link

Summary

Fixes two root causes of a segfault when calling MetricBuffer#retrieve_updates under load (#396):

  • Ruby VALUEs on the Rust heap invisible to GC: convert_metric_events collected Update objects into a Vec<Value> whose backing buffer lives on the Rust heap. Ruby's conservative GC scans the native stack but not the Rust heap, so if a subsequent funcall triggered GC, previously created Update objects could be collected — leaving dangling VALUE pointers. Fixed by pushing directly into an RArray which acts as a GC-visible root.

  • BoxValue dropped on non-Ruby Tokio threads: The Core SDK's Tokio threads call update_attributes on metric instruments, which can drop Arc references to BufferedMetricAttributes/BufferedMetricRef. When the last Arc is dropped on a Tokio thread, BoxValue's destructor calls rb_gc_unregister_address from a thread unknown to Ruby's VM, corrupting GC internals. Fixed by introducing ThreadSafeBoxValue which uses ManuallyDrop and leaks the BoxValue when dropped on non-Ruby threads (acceptable since metric instruments/attributes are bounded cardinality).

Test plan

  • Stress test: 60+ seconds, 123k metric updates with aggressive GC.start on a separate thread — zero segfaults (previously crashed within seconds)
  • Full test suite: 367 tests, 2452 assertions, 0 failures, 0 errors

Fixes #396

🤖 Generated with Claude Code

Two issues caused segfaults when draining the metric buffer under load:

1. Ruby VALUES stored in Rust Vec invisible to GC: convert_metric_events
   collected Update objects into a Vec<Value> on the Rust heap. Ruby's
   conservative GC scans the native stack but not the Rust heap, so if a
   subsequent funcall triggered GC, previously created Update objects
   could be collected, leaving dangling VALUE pointers. Fixed by pushing
   directly into an RArray which acts as a GC-visible root.

2. BoxValue dropped on non-Ruby Tokio threads: The Core SDK's Tokio
   threads call update_attributes on metric instruments, which can drop
   Arc references to BufferedMetricAttributes/BufferedMetricRef. When
   the last Arc is dropped on a Tokio thread, BoxValue's destructor
   calls rb_gc_unregister_address from a thread unknown to Ruby's VM,
   corrupting GC internals. Fixed by introducing ThreadSafeBoxValue
   which uses ManuallyDrop and leaks the BoxValue when dropped on
   non-Ruby threads (acceptable since metric instruments/attributes
   are bounded cardinality).

Fixes temporalio#396

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dacuna-ic dacuna-ic requested a review from a team as a code owner March 9, 2026 00:13
@CLAassistant
Copy link

CLAassistant commented Mar 9, 2026

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MetricBuffer#retrieve_updates segfault: returned objects reference freed Rust memory

3 participants