Bug Report
Problem
When using tikv-client in a context where the Tokio runtime may be dropped (e.g., a short-lived runtime created inside a worker thread), the program may panic with:
A Tokio 1.x context was found, but it is being shutdown.
Backtrace
The panic originates from TimerEntry::poll_elapsed in tokio-1.26.0/src/runtime/time/entry.rs:550-551:
TimerEntry::poll_elapsed ← PANIC: runtime timer driver is shut down
← tokio::time::sleep::Sleep::poll ← sleep() called inside task
← tonic::transport::grpc_timeout::ResponseFuture::poll ← gRPC timeout
← tonic AddOrigin service call
← tower buffer future
← tikv_client::ScanRequest::dispatch
← tikv_client::KvRpcClient::dispatch
← Dispatch<Req>::execute
← ResolveLock<P,PdC>::execute
← RetryableMultiRegion::single_shard_handler ← spawned via tokio::spawn
← tokio multi_thread worker (async task)
Scenario
The user intends to:
- Create a short-lived Tokio runtime.
- Create a
TransactionClient on that runtime.
- Perform scan operations via
snapshot.scan() within runtime.block_on().
- Eventually drop the runtime when the worker thread exits.
During scan operations, tikv-client internally spawns tasks on the current runtime to shard the request across multiple regions. If one of these tasks errors and the user's runtime is short-lived enough to be dropped before the spawned tasks naturally terminate, the process panics.
Root Cause
In request/plan.rs, RetryableMultiRegion::single_plan_handler spawns concurrent per-shard handler tasks via tokio::spawn and waits for them using futures::future::try_join_all:
// plan.rs:122-136
for shard in shards {
let handle = tokio::spawn(Self::single_shard_handler(...));
handles.push(handle);
}
let results = try_join_all(handles).await?;
try_join_all (from futures-util 0.3.28) cancels remaining futures immediately when any one future returns an error (source: try_join_all.rs lines 165-168 break on first error, line 182 drops remaining futures). This means:
- When one
single_shard_handler task fails (e.g., gRPC timeout, region error, leader-not-found), try_join_all drops the remaining JoinHandles.
- Tokio's
JoinHandle::Drop does not cancel the spawned task — it detaches the task, which continues running on the runtime.
- If the runtime is dropped while detached tasks are still alive, the tasks' subsequent
tokio::time::sleep calls (via tonic's gRPC timeout wrapper or tikv-client's retry backoff) panic because the timer driver is shut down.
The same pattern exists in RetryableAllStores::single_store_handler (plan.rs:463).
Proposed Fix
Instead of try_join_all, use futures::future::join_all to await all spawned tasks to completion before propagating errors:
let results = join_all(handles).await;
let mut err = None;
let mut outputs = Vec::with_capacity(results.len());
for r in results {
match r.unwrap() { // JoinError only on panic
Ok(ok) => outputs.push(Ok(ok)),
Err(e) if err.is_none() => err = Some(e),
_ => {}
}
}
if let Some(e) = err {
return Err(e);
}
Ok(outputs)
Alternatively, use tokio::task::JoinSet which provides join_all() semantics and tracks all spawned tasks.
Bug Report
Problem
When using tikv-client in a context where the Tokio runtime may be dropped (e.g., a short-lived runtime created inside a worker thread), the program may panic with:
Backtrace
The panic originates from
TimerEntry::poll_elapsedintokio-1.26.0/src/runtime/time/entry.rs:550-551:Scenario
The user intends to:
TransactionClienton that runtime.snapshot.scan()withinruntime.block_on().During scan operations, tikv-client internally spawns tasks on the current runtime to shard the request across multiple regions. If one of these tasks errors and the user's runtime is short-lived enough to be dropped before the spawned tasks naturally terminate, the process panics.
Root Cause
In
request/plan.rs,RetryableMultiRegion::single_plan_handlerspawns concurrent per-shard handler tasks viatokio::spawnand waits for them usingfutures::future::try_join_all:try_join_all(fromfutures-util 0.3.28) cancels remaining futures immediately when any one future returns an error (source:try_join_all.rslines 165-168 break on first error, line 182 drops remaining futures). This means:single_shard_handlertask fails (e.g., gRPC timeout, region error, leader-not-found),try_join_alldrops the remainingJoinHandles.JoinHandle::Dropdoes not cancel the spawned task — it detaches the task, which continues running on the runtime.tokio::time::sleepcalls (via tonic's gRPC timeout wrapper or tikv-client's retry backoff) panic because the timer driver is shut down.The same pattern exists in
RetryableAllStores::single_store_handler(plan.rs:463).Proposed Fix
Instead of
try_join_all, usefutures::future::join_allto await all spawned tasks to completion before propagating errors:Alternatively, use
tokio::task::JoinSetwhich providesjoin_all()semantics and tracks all spawned tasks.