RetryableMultiRegion: try_join_all drops JoinHandles on error, detaching spawned tasks that panic on runtime shutdown

## Bug Report

### Problem

When using tikv-client in a context where the Tokio runtime may be dropped (e.g., a short-lived runtime created inside a worker thread), the program may panic with:

```
A Tokio 1.x context was found, but it is being shutdown.
```

### Backtrace

The panic originates from `TimerEntry::poll_elapsed` in `tokio-1.26.0/src/runtime/time/entry.rs:550-551`:

```
TimerEntry::poll_elapsed                          ← PANIC: runtime timer driver is shut down
  ← tokio::time::sleep::Sleep::poll               ← sleep() called inside task
    ← tonic::transport::grpc_timeout::ResponseFuture::poll  ← gRPC timeout
      ← tonic AddOrigin service call
        ← tower buffer future
          ← tikv_client::ScanRequest::dispatch
            ← tikv_client::KvRpcClient::dispatch
              ← Dispatch<Req>::execute
                ← ResolveLock<P,PdC>::execute
                  ← RetryableMultiRegion::single_shard_handler  ← spawned via tokio::spawn
                    ← tokio multi_thread worker (async task)
```

### Scenario

The user intends to:

1. Create a short-lived Tokio runtime.
2. Create a `TransactionClient` on that runtime.
3. Perform scan operations via `snapshot.scan()` within `runtime.block_on()`.
4. Eventually drop the runtime when the worker thread exits.

During scan operations, tikv-client internally spawns tasks on the current runtime to shard the request across multiple regions. If one of these tasks errors and the user's runtime is short-lived enough to be dropped before the spawned tasks naturally terminate, the process panics.

### Root Cause

In `request/plan.rs`, `RetryableMultiRegion::single_plan_handler` spawns concurrent per-shard handler tasks via `tokio::spawn` and waits for them using `futures::future::try_join_all`:

```rust
// plan.rs:122-136
for shard in shards {
    let handle = tokio::spawn(Self::single_shard_handler(...));
    handles.push(handle);
}
let results = try_join_all(handles).await?;
```

`try_join_all` (from `futures-util 0.3.28`) **cancels remaining futures immediately when any one future returns an error** (source: `try_join_all.rs` lines 165-168 break on first error, line 182 drops remaining futures). This means:

1. When one `single_shard_handler` task fails (e.g., gRPC timeout, region error, leader-not-found), `try_join_all` drops the remaining `JoinHandle`s.
2. Tokio's `JoinHandle::Drop` does **not** cancel the spawned task — it detaches the task, which continues running on the runtime.
3. If the runtime is dropped while detached tasks are still alive, the tasks' subsequent `tokio::time::sleep` calls (via tonic's gRPC timeout wrapper or tikv-client's retry backoff) panic because the timer driver is shut down.

The same pattern exists in `RetryableAllStores::single_store_handler` (plan.rs:463).

### Proposed Fix

Instead of `try_join_all`, use `futures::future::join_all` to await all spawned tasks to completion before propagating errors:

```rust
let results = join_all(handles).await;
let mut err = None;
let mut outputs = Vec::with_capacity(results.len());
for r in results {
    match r.unwrap() { // JoinError only on panic
        Ok(ok) => outputs.push(Ok(ok)),
        Err(e) if err.is_none() => err = Some(e),
        _ => {}
    }
}
if let Some(e) = err {
    return Err(e);
}
Ok(outputs)
```

Alternatively, use `tokio::task::JoinSet` which provides `join_all()` semantics and tracks all spawned tasks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RetryableMultiRegion: try_join_all drops JoinHandles on error, detaching spawned tasks that panic on runtime shutdown #534

Bug Report

Problem

Backtrace

Scenario

Root Cause

Proposed Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RetryableMultiRegion: try_join_all drops JoinHandles on error, detaching spawned tasks that panic on runtime shutdown #534

Description

Bug Report

Problem

Backtrace

Scenario

Root Cause

Proposed Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions