[GLUTEN-11895][VL] Fix SIGSEGV on IOThreadPool threads during HDFS scan#11896
[GLUTEN-11895][VL] Fix SIGSEGV on IOThreadPool threads during HDFS scan#11896guowangy wants to merge 1 commit intoapache:mainfrom
Conversation
a962aba to
368f446
Compare
|
Hold on to merge until we fix #11452 (comment) |
zhztheplayer
left a comment
There was a problem hiding this comment.
Thanks @guowangy. I wonder simply removing these detach statements will cause Java Thread object leak.
You can give it a try to see whether the heap usage keeps going up with this change. Or have a look at #11895 (comment).
In my opinions, no leak. For HDFS threads: libhdfs has its own cleanup — it calls DetachCurrentThread automatically when each thread exits, via hdfsThreadDestructor. The old code was calling DetachCurrentThread prematurely at object destruction time — before thread exit — which corrupted libhdfs's cached JNIEnv* and caused the crash. For non-HDFS native threads: attachCurrentThreadAsDaemonOrThrow attaches them as daemon on first use. They are joined by ioExecutor_.reset() inside the JVM shutdown hook before the JVM exits. Daemon threads do not block JVM shutdown. The attached-thread count is bounded by pool size, not by query or operation count. For Spark task threads and shutdown threads: these are JVM-managed and already attached — attachCurrentThreadAsDaemonOrThrow is a no-op (GetEnv returns JNI_OK). No new JavaThread is created.
I don't see heap usage keeps going up in an 8-hours testing. |
|
Thanks for the experiment @guowangy. I revisited the code and
Perhaps, using |
Should we cleanup the stale pointer at step2 and re-attach in 3? |
|
It's a libhdfs only issue, right? Does the PR work on S3, abfs, gcs? |
It requires clearing quickTlsEnv after DetachCurrentThread. That's impossible from outside libhdfs because quickTlsEnv is a static __thread variable declared inside the body of threadLocalStorageGet() — it has no external symbol, is not exported from libhdfs.so, and no public API exists to zero it. The only code that can write it is libhdfs itself. Without being able to clear quickTlsEnv, any HDFS call after DetachCurrentThread hits the fast path, sees the non-null but stale pointer, and crashes — the re-attach in step 3 never gets a chance to run.
Yes, hdfs only. Others don't have such problem. |
@zhztheplayer Good catch. spillExecutor_ is the real issue — it's created per task, and its threads get attached to the JVM via SparkAllocationListener but never detached, so JavaThread objects accumulate over time. Proposal Fix: Add a thin folly::ThreadFactory wrapper for spillExecutor_ that attaches threads at pool creation and calls DetachCurrentThread inside the thread body after all work completes. Spill threads never call libhdfs, so this is safe. Does this approach sound reasonable to you? |
This sounds reasonable to me. |
What changes are proposed in this pull request?
Fix SIGSEGV on
CPUThreadPoolthreads during HDFS scan caused byDetachCurrentThreadpoisoninglibhdfs.so's TLS-cachedJNIEnv*.Fixes #11895
Root cause
libhdfs.socachesJNIEnv*per thread in a two-level TLS structure:static __thread ThreadLocalState *quickTlsEnv(ELF linker-initialized, zero-cost read, no re-validation)pthread_getspecific(gTlsKey)(mutex-protected, only on first call per thread)After the first
AttachCurrentThreadon aCPUThreadPoolthread, libhdfs caches theJNIEnv*inquickTlsEnv. The fast path returns this pointer on all subsequent calls without checking validity.Two Gluten destructors called
vm_->DetachCurrentThread()unconditionally after JNI cleanup:JniColumnarBatchIterator::~JniColumnarBatchIterator()(cpp/core/jni/JniCommon.cc)JavaInputStreamAdaptor::Close()(cpp/core/jni/JniWrapper.cc)Both used
attachCurrentThreadAsDaemonOrThrow()followed by unconditionalDetachCurrentThread(). This was not a proper attach/detach pair:attachCurrentThreadAsDaemonOrThrowonly attaches if the thread is not already attached, butDetachCurrentThreadran regardless — detaching threads that libhdfs had attached. This freed the JVM'sJavaThreadobject whilequickTlsEnvstill held the stale pointer.The crash sequence:
CPUThreadPool21runs a preload task → libhdfs callsAttachCurrentThread, cachesJNIEnv*inquickTlsEnvDetachCurrentThread→JavaThreadfreed, butquickTlsEnvstill holds stale pointerCPUThreadPool21runs next preload task →hdfsGetPathInfo()→ libhdfs fast path returns stale env →jni_NewStringUTF(stale_env)→ SIGSEGVCPUThreadPoolthreads live for the entire executor JVM lifetime (created once inVeloxBackend::initConnector, destroyed only inVeloxBackend::tearDown), so the stale pointer persists across all subsequent queries.Confirmed by core dump analysis (
core.CPUThreadPool21.1770392from TPC-DS on YARN):RDI = 0x0(NULLJavaThread*, set byblock_if_vm_exited)R12 = 0x7f3003a52200(staleJNIEnv*from libhdfs TLS cache)Fix
Remove
DetachCurrentThread()from both destructors with the following considerations:attachCurrentThreadAsDaemonOrThrowis conditional (no-op if already attached) butDetachCurrentThreadwas unconditionalAttachCurrentThreadAsDaemon) do not block JVM shutdownfolly::ThreadLocal<HdfsFile>destructors callhdfsCloseFile()(JNI) during thread exit — detaching mid-lifetime would crash these toopthread_keydestructor (hdfsThreadDestructor) callsDetachCurrentThreadwhen the thread actually exits — proper cleanup still happens at thread exit when libhdfs is in usepthread_keyis registered, daemon-attached threads are safely reclaimed when the executor JVM exitsHow was this patch tested?
JniThreadDetachTest.testIteratorDestructorDoesNotDetachThread: on a nativestd::thread(simulatingCPUThreadPool), savesJNIEnv*(simulating libhdfs TLS cache), creates and destroys a realJniColumnarBatchIterator, then reuses the saved env forFindClass(simulating libhdfs's nexthdfsGetPathInfo). With the bug: SIGSEGV crashes the JVM. With the fix: succeeds normally.mvn surefire:test -pl backends-velox)CPUThreadPoolSIGSEGVWas this patch authored or co-authored using generative AI tooling?
Co-authored-by: Claude Opus 4.6