RDKB-64644: Fix SE051 ENGINE memory leak in telemetry HTTP pool#358
RDKB-64644: Fix SE051 ENGINE memory leak in telemetry HTTP pool#358tabbas651 wants to merge 4 commits intosupport/1.8from
Conversation
On HROT platforms using the SE051 secure element (XB10/XER10/SXB10), the e4sss OpenSSL ENGINE accumulates per-session hardware state (APDU session objects, secure channel buffers) across mTLS operations. Unlike SE050 (XB8), the SE051 ENGINE allocates larger per-session state that is not released by curl's connection cache management, OPENSSL_thread_stop, or ERR_clear_error() alone. This causes a progressive memory leak (~5MB baseline increase + ~5MB growth over 10+ days) in the telemetry process. Fix: - Add release_hrot_engine_state() helper that calls ENGINE_finish() on the e4sss ENGINE to release its functional reference and close the hardware session. A fresh ENGINE_init() occurs automatically on the next TLS handshake via the openssl.cnf auto-init mechanism. - Call release_hrot_engine_state() on the GET success path and POST completion path to ensure hardware session state is freed after each HTTP operation. - Add ERR_clear_error() in both GET and POST xPKI retry loops to drain the OpenSSL error queue between retries, preventing ENGINE-internal error state accumulation. The fix is a no-op on non-HROT platforms (ENGINE_by_id returns NULL).
There was a problem hiding this comment.
Pull request overview
This PR addresses a long-running memory leak on HROT platforms (SE051/e4sss OpenSSL ENGINE) by explicitly releasing per-session engine state after telemetry HTTP mTLS operations and draining OpenSSL error state between xPKI retries.
Changes:
- Adds an
release_hrot_engine_state()helper that callsENGINE_finish()/ENGINE_free()for thee4sssengine. - Invokes engine-state release after GET and POST requests.
- Adds
ERR_clear_error()inside GET/POST xPKI retry loops to drain the OpenSSL error queue between retries.
| if (e != NULL) | ||
| { | ||
| // ENGINE_finish() releases the functional reference obtained by | ||
| // ENGINE_init(), which triggers the engine's finish callback to | ||
| // close the hardware session (SE051 APDU channel). | ||
| ENGINE_finish(e); | ||
| // ENGINE_free() releases the structural reference from ENGINE_by_id() | ||
| ENGINE_free(e); | ||
| } |
There was a problem hiding this comment.
release_hrot_engine_state() calls ENGINE_by_id("e4sss"); when the engine is not present (non-HROT platforms), OpenSSL APIs typically push a failure onto the thread-local error queue. Since this helper is called after ERR_clear_error()/OPENSSL_thread_stop in some paths, it can re-populate the error queue and defeat the intent of draining it. Consider clearing the OpenSSL error queue inside this helper when ENGINE_by_id returns NULL (and/or after a failed ENGINE_finish) so it is a true no-op on platforms without the engine.
| if (e != NULL) | |
| { | |
| // ENGINE_finish() releases the functional reference obtained by | |
| // ENGINE_init(), which triggers the engine's finish callback to | |
| // close the hardware session (SE051 APDU channel). | |
| ENGINE_finish(e); | |
| // ENGINE_free() releases the structural reference from ENGINE_by_id() | |
| ENGINE_free(e); | |
| } | |
| if (e == NULL) | |
| { | |
| // On non-HROT platforms, ENGINE_by_id() may push a lookup failure | |
| // onto the thread-local OpenSSL error queue. Clear it so this helper | |
| // remains a true no-op for callers that have already drained errors. | |
| ERR_clear_error(); | |
| return; | |
| } | |
| // ENGINE_finish() releases the functional reference obtained by | |
| // ENGINE_init(), which triggers the engine's finish callback to | |
| // close the hardware session (SE051 APDU channel). | |
| if (ENGINE_finish(e) != 1) | |
| { | |
| // Cleanup should not leave stale OpenSSL errors behind. | |
| ERR_clear_error(); | |
| } | |
| // ENGINE_free() releases the structural reference from ENGINE_by_id() | |
| ENGINE_free(e); |
There was a problem hiding this comment.
@copilot please apply the recommended changes
There was a problem hiding this comment.
Applied in commit a04be56. release_hrot_engine_state() now calls ERR_clear_error() when ENGINE_by_id returns NULL (ensuring it's a true no-op on non-HROT platforms) and also when ENGINE_finish() returns failure (so stale OpenSSL errors aren't left behind).
| // Release HROT engine hardware session state. On SE051 platforms the | ||
| // e4sss ENGINE accumulates APDU session objects across TLS operations; | ||
| // this explicitly closes the hardware session after each request. | ||
| release_hrot_engine_state(); | ||
|
|
||
| release_pool_handle(idx); | ||
|
|
There was a problem hiding this comment.
release_hrot_engine_state() is called while this request is still counted as active (release_pool_handle() happens after), and while other HTTP operations may be in-flight (pool_size > 1). Since ENGINE_finish() affects process-global engine state, this can race with other concurrent curl/OpenSSL operations and potentially break ongoing mTLS handshakes. Suggest deferring engine release until after release_pool_handle(), and only performing it when active_requests drops to 0 under pool_mutex (or protect engine release with a dedicated global lock/refcount).
| // Release HROT engine hardware session state. On SE051 platforms the | |
| // e4sss ENGINE accumulates APDU session objects across TLS operations; | |
| // this explicitly closes the hardware session after each request. | |
| release_hrot_engine_state(); | |
| release_pool_handle(idx); | |
| release_pool_handle(idx); | |
| // Release HROT engine hardware session state only after this request is no | |
| // longer counted as active, and only when no other pooled HTTP operations | |
| // are still in flight. The ENGINE state is process-global, so this check | |
| // must be serialized with the active request count. | |
| pthread_mutex_lock(&pool_mutex); | |
| if (active_requests == 0) | |
| { | |
| // On SE051 platforms the e4sss ENGINE accumulates APDU session objects | |
| // across TLS operations; explicitly close the hardware session only | |
| // when the pool has no active users. | |
| release_hrot_engine_state(); | |
| } | |
| pthread_mutex_unlock(&pool_mutex); |
| // Release HROT engine hardware session state after POST operation. | ||
| release_hrot_engine_state(); |
There was a problem hiding this comment.
release_hrot_engine_state() is invoked before release_pool_handle(), so the request is still considered active and other requests may still be running (pool_size allows concurrency). Because ENGINE_finish() updates global engine state, this should be synchronized with other HTTP operations to avoid closing the engine/session while another thread is using it. Consider releasing the pool handle first, then (under pool_mutex) only calling release_hrot_engine_state() when active_requests reaches 0, or add a dedicated engine mutex.
| if(curl_code != CURLE_OK || http_code != 200) | ||
| { | ||
| T2Error("%s: Failed to establish connection using xPKI certificate: %s, Curl failed : %d\n", __func__, pCertFile, curl_code); | ||
| // Drain OpenSSL error queue between retries to prevent |
There was a problem hiding this comment.
Indentation uses a tab before the comment here (misaligned with surrounding spaces). Please align to the file’s indentation style to keep diffs clean and readable.
| // Drain OpenSSL error queue between retries to prevent | |
| // Drain OpenSSL error queue between retries to prevent |
| if(curl_code != CURLE_OK || http_code != 200) | ||
| { | ||
| T2Error("%s: Failed to establish connection using xPKI certificate: %s, curl failed: %d (entry %d)\n", __func__, pCertFile, curl_code, idx); | ||
| // Drain OpenSSL error queue between retries to prevent |
There was a problem hiding this comment.
Indentation uses a tab before the comment here (misaligned with surrounding spaces). Please align to the file’s indentation style to keep diffs clean and readable.
| // Drain OpenSSL error queue between retries to prevent | |
| // Drain OpenSSL error queue between retries to prevent |
| static void release_hrot_engine_state(void) | ||
| { | ||
| ENGINE *e = ENGINE_by_id("e4sss"); |
There was a problem hiding this comment.
New behavior introduced here (ENGINE_by_id/ENGINE_finish/ENGINE_free) isn’t covered by the existing MultiCurlInterface unit tests. Consider adding tests that mock the OpenSSL ENGINE APIs (similar to the existing curl_* mocks) so we can verify the engine-state release path (including the no-engine case and any synchronization/gating that’s needed).
|
@copilot apply changes based on the comments in this thread |
…ailed ENGINE_finish Agent-Logs-Url: https://github.com/rdkcentral/telemetry/sessions/5211fc56-fa4f-495d-8d93-a99f79d82477 Co-authored-by: tabbas651 <74683978+tabbas651@users.noreply.github.com>
…icurlinterface.c Agent-Logs-Url: https://github.com/rdkcentral/telemetry/sessions/e62a6030-a368-4f32-b8e2-762f1f6f32f3 Co-authored-by: tabbas651 <74683978+tabbas651@users.noreply.github.com>
Applied all outstanding review feedback in commit
|
|
@copilot we have L2 Integration Tests got failed |
…ngine_state Agent-Logs-Url: https://github.com/rdkcentral/telemetry/sessions/b70848a4-0231-4578-82d1-8bdf80fa2ae2 Co-authored-by: tabbas651 <74683978+tabbas651@users.noreply.github.com>
Fixed in commit |
On HROT platforms using the SE051 secure element (XB10/XER10/SXB10), the e4sss OpenSSL ENGINE accumulates per-session hardware state (APDU session objects, secure channel buffers) across mTLS operations. Unlike SE050 (XB8), the SE051 ENGINE allocates larger per-session state that is not released by curl's connection cache management, OPENSSL_thread_stop, or ERR_clear_error() alone.
This causes a progressive memory leak (~5MB baseline increase + ~5MB growth over 10+ days) in the telemetry process.
Fix:
The fix is a no-op on non-HROT platforms (ENGINE_by_id returns NULL).