TURN refresh hardening + ChannelBind/CreatePermission timer fix#16
Merged
Conversation
Copilot created this pull request from a session on behalf of
havardgraff
April 29, 2026 11:26
View session
Agent-Logs-Url: https://github.com/pexip/libnice/sessions/42c2bc16-14cd-4459-b532-abac75868dc7 Co-authored-by: havardgraff <1926313+havardgraff@users.noreply.github.com>
… NONCE Agent-Logs-Url: https://github.com/pexip/libnice/sessions/55561bb3-7274-4462-a993-de47f9f6af9d Co-authored-by: havardgraff <1926313+havardgraff@users.noreply.github.com>
Agent-Logs-Url: https://github.com/pexip/libnice/sessions/3986912d-aee1-4c33-a276-e5d1ad6e857c Co-authored-by: havardgraff <1926313+havardgraff@users.noreply.github.com>
…of tearing down Agent-Logs-Url: https://github.com/pexip/libnice/sessions/7c12d630-5e66-4254-9815-9c13b83eb81a Co-authored-by: havardgraff <1926313+havardgraff@users.noreply.github.com>
…elay cand is dropped Agent-Logs-Url: https://github.com/pexip/libnice/sessions/a396f8ca-365c-48ad-b673-1a8c88d66038 Co-authored-by: havardgraff <1926313+havardgraff@users.noreply.github.com>
Agent-Logs-Url: https://github.com/pexip/libnice/sessions/a396f8ca-365c-48ad-b673-1a8c88d66038 Co-authored-by: havardgraff <1926313+havardgraff@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
This PR fixes a long-standing class of TURN allocation bugs in libnice that caused relay candidates to silently stop refreshing — eventually killing media flow on long calls — and a related shutdown crash. The root causes were:
GMainContext— they ran but could not be cancelled, leaking sources and crashing on socket close withSource ID … not found.The PR is split into logically-grouped commits, ships three new regression tests, and adds descriptive
GST_INFO/GST_WARNINGlogs around every suspicious branch so future failures are diagnosable from a single log.Why managers should care
agent/conncheck.candsocket/turn.c. No public API changes, no behaviour change on the happy path.Why developers should care — root causes & fixes
1.
GMainContextmix-up on TURN socket timers (socket/turn.c)agent->main_context, but the cleanup path tried to remove them frompriv->ctx(the socket's own context).g_source_remove_by_user_data()therefore returned NULL →GLib-CRITICAL **: Source ID … not foundand on shutdown a crash.priv->ctxconsistently. Lowered channel-bind refresh to 240 s (was 540) to stay safely below the 600 s server lifetime.priv_source_remove_with_contextfinds no source — flushes out any remaining mismatches in CI.b5bb7f2,dcc6af9,1b829d7,a099a24.2. Stale-NONCE handling on allocation Refresh (
agent/conncheck.c,socket/turn.c)The server rotates
NONCEperiodically (RFC 5389 §10.2). Our handling was broken in four ways; each commit fixes one:84b2e48)995c318)cdisco->stun_resp_msgat allocation time. (e242069)14f589d→5313ce6for the cap,3c82a14for tear-down, then07f0549superseding it with the safer re-arm.)3. Refresh-interval scheduling & test ergonomics (
agent/conncheck.c)NICE_TURN_EXPIRE_TIMEOUTtest-only env knob, sibling to the existingNICE_TURN_BINDING_TIMEOUT/NICE_TURN_PERMISSION_TIMEOUTknobs insocket/turn.c. Lets unit tests drive the Refresh path in seconds instead of minutes. NOT public API. (6415153)g_once_init_enter— the cache caused the knob to be silently ignored if any earlier code path in the same process evaluated the helper beforeg_setenv()ran. (efc1453)GST_WARNINGwhen allocation succeeds butdiscovery_add_relay_candidate()returns NULL — that's the silent branch wherepriv_add_new_turn_refresh()is skipped and no Refresh is ever scheduled. (efc1453)4. Idempotent allocation release (
agent/conncheck.c)RefreshwithLIFETIME=0) was being sent on every refresh tick after teardown. Now sent exactly once. (228dd34)5. Misc correctness drive-bys
53c7298,f46d0ac,f01167b)socket/tcp-bsd.c: parenthesisation fix.agent/agent.c: format-string argument fix.agent/conncheck.c: ms overflow fix in scheduling math. (b12e38b)Diagnostics added throughout
Every fix above is paired with a
GST_INFOorGST_WARNINGat the decision point (allocation succeeds, refresh tick fires, stale-nonce counter increments, source removal finds nothing, relay candidate is dropped, etc.). The goal is that the next time a similar issue appears in the field, the log alone is enough to identify which of the A/B/C variants is occurring without needing a packet capture.Tests added (
tests/)test-mainctx-cross-thread.c— regression test for the GMainContext mix-up bug. Runs the agent on a non-default main context and asserts that socket close completes withoutSource ID not foundwarnings.test-turn-refresh-interval.c— invariant tests forpriv_turn_lifetime_to_refresh_interval(): never returns 0, always strictly less than the granted lifetime, monotonic in lifetime, honours the env override.test-turn-timeout-env.c— exercisesNICE_TURN_BINDING_TIMEOUT,NICE_TURN_PERMISSION_TIMEOUT, andNICE_TURN_EXPIRE_TIMEOUTknobs end-to-end, including the new "re-readable" semantics.tests/meson.build. (Autotools wiring intentionally not added — the tree's autotools build requires Python 2 which is end-of-life; meson is the supported build path.)Files changed
agent/conncheck.c(+450/-90)agent/discovery.{c,h}agent/agent.c,agent/agent-priv.h,agent/conncheck.hsocket/turn.{c,h}(+250/-25)socket/tcp-bsd.ctests/test-mainctx-cross-thread.c(new)tests/test-turn-refresh-interval.c(new)tests/test-turn-timeout-env.c(new)tests/meson.build,meson.buildChecklist
LIFETIME=0release exactly onceNICE_TURN_EXPIRE_TIMEOUTfor ALLOCATE refresh schedulingNICE_TURN_EXPIRE_TIMEOUTre-readable per refresh-interval computationtests/meson.build