feat(sdk): add config driven sink selection#176
feat(sdk): add config driven sink selection#176namrataghadi-galileo wants to merge 6 commits intofeature/62793-add-external-sinkfrom
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
lan17
left a comment
There was a problem hiding this comment.
I like the direction of moving sink selection into a shared model, but I don't think this middle layer is ready to merge yet. The biggest issue is that the config surface is now shared between SDK and server, but the semantics still diverge. I also don't think the SDK should take ownership of caller-registered sinks on shutdown, and returning the best fanout result makes partial delivery failures too easy to miss.
| if default_backend is None: | ||
| raise RuntimeError("Default server observability backend was not provided") | ||
| return default_backend | ||
| if selection.name == REGISTERED_CONTROL_EVENT_SINK_NAME: |
There was a problem hiding this comment.
This makes registered a legal SDK value but a server startup error even though both now read the same AGENT_CONTROL_OBSERVABILITY_* surface. In any shared-env deployment, flipping the SDK to registered can take the server down. I think we need either aligned semantics here or separate config names.
| _batcher = None | ||
| _event_sink = None | ||
| global _configured_named_event_sink, _configured_named_event_sink_selection | ||
| custom_sinks = _get_custom_control_event_sinks_to_shutdown() |
There was a problem hiding this comment.
I'm uneasy with the SDK taking ownership of caller-registered sinks here. register_control_event_sink() accepts user-owned instances, but shutdown now flushes, closes, and clears them, which means shutdown() can tear down resources the SDK didn't create and a later re-init silently stops sending unless the caller re-registers everything.
| primary_result = result | ||
|
|
||
| if primary_result is None: | ||
| if best_result is None or ( |
There was a problem hiding this comment.
Picking the "best" sink result here makes partial fanout failures invisible. If one configured destination drops the batch and another accepts it, callers still see success. For a multi-sink path I'd rather not hide that kind of split-brain delivery.
Summary
Added config-driven observability sink selection across the SDK and server.
Introduced shared telemetry sink-selection primitives so observability can use the default backend, registered SDK sinks, or named custom sink factories.
Expanded tests and README examples to cover the new sink-selection behavior and lifecycle handling.
Fixed the SDK mypy issue in shutdown handling by wrapping sink lifecycle awaitables in a concrete coroutine before passing them to
asyncio.run().Scope
User-facing/API changes:
agent_control.init()now acceptsobservability_sink_nameandobservability_sink_configInternal changes:
telemetry/Out of scope:
Risk and Rollout
Risk level: medium
Rollback plan: Revert this branch to restore the previous default-only observability path. As a partial mitigation, deployments can keep
observability_sink_name=defaultto stay on the legacy behavior even with this code merged.Testing
make check— could not run in this environment becauseuvwas not available in the shellChecklist
Follow-up tasks:
make checkin a full local/CI environment withuvavailable