Skip to content

feat(sdk): add config driven sink selection#176

Open
namrataghadi-galileo wants to merge 6 commits intofeature/62793-add-external-sinkfrom
feature/61576-add-config-driven-sink-selection
Open

feat(sdk): add config driven sink selection#176
namrataghadi-galileo wants to merge 6 commits intofeature/62793-add-external-sinkfrom
feature/61576-add-config-driven-sink-selection

Conversation

@namrataghadi-galileo
Copy link
Copy Markdown
Contributor

Summary

Added config-driven observability sink selection across the SDK and server.

Introduced shared telemetry sink-selection primitives so observability can use the default backend, registered SDK sinks, or named custom sink factories.

Expanded tests and README examples to cover the new sink-selection behavior and lifecycle handling.

Fixed the SDK mypy issue in shutdown handling by wrapping sink lifecycle awaitables in a concrete coroutine before passing them to asyncio.run().

Scope

User-facing/API changes:

  • SDK agent_control.init() now accepts observability_sink_name and observability_sink_config
  • SDK exports new sink registration helpers for external/custom observability sinks
  • Server observability config now supports sink selection and sink config via environment/settings
  • README includes a new example for registering and selecting an external sink

Internal changes:

  • Added shared sink-selection models/registry in telemetry/
  • Refactored SDK observability to resolve active sinks from config rather than always using the built-in batcher
  • Refactored server startup to resolve an observability backend from sink selection
  • Added server/SDK/telemetry tests for sink selection, lifecycle, and event delivery paths
  • Adjusted SDK shutdown typing to satisfy mypy

Out of scope:

  • No changes to core control evaluation logic
  • No new built-in third-party sink implementation is included in this branch
  • No UI work or rollout automation is included

Risk and Rollout

Risk level: medium

Rollback plan: Revert this branch to restore the previous default-only observability path. As a partial mitigation, deployments can keep observability_sink_name=default to stay on the legacy behavior even with this code merged.

Testing

  • Added or updated automated tests
  • Ran make check — could not run in this environment because uv was not available in the shell
  • Manually verified behavior

Checklist

  • Linked issue/spec (if applicable)
  • Updated docs/examples for user-facing changes
  • Included any required follow-up tasks

Follow-up tasks:

  • Run make check in a full local/CI environment with uv available
  • Confirm any downstream custom sink integrations against the new SDK/server registration APIs

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 16, 2026

@namrataghadi-galileo namrataghadi-galileo changed the base branch from main to feature/62793-add-external-sink April 16, 2026 22:19
Copy link
Copy Markdown
Contributor

@lan17 lan17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the direction of moving sink selection into a shared model, but I don't think this middle layer is ready to merge yet. The biggest issue is that the config surface is now shared between SDK and server, but the semantics still diverge. I also don't think the SDK should take ownership of caller-registered sinks on shutdown, and returning the best fanout result makes partial delivery failures too easy to miss.

if default_backend is None:
raise RuntimeError("Default server observability backend was not provided")
return default_backend
if selection.name == REGISTERED_CONTROL_EVENT_SINK_NAME:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes registered a legal SDK value but a server startup error even though both now read the same AGENT_CONTROL_OBSERVABILITY_* surface. In any shared-env deployment, flipping the SDK to registered can take the server down. I think we need either aligned semantics here or separate config names.

_batcher = None
_event_sink = None
global _configured_named_event_sink, _configured_named_event_sink_selection
custom_sinks = _get_custom_control_event_sinks_to_shutdown()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm uneasy with the SDK taking ownership of caller-registered sinks here. register_control_event_sink() accepts user-owned instances, but shutdown now flushes, closes, and clears them, which means shutdown() can tear down resources the SDK didn't create and a later re-init silently stops sending unless the caller re-registers everything.

primary_result = result

if primary_result is None:
if best_result is None or (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Picking the "best" sink result here makes partial fanout failures invisible. If one configured destination drops the batch and another accepts it, callers still see success. For a multi-sink path I'd rather not hide that kind of split-brain delivery.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants