feat(sdk): add otel support#177
feat(sdk): add otel support#177namrataghadi-galileo wants to merge 2 commits intofeature/61576-add-config-driven-sink-selectionfrom
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
lan17
left a comment
There was a problem hiding this comment.
I like having a built-in OTEL path, but I don't think this version is safe to merge yet. Right now the SDK can tell callers observability is working while the OTEL sink is effectively inert, and the caching logic doesn't compose cleanly with the new otel_* settings. I'd also mark failed control executions as OTEL errors rather than only attaching an exception object.
| """Sink that accepts events but intentionally emits nothing.""" | ||
|
|
||
| def write_events(self, events: Sequence[ControlExecutionEvent]) -> SinkResult: | ||
| return SinkResult(accepted=len(events), dropped=0) |
There was a problem hiding this comment.
This is acknowledging delivery even though nothing is emitted. That means add_event() returns True and is_observability_enabled() stays truthy while observability is effectively off. I think this should fail closed like the missing named-sink path in the previous PR, not silently ack drops.
|
|
||
| def _resolve_otel_sink_config(config: JSONObject) -> OTELSinkConfig: | ||
| """Resolve OTEL sink config from settings with per-sink overrides.""" | ||
| settings = get_settings() |
There was a problem hiding this comment.
This factory is reading hidden global otel_* settings, but the named-sink cache still keys only on (name, config). After the first resolution, changing otel_endpoint, headers, service name, or the enabled flag with the same sink selection won't rebuild the provider. Either the fully resolved OTEL config needs to live in observability_sink_config, or the cache key needs to include these settings.
| start_time=span_data.start_time_unix_nano, | ||
| ) | ||
| span.set_attributes(span_data.attributes) | ||
| if span_data.error_message: |
There was a problem hiding this comment.
Once we're emitting spans for failed control executions, I think we should set OTEL span status to ERROR here rather than only recording an exception. A lot of backends key alerting and red/error views off status, not exception payloads alone.
Summary
Added configurable observability sink selection for the Python SDK and server so control events can be routed to the default backend, registered custom sinks, or named sink factories.
Added a built-in OpenTelemetry sink for the SDK, including OTLP configuration via settings/environment variables.
Updated docs and exports so custom sink registration and OTEL usage are available as public integration points.
Scope
User-facing/API changes:
observability_sink_nameandobservability_sink_configto SDK initialization/configurationotelextra and OTEL-related environment variablesInternal changes:
telemetry/Out of scope:
Risk and Rollout
Risk level: medium
Rollback plan: Revert the sink-selection/OTEL changeset to restore the previous default SDK-to-server observability path only. If needed, disable custom routing by keeping
observability_sink_name=defaultand not installing/configuring theotelextra.Testing
make check— typecheck was attempted but full localmake checkwas not run becauseuvwas unavailable in this environmentChecklist