Skip to content

Enable A365 tracing and fix W3C baggage propagation in agentserver#46754

Open
singankit wants to merge 42 commits into
mainfrom
feature/enable-a365-tracing
Open

Enable A365 tracing and fix W3C baggage propagation in agentserver#46754
singankit wants to merge 42 commits into
mainfrom
feature/enable-a365-tracing

Conversation

@singankit
Copy link
Copy Markdown
Contributor

Summary

Enable Agent365 (A365) tracing in the agentserver packages and fix W3C baggage propagation so that incoming baggage entries (e.g. user.id) are visible to span processors on all spans.

Changes

A365 Tracing Enablement (agentserver-core)

  • Gate A365 export behind FOUNDRY_AGENT365_TRACING_ENABLED env var
  • Add agent identity resolvers (agent_id, blueprint_id, tenant_id) from env vars
  • Enable a365_enable_observability_exporter and a365_observability_scope_override
  • Wire through _FoundryEnrichmentSpanProcessor with span attributes

Streaming Context Fix (responses)

  • Capture full OTel context (span + baggage) at wrap time in _wrap_streaming_response
  • Re-attach during async iteration so baggage is available after the handler's finally block

W3C Baggage Propagation Fix (responses + invocations)

  • Use W3CBaggagePropagator().extract() to extract only baggage from incoming headers
  • Merge extracted baggage onto get_current() before adding server entries
  • This preserves span parent-child relationships while capturing incoming baggage like user.id

Tests

  • 3 baggage propagation tests for responses package
  • 3 baggage propagation tests for invocations package
  • Coverage: baggage merging, span parenting preserved, empty header safety

Root Cause

start_as_current_span(context=extracted_ctx) inside request_span makes the span current, but baggage from the extracted context does not survive through the contextmanager yield boundary to get_current() as seen by the endpoint handler. Only entries explicitly added after get_current() survive. The fix extracts baggage separately at the endpoint handler level.

singankit and others added 7 commits May 4, 2026 13:27
Conditionally enable A365 observability export via microsoft-opentelemetry
distro when both FOUNDRY_HOSTING_ENVIRONMENT and
FOUNDRY_AGENT365_TRACING_ENABLED env vars are set. Uses S2S endpoint
for token resolution in hosted environments.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…hment

- Add resolve_agent_id() with FOUNDRY_AGENT_INSTANCE_CLIENT_ID env var
  (falls back to name:version or name)
- Add resolve_agent_blueprint_id() with FOUNDRY_AGENT_BLUEPRINT_CLIENT_ID
- Add resolve_agent_tenant_id() with FOUNDRY_AGENT_TENANT_ID
- Wire all three through _FoundryEnrichmentSpanProcessor
- Make processor __init__ keyword-only

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ator

The streaming async generator runs after the request handler's finally
block detaches baggage. Fix by capturing the full OTel context (including
baggage) at wrap time and re-attaching it during iteration, so child spans
created during streaming can see baggage entries like conversation_id.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extract incoming baggage (e.g. user.id) using W3CBaggagePropagator
without re-extracting traceparent, preserving parent-child span
relationships while making caller's baggage entries visible to
downstream span processors.

Also removes stale flask/sqlalchemy imports from prior attempts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…kages

- Apply same baggage extraction fix to invocations/_invocation.py
- Add 3 baggage propagation tests for invocations package
- Add 3 baggage propagation tests for responses package
- Tests verify: baggage merging, span parenting preserved, empty header safety

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@singankit singankit marked this pull request as ready for review May 6, 2026 14:51
@singankit singankit requested a review from ankitbko as a code owner May 6, 2026 14:51
Copilot AI review requested due to automatic review settings May 6, 2026 14:51
@github-actions github-actions Bot added the Hosted Agents sdk/agentserver/* label May 6, 2026
singankit and others added 16 commits May 6, 2026 09:50
Server-added entries (response_id) are set after span starts, so
on_start processor won't see them. Test should only verify incoming
baggage merging.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ponse'

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…an start

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Thread enable_sensitive_data kwarg from AgentServerHost through
configure_observability -> _configure_tracing -> _setup_distro_export
-> use_microsoft_opentelemetry so Agent Framework SDK records prompts,
tool arguments, and results.

Defaults to True; set FOUNDRY_ENABLE_SENSITIVE_DATA=false to opt out.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add _ATTR_FOUNDRY_AGENT_TYPE constant
- Set agent_type='hosted' when FOUNDRY_HOSTING_ENVIRONMENT is set
- Only write attribute on spans with gen_ai.operation.name == invoke_agent
- Add 3 tests for agent_type scoping behavior

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace request_span() with request_context() that extracts and attaches
incoming W3C trace context (traceparent/tracestate/baggage) without creating
a span. Framework spans created inside handlers are now parented directly
under the caller's span.

Changes:
- core/_tracing.py: Add request_context(), remove request_span()
- core/_base.py: Simplify AgentServerHost.request_context() wrapper
- invocations/_invocation.py: Remove span creation/attrs/end logic
- responses/_endpoint_handler.py: Same simplification
- Remove agent_type from enrichment processor (no invoke_agent span)
- Update all tests to validate context propagation without server span

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces the weak status-code-only assertion with a test that creates a
span inside the handler and verifies trace ID and parent span ID match
the incoming traceparent header.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The request_context method was added in 2.0.0b4 (as part of the
invoke_agent span removal). Update invocations and responses packages
to require the correct minimum version.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Revert min dependency back to >=2.0.0b3 and add hasattr guards
so that invocations/responses gracefully degrade when running
against core 2.0.0b3 (which lacks request_context). This fixes
the mindependency CI check.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Creates a real OTel caller span, injects its trace context into
the request headers, creates a child span in the invocation handler,
and validates the handler span is correctly parented under the caller.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…dd baggage tests

- Add invocation_id baggage-to-span-attribute mapping in _FoundryEnrichmentSpanProcessor.on_start
- Add core tests for invocation_id enrichment (from baggage, no baggage, child propagation)
- Add invocations test verifying SDK-set baggage (invocation_id, session_id) available in handler
- Add responses test verifying SDK-set baggage (response_id, conversation_id, streaming) available in handler
- Add invocations integration test verifying baggage entries stamped as span attributes via enricher

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tate

In CI environments where microsoft-opentelemetry distro is installed and
APPLICATIONINSIGHTS_CONNECTION_STRING is set, non-tracing tests would
trigger use_microsoft_opentelemetry() on the first server construction,
installing a global TracerProvider that breaks traceparent-propagation
tests.

Fix:
- Add session-scoped _prevent_distro_setup fixture in both invocations
  and responses conftest.py that mocks _setup_distro_export for all tests
- Pass configure_observability=None in conftest factory functions
- Pass configure_observability=None in test_tracing_disabled_by_default
  and test_no_tracing_when_no_endpoints

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace synthetic traceparent string with real OTel span + inject()
pattern. This ensures correct trace context propagation regardless of
which TracerProvider or auto-instrumentation (e.g. microsoft-opentelemetry)
is active in the CI environment.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@@ -24,6 +24,9 @@

_ENV_FOUNDRY_AGENT_NAME = "FOUNDRY_AGENT_NAME"
_ENV_FOUNDRY_AGENT_VERSION = "FOUNDRY_AGENT_VERSION"
_ENV_FOUNDRY_AGENT_INSTANCE_CLIENT_ID = "FOUNDRY_AGENT_INSTANCE_CLIENT_ID"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

@RaviPidaparthi RaviPidaparthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add change logs to all packages

singankit and others added 19 commits May 14, 2026 15:48
…sensitive data

Replace FOUNDRY_ENABLE_SENSITIVE_DATA with the standard OpenTelemetry
GenAI semantic convention env var for controlling sensitive data capture.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ability

Replace synthetic traceparent strings with real OTel span + inject()
pattern in both streaming and non-streaming span parenting tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rewrite the enrichment processor test to run in isolation without
TestClient/ASGI, avoiding CI-specific context propagation differences.
The full baggage flow through the invocations server is already covered
by test_sdk_set_baggage_available_in_handler.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Resolve conflicts in _invocation.py: keep error classification from
main but remove span-attribute calls (otel_span, _safe_set_attrs,
end_span, record_error) since this branch does not create invoke spans.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- core/_base.py: break long line for env var read
- invocations/_invocation.py: remove unused StreamingResponse import
- responses/_endpoint_handler.py: remove unused RequestValidationError and
  build_create_otel_attrs imports, break long context manager line

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Pass headers directly to propagator instead of using _extract_w3c_carrier
- Remove _extract_w3c_carrier helper and _W3C_HEADERS constant
- Add debug log for attached span context (type, trace_id, trace_flags)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n_from_context()

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tors

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces the manual span with proper Starlette instrumentation
that creates a SERVER span per request with trace context propagation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…instrumentor

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Patch OpenTelemetryMiddleware.__init__ to set exclude_receive_span and
exclude_send_span to True, suppressing the per-event INTERNAL spans
(http receive / http send) that the Starlette OTel instrumentor creates.

The upstream ASGI middleware already supports these attributes via its
exclude_spans constructor parameter, but the Starlette instrumentor does
not expose them yet (tracked in opentelemetry-python-contrib#3725).
The monkeypatch can be removed once upstream adds constructor support.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Temporarily remove BaggageMiddleware from the middleware stack to test
whether its context.attach() call is causing the NonRecordingSpan crash
in azure-ai-projects _responses_instrumentor.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the Starlette OTel instrumentor (which created noisy SERVER and
ASGI internal spans) and replace with a lightweight TraceContextMiddleware
that only extracts W3C traceparent/tracestate/baggage from incoming
requests. This ensures downstream spans (from MAF/agent-framework) are
children of the caller's trace without creating extra spans.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TraceContextTextMapPropagator was not importable from
opentelemetry.trace.propagation. Use the global propagate.extract()
instead which handles both TraceContext and Baggage propagation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
No longer needed since we replaced StarletteInstrumentor with our own
lightweight TraceContextMiddleware. Fixes CI 'Analyze dependencies'
failure (dependency not in shared_requirements.txt).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Hosted Agents sdk/agentserver/*

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants