Skip to content

feat: respond to OpAMP RequestInstanceUid flag with agent_identification for newly enrolled agents#6834

Open
michel-laterman wants to merge 11 commits intoelastic:mainfrom
michel-laterman:opamp-request-instance-uid
Open

feat: respond to OpAMP RequestInstanceUid flag with agent_identification for newly enrolled agents#6834
michel-laterman wants to merge 11 commits intoelastic:mainfrom
michel-laterman:opamp-request-instance-uid

Conversation

@michel-laterman
Copy link
Copy Markdown
Contributor

@michel-laterman michel-laterman commented Apr 14, 2026

What is the problem this PR solves?

Fleet-server ignores the AgentToServer.flags.RequestInstanceUid flag and does not respond with ServerToAgent.agent_identification, which is required by the OpAMP spec.

How does this PR solve the problem?

When an agent sets the RequestInstanceUid flag, fleet-server treats the message as a new enrollment regardless of whether an agent with the incoming instance_uid already exists:

  1. handleMessage checks the flag early and, when set, skips the findEnrolledAgent lookup entirely.
  2. enrollAgent generates a fresh UUID v7 and uses it as the agent's ID; the new agent document is stored in .fleet-agents under that UID.
  3. The response includes AgentIdentification.NewInstanceUid with the generated UID. The agent is expected to adopt the new UID for subsequent messages per the spec.
  4. The response's InstanceUid still echoes the incoming instance_uid per the spec.

If agent_disconnect is set together with RequestInstanceUid (or for an unenrolled agent), fleet-server returns a BadRequest error response.

Reassigning an already-enrolled agent's instance UID without re-enrollment is intentionally out of scope for this change. Doing so would require decoupling the .fleet-agents document _id from instance_uid, which today are the same value (consistent with the rest of fleet-server).

How to test this PR locally

  1. Configure an OpenTelemetry Collector to connect to fleet-server over OpAMP (see docs/opamp.md).
  2. Send an AgentToServer message with flags set to 1 (RequestInstanceUid). The response should include agent_identification.new_instance_uid with a valid UUID v7 distinct from the incoming instance_uid, and a new document in .fleet-agents should exist under the generated UID.
  3. Confirm a message without the flag continues to behave as before (existing agents check in normally; unknown agents auto-enroll under the incoming UID).

Design Checklist

  • I have ensured my design is stateless and will work when multiple fleet-server instances are behind a load balancer.
  • I have or intend to scale test my changes, ensuring it will work reliably with 100K+ agents connected.
  • I have included fail safe mechanisms to limit the load on fleet-server: rate limiting, circuit breakers, caching, load shedding, etc.

Checklist

  • I have added tests that prove my fix is effective or that my feature works
  • I have made corresponding changes to the documentation
  • I have added an entry in ./changelog/fragments using the changelog tool

Related issues

@michel-laterman michel-laterman requested a review from a team as a code owner April 14, 2026 18:29
@michel-laterman michel-laterman added enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Apr 14, 2026
@michel-laterman michel-laterman self-assigned this Apr 14, 2026
@michel-laterman michel-laterman marked this pull request as draft April 14, 2026 18:29
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 14, 2026

This pull request does not have a backport label. Could you fix it @michel-laterman? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 14, 2026

🔍 Preview links for changed docs

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 14, 2026

✅ Vale Linting Results

No issues found on modified lines!


The Vale linter checks documentation changes against the Elastic Docs style guide.

To use Vale locally or report issues, refer to Elastic style guide for Vale.

Comment thread internal/pkg/api/handleOpAMP.go Outdated
michel-laterman and others added 6 commits April 21, 2026 09:49
When an agent sets the RequestInstanceUid flag in AgentToServer.flags,
fleet-server now generates a new UUID v7 and returns it in
ServerToAgent.agent_identification.new_instance_uid as required by the
OpAMP spec.

Closes elastic#6789

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When generating a new instance UID, create a new .fleet-agents document
under the new ID (preserving agent metadata) and delete the old document
so the agent record carries over rather than forcing a re-enrollment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…directly

Adds TestHandleMessageNewEnrollmentWithRequestInstanceUid which verifies
that when an unenrolled agent sets RequestInstanceUid, enrollment uses
the new UID as the agent ID and reassignAgentID is not called.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When an enrolled agent requests a new instance UID, reassignAgentID now
applies the checkin fields (status, health, capabilities, effective
config, timestamps) to the agent document before creating it. This
eliminates the separate updateAgent/CheckIn call, reducing the write
operations for the reassignment path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reassigning an enrolled agent's instance UID is not yet supported;
reassignAgentID now returns errors.ErrUnsupported so enrolled agents
requesting a new UID get an error response. New enrollments continue
to work — the generated UID is used as the agent ID during enrollment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@michel-laterman michel-laterman changed the title feat: respond to OpAMP RequestInstanceUid flag with agent_identification feat: respond to OpAMP RequestInstanceUid flag with agent_identification for newly enrolled agents Apr 21, 2026
@michel-laterman michel-laterman marked this pull request as ready for review April 21, 2026 21:42
Comment thread internal/pkg/api/handleOpAMP.go Outdated
},
}
}
newInstanceUID = &uid
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we set instanceUID here to &uid? I think it might simplify some of the code later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the spec, the response's instance_uid value must match the request's even if the request has the new instance_uid flag set

Comment thread internal/pkg/api/handleOpAMP.go Outdated
Co-authored-by: Shaunak Kashyap <ycombinator@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[OpAMP] fleet-server should respond to AgentToServer.flags.RequestInstanceUid

3 participants