Revert "fix: always set ReportFullState flag in OpAMP responses (#6831)"#6878
Revert "fix: always set ReportFullState flag in OpAMP responses (#6831)"#6878ycombinator wants to merge 2 commits intoelastic:mainfrom
Conversation
|
This pull request does not have a backport label. Could you fix it @ycombinator? 🙏
|
🔍 Preview links for changed docs |
✅ Vale Linting ResultsNo issues found on modified lines! The Vale linter checks documentation changes against the Elastic Docs style guide. To use Vale locally or report issues, refer to Elastic style guide for Vale. |
michel-laterman
left a comment
There was a problem hiding this comment.
As mentioned in our standup, even if we revered this change and implemented the spec requirement to collect on seq_no drift; our scale testing should test that a server can gather the full status reports from all agents at once as we would need to ensure we are reliable if an event occurs that causes all agents to drift (restore from snapshot, network connectivity, etc).
I would find out why the test is failing and address the root cause
Yes this was my motivation for doing it this way, converting the edge cases into always cases so that we didn't have to discover them through incidents or support cases. I think reverting this to confirm it is the problem is fine, but if it is you still have to fix it. |
What is the problem this PR solves?
Since April 17, the daily 10k OpAMP-on-serverless scale test has failed every run (details). The failures correlate with the deployment of PR #6831, which set
ServerToAgentFlags_ReportFullStateon everyServerToAgentresponse to every agent.How does this PR solve the problem?
Reverts #6831 to remove the unconditional
ReportFullStateflag from every OpAMP response.This is initially a temporary revert to confirm that #6831 is the root cause of the scale test failures. Once confirmed, the
ReportFullStatelogic can be re-introduced in a more targeted way (e.g. only on enrollment or drift detection, not on every message).How to test this PR locally
No special local testing needed — the fix will be validated by the daily 10k OpAMP-on-serverless scale test returning to passing state after deployment.
Design Checklist
Checklist
Related issues