Enhancement: Add drift detection and automatic reconciliation#668
Enhancement: Add drift detection and automatic reconciliation#668eshulman2 wants to merge 1 commit intok-orc:mainfrom
Conversation
mandre
left a comment
There was a problem hiding this comment.
What part of the code needs changing? I expect we detail how shouldReconcile changes.
3134799 to
a9f9abf
Compare
|
|
||
| 1. On the next reconciliation, ORC attempts to fetch the resource by the ID stored in `status.id` | ||
| 2. If not found and the resource was originally created by ORC (not imported), ORC recreates it | ||
| 3. The new resource ID is stored in `status.id` |
There was a problem hiding this comment.
My question wasn't really about the obvious case where you'd delete out of band a resource for which you have a weak dependency, but for where we had hard dependency. An example with Subnet -> Network would be more telling. What happens if a network is re-created? Will the subnet be recreated as well?
Inconsistent states can happen, and will happen. Someone who force-deleted a resource, a bug in OpenStack, an operator who made changes to the database directly... We should explain what we would do whenever we will have that case.
This probably deserves a separate section.
| **Behavior when drift detection is disabled** (`resyncPeriod: 0`): External deletion remains a terminal error (current behavior preserved). Resource recreation only occurs when drift detection is enabled and a periodic resync discovers the missing resource. This maintains backwards compatibility. | ||
|
|
||
| For **imported resources** that are deleted externally, this is always a terminal error regardless of drift detection settings, because the resource was not created by ORC and recreating it would not restore the original resource. | ||
|
|
There was a problem hiding this comment.
What happens to a resource that was in a terminal error (due to missing openstack resource) prior to enabling drift detection? I believe we should clarify that we won't reconcile resources that are in terminal error.
There was a problem hiding this comment.
I believe we should actually reconcile them even if they are in terminal error as this would actually solve #667 which is unsolvable for users right now and forces users to delete and re-create the resource for already solved semi-transient issues currently addressed as terminal irrecoverable errors (like quota space being freed up by another machine being deleted). The periodic resync provides a recovery path without requiring users to manually touch the spec to trigger reconciliation.
Proposal for drift detection feature.
a9f9abf to
eda8a6f
Compare
Proposal for drift detection feature.