Skip to content

fix(api): persist nested column changes on optimistic-locking PATCH#28837

Open
harshach wants to merge 5 commits into
mainfrom
harshach/es-read-consistency-tag-patching
Open

fix(api): persist nested column changes on optimistic-locking PATCH#28837
harshach wants to merge 5 commits into
mainfrom
harshach/es-read-consistency-tag-patching

Conversation

@harshach

@harshach harshach commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Describe your changes:

Fixes #28876

I made the If-Match (optimistic-locking) PATCH path run the entity-specific updater, because it was resolving the base EntityUpdater and silently dropping nested column tag/description changes — the server returned 200 with the change echoed in the body but never persisted it (top-level fields were unaffected, which hid the bug). I also closed two adjacent search write-path gaps and wired the Python SDK to actually exercise ETags so concurrent column-tag patches are safe instead of last-write-wins.

Type of change:

  • Bug fix

High-level design:

  • Root cause / fix: EntityRepository has three getUpdater overloads. Normal PATCH uses the 4-arg overload, which each repository overrides to return its entity-specific updater (e.g. TableUpdater, whose entitySpecificUpdate runs updateColumns). The optimistic PATCH used the 5-arg overload, which no repository overrides, so it fell back to the base EntityUpdater whose entitySpecificUpdate is a no-op — dropping every nested column change under If-Match, for all entities with a custom updater. Fix: the base 5-arg getUpdater now delegates to the overridden 4-arg one and sets useOptimisticLocking via a setter.
  • Write-path hardening: lineage ES writes route through the staged rebuild index (getWriteIndexName) so edges aren't dropped on the alias swap mid-reindex; the ElasticSearch updateByFqnPrefix gains conflicts(Proceed) for parity with the OpenSearch path.
  • Client-side concurrency (opt-in): ETagResponseFilter now emits the ETag on PUT/POST/PATCH responses; the Python SDK's patch_column_tags / patch_column_descriptions capture the ETag, send If-Match, and retry on 412. It stays opt-in so bulk ingestion keeps its whole-array last-write-wins behavior.

Tests:

Use cases covered

  • A column tag/description PATCH sent with If-Match persists (previously dropped silently).
  • A stale If-Match is rejected with 412 instead of last-write-wins.
  • A concurrent modification between the SDK's read and its write is retried and converges.
  • ETag is emitted on mutation responses; GET still short-circuits to 304.

Unit tests

  • openmetadata-service/.../resources/filters/ETagResponseFilterTest.java6/6 passing locally.

Backend integration tests

  • openmetadata-integration-tests/.../it/tests/OptimisticLockingColumnPatchIT.java (column tag + description persist under If-Match; stale If-Match rejected).

Ingestion integration tests

  • ingestion/tests/integration/ometa/test_ometa_patch.py — concurrent-modification retry case; the existing test_patch_column_tags / test_patch_column_description now run through the new If-Match path.

Playwright (UI) tests

  • Not applicable (no UI changes).

Manual testing performed

Reproduced the bug on a live stack (MySQL + Elasticsearch 9.3): a column tag/description PATCH with If-Match returned 200 but did not persist, while the same PATCH without it did. Verified server-side If-Match enforcement (stale → 412) and the SDK's read → If-Match412 → refetch → retry algorithm live. The running container predates the fix, so the end-to-end ❌→✅ flip is covered by the new integration tests in CI.

UI screen recording / screenshots:

Not applicable.

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • My PR is linked to a GitHub issue via Fixes #<issue-number> above.
  • I have commented on my code, particularly in hard-to-understand areas.
  • I have added tests (unit / integration as applicable) and listed them above.
  • I have added a test that covers the exact scenario we are fixing.

🤖 Generated with Claude Code

The If-Match (optimistic-locking) PATCH path resolved the base EntityUpdater
instead of the entity-specific updater, so column tag/description changes were
silently dropped: the server returned 200 with the change echoed in the body
but never persisted it (top-level fields were unaffected). Delegate the
optimistic getUpdater to the overridden 4-arg getUpdater so nested
entitySpecificUpdate logic runs.

Also closes related write-path gaps and wires up the client:
- route lineage ES writes through the staged index during reindex (LineageUtil)
- add conflicts(Proceed) to the ElasticSearch updateByFqnPrefix path (OS parity)
- emit ETag on PUT/POST/PATCH responses for optimistic-concurrency edit chains
- Python SDK: send If-Match + retry on 412 in patch_column_tags/descriptions

Tests: OptimisticLockingColumnPatchIT (column tag/desc persist under If-Match,
stale If-Match rejected), ETagResponseFilterTest (mutation ETag + GET 304),
and a concurrent-modification retry case in test_ometa_patch.py.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@harshach harshach requested a review from a team as a code owner June 9, 2026 03:30
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

✅ PR checks passed

The linked issue has a description and all required Shipping project fields set. Thanks!

@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels Jun 9, 2026
Comment thread ingestion/src/metadata/ingestion/ometa/mixins/patch_mixin.py Outdated
Comment thread ingestion/src/metadata/ingestion/ometa/mixins/patch_mixin.py Outdated
- LineageUtilTest: stub SearchRepository.getWriteIndexName (delegating to the
  real IndexMapping) so the two domain-lineage tests pass now that
  doAddLineageToSearch routes lineage writes through getWriteIndexName
- TrackedREST.patch: match the new REST.patch(..., headers=None) signature
  (basedpyright incompatible-override error)
- patch_mixin: satisfy basedpyright — cast Optional[T] on returns, guard the
  Optional fullyQualifiedName in log messages, and localize the mixin-provided
  get_suffix attribute access

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Fix dead-code/contract bug in patch_column_tags/patch_column_descriptions:
  on the final retry a 412 now breaks to the warn + return None path instead
  of re-raising, so the post-loop warning is reachable and the methods keep
  their "return None on persistent contention" contract (no new exception
  surfaced to callers)
- Derive the If-Match ETag client-side from the already-fetched instance
  (mirrors the server's EntityETag.generateETag) instead of issuing an extra
  GET per column patch; removes the extra round-trip and the read-vs-instance
  TOCTOU. Verified the computed ETag matches the server's byte-for-byte
- Drop the now-unused REST.get_etag and _request(return_response=...) helpers

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread ingestion/src/metadata/ingestion/ometa/mixins/patch_mixin.py
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

🟡 Playwright Results — all passed (18 flaky)

✅ 4266 passed · ❌ 0 failed · 🟡 18 flaky · ⏭️ 88 skipped

Shard Passed Failed Flaky Skipped
✅ Shard 1 301 0 0 4
🟡 Shard 2 803 0 3 9
🟡 Shard 3 803 0 5 8
🟡 Shard 4 839 0 4 12
✅ Shard 5 721 0 0 47
🟡 Shard 6 799 0 6 8
🟡 18 flaky test(s) (passed on retry)
  • Features/DataQuality/TestCaseImportExportE2eFlow.spec.ts › EditAll User: Complete export-import-validate flow (shard 2, 1 retry)
  • Features/DataQuality/TestCaseResultPermissions.spec.ts › User with only VIEW cannot PATCH results (shard 2, 1 retry)
  • Features/Glossary/GlossaryWorkflow.spec.ts › should start term as Draft when glossary has reviewers (shard 2, 1 retry)
  • Features/IncidentManager.spec.ts › Next, Previous and page indicator (shard 3, 1 retry)
  • Features/KnowledgeCenter.spec.ts › Activity feed functionality in Knowledge Center page (shard 3, 1 retry)
  • Features/Table.spec.ts › Table pagination with sorting should works (shard 3, 1 retry)
  • Features/Tasks/TaskNavigation.spec.ts › navigating to /table/TASK-XXXXX should show 404 (invalid URL pattern) (shard 3, 1 retry)
  • Flow/NestedChildrenUpdates.spec.ts › should update nested column description immediately without page refresh (shard 3, 1 retry)
  • Pages/CustomProperties.spec.ts › Create custom property and configure search for Dashboard (shard 4, 1 retry)
  • Pages/DataContractsSemanticRules.spec.ts › Validate Description Rule Is_Set (shard 4, 1 retry)
  • Pages/Domains.spec.ts › Verify domain custom property value persistence (shard 4, 1 retry)
  • Pages/Domains.spec.ts › Domain Rbac (shard 4, 1 retry)
  • Pages/GlossaryImportExport.spec.ts › Glossary Bulk Import Export (shard 6, 1 retry)
  • Pages/InputOutputPorts.spec.ts › Lineage section collapse/expand (shard 6, 1 retry)
  • Pages/InputOutputPorts.spec.ts › Input ports section collapse/expand (shard 6, 2 retries)
  • Pages/InputOutputPorts.spec.ts › Toggle fullscreen mode (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/PlatformLineage.spec.ts › Verify domain platform view (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Per review: the client-derived If-Match couples to the server's number
formatting, so a mismatch would 412 on every attempt and silently drop the
write after exhausting retries — the persistence-loss class this PR fixes.

- On a 412 against an UNCHANGED instance (the same derived ETag was already
  rejected), treat the conditional header as unusable and fall back to a
  non-conditional write (last-write-wins) so the change is not dropped. Real
  contention (the instance changed) still retries with If-Match.
- Make the retry-exhaustion warning state the change was NOT persisted so it
  is greppable/alertable.
- Add an integration test that stubs the ETag helper to an always-rejected
  value and asserts the column change still persists via the fallback.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gitar-bot

gitar-bot Bot commented Jun 9, 2026

Copy link
Copy Markdown
Code Review ✅ Approved 3 resolved / 3 findings

Routes optimistic-locking PATCH requests through entity-specific updaters to prevent silent data loss, and hardens column-tagging concurrency with SDK-level ETag retries.

✅ 3 resolved
Bug: Final-attempt 412 raises instead of graceful None; warning is dead code

📄 ingestion/src/metadata/ingestion/ometa/mixins/patch_mixin.py:545-559 📄 ingestion/src/metadata/ingestion/ometa/mixins/patch_mixin.py:652-666
In both patch_column_tags and patch_column_descriptions the retry loop only treats a 412 as retryable while attempt < MAX_OPTIMISTIC_LOCK_RETRIES - 1. On the last attempt a 412 falls through to the bare raise. Because every iteration either returns, raises, or continues (and continue is impossible on the final iteration), the loop can never complete normally — so the post-loop logger.warning("... abandoned after %d optimistic-lock retries ...") + return None is unreachable dead code.

This produces two problems: (1) the documented/intended behavior (log a warning and return None after exhausting retries) never happens; instead the method propagates an APIError(412) to the caller. These methods previously never raised on contention — they returned the patched entity or None — so persistent concurrent modification now changes the contract and can surface as an unhandled exception in callers. (2) last_error is assigned but, given the unreachable warning, is effectively never read.

Suggested fix: on the final attempt, fall through to the warning/return None path rather than re-raising a 412, so behavior matches the comment and the loop's exit branch is actually reachable.

Performance: Column patch now issues an extra GET per call for the ETag

📄 ingestion/src/metadata/ingestion/ometa/mixins/patch_mixin.py:531-534 📄 ingestion/src/metadata/ingestion/ometa/mixins/patch_mixin.py:638-639 📄 ingestion/src/metadata/ingestion/ometa/client.py:483-491
patch_column_tags and patch_column_descriptions now call self.client.get_etag(...) (a full entity GET) before the existing _fetch_entity_if_exists GET on every attempt, adding an extra network round-trip per column patch. The deprecated patch_column_tag / patch_column_description single-column helpers delegate to these methods, so all column-tag/description patching paths incur the additional GET unconditionally (and again on each retry). For workloads that patch many columns one-by-one this roughly adds one extra GET per column.

Consider deriving the ETag from the entity already fetched by _fetch_entity_if_exists (e.g. compute it client-side, or have that fetch return the response headers) instead of an independent GET, to avoid doubling the request count on the hot path.

Edge Case: Client-derived ETag couples to server float formatting; mismatch silently drops writes

📄 ingestion/src/metadata/ingestion/ometa/mixins/patch_mixin.py:80-94 📄 ingestion/src/metadata/ingestion/ometa/mixins/patch_mixin.py:560-574 📄 ingestion/src/metadata/ingestion/ometa/mixins/patch_mixin.py:656-670
_entity_etag reconstructs the server's ETag locally as f"{model_str(version)}-{model_str(updated_at)}" hashed with SHA-256, mirroring Java's EntityETag.generateETag which uses entity.getVersion() + "-" + entity.getUpdatedAt() (Double/Long string concatenation).

For normal versions (OpenMetadata increments via Math.round(... )/10, producing one-decimal values like 0.1, 1.2), Python str(float) and Java Double.toString agree, so this works. However, the correctness of every conditional column patch now depends on Python's str() formatting of a float exactly matching Java's Double.toString for all reachable version values. If they ever diverge (e.g. a version value that round-trips differently, or a future change to the server algorithm), the If-Match header will never match the server's stored ETag, so the server returns 412 on every attempt. After MAX_OPTIMISTIC_LOCK_RETRIES the helper logs a warning and returns None (sources L591-597, L701) — the caller's column tag/description change is silently dropped with only a log line, which is exactly the persistence-loss class of bug this PR set out to fix.

Consider hardening this: either (a) detect repeated 412s that survive a fresh refetch (where the local ETag equals the just-fetched instance's derived ETag yet the server still rejects it) and fall back to a non-conditional write, or (b) prefer the server-issued ETag from the GET response (the PR adds ETagResponseFilter emitting ETags) over recomputing it client-side, so the client never has to replicate the server's number-formatting. At minimum, the retry-exhaustion warning should make clear that the patch was NOT persisted so it is greppable/alertable.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud

sonarqubecloud Bot commented Jun 9, 2026

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimistic-locking (If-Match) PATCH silently drops nested column tag/description changes

1 participant