fix(mssql): harden metadata fault-tolerance and warn on query-log truncation#28832
fix(mssql): harden metadata fault-tolerance and warn on query-log truncation#28832Khairajani wants to merge 1 commit into
Conversation
❌ PR checklist incompleteThis PR cannot be merged until the following are addressed on its linked issue:
The fields live on the linked issue in the Shipping project (open the issue → right sidebar → Projects). After you set them, re-run this check (or push a commit) — issue/project changes do not re-trigger it automatically. Maintainers can bypass this check by adding the |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
| if result_limit and row_count >= result_limit: | ||
| logger.warning( | ||
| f"Reached the configured resultLimit of {result_limit} query log entries; " | ||
| f"older queries may be truncated and lineage incomplete. " | ||
| f"Consider increasing resultLimit." | ||
| ) |
There was a problem hiding this comment.
💡 Edge Case: resultLimit warning can be a false positive at exactly the limit
The truncation warning triggers on row_count >= result_limit. When the query log contains exactly resultLimit rows with no actual truncation, the warning still fires, telling users data may be incomplete when it is not. This is inherent to using row count as a truncation proxy and is acceptable for a log-only message, but consider clarifying the wording (e.g. "reached the resultLimit; if more queries exist they may be truncated") to avoid alarming users when the count merely equals the limit. Identical wording is duplicated in both lineage_source.py and usage_source.py.
Was this helpful? React with 👍 / 👎
…ncation Metadata (mssql/metadata.py): - Guard the single-database description load so a failed (optional) description query no longer aborts the whole run. The load and per-database cache reset are moved into _load_description_maps and degrade gracefully. - Record databases that fail to connect in the workflow status (status.failed) instead of only logging, so they show up in the run summary. - Raise encrypted-stored-procedure detection failures from DEBUG to WARNING; they were previously silent and the procedures treated as non-encrypted. - Reset the per-database encrypted-procedure cache on each database. Lineage/usage (shared lineage_source.py, usage_source.py): - Warn when the query log returns resultLimit rows, since older queries are then truncated and lineage/usage may be incomplete.
9ec5a76 to
86ba917
Compare
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
Code Review 👍 Approved with suggestions 1 resolved / 2 findingsHardens MSSQL metadata extraction and adds query-log truncation warnings, resolving missing test coverage for status tracking. Consider adjusting the result limit warning to account for scenarios where the row count exactly matches the limit to prevent potential false positives. 💡 Edge Case: resultLimit warning can be a false positive at exactly the limit📄 ingestion/src/metadata/ingestion/source/database/lineage_source.py:339-344 📄 ingestion/src/metadata/ingestion/source/database/usage_source.py:154-159 The truncation warning triggers on ✅ 1 resolved✅ Quality: New status.failed and resultLimit-warning paths lack tests
🤖 Prompt for agentsOptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
🟡 Playwright Results — all passed (13 flaky)✅ 4270 passed · ❌ 0 failed · 🟡 13 flaky · ⏭️ 88 skipped
🟡 13 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |



What
Reliability hardening for MSSQL metadata extraction, plus a query-log truncation warning for the shared lineage/usage base.
Metadata fault-tolerance (
mssql/metadata.py)_load_description_maps, which degrades gracefully (logs and continues) on failure. Previously the single-database branch was unguarded, so a failedMS_Descriptionquery could abort the whole workflow (the all-databases branch was already guarded).status.failed) instead of only being logged, so they appear in the run summary rather than silently disappearing from an otherwise "successful" run.Query-log truncation warning (shared
lineage_source.py,usage_source.py)resultLimitrows: at that point older queries are truncated and lineage/usage may be incomplete. This is a small, log-only addition in the shared base, so it benefits every SQL connector that uses the query-log lineage/usage path.Tests
_load_description_mapsdegrades gracefully on a failing description query, and resets the encrypted-procedure cache on each database.Notes
metadata.pychanges are MSSQL-specific; the truncation warning is in the shared lineage/usage base.