Skip to content

fix(mssql): harden metadata fault-tolerance and warn on query-log truncation#28832

Open
Khairajani wants to merge 1 commit into
open-metadata:mainfrom
Khairajani:mssql-p0-reliability
Open

fix(mssql): harden metadata fault-tolerance and warn on query-log truncation#28832
Khairajani wants to merge 1 commit into
open-metadata:mainfrom
Khairajani:mssql-p0-reliability

Conversation

@Khairajani

Copy link
Copy Markdown
Contributor

What

Reliability hardening for MSSQL metadata extraction, plus a query-log truncation warning for the shared lineage/usage base.

Metadata fault-tolerance (mssql/metadata.py)

  • A single-database run no longer aborts on an optional description query. The description-map load is moved into _load_description_maps, which degrades gracefully (logs and continues) on failure. Previously the single-database branch was unguarded, so a failed MS_Description query could abort the whole workflow (the all-databases branch was already guarded).
  • Databases that fail to connect are recorded in the workflow status (status.failed) instead of only being logged, so they appear in the run summary rather than silently disappearing from an otherwise "successful" run.
  • Encrypted-stored-procedure detection failures are raised from DEBUG to WARNING. They were silent, and the procedures were then treated as non-encrypted.
  • The per-database encrypted-procedure cache is reset on each database (it previously accumulated across the whole run).

Query-log truncation warning (shared lineage_source.py, usage_source.py)

  • Warn when the query log returns resultLimit rows: at that point older queries are truncated and lineage/usage may be incomplete. This is a small, log-only addition in the shared base, so it benefits every SQL connector that uses the query-log lineage/usage path.

Tests

  • Unit: _load_description_maps degrades gracefully on a failing description query, and resets the encrypted-procedure cache on each database.
  • Existing encrypted-procedure / stored-procedure tests continue to cover the related paths.

Notes

  • The metadata.py changes are MSSQL-specific; the truncation warning is in the shared lineage/usage base.

@Khairajani Khairajani requested a review from a team as a code owner June 8, 2026 18:57
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

❌ PR checklist incomplete

This PR cannot be merged until the following are addressed on its linked issue:

  • No GitHub issue is linked. Link an issue in the Development section of the PR (or add Fixes #12345 to the description). For a same-org cross-repo issue, add Fixes open-metadata/<repo>#123 to the description.

The fields live on the linked issue in the Shipping project (open the issue → right sidebar → Projects). After you set them, re-run this check (or push a commit) — issue/project changes do not re-trigger it automatically.

Maintainers can bypass this check by adding the skip-pr-checks label.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment thread ingestion/src/metadata/ingestion/source/database/mssql/metadata.py
Comment on lines +339 to +344
if result_limit and row_count >= result_limit:
logger.warning(
f"Reached the configured resultLimit of {result_limit} query log entries; "
f"older queries may be truncated and lineage incomplete. "
f"Consider increasing resultLimit."
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Edge Case: resultLimit warning can be a false positive at exactly the limit

The truncation warning triggers on row_count >= result_limit. When the query log contains exactly resultLimit rows with no actual truncation, the warning still fires, telling users data may be incomplete when it is not. This is inherent to using row count as a truncation proxy and is acceptable for a log-only message, but consider clarifying the wording (e.g. "reached the resultLimit; if more queries exist they may be truncated") to avoid alarming users when the count merely equals the limit. Identical wording is duplicated in both lineage_source.py and usage_source.py.

Was this helpful? React with 👍 / 👎

…ncation

Metadata (mssql/metadata.py):
- Guard the single-database description load so a failed (optional) description
  query no longer aborts the whole run. The load and per-database cache reset
  are moved into _load_description_maps and degrade gracefully.
- Record databases that fail to connect in the workflow status (status.failed)
  instead of only logging, so they show up in the run summary.
- Raise encrypted-stored-procedure detection failures from DEBUG to WARNING;
  they were previously silent and the procedures treated as non-encrypted.
- Reset the per-database encrypted-procedure cache on each database.

Lineage/usage (shared lineage_source.py, usage_source.py):
- Warn when the query log returns resultLimit rows, since older queries are
  then truncated and lineage/usage may be incomplete.
@Khairajani Khairajani force-pushed the mssql-p0-reliability branch from 9ec5a76 to 86ba917 Compare June 9, 2026 05:33
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@Khairajani Khairajani added the safe to test Add this label to run secure Github workflows on PRs label Jun 9, 2026
@gitar-bot

gitar-bot Bot commented Jun 9, 2026

Copy link
Copy Markdown
Code Review 👍 Approved with suggestions 1 resolved / 2 findings

Hardens MSSQL metadata extraction and adds query-log truncation warnings, resolving missing test coverage for status tracking. Consider adjusting the result limit warning to account for scenarios where the row count exactly matches the limit to prevent potential false positives.

💡 Edge Case: resultLimit warning can be a false positive at exactly the limit

📄 ingestion/src/metadata/ingestion/source/database/lineage_source.py:339-344 📄 ingestion/src/metadata/ingestion/source/database/usage_source.py:154-159

The truncation warning triggers on row_count >= result_limit. When the query log contains exactly resultLimit rows with no actual truncation, the warning still fires, telling users data may be incomplete when it is not. This is inherent to using row count as a truncation proxy and is acceptable for a log-only message, but consider clarifying the wording (e.g. "reached the resultLimit; if more queries exist they may be truncated") to avoid alarming users when the count merely equals the limit. Identical wording is duplicated in both lineage_source.py and usage_source.py.

✅ 1 resolved
Quality: New status.failed and resultLimit-warning paths lack tests

📄 ingestion/src/metadata/ingestion/source/database/mssql/metadata.py:240-246 📄 ingestion/src/metadata/ingestion/source/database/lineage_source.py:338-344 📄 ingestion/src/metadata/ingestion/source/database/usage_source.py:153-159
The PR adds two behavior changes that are not covered by tests, while testing guidance targets ~90% coverage for changes:

  1. get_database_names now records failed database connections via self.status.failed(...) in the all-databases branch (metadata.py:240-246). The added tests only cover _load_description_maps; there is no test asserting that a database which fails set_inspector is added to status.failed and is not yielded.
  2. The resultLimit truncation warning in lineage_source.py:338-344 and usage_source.py:153-159 is untested. A small test asserting the warning fires when row_count >= resultLimit (and not otherwise) would lock in the behavior.

These are log/status-only paths so impact is low, but adding coverage would protect against regressions.

🤖 Prompt for agents
Code Review: Hardens MSSQL metadata extraction and adds query-log truncation warnings, resolving missing test coverage for status tracking. Consider adjusting the result limit warning to account for scenarios where the row count exactly matches the limit to prevent potential false positives.

1. 💡 Edge Case: resultLimit warning can be a false positive at exactly the limit
   Files: ingestion/src/metadata/ingestion/source/database/lineage_source.py:339-344, ingestion/src/metadata/ingestion/source/database/usage_source.py:154-159

   The truncation warning triggers on `row_count >= result_limit`. When the query log contains exactly `resultLimit` rows with no actual truncation, the warning still fires, telling users data may be incomplete when it is not. This is inherent to using row count as a truncation proxy and is acceptable for a log-only message, but consider clarifying the wording (e.g. "reached the resultLimit; if more queries exist they may be truncated") to avoid alarming users when the count merely equals the limit. Identical wording is duplicated in both lineage_source.py and usage_source.py.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud

sonarqubecloud Bot commented Jun 9, 2026

Copy link
Copy Markdown

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

🟡 Playwright Results — all passed (13 flaky)

✅ 4270 passed · ❌ 0 failed · 🟡 13 flaky · ⏭️ 88 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 300 0 1 4
🟡 Shard 2 805 0 1 9
🟡 Shard 3 803 0 1 8
🟡 Shard 4 845 0 2 12
🟡 Shard 5 720 0 1 47
🟡 Shard 6 797 0 7 8
🟡 13 flaky test(s) (passed on retry)
  • Flow/SearchRBAC.spec.ts › User without permission (shard 1, 1 retry)
  • Features/Glossary/GlossaryHierarchy.spec.ts › should cancel move operation (shard 2, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 2 retries)
  • Flow/PersonaDeletionUserProfile.spec.ts › User profile loads correctly before and after persona deletion (shard 4, 1 retry)
  • Pages/CustomProperties.spec.ts › Markdown (shard 4, 1 retry)
  • Pages/ExplorePageRightPanel_KnowledgeCenter.spec.ts › Should remove user owner for knowledgeCenter (shard 5, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage service type filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/Lineage/PlatformLineage.spec.ts › Verify domain platform view (shard 6, 1 retry)
  • Pages/TasksUIFlow.spec.ts › Create and reject tag task for Dashboard via UI (shard 6, 1 retry)
  • Pages/TestSuite.spec.ts › Logical TestSuite (shard 6, 1 retry)
  • Pages/UserDetails.spec.ts › Admin user can edit teams from the user profile (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant