fix(mssql): harden metadata fault-tolerance and warn on query-log truncation by Khairajani · Pull Request #28832 · open-metadata/OpenMetadata

Khairajani · 2026-06-08T18:57:31Z

What

Reliability hardening for MSSQL metadata extraction, plus a query-log truncation warning for the shared lineage/usage base.

Metadata fault-tolerance (`mssql/metadata.py`)

A single-database run no longer aborts on an optional description query. The description-map load is moved into _load_description_maps, which degrades gracefully (logs and continues) on failure. Previously the single-database branch was unguarded, so a failed MS_Description query could abort the whole workflow (the all-databases branch was already guarded).
Databases that fail to connect are recorded in the workflow status (status.failed) instead of only being logged, so they appear in the run summary rather than silently disappearing from an otherwise "successful" run.
Encrypted-stored-procedure detection failures are raised from DEBUG to WARNING. They were silent, and the procedures were then treated as non-encrypted.
The per-database encrypted-procedure cache is reset on each database (it previously accumulated across the whole run).

Query-log truncation warning (shared `lineage_source.py`, `usage_source.py`)

Warn when the query log returns resultLimit rows: at that point older queries are truncated and lineage/usage may be incomplete. This is a small, log-only addition in the shared base, so it benefits every SQL connector that uses the query-log lineage/usage path.

Tests

Unit: _load_description_maps degrades gracefully on a failing description query, and resets the encrypted-procedure cache on each database.
Existing encrypted-procedure / stored-procedure tests continue to cover the related paths.

Notes

The metadata.py changes are MSSQL-specific; the truncation warning is in the shared lineage/usage base.

github-actions · 2026-06-08T18:57:42Z

❌ PR checklist incomplete

This PR cannot be merged until the following are addressed on its linked issue:

No GitHub issue is linked. Link an issue in the Development section of the PR (or add Fixes #12345 to the description). For a same-org cross-repo issue, add Fixes open-metadata/<repo>#123 to the description.

The fields live on the linked issue in the Shipping project (open the issue → right sidebar → Projects). After you set them, re-run this check (or push a commit) — issue/project changes do not re-trigger it automatically.

Maintainers can bypass this check by adding the skip-pr-checks label.

github-actions · 2026-06-08T18:57:58Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

gitar-bot · 2026-06-08T19:00:11Z

+                if result_limit and row_count >= result_limit:
+                    logger.warning(
+                        f"Reached the configured resultLimit of {result_limit} query log entries; "
+                        f"older queries may be truncated and lineage incomplete. "
+                        f"Consider increasing resultLimit."
+                    )


💡 Edge Case: resultLimit warning can be a false positive at exactly the limit

The truncation warning triggers on row_count >= result_limit. When the query log contains exactly resultLimit rows with no actual truncation, the warning still fires, telling users data may be incomplete when it is not. This is inherent to using row count as a truncation proxy and is acceptable for a log-only message, but consider clarifying the wording (e.g. "reached the resultLimit; if more queries exist they may be truncated") to avoid alarming users when the count merely equals the limit. Identical wording is duplicated in both lineage_source.py and usage_source.py.

_{Was this helpful? React with 👍 / 👎}

…ncation Metadata (mssql/metadata.py): - Guard the single-database description load so a failed (optional) description query no longer aborts the whole run. The load and per-database cache reset are moved into _load_description_maps and degrade gracefully. - Record databases that fail to connect in the workflow status (status.failed) instead of only logging, so they show up in the run summary. - Raise encrypted-stored-procedure detection failures from DEBUG to WARNING; they were previously silent and the procedures treated as non-encrypted. - Reset the per-database encrypted-procedure cache on each database. Lineage/usage (shared lineage_source.py, usage_source.py): - Warn when the query log returns resultLimit rows, since older queries are then truncated and lineage/usage may be incomplete.

github-actions · 2026-06-09T05:33:56Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

gitar-bot · 2026-06-09T05:35:58Z

Code Review 👍 Approved with suggestions 1 resolved / 2 findings

Hardens MSSQL metadata extraction and adds query-log truncation warnings, resolving missing test coverage for status tracking. Consider adjusting the result limit warning to account for scenarios where the row count exactly matches the limit to prevent potential false positives.

💡 Edge Case: resultLimit warning can be a false positive at exactly the limit

📄 ingestion/src/metadata/ingestion/source/database/lineage_source.py:339-344 📄 ingestion/src/metadata/ingestion/source/database/usage_source.py:154-159

The truncation warning triggers on row_count >= result_limit. When the query log contains exactly resultLimit rows with no actual truncation, the warning still fires, telling users data may be incomplete when it is not. This is inherent to using row count as a truncation proxy and is acceptable for a log-only message, but consider clarifying the wording (e.g. "reached the resultLimit; if more queries exist they may be truncated") to avoid alarming users when the count merely equals the limit. Identical wording is duplicated in both lineage_source.py and usage_source.py.

✅ 1 resolved

✅ Quality: New status.failed and resultLimit-warning paths lack tests

📄 ingestion/src/metadata/ingestion/source/database/mssql/metadata.py:240-246 📄 ingestion/src/metadata/ingestion/source/database/lineage_source.py:338-344 📄 ingestion/src/metadata/ingestion/source/database/usage_source.py:153-159
The PR adds two behavior changes that are not covered by tests, while testing guidance targets ~90% coverage for changes:

get_database_names now records failed database connections via self.status.failed(...) in the all-databases branch (metadata.py:240-246). The added tests only cover _load_description_maps; there is no test asserting that a database which fails set_inspector is added to status.failed and is not yielded.

The resultLimit truncation warning in lineage_source.py:338-344 and usage_source.py:153-159 is untested. A small test asserting the warning fires when row_count >= resultLimit (and not otherwise) would lock in the behavior.

These are log/status-only paths so impact is low, but adding coverage would protect against regressions.

🤖 Prompt for agents

Code Review: Hardens MSSQL metadata extraction and adds query-log truncation warnings, resolving missing test coverage for status tracking. Consider adjusting the result limit warning to account for scenarios where the row count exactly matches the limit to prevent potential false positives.

1. 💡 Edge Case: resultLimit warning can be a false positive at exactly the limit
   Files: ingestion/src/metadata/ingestion/source/database/lineage_source.py:339-344, ingestion/src/metadata/ingestion/source/database/usage_source.py:154-159

   The truncation warning triggers on `row_count >= result_limit`. When the query log contains exactly `resultLimit` rows with no actual truncation, the warning still fires, telling users data may be incomplete when it is not. This is inherent to using row count as a truncation proxy and is acceptable for a log-only message, but consider clarifying the wording (e.g. "reached the resultLimit; if more queries exist they may be truncated") to avoid alarming users when the count merely equals the limit. Identical wording is duplicated in both lineage_source.py and usage_source.py.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

sonarqubecloud · 2026-06-09T06:45:53Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
5 New issues
0 Accepted issues

Measures
0 Security Hotspots
73.1% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2026-06-09T07:52:22Z

🟡 Playwright Results — all passed (13 flaky)

✅ 4270 passed · ❌ 0 failed · 🟡 13 flaky · ⏭️ 88 skipped

Shard	Passed	Flaky	Skipped
🟡 Shard 1	300	1	4
🟡 Shard 2	805	1	9
🟡 Shard 3	803	1	8
🟡 Shard 4	845	2	12
🟡 Shard 5	720	1	47
🟡 Shard 6	797	7	8

🟡 13 flaky test(s) (passed on retry)

Flow/SearchRBAC.spec.ts › User without permission (shard 1, 1 retry)
Features/Glossary/GlossaryHierarchy.spec.ts › should cancel move operation (shard 2, 1 retry)
Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 2 retries)
Flow/PersonaDeletionUserProfile.spec.ts › User profile loads correctly before and after persona deletion (shard 4, 1 retry)
Pages/CustomProperties.spec.ts › Markdown (shard 4, 1 retry)
Pages/ExplorePageRightPanel_KnowledgeCenter.spec.ts › Should remove user owner for knowledgeCenter (shard 5, 1 retry)
Pages/Lineage/LineageFilters.spec.ts › Verify lineage service type filter selection (shard 6, 1 retry)
Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
Pages/Lineage/PlatformLineage.spec.ts › Verify domain platform view (shard 6, 1 retry)
Pages/TasksUIFlow.spec.ts › Create and reject tag task for Dashboard via UI (shard 6, 1 retry)
Pages/TestSuite.spec.ts › Logical TestSuite (shard 6, 1 retry)
Pages/UserDetails.spec.ts › Admin user can edit teams from the user profile (shard 6, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Khairajani requested a review from a team as a code owner June 8, 2026 18:57

gitar-bot Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread ingestion/src/metadata/ingestion/source/database/mssql/metadata.py

gitar-bot Bot reviewed Jun 8, 2026

View reviewed changes

Khairajani force-pushed the mssql-p0-reliability branch from 9ec5a76 to 86ba917 Compare June 9, 2026 05:33

Khairajani added the safe to test Add this label to run secure Github workflows on PRs label Jun 9, 2026

Khairajani added this to Shipping Jun 9, 2026

Khairajani self-assigned this Jun 9, 2026

Khairajani temporarily deployed to test June 9, 2026 05:45 — with GitHub Actions Inactive

Khairajani removed this from Shipping Jun 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mssql): harden metadata fault-tolerance and warn on query-log truncation#28832

fix(mssql): harden metadata fault-tolerance and warn on query-log truncation#28832
Khairajani wants to merge 1 commit into
open-metadata:mainfrom
Khairajani:mssql-p0-reliability

Khairajani commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

Uh oh!

gitar-bot Bot Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

gitar-bot Bot commented Jun 9, 2026

Uh oh!

sonarqubecloud Bot commented Jun 9, 2026

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Khairajani commented Jun 8, 2026

What

Metadata fault-tolerance (mssql/metadata.py)

Query-log truncation warning (shared lineage_source.py, usage_source.py)

Tests

Notes

Uh oh!

github-actions Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ PR checklist incomplete

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

Uh oh!

gitar-bot Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

gitar-bot Bot commented Jun 9, 2026

Uh oh!

sonarqubecloud Bot commented Jun 9, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

github-actions Bot commented Jun 9, 2026

🟡 Playwright Results — all passed (13 flaky)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Metadata fault-tolerance (`mssql/metadata.py`)

Query-log truncation warning (shared `lineage_source.py`, `usage_source.py`)

github-actions Bot commented Jun 8, 2026 •

edited

Loading