Skip to content

Fall back to single-DB mode when Azure master is inaccessible (#857)#858

Merged
erikdarlingdata merged 1 commit intodevfrom
fix/857-azure-master-fallback
Apr 17, 2026
Merged

Fall back to single-DB mode when Azure master is inaccessible (#857)#858
erikdarlingdata merged 1 commit intodevfrom
fix/857-azure-master-fallback

Conversation

@erikdarlingdata
Copy link
Copy Markdown
Owner

Summary

  • Fixes [FEATURE] Review/lower required permission for the Azure server to user database level(no master) #857 — Lite failed and showed an empty screen when connecting to an Azure SQL DB login that lacks master access (common with Microsoft Dynamics 365 FO and other single-tenant logins).
  • GetAzureDatabaseListAsync now catches known access-denied/login-failed SQL errors from the master connection, caches the per-server decision, and returns the connection's InitialCatalog as a single-element list.
  • The three callers (query_stats, database_size_stats, file_io_stats) already iterate per-database — single-DB mode works without further changes.
  • Users with master access retain today's full-server enumeration behavior.

Behavior matrix

InitialCatalog Master accessible Result
empty yes enumerate all DBs (unchanged)
empty no nothing collected, warning logged
set to a DB yes enumerate all DBs (unchanged)
set to a DB no collect from that one DB (new, fixes #857)

Test plan

  • Connect Lite to an Azure SQL DB login with master access — enumeration still works, all DBs appear.
  • Connect Lite to an Azure SQL DB login without master access and a specified target DB — no error popup, data collected for target DB only, info log line appears.
  • Connect Lite to an Azure SQL DB login without master access and no target DB — warning log line appears, no crash.
  • Confirm the fallback decision is cached: subsequent cycles don't retry master (check logs don't show repeated fallback messages).
  • Boxed SQL Server and Azure MI unaffected (they don't call GetAzureDatabaseListAsync).

🤖 Generated with Claude Code

…857)

On Azure SQL DB, some logins (e.g. Microsoft Dynamics 365 FO) are granted
access only to a specific user database and not to master. The three
collectors that enumerate databases via master — query_stats,
database_size_stats, file_io_stats — would fail the first time and
produce an empty screen.

GetAzureDatabaseListAsync now catches known access-denied/login-failed
errors from the master connection, caches the per-server decision, and
returns the connection's InitialCatalog as a single-element list. The
three callers already loop per-database, so single-DB mode works without
further changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@erikdarlingdata erikdarlingdata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: #858 — Azure master fallback

What this does

On Azure SQL DB, GetAzureDatabaseListAsync now catches specific SQL errors indicating master is inaccessible, caches that decision per server under a lock, and returns the connection's InitialCatalog as a single-element list. The three per-database callers (query_stats, database_size_stats, file_io_stats) already iterate per DB, so they work unchanged. Server-level collectors (which don't call this method) are unaffected.

What's good

  • Scope is right: Lite-only change, single file, narrowly targeted at the issue.
  • Base branch is dev. No schema, install, upgrade, or workflow changes.
  • No PlanAnalyzer or Dashboard/Lite-duplicated code touched — sync rule N/A.
  • Exception filter is specific (when (IsMasterAccessDeniedError(ex))), so unrelated SqlExceptions still bubble up rather than being swallowed.
  • using disposal for SqlConnection/SqlCommand/SqlDataReader is correct.
  • Cache is behind a lock; no async void; no cross-thread UI work.
  • Log message distinguishes the info-level "collecting from X only" case from the warn-level "nothing to collect" case.

What needs attention

  1. Error list is too broad (IsMasterAccessDeniedError, RemoteCollectorService.cs#L649-L656): 18456 and 40615 are generic auth/firewall failures, not "master-only" lockouts. When the connection is outright broken, the fallback cascades into failed per-DB opens but logs a misleading "master inaccessible" message first. Consider narrowing to 229/230/916/4060/40613. Inline comment has detail.
  2. Cache never clears (RemoteCollectorService.cs#L105-L108): Regranting master requires a service restart to pick up. Intentional per the PR description; worth a one-line XML-doc note so it isn't a surprise later.
  3. Dictionary<int, bool> reads as tri-state but isn't — value is always true when present. HashSet<int> would be clearer.
  4. Double-log on concurrent first failure — two racing cycles will each emit the Info/Warn line once. Cosmetic, but the fix is a single check-and-set under the lock.
  5. No tests added. Lite.Tests has no RemoteCollectorService coverage at all, so this is consistent with existing patterns rather than a regression, but the fallback logic (error-code dispatch, SingleDbOrEmpty, cache behavior) is pure and testable without a live Azure endpoint. Worth adding at least unit coverage for IsMasterAccessDeniedError and SingleDbOrEmpty if you want a safety net for future tuning of the error list.

Test plan check

The manual test plan in the PR description covers the right matrix (master accessible / not × target DB set / not) and explicitly checks the cache, which is the non-obvious part. No automated coverage to run.

Nothing blocking from my side — maintainer's call.


Generated by Claude Code

/// <summary>
/// Per-server flag indicating that master DB enumeration has failed with an access-denied
/// error and should not be retried. Used on Azure SQL DB where per-database logins may not
/// have master access (e.g. Microsoft Dynamics 365 FO). See issue #857.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the value is always true when the key is present — a HashSet<int> would express intent more clearly and avoid the misleading TryGetValue(..., out knownInaccessible) pattern where the false branch actually means "not yet determined" rather than "accessible". Not blocking.


Generated by Claude Code

Comment on lines +649 to +656
return ex.Number switch
{
229 => true, // Permission denied on object
230 => true, // Permission denied on column
916 => true, // Server principal is not able to access the database under the current security context
4060 => true, // Cannot open database requested by the login
18456 => true, // Login failed for user
40613 => true, // Database 'master' on server is not currently available
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two of these aren't really "master inaccessible" signals:

  • 18456 (Login failed for user) — the login is wrong/disabled/locked. The per-DB connection opened later by OpenAzureDatabaseConnectionAsync will fail for the exact same reason.
  • 40615 (Cannot open server — login denied, firewall/auth) — usually a firewall IP rule or tenant-auth issue. Again, per-DB connections will also fail.

Treating these as "fallback to single DB" masks a broken connection as "master inaccessible (SQL error 18456) — collecting from '…' only", and then every per-DB collector fails on the next line. Consider letting 18456/40615 bubble up as real connection errors, and reserving the fallback for errors that genuinely indicate master-only lockout (229/230/916/4060/40613).

Also worth noting: the error list only ever grows the inaccessibility cache; if a login is later granted master access, this process won't pick it up until restart. Intentional per the PR description, but worth a one-line doc note on _azureMasterInaccessible.


Generated by Claude Code

Comment on lines +636 to +640
private static List<string> SingleDbOrEmpty(string? targetDb)
{
if (string.IsNullOrEmpty(targetDb) || string.Equals(targetDb, "master", StringComparison.OrdinalIgnoreCase))
return new List<string>();
return new List<string> { targetDb };
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: comparing targetDb against "master" case-insensitively is good, but tempdb/msdb/model would also be meaningless fallback targets on Azure SQL DB (not that they'd normally appear in InitialCatalog, but defensive). More practically: what if targetDb is quoted/bracketed like [mydb]? SqlConnectionStringBuilder.InitialCatalog strips the brackets so this should be fine — just noting the assumption.


Generated by Claude Code

Comment on lines +616 to +621
catch (SqlException ex) when (IsMasterAccessDeniedError(ex))
{
lock (_azureMasterInaccessibleLock)
{
_azureMasterInaccessible[serverId] = true;
}
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two concurrent cycles racing here both open master, both fail, both grab the lock in turn, both set [serverId] = true, and both emit the AppLogger.Info/Warn line — so the first cycle after permission loss can double-log. Not a correctness problem, just cosmetic noise in the log on the transition. If you want to suppress the second log, check-and-set under a single lock and only log when the flag flips from unset to set.


Generated by Claude Code

@erikdarlingdata erikdarlingdata merged commit d5f586c into dev Apr 17, 2026
3 checks passed
@erikdarlingdata erikdarlingdata deleted the fix/857-azure-master-fallback branch April 17, 2026 13:14
erikdarlingdata added a commit that referenced this pull request Apr 18, 2026
On Azure SQL Database, logins without access to master can't resolve
cross-database rows returned by sys.dm_exec_requests, which caused the
Live Snapshot button and the query snapshots collector to error in
D365FO-style environments (reported by @TrudAX in #857 after PR #858).

BuildQuerySnapshotsQuery now takes an isAzureSqlDatabase flag and emits
AND der.database_id = DB_ID() only when true. Boxed SQL Server, MI, and
elastic pool behavior is unchanged. The Live Snapshot button path gets
the flag through a new ServerTab constructor parameter wired from the
cached ServerConnectionStatus.SqlEngineEdition.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant