auth: keep keyring backend on probe timeout during login by simonfaltum · Pull Request #5210 · databricks/cli

simonfaltum · 2026-05-07T16:39:27Z

Why

When databricks auth login runs with secure storage and the OS keyring is reachable but locked, the probe triggers the GUI unlock prompt and immediately races a 3 second timeout. If the user takes longer than 3 seconds to type their password, the probe returns a TimeoutError and the entire login commits to plaintext, even though the user goes on to successfully unlock the keyring while OAuth is running in the browser.

End result: user types keyring password, finishes OAuth, token still lands in plaintext. Confusing and easy to miss.

Changes

Before: any probe error (timeout or otherwise) caused secure-default logins to silently fall back to plaintext, and explicit-secure logins to error.

Now: probe errors split into two cases.

*TimeoutError: keyring is reachable but locked. Login stays on keyring regardless of whether secure was explicit. The unlock prompt continues in parallel with OAuth, and by the time the SDK calls Store at the end the keyring has been unlocked.
Any other error: keyring is genuinely unavailable. Existing behavior unchanged (silent plaintext fallback when not explicit, error when explicit).

The probe still triggers the unlock prompt up front, so the prompt is visible while the user is also doing OAuth. That overlap is the whole point: it gives the user time to unlock without the CLI giving up.

Trade-off

If the user does not unlock the keyring within the probe's 3 second window, login proceeds with the keyring backend on the assumption that the user will type their password during the OAuth ceremony. If instead they cancel the dialog, ignore it, or otherwise never enter the password, the SDK's final Store at the end of login triggers a second unlock attempt and times out after 3 seconds, failing the login.

In other words: cooperative users who actually want to use the keyring see one prompt and succeed. Users who chose not to use the keyring after seeing it see a second prompt and a 3 second wait at the end. We accept this compromise on the assumption that users who hit the dialog at all are lawful-good users working with the CLI, not against it.

Test plan

New unit tests in cache_test.go cover the TimeoutError path: bare and wrapped, default and explicit secure (4 sub cases)
Existing probe-fail tests still pass (plain errors still fall back / error as before)
task checks clean
task lint-q clean
Manual on Ubuntu: lock the default secret service collection, run databricks auth login with secure storage, deliberately wait >10s before entering the keyring password, confirm the token lands in keyring (not in the plaintext file cache)
Manual on Ubuntu: same setup but cancel the dialog after the probe times out, confirm login fails after a second 3 second wait at the end of OAuth (the documented trade-off)

A locked-but-reachable OS keyring surfaces as a probe TimeoutError. Today that's treated the same as a genuinely unavailable keyring: login commits to plaintext, even though the user is mid-typing in the unlock dialog and the keyring is about to be unlocked. Distinguish *TimeoutError from other probe errors. On timeout, stay on keyring regardless of whether secure was explicit; the unlock continues in parallel with the OAuth flow and the final Store completes against an unlocked keyring. Other probe errors keep current behavior.

The wrapper produces TimeoutError for any operation that misses the deadline, not only locked-but-reachable keyrings. The login path's optimistic stay-on-keyring is still right (cost of guessing wrong is one wasted OAuth ceremony, not a silently-plaintext token), but the docstring and inline comment overclaimed by labeling timeout as 'reachable but locked'.

The doc and inline comments narrated the trade-off in too much detail. Comments should explain WHY when non-obvious, not walk through every branch. The decision and its trade-off are clear in the code; the comment just needs to flag that the timeout case is intentional and not a silent downgrade.

github-actions · 2026-05-08T08:01:14Z

Approval status: pending

`/cmd/auth/` - needs approval

Files: cmd/auth/login.go
Suggested: @tanmay-db
Also eligible: @tejaskochar-db, @hectorcast-db, @mihaimitrea-db, @renaudhartert-db, @parthban-db, @Divyansh-db, @chrisst, @rauchy

`/libs/auth/` - needs approval

Files: libs/auth/storage/cache.go, libs/auth/storage/cache_test.go, libs/auth/storage/keyring.go
Suggested: @tanmay-db
Also eligible: @tejaskochar-db, @hectorcast-db, @mihaimitrea-db, @renaudhartert-db, @parthban-db, @Divyansh-db, @chrisst, @rauchy

General files (require maintainer)

Files: NEXT_CHANGELOG.md
Based on git history:

@pietern -- recent work in cmd/auth/, ./

_{Any maintainer (@andrewnester, @anton-107, @denik, @pietern, @shreyas-goenka, @renaudhartert-db) can approve all areas.

See OWNERS for ownership rules.}

pietern · 2026-05-08T12:41:37Z

+//     the final Store rather than silently downgrading to plaintext.
 //
-// Both rules are dormant today: the resolver default is plaintext, so
+// Rules 1 and 2 are dormant today: the resolver default is plaintext, so


Meta point, why is this called cache.go and ResolveCache if it is a storage backend?

Or is this a cache layer on top of it?

Yeah, this is a remnant from earlier iterations. It started as cmd/auth/cache.go with a newAuthCache() helper that literally just returned a cache.TokenCache (the SDK type at databricks-sdk-go/credentials/u2m/cache). Back then the name matched what it did.

To answer the second part directly: no, it's not a cache layer on top of a storage backend. The SDK's cache.TokenCache interface is the storage backend abstraction, that naming is just bleeding through. Since then this file has grown probe logic, login fallback, and the dual-write wrapper, so it's really a storage backend resolver now.

If we want to do a rename for clarity, I would suggest a follow-up PR (cache.go -> resolve.go, ResolveCache -> ResolveStorage). ~5 call sites, small.

pietern · 2026-05-08T12:43:00Z

+// Rules 1 and 2 are dormant today: the resolver default is plaintext, so
 // (mode=Secure, explicit=false) is unreachable. They activate when the
 // default flips to secure (MS4 / cli-ga). Wiring lands now so MS4 is a
 // single-line default flip plus pin-on-success additions.


I don't understand the above.

Tried to clarify it a bit. Reads clearer now?

pietern · 2026-05-08T12:47:27Z

+			if errors.As(probeErr, &timeoutErr) {
+				log.Debugf(ctx, "keyring probe timed out (%v); staying on keyring", probeErr)
+				return f.newKeyring(), mode, nil
+			}


Under which conditions does a timeout happen?

One example is if the keyring is locked, we can't know if the user is typing the password or not. If we wait for the keyring to become available, we could be blocking forever with potentially no way for the user to take action. So then the timeout would trigger

To add some precision: the common case is a locked OS keyring on Linux. The Secret Service blocks the Set call while waiting for the user to type their password into a GUI unlock dialog. The keyring API gives us no progress signal and no cancel, so we wrap the call in a 3s timeout (matches GitHub CLI's pattern, see defaultKeyringTimeout in keyring.go). If the user takes longer than 3s to enter the password (common), the wrapper returns a *TimeoutError. The dialog itself stays up, which is what this PR exploits: keep the keyring backend, let OAuth proceed in parallel, and the user finishes unlocking before the SDK's final Store call.

Drops internal milestone jargon (MS4 / cli-ga) from doc comments and aligns wording with the package doc in mode.go (which already says "at GA"). No behavior change. Co-authored-by: Isaac

simonfaltum temporarily deployed to test-trigger-is May 7, 2026 16:40 — with GitHub Actions Inactive

simonfaltum temporarily deployed to test-trigger-is May 7, 2026 16:56 — with GitHub Actions Inactive

simonfaltum temporarily deployed to test-trigger-is May 7, 2026 17:00 — with GitHub Actions Inactive

Merge branch 'main' into simonfaltum/keyring-probe-non-fatal

cebf6fc

simonfaltum temporarily deployed to test-trigger-is May 8, 2026 07:35 — with GitHub Actions Inactive

simonfaltum marked this pull request as ready for review May 8, 2026 08:00

simonfaltum requested a review from pietern May 8, 2026 08:34

pietern reviewed May 8, 2026

View reviewed changes

simonfaltum and others added 2 commits May 8, 2026 15:23

auth: clarify dormant rule comments in storage resolver

89bc483

Drops internal milestone jargon (MS4 / cli-ga) from doc comments and aligns wording with the package doc in mode.go (which already says "at GA"). No behavior change. Co-authored-by: Isaac

Merge branch 'main' into simonfaltum/keyring-probe-non-fatal

97288bf

simonfaltum temporarily deployed to test-trigger-is May 8, 2026 13:55 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

auth: keep keyring backend on probe timeout during login#5210

auth: keep keyring backend on probe timeout during login#5210
simonfaltum wants to merge 6 commits intomainfrom
simonfaltum/keyring-probe-non-fatal

simonfaltum commented May 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 8, 2026 •

edited

Loading

Uh oh!

pietern May 8, 2026

Uh oh!

simonfaltum May 8, 2026

Uh oh!

pietern May 8, 2026

Uh oh!

simonfaltum May 8, 2026

Uh oh!

pietern May 8, 2026

Uh oh!

simonfaltum May 8, 2026

Uh oh!

simonfaltum May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

simonfaltum commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Changes

Trade-off

Test plan

Uh oh!

github-actions Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approval status: pending

/cmd/auth/ - needs approval

/libs/auth/ - needs approval

General files (require maintainer)

Uh oh!

pietern May 8, 2026

Choose a reason for hiding this comment

Uh oh!

simonfaltum May 8, 2026

Choose a reason for hiding this comment

Uh oh!

pietern May 8, 2026

Choose a reason for hiding this comment

Uh oh!

simonfaltum May 8, 2026

Choose a reason for hiding this comment

Uh oh!

pietern May 8, 2026

Choose a reason for hiding this comment

Uh oh!

simonfaltum May 8, 2026

Choose a reason for hiding this comment

Uh oh!

simonfaltum May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

simonfaltum commented May 7, 2026 •

edited

Loading

github-actions Bot commented May 8, 2026 •

edited

Loading

`/cmd/auth/` - needs approval

`/libs/auth/` - needs approval