-
Notifications
You must be signed in to change notification settings - Fork 167
auth: keep keyring backend on probe timeout during login #5210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
6cc96a5
2ec22bf
1330327
cebf6fc
89bc483
97288bf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,6 +2,7 @@ package storage | |
|
|
||
| import ( | ||
| "context" | ||
| "errors" | ||
| "fmt" | ||
|
|
||
| "github.com/databricks/cli/libs/databrickscfg" | ||
|
|
@@ -54,11 +55,13 @@ func ResolveCache(ctx context.Context, override StorageMode) (cache.TokenCache, | |
| // 2. When the user explicitly asked for secure (override, env var, or | ||
| // config) but the keyring is unreachable, return an error. An explicit | ||
| // "I want secure" is honored strictly: never silently downgrade. | ||
| // 3. When the probe times out, stay on keyring regardless of explicit. | ||
| // The timeout is ambiguous (locked vs hung); a misdiagnosis fails | ||
| // the final Store rather than silently downgrading to plaintext. | ||
| // | ||
| // Both rules are dormant today: the resolver default is plaintext, so | ||
| // Rules 1 and 2 are dormant today: the resolver default is plaintext, so | ||
| // (mode=Secure, explicit=false) is unreachable. They activate when the | ||
| // default flips to secure (MS4 / cli-ga). Wiring lands now so MS4 is a | ||
| // single-line default flip plus pin-on-success additions. | ||
| // default flips to secure at GA. | ||
| // | ||
| // Login-specific. Read paths (auth token, bundle commands) keep the original | ||
| // keyring error so they don't silently mint plaintext copies of tokens that | ||
|
|
@@ -120,7 +123,7 @@ func resolveCacheForLoginWith(ctx context.Context, override StorageMode, f cache | |
| // | ||
| // Pin-on-success across modes (locking in the first working behavior to | ||
| // insulate users from keyring flakiness) is intentionally not implemented | ||
| // here. It lands with MS4 alongside the default flip; pinning before the | ||
| // here. It lands at GA alongside the default flip; pinning before the | ||
| // flip would freeze every default user into plaintext and make the flip a | ||
| // no-op for them. | ||
| func applyLoginFallback(ctx context.Context, mode StorageMode, explicit bool, f cacheFactories) (cache.TokenCache, StorageMode, error) { | ||
|
|
@@ -133,6 +136,15 @@ func applyLoginFallback(ctx context.Context, mode StorageMode, explicit bool, f | |
| return c, mode, nil | ||
| case StorageModeSecure: | ||
| if probeErr := f.probeKeyring(); probeErr != nil { | ||
| // Stay on keyring on timeout: a locked keyring being unlocked | ||
| // during OAuth is the common case, and a misdiagnosed hang | ||
| // fails the final Store anyway, which is better than a | ||
| // silent plaintext downgrade. | ||
| var timeoutErr *TimeoutError | ||
| if errors.As(probeErr, &timeoutErr) { | ||
| log.Debugf(ctx, "keyring probe timed out (%v); staying on keyring", probeErr) | ||
| return f.newKeyring(), mode, nil | ||
| } | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Under which conditions does a timeout happen?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One example is if the keyring is locked, we can't know if the user is typing the password or not. If we wait for the keyring to become available, we could be blocking forever with potentially no way for the user to take action. So then the timeout would trigger
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To add some precision: the common case is a locked OS keyring on Linux. The Secret Service blocks the Set call while waiting for the user to type their password into a GUI unlock dialog. The keyring API gives us no progress signal and no cancel, so we wrap the call in a 3s timeout (matches GitHub CLI's pattern, see |
||
| if explicit { | ||
| return nil, "", fmt.Errorf("secure storage was requested but the OS keyring is not reachable: %w", probeErr) | ||
| } | ||
|
|
@@ -159,7 +171,7 @@ func applyLoginFallback(ctx context.Context, mode StorageMode, explicit bool, f | |
| // Only called on the (mode=Secure, explicit=false) probe-failure branch. That | ||
| // branch is unreachable today (resolver default is plaintext), so this is | ||
| // dormant infrastructure: it activates when the default flips to secure | ||
| // (MS4) and lets default-on-broken-keyring users avoid a 3s probe on every | ||
| // at GA and lets default-on-broken-keyring users avoid a 3s probe on every | ||
| // command. | ||
| func persistPlaintextFallback(ctx context.Context) error { | ||
| configPath := env.Get(ctx, "DATABRICKS_CONFIG_FILE") | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meta point, why is this called
cache.goandResolveCacheif it is a storage backend?Or is this a cache layer on top of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is a remnant from earlier iterations. It started as
cmd/auth/cache.gowith anewAuthCache()helper that literally just returned acache.TokenCache(the SDK type atdatabricks-sdk-go/credentials/u2m/cache). Back then the name matched what it did.To answer the second part directly: no, it's not a cache layer on top of a storage backend. The SDK's
cache.TokenCacheinterface is the storage backend abstraction, that naming is just bleeding through. Since then this file has grown probe logic, login fallback, and the dual-write wrapper, so it's really a storage backend resolver now.If we want to do a rename for clarity, I would suggest a follow-up PR (
cache.go->resolve.go,ResolveCache->ResolveStorage). ~5 call sites, small.