Skip to content

Fix race condition between SSH client and ProxyCommand process#5001

Open
anton-107 wants to merge 1 commit intomainfrom
fix-ssh-proxy-key-race
Open

Fix race condition between SSH client and ProxyCommand process#5001
anton-107 wants to merge 1 commit intomainfrom
fix-ssh-proxy-key-race

Conversation

@anton-107
Copy link
Copy Markdown
Contributor

@anton-107 anton-107 commented Apr 16, 2026

Summary

Additional take on #4988

  • Skip key operations in proxy mode: The ProxyCommand (proxy mode) process was unconditionally running key generation and SaveSSHKeyPair, which wiped the SSH keys directory while the parent SSH client was concurrently reading the identity file — causing "Permission denied (publickey)" errors. Proxy mode only proxies stdin/stdout over a WebSocket and never needs SSH keys.
  • Fix SaveSSHKeyPair to not wipe sibling sessions: os.RemoveAll(filepath.Dir(keyPath)) removed the entire ~/.databricks/ssh-tunnel-keys/ directory instead of just the current session's files, breaking concurrent connections to different clusters.
  • Add unit tests for SaveSSHKeyPair: Including a regression test that verifies saving one session's keys does not delete another session's keys.

Test plan

  • Existing unit tests pass (go test ./experimental/ssh/internal/client/ and go test ./experimental/ssh/internal/keys/)
  • New TestSaveSSHKeyPairDoesNotDeleteOtherSessions validates the concurrent session fix
  • Linter passes (make lint)
  • Manual test: databricks ssh connect --cluster=<id> should no longer intermittently fail with "Permission denied (publickey)"

This pull request was AI-assisted by Isaac.

The proxy mode process was unconditionally running key generation and
SaveSSHKeyPair, which wiped the SSH keys directory while the parent SSH
client was trying to read the identity file — causing "Permission denied
(publickey)" errors.

Two fixes:
1. Skip key operations (secret scope creation, key generation,
   SaveSSHKeyPair) in proxy mode — it only proxies stdin/stdout over a
   WebSocket and never needs SSH keys.
2. Fix SaveSSHKeyPair to remove only the specific session's key files
   instead of os.RemoveAll on the entire parent directory, which also
   broke concurrent sessions to different clusters.

Co-authored-by: Isaac
@github-actions
Copy link
Copy Markdown

Waiting for approval

Based on git history, these people are best suited to review:

  • @ilia-db -- recent work in experimental/ssh/internal/client/, experimental/ssh/internal/keys/

Eligible reviewers: @andrewnester, @denik, @pietern, @renaudhartert-db, @shreyas-goenka, @simonfaltum

Suggestions based on git history. See OWNERS for ownership rules.

@anton-107 anton-107 requested a review from ilia-db April 16, 2026 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant