Skip to content

fix(cli): cp-style sandbox download and workspace-boundary check#1353

Open
laitingsheng wants to merge 3 commits into
mainfrom
fix/sandbox-download-traversal-and-file-shape
Open

fix(cli): cp-style sandbox download and workspace-boundary check#1353
laitingsheng wants to merge 3 commits into
mainfrom
fix/sandbox-download-traversal-and-file-shape

Conversation

@laitingsheng
Copy link
Copy Markdown
Contributor

Summary

openshell sandbox download produced a directory at the host destination instead of writing the source bytes as a regular file, and did not refuse sandbox-side paths that resolved outside /sandbox. This brings the download path to parity with the upload-side cp-style fix shipped in #694 (#667) and #952 (#885), and adds the explicit workspace-boundary check that the upload side never needed.

Related Issue

Fixes NVIDIA/NemoClaw#3345

Changes

  • sandbox_sync_down now validates the sandbox-side source path with a lexical workspace-boundary check before issuing any SSH command. Paths that resolve outside /sandbox (e.g. /etc/passwd, /sandbox/../etc/passwd, /sandboxed/secrets) are refused with a clear error.
  • Pre-probes whether the source is a file or directory over SSH, then branches.
  • Single-file downloads follow cp-style destination semantics:
    • Trailing slash on dest (or dest already exists as a directory) → file lands at <dest>/<source-basename>.
    • Otherwise dest is taken as the exact file path.
    • Extraction is staged in a sibling directory and the staged entry is atomically renamed into place.
    • Refuses to overwrite an existing directory with the downloaded file.
  • Directory downloads preserve prior flat-extract behaviour into dest (with an early refusal when dest exists as a non-directory).
  • The SSH tar stream is now factored through a shared stream_sandbox_tar helper, mirroring the upload-side ssh_tar_upload refactor from fix(cli): sandbox upload overwrites files instead of creating directories #694.
  • Caller signature: sandbox_sync_down takes dest: &str so trailing slashes survive (previously the value was converted to &Path at the dispatch site, which dropped them).

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

`openshell sandbox download` produced a directory at the host destination
instead of writing the source bytes as a regular file, and did not refuse
sandbox-side paths that resolved outside `/sandbox` (only `creds.json` was
rejected, by filesystem policy rather than by an explicit check).

Mirrors the cp-style design from the upload-side fixes (#667 / #694 and
#885 / #952): destination resolution now respects trailing slashes and
existing-directory destinations, single-file sources land as regular files
via a staged extraction and atomic rename, and the function delegates the
SSH tar stream to a shared helper. Adds an explicit lexical
workspace-boundary check on the sandbox-side source — required on the
download side because filesystem policy alone is not a substitute for an
explicit check when crossing the trust boundary in the leak direction.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 13, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

…ember

The lexical check in `validate_sandbox_source_path` clears any path whose
components stay under `/sandbox` after `.` and `..` are folded out, but it
cannot see symlinks. A workspace-relative symlink such as
`/sandbox/etc-link -> /etc` lets `openshell sandbox download <name>
/sandbox/etc-link/passwd /tmp/passwd` slip through, then `tar -C
/sandbox/etc-link passwd` follows the link on the sandbox side and
streams `/etc/passwd` back to the host.

Add `canonicalize_sandbox_source_path` which runs `realpath -e --` over
SSH and rejects the result unless it still equals `/sandbox` or starts
with `/sandbox/`. The lexical guard stays as a cheap fail-fast for
obviously-bad inputs, and the rest of the download flow now uses the
canonical path everywhere a tar invocation or probe might dereference a
symlink.

While editing the single-file branch, also add the `--` separator before
the basename in `tar cf - -C <parent> -- <name>`. Without it, a sandbox
file whose basename starts with `-` (e.g. `--checkpoint-action=...`) is
parsed by GNU tar as an option rather than a member, which at best
fails the wrap and at worst triggers tar option behaviour. The directory
branch passes literal `.` and is unaffected.

Cover both fixes with pure-function tests: `is_under_sandbox_workspace`
gets accept/reject pairs (the same predicate the post-realpath
validator uses) and the new `build_single_file_tar_cmd` helper asserts
the `--` separator and shell escaping survive any future refactor.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
`tempfile` was promoted from `[dev-dependencies]` to `[dependencies]`
once the cp-style single-file download path started using
`TempDir::new_in` from production code. The original `[dev-dependencies]`
line stayed behind by accident — entries under `[dependencies]` are
already available to test compilation, so the second declaration is
dead weight.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@laitingsheng laitingsheng marked this pull request as ready for review May 13, 2026 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Ubuntu 24.04][Sandbox] openshell sandbox download produces a directory instead of file content + does not reject path-traversal targets

1 participant