Skip to content

fix(security): sanitize CheckpointStore seed_id to prevent path traversal#413

Open
shaun0927 wants to merge 3 commits intoQ00:mainfrom
shaun0927:fix/checkpoint-path-traversal
Open

fix(security): sanitize CheckpointStore seed_id to prevent path traversal#413
shaun0927 wants to merge 3 commits intoQ00:mainfrom
shaun0927:fix/checkpoint-path-traversal

Conversation

@shaun0927
Copy link
Copy Markdown
Contributor

@shaun0927 shaun0927 commented Apr 15, 2026

Summary

  • Fixes security(persistence): CheckpointStore seed_id path traversal allows arbitrary file write/delete #397CheckpointStore._get_checkpoint_path() directly interpolated seed_id into filenames without sanitization, allowing crafted IDs like x/../../PWNED to read, write, or delete files outside the checkpoint directory.
  • Adds _sanitize_seed_id() static method that strips null bytes, removes .. traversal sequences, replaces / and \ with underscores, limits length to 255 chars, and raises ValueError for empty results.
  • Adds _validate_path_containment() that verifies the resolved checkpoint path stays within self._base_path.
  • Both guards are applied in _get_checkpoint_path(), which is used by save(), load(), and all rollback operations.

Test plan

  • Path traversal with forward slashes is sanitized (file stays in base dir)
  • Path traversal with backslashes is sanitized
  • No file is created outside the checkpoint directory
  • Normal seed_ids continue to work (save + load roundtrip)
  • Empty seed_id raises ValueError
  • Null-byte-only seed_id raises ValueError
  • ..-only seed_id raises ValueError
  • Null bytes are stripped but remaining content preserved
  • Seed IDs longer than 255 chars are truncated
  • load() with traversal seed_id resolves consistently with save()
  • All 34 existing + new tests pass
  • ruff check passes clean

…aversal (Q00#397)

Add _sanitize_seed_id() that strips null bytes, removes '..' sequences,
replaces path separators with underscores, and limits length to 255 chars.
Add _validate_path_containment() to verify resolved paths stay within
the checkpoint base directory. Both checks are applied in
_get_checkpoint_path(), which is used by save(), load(), and all
rollback operations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: APPROVE

Reviewing commit 72485a7 for PR #413

Blocking Findings

No in-scope blocking findings remained after policy filtering.

Non-blocking Suggestions

# File:Line Category Finding

Design Notes

The path-traversal goal is valid, but folding raw identifiers directly into sanitized filenames changes the storage key contract. This needs a collision-resistant mapping with an explicit filename-length budget, rather than lossy replacement plus truncation.


Reviewed by ouroboros-agent[bot] via Codex deep analysis

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: APPROVE

Reviewing commit 38b3aa4 for PR #413

Blocking Findings

No in-scope blocking findings remained after policy filtering.

Non-blocking Suggestions

# File:Line Category Finding

Design Notes

The traversal hardening is directionally correct, but it was added at the filename layer without preserving a one-to-one mapping between logical seed IDs and on-disk keys. For persistence code, the key derivation has to remain collision-free or be explicitly rejected.


Reviewed by ouroboros-agent[bot] via Codex deep analysis

@shaun0927
Copy link
Copy Markdown
Contributor Author

Labels: security persistence — Fixes #397

Copy link
Copy Markdown
Owner

@Q00 Q00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes because the filename-length hardening is still incomplete.

_sanitize_seed_id() truncates the raw seed id to 255 characters, but _get_checkpoint_path() then wraps it in checkpoint_<seed>.json / .json.<level>. That still produces basenames longer than 255 bytes on common filesystems (len("checkpoint_") + 255 + len(".json") = 271), so long seed ids can still fail with ENAMETOOLONG on save/load.

Please cap the sanitized id to the remaining filename budget after the prefix/suffix, and add a regression test that actually writes a long-id checkpoint instead of only asserting on the generated path string.

_sanitize_seed_id() previously truncated to 255 characters, but
_get_checkpoint_path() wraps the seed in "checkpoint_<seed>.json[.N]"
and file_lock appends ".lock", producing basenames up to 273 bytes
which triggers ENAMETOOLONG on common filesystems.

Now the seed is capped to 232 characters (255 minus worst-case
prefix+suffix+lock overhead). When truncation is needed, a SHA-256
hash fragment is appended to keep the mapping collision-resistant.

Replaces the path-string-only long-id test with regression tests that
actually write and read back checkpoints with long seed ids.

Addresses review feedback from Q00 on PR Q00#413.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: APPROVE

Reviewing commit 17b5c70 for PR #413

Blocking Findings

No in-scope blocking findings remained after policy filtering.

Non-blocking Suggestions

# File:Line Category Finding

Design Notes

The hardening is heading in the right direction, but it currently treats filename safety as a lossy string rewrite. For persistence keys, the safer shape is a reversible or collision-resistant mapping from raw seed_id to filename, with byte-based length enforcement rather than character-count enforcement.


Reviewed by ouroboros-agent[bot] via Codex deep analysis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

security(persistence): CheckpointStore seed_id path traversal allows arbitrary file write/delete

2 participants