Security: Prevent zip slip during archive extraction#5112
Security: Prevent zip slip during archive extraction#5112RishabhJain2018 wants to merge 2 commits into
Conversation
Validate zip member paths before extractall in the API, config utilities, and submission workers so archives cannot write outside the target directory. Co-authored-by: Rishabh Jain <rishabhjain2018@gmail.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (4)
WalkthroughThis PR hardens ZIP extraction security by introducing a centralized path-traversal validation utility and routing all ZIP extractions through it. A new ChangesZIP Extraction Path-Traversal Security
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/base/utils.py`:
- Around line 503-517: The ZIP extraction is protected but YAML-sourced paths
can still traverse out of the extracted tree; add a helper (e.g.,
safe_join_under_root(root, *parts)) that resolves and verifies the final path is
inside root (similar check to safe_extract_zip_file), then replace every place
that builds filesystem paths from YAML values (references to description,
evaluation_details, terms_and_conditions, submission_guidelines,
evaluation_script, test_annotation_file) to call
safe_join_under_root(extracted_root, yaml_value) before any
os.path.isfile/open/stat calls; ensure the helper raises on unsafe targets and
is used everywhere YAML-derived values are passed to isfile() or open().
In `@apps/challenges/challenge_config_utils.py`:
- Around line 96-103: The extract_zip_file function currently opens
zipfile.ZipFile(file_path, mode) into zip_ref and calls
safe_extract_zip_file(zip_ref, output_path) but only calls zip_ref.close() after
that, so if safe_extract_zip_file raises the handle remains open; wrap the
extraction call in a try/finally (or use a context manager) so zip_ref.close()
is always executed, i.e., ensure zip_ref is closed in the finally block even on
exceptions and return only after closing; reference symbols: extract_zip_file,
safe_extract_zip_file, zipfile.ZipFile, zip_ref.close().
In `@scripts/workers/remote_submission_worker.py`:
- Around line 137-140: The import of safe_extract_zip_file at module top-level
causes apps/base/utils.py (with Django/DRF top-level imports) to execute and
crash this worker; to fix, remove the top-level line "from base.utils import
safe_extract_zip_file" and instead perform a lazy/local import inside the
function that uses it (the code around zip_ref =
zipfile.ZipFile(download_location, "r") and safe_extract_zip_file(zip_ref,
extract_location)), or refactor safe_extract_zip_file into a small standalone
helper module with no Django dependencies and import that helper here; update
references to use the newly imported/local function name so extraction occurs
without importing Django during module import.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: dd256f7a-b836-45ba-9cb2-0e94968634f5
📒 Files selected for processing (5)
apps/base/utils.pyapps/challenges/challenge_config_utils.pyapps/challenges/views.pyscripts/workers/remote_submission_worker.pyscripts/workers/submission_worker.py
| def safe_extract_zip_file(zip_ref, destination): | ||
| """ | ||
| Extract zip archive members while preventing path traversal (zip slip). | ||
| """ | ||
| destination_path = os.path.abspath(destination) | ||
| for member in zip_ref.namelist(): | ||
| member_path = os.path.abspath(os.path.join(destination_path, member)) | ||
| if not ( | ||
| member_path == destination_path | ||
| or member_path.startswith(destination_path + os.sep) | ||
| ): | ||
| raise zipfile.BadZipFile( | ||
| "Zip archive contains unsafe file paths." | ||
| ) | ||
| zip_ref.extractall(destination_path) |
There was a problem hiding this comment.
This still leaves a traversal path via YAML-referenced files.
safe_extract_zip_file() only validates archive member paths. In the challenge import flows, the extracted YAML is still used to build paths for files like description, evaluation_details, terms_and_conditions, submission_guidelines, evaluation_script, and test_annotation_file with plain join(..., yaml_value) calls. A config value containing ../ can still escape the extracted tree and turn this into an arbitrary file read even though extraction itself is now bounded. Please add a shared safe_join_under_root()-style helper and use it everywhere a YAML value becomes a filesystem path.
🔐 Suggested direction
+def safe_join_under_root(root, relative_path):
+ candidate = os.path.abspath(os.path.join(root, relative_path))
+ if not (
+ candidate == root
+ or candidate.startswith(root + os.sep)
+ ):
+ raise ValueError("Path escapes extraction root.")
+ return candidateUse that helper before every isfile() / open() on YAML-derived paths.
As per coding guidelines, apps/**: Django backend. Pay attention to: Safe handling of user-submitted files and Docker image references.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def safe_extract_zip_file(zip_ref, destination): | |
| """ | |
| Extract zip archive members while preventing path traversal (zip slip). | |
| """ | |
| destination_path = os.path.abspath(destination) | |
| for member in zip_ref.namelist(): | |
| member_path = os.path.abspath(os.path.join(destination_path, member)) | |
| if not ( | |
| member_path == destination_path | |
| or member_path.startswith(destination_path + os.sep) | |
| ): | |
| raise zipfile.BadZipFile( | |
| "Zip archive contains unsafe file paths." | |
| ) | |
| zip_ref.extractall(destination_path) | |
| def safe_extract_zip_file(zip_ref, destination): | |
| """ | |
| Extract zip archive members while preventing path traversal (zip slip). | |
| """ | |
| destination_path = os.path.abspath(destination) | |
| for member in zip_ref.namelist(): | |
| member_path = os.path.abspath(os.path.join(destination_path, member)) | |
| if not ( | |
| member_path == destination_path | |
| or member_path.startswith(destination_path + os.sep) | |
| ): | |
| raise zipfile.BadZipFile( | |
| "Zip archive contains unsafe file paths." | |
| ) | |
| zip_ref.extractall(destination_path) | |
| def safe_join_under_root(root, relative_path): | |
| candidate = os.path.abspath(os.path.join(root, relative_path)) | |
| if not ( | |
| candidate == root | |
| or candidate.startswith(root + os.sep) | |
| ): | |
| raise ValueError("Path escapes extraction root.") | |
| return candidate |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/base/utils.py` around lines 503 - 517, The ZIP extraction is protected
but YAML-sourced paths can still traverse out of the extracted tree; add a
helper (e.g., safe_join_under_root(root, *parts)) that resolves and verifies the
final path is inside root (similar check to safe_extract_zip_file), then replace
every place that builds filesystem paths from YAML values (references to
description, evaluation_details, terms_and_conditions, submission_guidelines,
evaluation_script, test_annotation_file) to call
safe_join_under_root(extracted_root, yaml_value) before any
os.path.isfile/open/stat calls; ensure the helper raises on unsafe targets and
is used everywhere YAML-derived values are passed to isfile() or open().
Move safe_extract_zip_file into base.zip_utils for workers without Django setup and close zip handles on extraction failure.
Summary
Mitigates zip slip (path traversal) when extracting challenge and submission archives.
Changes
safe_extract_zip_file()inapps/base/utils.pyto validate member paths before extraction.Security impact
Prevents malicious zip archives from writing files outside the intended extraction directory.
Slack Thread
Summary by CodeRabbit