공식 경로 기반 온라인 자산 수집기 개선 — manifest/sha256/defer/report 추가#51
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 578717723c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if not added: | ||
| return [ | ||
| ManifestItem( | ||
| category="wheelhouse", | ||
| name=pkg_name, |
There was a problem hiding this comment.
Mark existing wheel artifacts as collected
pip download can return success even when the destination already contains the required files (the CLI explicitly supports existing-path behavior via --exists-action), but this branch treats not added as defer; that makes reruns and overlap cases (e.g., a package already fetched as another package’s dependency) show up as blocked/failed despite installable wheels already being present, which corrupts the manifest/report status and summary counts.
Useful? React with 👍 / 👎.
| status="defer", | ||
| detail=f"blocked: redirected to non-official URL ({final_url})", | ||
| ) | ||
| out_path.write_bytes(resp.read()) |
There was a problem hiding this comment.
Stream asset bodies instead of reading them all at once
This call buffers the entire HTTP response in memory before writing it to disk, so downloading large model/runtime assets can spike memory to the full file size and fail the collector; chunked streaming to disk is needed for the new large-asset workflow introduced in this commit.
Useful? React with 👍 / 👎.
Motivation
defer)해 후속 처리 목록으로 분리할 필요가 있었습니다.Description
scripts/collect_online_assets.py를 전면 개편해 다음 기능을 추가했습니다: PyPI 휠은name==version으로 고정하여https://pypi.org/simple만 사용해 다운로드하고, 모델/런타임 자산은official_base검증(HTTPS, 동일 호스트, 경로 프리픽스) 통과 시에만 다운로드하도록 제한했습니다.ManifestItem으로 표준화하고 각 파일에 대해sha256해시를 계산해.online_assets/meta/collection_manifest.json에 기록하도록 했습니다.collected/defer로 분류하고,installable,blocked_or_failed,defer구분으로 집계한 리포트.online_assets/meta/collection_report.json을 생성하도록 구현했습니다.resources/online_sources.json의wheelhouse,model_assets,runtime_assets형태로 변경했으며 관련 문서ONLINE_RESOURCE_COLLECTION.md를 정책(공식 경로 제한, manifest/sha256, defer/report) 기준으로 업데이트했습니다._is_official_asset_url, 파일명에서 버전 추출_extract_version_from_filename)에 대한 단위 테스트를tests/test_collect_online_assets.py로 추가했습니다.Testing
pytest -q tests/test_collect_online_assets.py을 실행해 통과했습니다 (2 passed).pytest -q tests/test_cli.py tests/test_collect_online_assets.py을 실행해 통과했습니다 (14 passed total).python scripts/collect_online_assets.py를 실행해 매니페스트와 리포트 파일(.online_assets/meta/collection_manifest.json,.online_assets/meta/collection_report.json)이 생성됨을 확인했습니다.python -c "import json;print(json.load(open('.online_assets/meta/collection_report.json'))['summary'])") 일부 항목이 네트워크 정책(403 등)으로defer처리되었음을 확인했습니다.Codex Task