From c0234d478bf757623eece72bdbf5ba97d7112c85 Mon Sep 17 00:00:00 2001 From: Dan LaManna Date: Mon, 13 Apr 2026 16:41:35 -0400 Subject: [PATCH] Add diagram/doc for zip streaming --- docs/zip-download-flow.md | 56 +++++++++++++++++++++++++++++++++++++++ isic/settings/base.py | 4 +++ 2 files changed, 60 insertions(+) create mode 100644 docs/zip-download-flow.md diff --git a/docs/zip-download-flow.md b/docs/zip-download-flow.md new file mode 100644 index 000000000..824b84aa5 --- /dev/null +++ b/docs/zip-download-flow.md @@ -0,0 +1,56 @@ +# Zip Download Flow + +This diagram illustrates the flow for downloading images as a zip file. + +```mermaid +sequenceDiagram + autonumber + Client->>+API: Generate a signed URL for zip download + API-->>-Client: Signed Zip Server URL + Client->>+Zip Server: Request Zip with signed URL + Zip Server->>+API: Ask for Zip File Descriptor + API-->>-Zip Server: Zip file descriptor + par + Zip Server->>+S3: Get image files + and + Zip Server->>+API: Get metadata/attribution/license files + and + Zip Server-->>-Client: Stream zip file + end +``` + +## Security + +### Step 1: Client → API (Generate signed URL) +- User's identity and search parameters are captured + +### Step 2: API → Client (Return signed URL) +- URL contains a cryptographically signed token (Django `TimestampSigner`) +- Token expires after 1 day +- Token encodes: user ID, search query, collection filters + +### Step 3: Client → Zip Server (Request zip) +- Client opaquely passes the signed token to the external zip server +- Token serves as proof of authorization + +### Step 4: Zip Server → API (Request file descriptor) +- Zip server is hardcoded to retrieve the descriptor from the one true API server with the given zsid +- Token validation via `ZipDownloadTokenAuth` +- Verifies signature and checks 24-hour expiration + +### Step 5: API → Zip Server (Return file descriptor) +- API returns file listing only for images the user is authorized to access +- Descriptor contains bare unsigned S3 URLs for images and API URLs for metadata files +- Query is executed with user's permissions applied + +### Step 6: Zip Server → S3 (Fetch images) +- **Public images** (sponsored bucket): + - Bucket has public read policy +- **Private images** (default bucket): + - Bucket and all objects are private + - Zip server has the an instance profile granting explicit read access + +### Step 7: Zip Server → API (Fetch metadata/attribution/license files) +- Same token authentication as Step 4 +- Files are generated on-the-fly +- Each request validates the token diff --git a/isic/settings/base.py b/isic/settings/base.py index 6251dd7eb..43311db8f 100644 --- a/isic/settings/base.py +++ b/isic/settings/base.py @@ -159,6 +159,10 @@ # Disallowing CORS credentials is all the security necessary. # The API is safe to call by anyone, as it has no side effects. +# Note: CORS_ALLOW_ALL_ORIGINS (which sets Access-Control-Allow-Origin: *) +# cannot be used with credentials per the CORS spec. Browsers will block any +# request that attempts credentials: 'include' when the response has a wildcard +# origin, preventing CSRF attacks even on @csrf_exempt endpoints. CORS_ALLOW_ALL_ORIGINS = True PASSWORD_HASHERS += [