Skip to content

simpleapi_download: root /simple/ fetch crashes on registries that don't serve PEP 503 root index (e.g. Google Artifact Registry)Β #3709

@krish-khimasia-glean

Description

@krish-khimasia-glean

🐞 bug report

Affected Rule

simpleapi_download.bzl β€” specifically _read_simpleapi and _get_dist_urls

Is this a regression?

Yes, this works in 1.7.0. The root /simple/ fetch was introduced in #3657 (commit 900d557).

Description

We're upgrading from rules_python 1.7.0 to 2.0.0-rc2 and hit an issue with a private PyPI registry on Google Artifact Registry (GAR).
In 2.0.0-rc2, simpleapi_download.bzl fetches the root /simple/ page of each index URL to discover which packages are on which index (introduced in #3657). GAR returns 404 for the root page, it only serves per-package pages like /simple/six/. ctx.download throws IOException on the 404, which kills the entire pip resolution.
We confirmed this is a GAR limitation, not an auth issue. Authenticated requests also get 404 on the root while per-package pages return 200.
We were able to work around this by patching _read_simpleapi to pass allow_fail=parse_index to ctx.download, and added a if not result.success: continue guard in _get_dist_urls, but I'm unsure how this fares for cases with experimental_extra_index_urls set.

πŸ”¬ Minimal Reproduction

  1. Set up a Google Artifact Registry Python repository
  2. Configure pip.parse with experimental_index_url pointing to the GAR Simple API URL, or pass --index-url via extra_pip_args
  3. Run any bazel build that triggers pip resolution
    The root cause can be verified independently:
# Per-package page works (200)
curl -s -o /dev/null -w "%{http_code}" \
  "https://<region>-python.pkg.dev/<project>/<repo>/simple/six/" \
  -H "Authorization: Bearer $(gcloud auth print-access-token)"
# Root page does not (404)
curl -s -o /dev/null -w "%{http_code}" \
  "https://<region>-python.pkg.dev/<project>/<repo>/simple/" \
  -H "Authorization: Bearer $(gcloud auth print-access-token)"

πŸ”₯ Exception or Error

ERROR: .../simpleapi_download.bzl:258:39: Traceback (most recent call last):
  ...
  File ".../simpleapi_download.bzl", line 164, column 43, in _get_dist_urls
    results[index_url] = download.wait()
  File ".../simpleapi_download.bzl", line 258, column 39, in lambda
    result = download.wait(),
Error in wait: java.io.IOException: Error downloading
  [https://-python.pkg.dev///simple/]
  GET returned 404 Not Found

🌍 Your Environment

Operating System:

  
  macOS 24.5.0 (darwin arm64) / Linux (CI)
  

Output of bazel version:

  
8.5.1
  

Rules_python version:

  
2.0.0-rc2
  

Anything else relevant?

Metadata

Metadata

Assignees

No one assigned

    Labels

    need: upstream supportAn issue that needs changes in upstream code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions