Skip to content

discussion/refactor-internal #219

@bwalsh

Description

@bwalsh

ADR 0003: Align git-drs client API usage to GA4GH DRS-compatible operations

Re. feature/local-drs-server

Status

Proposed

Context

The git-drs client currently depends on multiple data-plane and metadata operations that are not part of the GA4GH DRS API surface. This ADR inventories those calls, proposes the closest GA4GH DRS equivalents where possible, and defines a phased refactor + test plan.

Scope for this ADR:

  • Includes calls made in client/drs and client/local.
  • Excludes Git LFS Batch API behavior by request.

What counts as "GA4GH DRS API" here

For this ADR, GA4GH DRS-compatible operations are treated as:

  • GET /ga4gh/drs/v1/objects/{object_id}
  • GET /ga4gh/drs/v1/objects/{object_id}/access/{access_id}
  • GET /ga4gh/drs/v1/service-info

All client calls outside those read-oriented operations are considered non-DRS extensions.

Inventory: non-GA4GH calls used by the client

Current client call Where used Why it exists today Best GA4GH DRS equivalent Gap
POST /index/bulk/sha256/validity getSHA256ValidityMapRuntime Fast "does content already exist" check for many SHA-256 values before upload. No direct equivalent. Best approximation is deterministic object IDs + repeated GET /objects/{id}. No standard hash-query or bulk existence endpoint in DRS.
Health/discovery GET /index (local server behavior) Local client request path validated in tests Local metadata/service probe prior to object operations. GET /ga4gh/drs/v1/service-info. Local stack currently expects /index; needs service-info adoption path.
GetObjectByHash ResolveGitScopedURL, DeleteRecordsByOID, local delete/sync flows Resolve records by SHA-256 OID. No direct equivalent. Best fit is derive object_id from OID and call GET /objects/{id}. DRS has no native checksum search endpoint.
BatchGetObjectsByHash Local metadata interface Batch lookup by checksum to reduce round-trips. None; fan-out GET /objects/{id} if deterministic IDs are adopted. No batch object lookup in DRS.
ListObjects Local metadata interface Administrative enumeration of all records. None in base DRS. DRS has no list endpoint.
ListObjectsByProject Local metadata interface Project-scoped listing for maintenance and reporting. None in base DRS. DRS has no project listing endpoint; project scoping is implementation-specific metadata.
GetProjectSample Local metadata interface Project sample for diagnostics/UX. None in base DRS. No sampling/list primitive in DRS.
RegisterRecord / RegisterRecords Register flows (RegisterFile, push batch sync) Create metadata records before upload. None in base DRS. DRS is read/access-oriented; write/ingest is out-of-scope.
UpdateRecord Local metadata interface Mutate existing DRS metadata. None in base DRS. No metadata mutation operation in DRS.
DeleteRecord / DeleteRecordsByProject Delete-by-OID and cleanup flows Remove stale/duplicate records and support upsert semantics. None in base DRS. No deletion operation in DRS.
Upload orchestration via transfer.Uploader and ResolveUploadURLs Upload flows in register.go, batch_sync.go, local_client.go Obtain presigned upload URLs and transfer file bytes. No direct equivalent. Closest pattern: implementation-specific ingest API + DRS readback after ingest. DRS access methods are for resolving access to existing objects; upload control-plane is non-standard.

Inventory: non-GA4GH calls used by the client

Current client call Where used Why it exists today Best GA4GH DRS equivalent Gap
POST /index/bulk/sha256/validity getSHA256ValidityMapRuntime Fast "does content already exist" check for many SHA-256 values before upload. GET /objects/checksum/{checksum} (feature branch) per checksum. No bulk checksum validity endpoint in current draft; requires fan-out checks.
Health/discovery GET /index (local server behavior) Local client request path validated in tests Local metadata/service probe prior to object operations. GET /ga4gh/drs/v1/service-info. Local stack currently expects /index; needs service-info adoption path.
GetObjectByHash ResolveGitScopedURL, DeleteRecordsByOID, local delete/sync flows Resolve records by SHA-256 OID. GET /objects/checksum/{checksum} (feature branch). Checksum type handling and hash canonicalization need alignment.
BatchGetObjectsByHash Local metadata interface Batch lookup by checksum to reduce round-trips. No direct batch checksum query; fan-out GET /objects/checksum/{checksum}. Still no one-call batch checksum lookup.
ListObjects Local metadata interface Administrative enumeration of all records. None in base DRS. DRS has no list endpoint.
ListObjectsByProject Local metadata interface Project-scoped listing for maintenance and reporting. None in base DRS. DRS has no project listing endpoint; project scoping is implementation-specific metadata.
GetProjectSample Local metadata interface Project sample for diagnostics/UX. None in base DRS. No sampling/list primitive in DRS.
RegisterRecord / RegisterRecords Register flows (RegisterFile, push batch sync) Create metadata records before upload. POST /objects/register (feature branch, optional). Capability discovery + fallback required for servers that remain read-only.
UpdateRecord Local metadata interface Mutate existing DRS metadata. PUT /objects/{object_id}/access-methods, PUT /objects/access-methods, and checksum update endpoints where applicable (feature branch, optional). No full object replacement endpoint; updates are scoped to specific mutable fields.
DeleteRecord / DeleteRecordsByProject Delete-by-OID and cleanup flows Remove stale/duplicate records and support upsert semantics. PUT /objects/{object_id}/delete and PUT /objects/delete (feature branch, optional). Project-scope delete still requires client-side selection/filtering.
Upload orchestration via transfer.Uploader and ResolveUploadURLs Upload flows in register.go, batch_sync.go, local_client.go Obtain presigned upload URLs and transfer file bytes. POST /upload-request followed by POST /objects/register (feature branch, optional). Client/server negotiation must handle unsupported upload methods and optional endpoint absence.

Detailed traceability table

Updated ADR 0003 to explicitly account for the GA4GH DRS feature/issue-416-drs-upload branch as the basis for best-match write/upload/delete equivalents, instead of marking most write operations as having no equivalent.

non-standard-api ga4gh-drs-api source-code-file-line test-file-line
POST /index/bulk/sha256/validity GET /ga4gh/drs/v1/objects/checksum/{checksum} (feature/issue-416-drs-upload) client/drs/register.go:365-390 client/drs/register_helpers_test.go:128-183
GET /index (local register/list/hash lookup behavior) GET /ga4gh/drs/v1/service-info for health/capability probing; object reads via GET /ga4gh/drs/v1/objects/{object_id} client/local/local_client.go:237-258 client/local/local_client_error_test.go:328-381,416-470
api.GetObjectByHash(...) / GetObjectByHashForGit(...) GET /ga4gh/drs/v1/objects/checksum/{checksum} (feature/issue-416-drs-upload) client/drs/orchestrator.go:68-105 client/drs/client_methods_test.go:78-100
BatchGetObjectsByHash(...) No direct batch checksum endpoint; repeat GET /ga4gh/drs/v1/objects/checksum/{checksum} client/local/local_client.go:256-257 N/A (no direct unit test coverage)
ListObjects(...) No equivalent in base DRS (listing is out-of-scope) client/local/local_client.go:240-242 N/A (no direct unit test coverage)
ListObjectsByProject(...) No equivalent in base DRS client/local/local_client.go:243-245 N/A (no direct unit test coverage)
GetProjectSample(...) No equivalent in base DRS client/local/local_client.go:292-294 N/A (no direct unit test coverage)
RegisterRecord(...) and RegisterRecords(...) POST /ga4gh/drs/v1/objects/register (feature/issue-416-drs-upload, optional) client/local/local_client.go:295-300,328-333,412-415 client/local/local_client_error_test.go:328-408
UpdateRecord(...) PUT /ga4gh/drs/v1/objects/{object_id}/access-methods or PUT /ga4gh/drs/v1/objects/access-methods (feature/issue-416-drs-upload, optional) client/local/local_client.go:301-303 N/A (no direct unit test coverage)
DeleteRecord(...) / DeleteRecordsByProject(...) PUT /ga4gh/drs/v1/objects/{object_id}/delete and PUT /ga4gh/drs/v1/objects/delete (feature/issue-416-drs-upload, optional) client/drs/orchestrator.go:107-135; client/local/local_client.go:259-261,289-290 N/A (no direct unit test coverage)
Upload extensions: transfer.DoUpload(...), transfer.Uploader, ResolveUploadURLs(...) POST /ga4gh/drs/v1/upload-request + POST /ga4gh/drs/v1/objects/register (feature/issue-416-drs-upload, optional) client/drs/register.go:282-333; client/local/local_client.go:467-550 N/A (no direct unit test coverage for extension method contract)

Decision

Adopt a DRS-first client contract for read paths immediately, and isolate write/management extensions behind explicit non-DRS capability interfaces.

1) Read-path target contract (DRS-native)

  • Canonical object resolution should be based on deterministic object IDs where possible.
  • Download URL resolution should use GET /objects/{id} + GET /objects/{id}/access/{access_id}.
  • Service capability probing should prefer GET /service-info.

2) Non-DRS extension boundary

  • Keep non-DRS operations behind clearly named interfaces (e.g., MetadataAdminAPI, IngestAPI).
  • Avoid invoking extension methods through the generic DRS client interface used by read/download code paths.

3) Keep Git LFS Batch API unchanged

  • Existing Git LFS batch behavior is explicitly out-of-scope and remains as-is.

Gap analysis and refactor plan

Phase 0: Baseline and feature flags

  1. Add feature flags for:
    • drs.read_mode = legacy|drs_strict
    • drs.write_mode = extension_required|extension_optional
  2. Add structured logs/metrics around every extension call site to quantify runtime dependency.

Phase 1: DRS-native read-path migration

  1. Introduce deterministic DID strategy:
    • Preferred: did = drs://<authority>/<project>/<sha256> or equivalent stable mapping used consistently by client + server.
  2. Replace hash lookup reads:
    • ResolveGitScopedURL: resolve object ID directly from OID and call GetObject/GetDownloadURL.
  3. Replace local /index health checks with service-info probing where supported.
  4. Keep temporary fallback to current hash-based extensions while servers are upgraded.

Phase 2: Write-path boundary hardening

  1. Split interfaces:
    • DRSReadClient (strict DRS read ops)
    • DRSWriteExtensionClient (register/update/delete, hash validity, upload URL resolution)
  2. Move all write-time extension invocations into a dedicated adapter package.
  3. Ensure upsert/delete logic depends only on extension adapter, never on read client.

Phase 3: Optional compatibility adapters

  1. Implement compatibility adapter strategies:
    • Strategy A (full extension server): keep existing behavior.
    • Strategy B (DRS-only server): metadata writes unavailable; push returns actionable error with remediation.
  2. Add graceful degradation messages for operations impossible under strict DRS-only servers.

Phase 4: Remove hidden coupling

  1. Remove direct construction of non-DRS HTTP endpoints in core flows.
  2. Keep explicit allowlist of non-DRS endpoints in one place (for extension adapter only).

Test plan

Unit tests

  1. Read-path strict mode
    • Resolve/download flows call only GetObject and GetDownloadURL.
    • No hash lookup extension calls in strict mode.
  2. Fallback mode
    • Legacy hash lookup path still works when strict mode is disabled.
  3. Interface segregation
    • Compile-time checks ensure read flows only accept DRSReadClient.

Integration tests

  1. DRS-strict fixture server
    • Expose only /ga4gh/drs/v1/* read endpoints.
    • Verify pull/download succeeds.
  2. Extension-enabled fixture server
    • Verify push/register/delete continue to function via extension adapter.
  3. Mixed capability server
    • Verify capability detection via service-info and explicit fallback behavior.

Regression and safety checks

  1. Maintain existing upload/download correctness tests for multipart/single-part modes.
  2. Add contract tests that fail if read-path introduces non-DRS endpoint calls.
  3. Add e2e matrix:
    • legacy + extension
    • drs_strict + extension
    • drs_strict + no extension (read-only expected)

Consequences

  • Clarifies which behaviors are DRS-standard versus platform-specific extensions.
  • Reduces accidental coupling between read/download flows and Gen3-specific endpoints.
  • Preserves current write capabilities through explicit adapters while enabling DRS-strict interoperability for read paths.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions