ADR: LFS-Only Local Cache for OID↔Path→S3 URL Hints with Authoritative Resolution at Pre-Push
Status
Proposed
Context and Problem Statement
We need to associate Git LFS–tracked files with external storage locations (S3 URLs) while preserving correct behavior under:
- content changes
- file renames / moves
- undo / restaging workflows
- offline or low-latency local development
Key constraints and preferences:
- Git LFS defines the boundary of responsibility for external objects.
- Non-LFS files are fully managed by Git and must not participate in this mechanism.
- A server-side index (Indexd / DRS) is the authoritative source of truth for mapping content identity to storage locations.
- We do not want to consult the authoritative index on every commit.
pre-commit must be fast, deterministic, offline-friendly, and index-based.
- Correctness and enforcement should occur at
pre-push, where refs, remotes, and commit ranges are known.
"How git works" 🚧
| Hook |
stdin contents |
Structured? |
Stable format? |
Purpose |
| pre-commit |
Empty |
❌ No |
N/A |
Validate what is staged |
| pre-push |
Ref update list |
✅ Yes |
✅ Yes |
Validate what is about to be pushed |
The difference is intentional:
- pre-commit is index-centric
- pre-push is ref-centric
pre-commit reconciliation (practical)
In a pre-commit hook (or your custom git add wrapper), detect renames and move metadata accordingly.
How to detect renames reliably:
Use Git’s rename detection between HEAD and index:
- git diff --cached -M --name-status
This emits lines like:
- R100 old/path.txt new/path.txt
For each R* old new:
- move /old/... → /new/...
- or update the metadata file’s internal path field
- add the moved metadata file to the index
Pros:
- deterministic
- works before commit
- doesn’t require history scanning
Cons:
you must enforce the hook/wrapper usage
Decision
1. Scope: Git LFS–Only
This design applies exclusively to Git LFS–tracked files.
- A file is in scope if and only if its staged content is a valid Git LFS pointer:
version https://git-lfs.github.com/spec/v1
oid sha256:<hex>
- All non-LFS files are explicitly out of scope:
- no cache entries
- no validation
- no warnings or errors
- File size, extension, or
.gitattributes patterns alone MUST NOT be used to infer scope.
2. Identity Model
- The canonical content identity is the Git LFS OID (
sha256:<hex>), extracted from the staged pointer file.
- The system MUST NOT:
- hash file contents
- compute Git blob IDs
- infer identities for non-LFS files
3. Metadata Model
The local system models three non-authoritative relationships, all maintained purely for developer workflow:
-
Path → OID
- Which LFS object is currently staged at a working-tree path.
-
OID → Path(s)
- Which paths have recently referenced a given OID.
- Supports rename, undo, and multi-path reuse.
- Paths are advisory and may be stale.
-
OID → S3 URL (hint)
- A locally cached hint for where the object may live.
- Must be validated against the authoritative server index at
pre-push.
No locally stored relationship is authoritative.
4. Crisp Rule (Normative)
Path is never authoritative; OID (sha256) is.
Paths are client-side, repo-local workflow context.
The server indexes content identity and provides access methods.
5. Local Cache Location (Non-Versioned)
All local metadata is stored under:
This directory:
- is never committed
- is local to the working copy
- may be freely deleted and reconstructed
Recommended layout
.git/drs/pre-commit/
v1/
paths/
<encoded-path>.json
oids/
<oid>.json
tombstones/
<encoded-path>.json
state.json
Cache File Schemas
What is the encoded path
Algorithm (step-by-step):
-
Start with the repo-relative path of the file (as a UTF‑8 string).
-
Encode that path using Base64 URL‑safe encoding without padding (RawURLEncoding), producing the token.
-
Append .json to the encoded token.
-
Place the file under .git/drs/pre-commit/v1/paths/.
-
This is implemented in pathEntryFile → encodePath, which uses base64.RawURLEncoding.EncodeToString([]byte(path)), then appends .json to form the final filename.
Resulting pattern:
.git/drs/pre-commit/v1/paths/<base64url_no_padding(repo_relative_path)>.json
paths/<encoded-path>.json (Path → OID)
{
"path": "data/foo.bam",
"lfs_oid": "sha256:<hex>",
"updated_at": "2026-02-01T12:34:56Z"
}
oids/<oid>.json (OID → Path(s), S3 URL hint)
{
"lfs_oid": "sha256:<hex>",
"paths": [
"data/foo.bam",
"data/archive/foo-copy.bam"
],
"s3_url": "s3://bucket/key",
"updated_at": "2026-02-01T12:34:56Z",
"content_changed": false
}
Pre-Commit Responsibilities (LFS-Only, Local-Only)
The pre-commit hook operates only on the staged index and only on LFS-tracked files.
Pre-Push Responsibilities (Authoritative, Networked)
The pre-push hook is the sole enforcement point.
Mapping .git/drs/pre-commit to Server Semantics
| Local cache concept |
Server-side analogue |
Notes |
path → lfs_oid |
none |
purely client-side workflow context |
lfs_oid → paths[] |
none |
advisory, repo-local |
lfs_oid (sha256) |
Indexd hashes.sha256, DRS checksums |
canonical content identity |
lfs_oid → s3_url (hint) |
Indexd urls[], DRS access_methods[] |
server is authoritative |
| logical ID |
Indexd object_id, DRS object_id |
resolved at pre-push |
Summary
This ADR establishes a strict LFS-only contract:
pre-commit maintains a local, non-authoritative cache of path↔OID↔URL hints, while pre-push resolves and enforces truth using Indexd / DRS.
ADR: LFS-Only Local Cache for OID↔Path→S3 URL Hints with Authoritative Resolution at Pre-Push
Status
Proposed
Context and Problem Statement
We need to associate Git LFS–tracked files with external storage locations (S3 URLs) while preserving correct behavior under:
Key constraints and preferences:
pre-commitmust be fast, deterministic, offline-friendly, and index-based.pre-push, where refs, remotes, and commit ranges are known."How git works" 🚧
The difference is intentional:
pre-commit reconciliation (practical)
In a pre-commit hook (or your custom git add wrapper), detect renames and move metadata accordingly.
How to detect renames reliably:
Use Git’s rename detection between HEAD and index:
This emits lines like:
For each R* old new:
Pros:
Cons:
you must enforce the hook/wrapper usageDecision
1. Scope: Git LFS–Only
This design applies exclusively to Git LFS–tracked files.
.gitattributespatterns alone MUST NOT be used to infer scope.2. Identity Model
sha256:<hex>), extracted from the staged pointer file.3. Metadata Model
The local system models three non-authoritative relationships, all maintained purely for developer workflow:
Path → OID
OID → Path(s)
OID → S3 URL (hint)
pre-push.No locally stored relationship is authoritative.
4. Crisp Rule (Normative)
5. Local Cache Location (Non-Versioned)
All local metadata is stored under:
This directory:
Recommended layout
Cache File Schemas
What is the encoded path
Algorithm (step-by-step):
Start with the repo-relative path of the file (as a UTF‑8 string).
Encode that path using Base64 URL‑safe encoding without padding (RawURLEncoding), producing the token.
Append .json to the encoded token.
Place the file under .git/drs/pre-commit/v1/paths/.
This is implemented in pathEntryFile → encodePath, which uses base64.RawURLEncoding.EncodeToString([]byte(path)), then appends .json to form the final filename.
Resulting pattern:
.git/drs/pre-commit/v1/paths/<base64url_no_padding(repo_relative_path)>.json
paths/<encoded-path>.json(Path → OID){ "path": "data/foo.bam", "lfs_oid": "sha256:<hex>", "updated_at": "2026-02-01T12:34:56Z" }oids/<oid>.json(OID → Path(s), S3 URL hint){ "lfs_oid": "sha256:<hex>", "paths": [ "data/foo.bam", "data/archive/foo-copy.bam" ], "s3_url": "s3://bucket/key", "updated_at": "2026-02-01T12:34:56Z", "content_changed": false }Pre-Commit Responsibilities (LFS-Only, Local-Only)
The
pre-commithook operates only on the staged index and only on LFS-tracked files.Pre-Push Responsibilities (Authoritative, Networked)
The
pre-pushhook is the sole enforcement point.Mapping
.git/drs/pre-committo Server Semanticspath → lfs_oidlfs_oid → paths[]lfs_oid (sha256)hashes.sha256, DRSchecksumslfs_oid → s3_url(hint)urls[], DRSaccess_methods[]object_id, DRSobject_idSummary
This ADR establishes a strict LFS-only contract:
pre-commit maintains a local, non-authoritative cache of path↔OID↔URL hints, while pre-push resolves and enforces truth using Indexd / DRS.