Feature Request: Deprecate Pure-Go .gitattributes Matching in Favor of git check-attr
Summary
Deprecate the existing “best-effort pure-Go matcher for .gitattributes” and standardize on authoritative attribute resolution via git check-attr for all path-based routing, filtering, and policy decisions.
The pure-Go matcher is inherently incomplete and will produce incorrect results in common, real-world Git repositories due to Git’s attribute precedence rules. These failures are subtle, hard to debug, and can lead to incorrect routing (RO vs RW), incorrect enforcement, or data integrity issues.
Motivation
Git attributes are not a simple pattern-matching file. They are resolved by Git using:
- hierarchical precedence
- multiple attribute sources
- overrides and negation
- path-relative scope
- repo configuration and info files
Re-implementing this logic outside of Git is brittle and error-prone.
Git already exposes the correct resolution mechanism via:
Using Git as the source of truth eliminates ambiguity and guarantees correctness.
Problem Statement
The current pure-Go matcher:
- Parses a single
.gitattributes file
- Applies “last match wins” semantics locally
- Ignores Git’s full attribute resolution rules
This approach cannot faithfully replicate Git behavior and will return incorrect results in many common scenarios.
Typical Failure Scenarios
Below are non-edge-case, real-world situations where a best-effort matcher will give the wrong answer.
1. Nested .gitattributes Files
Scenario
.gitattributes
data/** drs.route=ro
data/projectA/.gitattributes
*.dat drs.route=rw
Path
Correct Git behavior
Pure-Go failure
- Only reads the root
.gitattributes
- Returns
ro
- Routes uploads incorrectly
Git resolves attributes per directory, not per file, and applies the closest .gitattributes.
2. Attribute Overrides and Unsets
Scenario
*.dat drs.route=ro
scratch/** -drs.route
Path
Correct Git behavior
Pure-Go failure
- Treats
-drs.route as unknown or ignores it
- Incorrectly keeps
ro
Unset semantics are core to Git attributes and are difficult to model correctly.
3. info/attributes and Global Attributes
Git reads attributes from multiple sources:
Order of precedence (simplified):
.gitattributes in the same directory
- Parent
.gitattributes
.git/info/attributes
- Global attributes (
core.attributesFile)
Scenario
.git/info/attributes
TARGET-ALL-P2/** drs.route=ro
No .gitattributes in the repo.
Correct Git behavior
Pure-Go failure
- Never looks at
.git/info/attributes
- Returns
unspecified
This is extremely common in controlled or managed repos.
4. Attribute Macros and Composition
Scenario
[attr]readonly
drs.route=ro
data/** readonly
Correct Git behavior
Pure-Go failure
- Does not expand attribute macros
- Misses the route entirely
Macros are first-class Git features and are used heavily in larger repos.
5. Path Normalization and Platform Semantics
Git attribute matching uses:
- forward-slash normalization
- repo-relative paths
- special handling for directories vs files
Scenario
- Windows paths (
\)
- symlinked worktrees
- submodules
A custom matcher will almost always diverge from Git’s behavior across platforms.
6. Renames and History-Sensitive Evaluation
Git evaluates attributes based on the current tree context, not historical assumptions.
Scenario
- File moved from
scratch/ → TARGET-ALL-P2/
- Different routing rules apply
Correct Git behavior
- Attributes reflect current path
Pure-Go failure
- Cached or inferred rules from old paths
- Incorrect routing after renames
Impact
Incorrect attribute resolution can cause:
- Files routed to the wrong backend (RO vs RW)
- Uploads denied or allowed incorrectly
- Silent policy violations
- Extremely difficult debugging (“works locally but not in CI”)
Because attribute resolution happens inside Git, any divergence introduces correctness risk.
Proposed Change
Deprecate
- The “best-effort pure-Go
.gitattributes matcher”
Standardize On
- Calling
git check-attr for all attribute lookups
Example:
git check-attr drs.route -- path/to/file
This provides:
- Exact Git semantics
- Correct precedence handling
- Consistent behavior across platforms and environments
Migration Plan
-
Mark the pure-Go matcher as deprecated
-
Update internal callers to use git check-attr
-
Retain the pure-Go matcher only as:
- a test helper, or
- a last-ditch fallback with explicit warnings
Alternatives Considered
-
Re-implement full Git attribute resolution in Go
❌ High complexity, high maintenance, guaranteed drift over time
-
Maintain both implementations
❌ Ambiguous correctness, inconsistent behavior
Using Git itself is the simplest, most robust solution.
Recommendation
Deprecate and remove the pure-Go attribute matcher in favor of authoritative resolution via git check-attr.
Git already solved this problem. We should not re-implement it.
Additional Rationale: Typical Git LFS Filter Scenarios Where Best-Effort Matching Fails
Git LFS usage amplifies the risk of incorrect attribute resolution because filter decisions affect both content storage and transfer semantics. A wrong answer doesn’t just misroute metadata — it can lead to missing objects, failed pushes, or corrupted workflows.
Below are common, real-world LFS patterns where a best-effort .gitattributes matcher will fail.
1. Mixed LFS / Non-LFS Paths with Overrides
Scenario
*.dat filter=lfs diff=lfs merge=lfs -text
# Explicitly exclude scratch outputs
scratch/** -filter -diff -merge
Path
scratch/results/output.dat
Correct Git behavior
filter = unspecified (NOT LFS)
- File is stored directly in Git
Best-effort failure
- Sees
*.dat filter=lfs
- Ignores or mishandles
-filter
- Treats file as LFS-managed
Impact
- Pointer file written where raw content was expected
- Downstream tools fail on unexpected pointer files
- Users see “why is my scratch output in LFS?”
2. Nested LFS Rules with Directory-Scoped Overrides
Scenario
.gitattributes
*.bin filter=lfs
data/raw/.gitattributes
*.bin -filter
Path
Correct Git behavior
Best-effort failure
- Only evaluates root
.gitattributes
- Treats file as LFS-managed
Impact
- Large raw files unintentionally pushed through LFS
- Uploads fail or are routed incorrectly
- Hard to diagnose because the rule looks correct to the user
3. LFS Enablement via Attribute Macros
Scenario
[attr]lfsdata
filter=lfs diff=lfs merge=lfs -text
*.bam lfsdata
*.cram lfsdata
Correct Git behavior
.bam and .cram files are LFS-tracked
Best-effort failure
- Does not expand attribute macros
- Returns
filter=unspecified
Impact
- Large genomics files committed directly into Git
- Repository bloat
- Silent failure until repo size explodes
This pattern is very common in scientific and media repositories.
4. info/attributes Used to Enforce LFS Globally
Scenario
.git/info/attributes
*.mp4 filter=lfs diff=lfs merge=lfs -text
No .gitattributes committed to the repo.
Correct Git behavior
.mp4 files are LFS-managed
Best-effort failure
- Never reads
.git/info/attributes
- Treats files as non-LFS
Impact
- CI and developer machines behave differently
- LFS rules appear to “randomly not apply”
- Violates operator expectations in managed environments
5. Conditional LFS Usage by Directory
Scenario
data/** filter=lfs
data/tmp/** -filter
Path
data/tmp/intermediate.bin
Correct Git behavior
Best-effort failure
- Applies first match only
- Or applies both incorrectly
- Returns
filter=lfs
Impact
- Temporary/intermediate files end up as LFS pointers
- Users delete temp dirs and break LFS history
- Garbage collection and pruning become unsafe
6. Rename-Sensitive LFS Semantics
Scenario
- File initially in
scratch/ (not LFS)
- Later renamed to
data/ (LFS-tracked)
scratch/** -filter
data/** filter=lfs
Correct Git behavior
- LFS applies based on current path, not history
Best-effort failure
- Cached or inferred rules based on old location
- Incorrectly treats renamed file as non-LFS
Impact
- Pointer not created when expected
- Push fails with
(missing) because bytes aren’t in LFS store
- Extremely confusing user experience
7. Cross-Platform Path Matching Issues
Git attribute matching:
- normalizes to
/
- applies repo-relative paths
- handles directories specially
Best-effort failure modes
- Windows
\ paths
- Case sensitivity mismatches
- Incorrect matching for
** patterns
Impact
- LFS works on macOS/Linux, fails on Windows
- Routing differs between developer machines and CI
Why This Matters More for LFS Than Other Attributes
For attributes like text or eol, a wrong answer is annoying.
For LFS, a wrong answer can cause:
- pointer files where raw data is expected
- raw data where pointers are required
- missing objects at push time
- irreversible repo pollution
Because LFS affects storage, transport, and history, correctness is non-negotiable.
Conclusion (Reinforced)
Any best-effort .gitattributes matcher will inevitably diverge from Git’s behavior in common LFS use cases.
For LFS-related decisions (filter=lfs, routing, policy enforcement):
git check-attr is not just preferable — it is required for correctness.
This strengthens the case to deprecate the pure-Go matcher entirely and rely on Git as the single source of truth.
Feature Request: Deprecate Pure-Go
.gitattributesMatching in Favor ofgit check-attrSummary
Deprecate the existing “best-effort pure-Go matcher for
.gitattributes” and standardize on authoritative attribute resolution viagit check-attrfor all path-based routing, filtering, and policy decisions.The pure-Go matcher is inherently incomplete and will produce incorrect results in common, real-world Git repositories due to Git’s attribute precedence rules. These failures are subtle, hard to debug, and can lead to incorrect routing (RO vs RW), incorrect enforcement, or data integrity issues.
Motivation
Git attributes are not a simple pattern-matching file. They are resolved by Git using:
Re-implementing this logic outside of Git is brittle and error-prone.
Git already exposes the correct resolution mechanism via:
Using Git as the source of truth eliminates ambiguity and guarantees correctness.
Problem Statement
The current pure-Go matcher:
.gitattributesfileThis approach cannot faithfully replicate Git behavior and will return incorrect results in many common scenarios.
Typical Failure Scenarios
Below are non-edge-case, real-world situations where a best-effort matcher will give the wrong answer.
1. Nested
.gitattributesFilesScenario
Path
Correct Git behavior
Pure-Go failure
.gitattributesroGit resolves attributes per directory, not per file, and applies the closest
.gitattributes.2. Attribute Overrides and Unsets
Scenario
Path
Correct Git behavior
Pure-Go failure
-drs.routeas unknown or ignores itroUnset semantics are core to Git attributes and are difficult to model correctly.
3.
info/attributesand Global AttributesGit reads attributes from multiple sources:
Order of precedence (simplified):
.gitattributesin the same directory.gitattributes.git/info/attributescore.attributesFile)Scenario
No
.gitattributesin the repo.Correct Git behavior
Pure-Go failure
.git/info/attributesunspecifiedThis is extremely common in controlled or managed repos.
4. Attribute Macros and Composition
Scenario
Correct Git behavior
Pure-Go failure
Macros are first-class Git features and are used heavily in larger repos.
5. Path Normalization and Platform Semantics
Git attribute matching uses:
Scenario
\)A custom matcher will almost always diverge from Git’s behavior across platforms.
6. Renames and History-Sensitive Evaluation
Git evaluates attributes based on the current tree context, not historical assumptions.
Scenario
scratch/→TARGET-ALL-P2/Correct Git behavior
Pure-Go failure
Impact
Incorrect attribute resolution can cause:
Because attribute resolution happens inside Git, any divergence introduces correctness risk.
Proposed Change
Deprecate
.gitattributesmatcher”Standardize On
git check-attrfor all attribute lookupsExample:
This provides:
Migration Plan
Mark the pure-Go matcher as deprecated
Update internal callers to use
git check-attrRetain the pure-Go matcher only as:
Alternatives Considered
Re-implement full Git attribute resolution in Go
❌ High complexity, high maintenance, guaranteed drift over time
Maintain both implementations
❌ Ambiguous correctness, inconsistent behavior
Using Git itself is the simplest, most robust solution.
Recommendation
Deprecate and remove the pure-Go attribute matcher in favor of authoritative resolution via
git check-attr.Git already solved this problem. We should not re-implement it.
Additional Rationale: Typical Git LFS Filter Scenarios Where Best-Effort Matching Fails
Git LFS usage amplifies the risk of incorrect attribute resolution because filter decisions affect both content storage and transfer semantics. A wrong answer doesn’t just misroute metadata — it can lead to missing objects, failed pushes, or corrupted workflows.
Below are common, real-world LFS patterns where a best-effort
.gitattributesmatcher will fail.1. Mixed LFS / Non-LFS Paths with Overrides
Scenario
Path
Correct Git behavior
filter = unspecified(NOT LFS)Best-effort failure
*.dat filter=lfs-filterImpact
2. Nested LFS Rules with Directory-Scoped Overrides
Scenario
Path
Correct Git behavior
Best-effort failure
.gitattributesImpact
3. LFS Enablement via Attribute Macros
Scenario
Correct Git behavior
.bamand.cramfiles are LFS-trackedBest-effort failure
filter=unspecifiedImpact
This pattern is very common in scientific and media repositories.
4.
info/attributesUsed to Enforce LFS GloballyScenario
No
.gitattributescommitted to the repo.Correct Git behavior
.mp4files are LFS-managedBest-effort failure
.git/info/attributesImpact
5. Conditional LFS Usage by Directory
Scenario
Path
Correct Git behavior
Best-effort failure
filter=lfsImpact
6. Rename-Sensitive LFS Semantics
Scenario
scratch/(not LFS)data/(LFS-tracked)Correct Git behavior
Best-effort failure
Impact
(missing)because bytes aren’t in LFS store7. Cross-Platform Path Matching Issues
Git attribute matching:
/Best-effort failure modes
\paths**patternsImpact
Why This Matters More for LFS Than Other Attributes
For attributes like
textoreol, a wrong answer is annoying.For LFS, a wrong answer can cause:
Because LFS affects storage, transport, and history, correctness is non-negotiable.
Conclusion (Reinforced)
Any best-effort
.gitattributesmatcher will inevitably diverge from Git’s behavior in common LFS use cases.For LFS-related decisions (
filter=lfs, routing, policy enforcement):This strengthens the case to deprecate the pure-Go matcher entirely and rely on Git as the single source of truth.