ANE-1036: Glob file matching for exclusion filters in .fossa.yml#1703
ANE-1036: Glob file matching for exclusion filters in .fossa.yml#1703
Conversation
`paths.only` and `paths.exclude` entries that contain `*`, `?`, or `[` are now parsed as System.FilePattern globs via the existing Data.Glob wrapper. Entries without glob metacharacters keep their prior "directory and all children" semantics, so this change is backward-compatible. Adds a PathFilter sum type at the config layer, threads a parallel list of glob patterns through FilterCombination, and extends pathAllowed / applyComb to include-or-exclude directories whose relative path matches a glob. Matching normalizes the trailing slash that Path.toString appends to Dir paths so patterns like `node_modules/*` match as users expect. Docs and fossa-yml.v3.schema.json updated. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Normalize backslashes to forward slashes before glob matching so user-supplied patterns like `node_modules/*` match the backslash- separated paths produced by `Path Rel Dir` on Windows. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Cover '?' wildcards, '[...]' character classes, root-anchored single-segment globs, an explicit trailing-slash normalization regression guard, and a four-way mix of include/exclude globs and concrete paths. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ests System.FilePattern only implements `*` and `**`; `?` and `[...]` are matched literally rather than as wildcards/character classes. The two new tests asserted wildcard semantics and were red on CI. Flip the expectations so they document the actual behavior and serve as a regression guard if the engine ever gains those features.
Extend the Windows portability fix from db3921b (which normalized the path side) to also normalize the user-supplied pattern side. A Windows user typing `node_modules\*` in `.fossa.yml` now gets the same glob as `node_modules/*`. The shared normalization is lifted into a top-level `normalizeSlashes` helper used by both `FromJSON PathFilter` and `globMatchesDir`. Also correct the glob-pattern documentation: it previously claimed `?` matches a single character, but `System.FilePattern` only implements `*` and `**` (test/Discovery/FiltersSpec.hs already asserts this). The doc now says `?` and `[...]` are matched as literals and notes that backslashes in patterns are normalized. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
System.FilePattern only implements `*` and `**` — `?` and `[...]` are matched as literal characters, not single-character wildcards or character classes (per commit 51f4f66). Routing strings containing `?` or `[` to the glob branch was therefore a no-op for matching, and on Windows it silently swallowed `parseRelDir` errors for `?` (a reserved NTFS character) by producing a glob that could never match a real path. Shrink the trigger to a simple `*`-in-string check inline. Strings with `?` or `[` now go through `parseRelDir` like any other concrete path. The two `FiltersSpec` tests that documented FilePattern's literal handling of `?`/`[]` go away — they tested an internal-engine quirk that is no longer reachable through the user-facing config. Doc + JSON schema updated to match (and the `*` description is fixed: "any sequence of characters within a single path segment", not the inaccurate "a single path segment"). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a short bulleted list under the YAML example illustrating what each pattern actually matches in a real tree (deep Go vendoring, a scoped npm package's transitive plugin tree, and a generated-proto subtree). The list also calls out that `build/generated/*` is anchored at the root and that the walker prunes the matched directory's whole subtree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Walker-level prunes from `paths.only`/`paths.exclude` short-circuit discovery before any strategy sees the excluded directory, so the user gets no log line and no "Skipping ..." trail telling them why a project they expected didn't appear in the analyze summary. The existing post-summary note even points at `fossa list-targets` as a workaround, but that command deliberately ignores all filters. Add a small `logActivePathFilters` helper invoked once at the start of `analyze` that prints the configured include/exclude paths and globs (skipping empty kinds). Output for a `.fossa.yml` containing `paths.exclude: ["**/zip/**"]` is now: [INFO] Active exclude glob filters: **/zip/** Per-prune logging would be more direct but requires propagating `Has Logger sig m` through `walkWithFilters'`, `simpleDiscover`, and every strategy's `findProjects`/`discover` (~35 files). Saving that for a follow-up; this gives the user the single piece of information needed to map a missing project back to a configured filter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Walker-level path-filter prunes were previously silent: pathFilterIntercept short-circuits before any strategy reaches the directory, and there was no log trail explaining why a project the user expected didn't appear in the analyze summary. The post-summary note even pointed at `fossa list-targets` as a workaround, which deliberately ignores all filters. Wire `Has Logger sig m` through `walkWithFilters'` and `pathFilterIntercept` so the walker can speak. Per-prune log lines fire at debug level (one per strategy, ~28 strategies = noisy at info). Add `enumeratePrunedSubtrees`, a one-shot pre-discovery walk that returns the list of subtrees the filter will reject; analyze invokes it once before strategies run and logs each pruned path at info level. Result for a `.fossa.yml` with `paths.exclude: ["**/zip/**"]`: Active exclude glob filters: **/zip/** Skipping path "zip/" (excluded by paths filter) The Has Logger ripple touches every strategy that uses walkWithFilters' (~32 single-line constraint sites, ~7 multi-line). Each carrier already provides Logger via DiscoverTaskEffs, so the change is purely a constraint propagation — no new effects, no runtime cost. Add three Walker spec tests (test/Discovery/WalkSpec.hs): - include-path filter: mirror of the existing exclude test, asserts the walker accepts ancestors + included subtree and prunes siblings. - WalkSkipSome merge: strategy returns WalkSkipSome ["a"], filter excludes "b", both should be pruned. Catches the `pathFilterIntercept`/`skipDisallowed` merge logic. - YAML-to-walker end-to-end: parses a YAML config string with a glob exclude, runs it through `collectConfigFileFilters`, executes the walker, asserts pruning. Catches "globs parse but never reach the walker" wiring regressions — the exact class of bug we hit earlier. Cannot exercise the new tests locally because the test binary's startup reads test/Container/testdata/emptypath.tar (a git-LFS pointer not materialized in this environment); CI's Linux/macOS/Windows jobs will validate. Library and test-binary builds pass with no warnings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-prune `logDebug` events from `pathFilterIntercept` were the only thing the 39-file `Has Logger sig m` ripple bought us. The user-facing value — "each pruned subtree shows up once at info" — comes from `enumeratePrunedSubtrees` + `logPrunedSubtrees` in `App.Fossa.Analyze`, and that path doesn't need the constraint propagated: `walk'` works without `Logger`, and the logging happens in `analyze` where Logger was always in scope. Revert the propagation in `walkWithFilters'` and `pathFilterIntercept` back to the prior `Applicative m, Monoid o` shape, and revert all 39 strategy files (and their `Effect.Logger` import additions) to master. Keep `enumeratePrunedSubtrees`, `logPrunedSubtrees`, and the "Active filters" startup line — they're the user-visible win. Add a no-filters short-circuit to `logPrunedSubtrees` so we don't pay for an extra walk when the user has no `paths.only`/`paths.exclude` entries configured. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sion-filters # Conflicts: # Changelog.md
CI's hlint job flagged two restricted patterns I introduced: - Text.pack should be toText (project-wide convention). - bare error is a restricted function in this codebase. Replace Text.pack with toText, and change the failure branch from error to Nothing so parse returns Maybe PathFilter. The shouldBe assertion still works: it now compares two Maybe PathFilter values, both of which are Just _ for valid input. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI's -Werror=unused-imports flagged the existing `Prettyprinter.viaShow` as unused because the same name was also imported (and used) from Effect.Logger when I added it for logPrunedSubtrees. Drop my Effect.Logger.viaShow addition; the Prettyprinter one was already in scope and is what's actually used. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the long-form description of glob semantics and the analyze visibility entry; "now accept glob patterns" plus the PR link gives readers everything they need, and full docs live in docs/references/files/fossa-yml.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WalkthroughAdds glob pattern support for 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/App/Fossa/Analyze.hs`:
- Around line 382-383: The startup logs call logActivePathFilters and
logPrunedSubtrees with the original filters value which is misleading because
discovery runs with discoveryFilters; update the two calls to use the effective
discovery filter set (discoveryFilters) instead of filters so startup "Skipping
path ..." output matches the filters actually used by discovery (use
discoveryFilters in place of filters for logActivePathFilters and
logPrunedSubtrees, taking into account that discoveryFilters may be mempty when
discovery exclusion is disabled).
In `@src/Discovery/Filters.hs`:
- Around line 101-103: The ToJSON instance for FilterCombination currently uses
toEncoding = genericToEncoding defaultOptions which emits the _combinedPathGlobs
field even when empty; change the instance to a custom ToJSON (implement either
toJSON or toEncoding) that serializes FilterCombination but omits the
_combinedPathGlobs key when it is an empty list. Locate the instance declaration
for ToJSON (FilterCombination) and replace the genericToEncoding usage with a
custom encoder that copies the generic object but conditionally excludes
_combinedPathGlobs, ensuring all other fields remain serialized unchanged.
- Around line 234-249: The include logic only tests if the current directory
itself matches an include glob (isIncludedByGlob) and therefore loses
reachability; fix by treating includedGlobs like includedPaths: compute the set
of directories matched by includedGlobs (e.g., matchedDirsFromIncludeGlobs using
globMatchesDir and combinedPathGlobs/includeFilters) and add ancestor/descendant
checks for those matched dirs (implement isParentOfIncludeGlob and
isChildOfIncludeGlob analogous to isParentOfIncludeMember and
isChildOfIncludeMember), then update isIncluded to OR in those new checks so
glob matches preserve parents and children the same way concrete includedPaths
do.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID: dd9ebaa0-88ac-435e-8dd3-8c2553b49162
📒 Files selected for processing (10)
Changelog.mddocs/references/files/fossa-yml.mddocs/references/files/fossa-yml.v3.schema.jsonsrc/App/Fossa/Analyze.hssrc/App/Fossa/Config/Common.hssrc/App/Fossa/Config/ConfigFile.hssrc/Discovery/Filters.hssrc/Discovery/Walk.hstest/Discovery/FiltersSpec.hstest/Discovery/WalkSpec.hs
…ilter scope Three issues from CodeRabbit: 1. Startup logs used `filters`, but discovery uses `discoveryFilters` (which is `mempty` under `--no-discovery-exclusion`). Compute `discoveryFilters` once near the top of `analyze` and use it for `logActivePathFilters` / `logPrunedSubtrees` so the log line agrees with what discovery actually applies. 2. `FilterCombination`'s ToJSON used `genericToEncoding defaultOptions`, which emits the new `_combinedPathGlobs` field as `[]` even when no globs are configured — so every serialized payload changes shape vs pre-glob-support runs. Replace with a hand-written `toEncoding` that omits the field when the list is empty. 3. Include globs only matched the directory itself; ancestors-on-the-way to a match and descendants-after-a-match weren't allowed. So `paths.only: ["apps/*"]` rejected `apps/`, the walker never descended, and every project under `apps/` was silently dropped. Add `isParentOfIncludedGlob` (path's segments are a prefix of the glob's literal directory prefix — segments before the first `*`) and `isChildOfIncludedGlob` (any proper ancestor of `path` matches a glob) to `pathAllowed`. Cover both with new tests in `FiltersSpec.hs`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The new "treats a leading '**' include glob as accepting any ancestor" test was failing on CI. With include `**/service/**`, the walker has no idea which subtree contains a `service/` directory and must descend everywhere on the way down. My initial pathSegmentsPrefixOf check returned False whenever the path was non-empty and the glob's literal directory prefix was empty, so the walker refused to descend at all. Fix: when the literal prefix is empty (because the glob's first segment is wildcarded — `**`, `*`, `*foo`, etc.), accept any path as a candidate ancestor. The actual match still has to fire via isIncludedByGlob when the walker reaches a real match. Costs extra walking when the user writes a leading-wildcard include, but that's the only correct behavior — there's no way to know up front where a match lives. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/Discovery/Filters.hs`:
- Around line 426-445: Replace the local isPrefixOfList implementation with the
standard Data.List.isPrefixOf: in pathSegmentsPrefixOf remove the where-bound
isPrefixOfList and change the call segs `isPrefixOfList` prefix to segs
`isPrefixOf` prefix (Data.List is already imported), leaving the rest of
pathSegmentsPrefixOf and splitSlash unchanged.
- Around line 450-460: The list comprehension in properAncestors should be
replaced with an explicit map to comply with the no-list-comprehension rule:
change the definition of properPrefixes in properAncestors to use map (e.g.
properPrefixes = map (\n -> take n segs) [1 .. length segs - 1] or
properPrefixes = map (flip take segs) [1 .. length segs - 1]) so behavior
remains identical; leave joinSegments and the subsequent parseRelDir mapping
unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID: fac27e52-664c-4c8a-b0c6-5f29a4287a43
📒 Files selected for processing (3)
src/App/Fossa/Analyze.hssrc/Discovery/Filters.hstest/Discovery/FiltersSpec.hs
Two style fixes from CodeRabbit's second review:
1. `pathSegmentsPrefixOf`'s local `isPrefixOfList` was a hand-rolled
reimplementation of `Data.List.isPrefixOf`. Drop the where-clause
and import the standard one (Data.List was already imported).
2. `properAncestors` used a list comprehension to build prefix lists,
which the project's coding guidelines disallow ("avoid partial
functions, list comprehensions, and match guards"). Replace with
`map (`take` segs) [1 .. length segs - 1]`.
Both behavior-equivalent.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| parseJSON = withText "PathFilter" $ \txt -> | ||
| let s = toString txt | ||
| in if '*' `elem` s | ||
| then pure . PathFilterGlob . Glob.unsafeGlobRel $ normalizeSlashes s |
There was a problem hiding this comment.
I think we should try to use Glob.toGlob instead. It looks like the unsafe version is currently only used in tests.
| -- the literal directory part. | ||
| null prefix || segs `isPrefixOf` prefix | ||
| where | ||
| splitSlash :: String -> [String] |
There was a problem hiding this comment.
Sorta a vague comment, but is it possible to make these methods not convert paths to strings? I feel like it could be more robust to keep the paths as Paths, but I'm not sure. Feel free to ignore if this suggestion doesn't work.
| pathAllowed filters $(mkRelDir "anywhere") `shouldBe` True | ||
| pathAllowed filters $(mkRelDir "anywhere/else") `shouldBe` True | ||
| pathAllowed filters $(mkRelDir "deep/nested/service/foo") `shouldBe` True | ||
|
|
There was a problem hiding this comment.
This behaviour seems wrong to me. Shouldn't anywhere and anywhere/else be not allowed?
There was a problem hiding this comment.
I think you might be reading this wrong? Shouldn't **/service/** only catch things with a service path in them? Or are you insinuating that the ** prefix should omit all directories
There was a problem hiding this comment.
Shouldn't
**/service/**only catch things with a service path in them?
I think that's correct, which means I think the assertions should be:
pathAllowed filters $(mkRelDir "anywhere") `shouldBe` False
pathAllowed filters $(mkRelDir "anywhere/else") `shouldBe` False
pathAllowed filters $(mkRelDir "deep/nested/service/foo") `shouldBe` True
because the first two don't contain service in the path.
Reviewer flagged the startup "Active <kind> filters: ..." log lines as redundant with the user's own .fossa.yml — and at info level, more noise than signal. The actually-useful info is "what got pruned in this run," which logPrunedSubtrees already provides. Remove logActivePathFilters and its call site, plus the imports it was the only consumer of (Data.Glob.unGlob, Data.Text, Text.intercalate). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per review feedback. The original `case reverse s of '/' : rest -> reverse rest` walks the list twice and allocates an intermediate reversed copy. Replacing with an unsnoc-based version traverses once. Project supports `base >= 4.15` and `Data.List.unsnoc` is base-4.19+ (GHC 9.8), so I can't import it directly. Inline a tiny helper that matches the stdlib implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Overview
Allow
paths.only/paths.excludein.fossa.ymlto accept glob patterns. An entry containing*is a glob; otherwise it keeps the existing "directory and all children" semantics, so the change is backward-compatible.Glob matching reuses
Data.GloboverSystem.FilePattern(already a dependency).Acceptance criteria
New user config
These directories get pruned during analyze.
At startup,
fossa analyzenow logs all of the active filters and any pruned subtrees once, so users can correlate a missing project with a configured rule.Testing plan
test/Discovery/FiltersSpec.hscover glob/concrete include + exclude composition, root-anchoring, trailing-slash normalization, and backslash-pattern parsing for Windows.test/Discovery/WalkSpec.hscover include-path pruning, the strategyWalkSkipSome↔ filter merge, and end-to-end YAML →collectConfigFileFilters→ walker pruning..fossa.ymlcontainingpaths.exclude: ["**/zip/**"]against the fossa-cli checkout: pruned subtree shows up once, no per-strategy spam.test/Container/testdata/emptypath.tar) that isn't materialized here; CI exercises the suite.Risks
Path.toStringonPath Rel Dirappends/, whichSystem.FilePatternrejects for patterns likenode_modules/*.globMatchesDirstrips it locally rather than touchingData.Glob.matches(shared with Node and Walk).FilterCombinationgets a third field for globs; telemetry JSON is only different when globs are configured.logPrunedSubtreesdoes one extra walk, gated on at least one path filter being set; it skips into pruned subtrees, so work is bounded by the kept tree.Metrics
Not currently tracked.
References
Checklist
docs/references/files/fossa-yml.md).Changelog.md.docs/references/files/fossa-yml.v3.schema.json.