Skip to content

fix: handle private GitHub/GitLab repos via OAuth + reshape repo-string syntax#46

Merged
Minitour merged 5 commits intomainfrom
fix/private-repo-access-and-repo-string-syntax
May 9, 2026
Merged

fix: handle private GitHub/GitLab repos via OAuth + reshape repo-string syntax#46
Minitour merged 5 commits intomainfrom
fix/private-repo-access-and-repo-string-syntax

Conversation

@Minitour
Copy link
Copy Markdown
Contributor

@Minitour Minitour commented May 9, 2026

Summary

Two intertwined fixes:

  1. Private GitHub/GitLab repos now work for rules and agent snippets.
    Capa was issuing direct HTTP GETs to raw GitHub/GitLab URLs when fetching rule files and agent-instruction snippets. For private repos behind SSO this silently returned a text/html login page (HTTP 200), and capa then wrote that HTML straight into the user's rule files and AGENTS.md — corrupting them with what looked like real markdown to the file watcher but turned out to be a login page on inspection. The clone-and-cache code path that already worked correctly for skills was simply never reused for these sibling features. Now it is.

  2. Repo-string grammar is no longer ambiguous.
    Previously @ had two opposite meanings depending on context:

    • For skills, the right-hand side was a basename (capa searched the cloned repo recursively for a directory matching it).
    • For rules / agent snippets, the right-hand side was an exact file path from the repo root.

    Same syntax, opposite resolution semantics, no error when users guessed wrong. Going forward there are two explicit forms, both valid for skills, rules, and snippets:

    Form Right-hand side Resolution
    owner/repo@<name> basename, no slashes allowed recursive search through the cloned snapshot
    owner/repo::<path> exact path from repo root direct lookup, fails if the file/dir isn't there

    Both still accept :version (tag/branch) and #sha (commit SHA) suffixes for pinning.

What changed

New module: src/shared/repo-file.ts

  • fetchRepoFile(platform, repoString, getRepoSnapshot, authFetch, opts) — clone the containing repo (with OAuth) and read the file off the snapshot. Branches on mode:
    • exact → direct path lookup, friendly error if missing
    • search → recursive walk for files whose basename matches; helpful errors with candidate filenames on miss and full match list on ambiguity (Tip: Use "owner/repo::<exact-path>" to disambiguate.)
  • fetchTextFile(url, opts)fetch wrapper that adds OAuth headers when an AuthenticatedFetch is provided, and rejects HTML responses (looksLikeHtmlPage(body, contentType)) so the SSO-login-page bug can never happen silently again.

Reshaped: src/shared/repo-string.ts

  • ParsedRepo gained a mode: 'search' | 'exact' discriminator and a target field (with a non-enumerable filepath accessor for back-compat).
  • parseRepoString recognizes :: first, then @. Rejects @ references whose target contains /, suggesting the :: form in the error message. Empty targets and missing owner/repo produce specific errors.
  • buildRawUrl requires mode === 'exact' (raw URLs need a known path) and ships a JSDoc explicitly steering callers to fetchRepoFile for the private-repo case.

Updated callers

  • installCommand — wires repoFetchAuth (AuthenticatedFetch) and repoFetchCtx through to installAgentsFile and uses fetchRepoFile / fetchTextFile for the rules loop.
  • installAgentsFile — base + additional snippets resolved via fetchRepoFile. A new detectRepoCoordsFromRawUrl helper auto-detects https://raw.githubusercontent.com/... and https://gitlab.com/.../-/raw/... URLs and re-routes them through the clone path so users who paste a raw URL still get OAuth-authenticated fetches. Detected URLs always emit the :: form.
  • Skill installation in install.ts — replaced the ad-hoc split(/[:#]/) + split('@') parser (which silently broke on :: because of the double colon) with parseRepoString. Added an exact-path branch that looks up <snapshotDir>/<target>/SKILL.md directly. The recursive-search branch now produces an error message that points users at the :: form when the basename collides or is ambiguous.
  • capa add — accepts both owner/repo::path/to/skill and gitlab:group/.../repo::path/to/skill, with optional :version / #sha. Help text rewritten with a two-grammar table and a "when to use which" section.

Docs

  • capabilities-schema.md — new "Repo string format (@ vs ::)" section with a side-by-side table and explicit usage guidance. Rules section gains worked examples for both grammars.
  • commands.mdcapa add doc lists both grammars with examples.
  • workflows-and-examples.md — fixed an existing rule example that used owner/repo@rules/typescript.md:v2.0.0 (now invalid because @ rejects slashes) and converted it to the :: form.
  • src/types/capabilities.ts — JSDoc updated to describe both grammars.

Migration

Existing capabilities files that referenced rules / agent snippets via owner/repo@some/path/file.md will now fail loudly at install time with an error message that points the user at the equivalent owner/repo::some/path/file.md form. No silent behavior change — the error is the migration prompt.

Test plan

  • bun test — 539 pass / 0 fail (29 new tests added)
  • bunx tsc --noEmit — clean
  • New src/shared/__tests__/repo-string.test.ts (18 tests) — both grammars, all pinning suffixes, subgroup repos, slash-rejection, missing-target / missing-owner errors, buildRawUrl integration, back-compat filepath accessor
  • Extended src/shared/__tests__/repo-file.test.ts — fixture grew an a/dup.md + b/dup.md collision pair; new tests cover unique-match search, no-match (with candidate hints), ambiguous-match (with both paths listed), and exact-path missing-file errors
  • Extended src/cli/utils/__tests__/agents-file.test.ts — every detector test expects ::, plus a round-trip sanity test that confirms the emitted strings parse back as mode === 'exact'
  • Extended src/cli/commands/__tests__/add.test.ts — new GitHub :: and GitLab :: describe blocks (5 tests) including pinning; updated help-text assertion
  • Manual: install rules from a private GitLab subgroup repo with OAuth configured, confirm the file content matches the source markdown (not an HTML login page).

Minitour added 4 commits May 9, 2026 20:13
…ng syntax

Capa was issuing direct HTTP GETs to raw GitHub/GitLab URLs when fetching
rule files and agent-instruction snippets. For private repos behind SSO
this silently returned a `text/html` login page (HTTP 200), and capa
wrote that HTML straight into rule and AGENTS.md files. The code path
that already handled `git clone` for skills was never reused for these
sibling features.

Two fixes, one PR:

1. Route every github/gitlab typed reference through the existing
   snapshot resolver (git clone with stored OAuth tokens), so private
   repos work the same way for skills, rules, and agent snippets.
   - New `src/shared/repo-file.ts`: `fetchRepoFile` (clone-and-read)
     and `fetchTextFile` (auth-aware raw URL fetch with HTML-response
     detection so we fail loudly instead of silently embedding a
     login page).
   - `installCommand` now passes a shared auth + snapshot context to
     `installAgentsFile` and uses `fetchRepoFile` / `fetchTextFile`
     for the rules loop.
   - `installAgentsFile` resolves base + additional snippets through
     the same helpers; raw URLs that look like github/gitlab raw paths
     are auto-detected and re-routed through the clone path so users
     who paste a raw URL still get OAuth-authenticated fetches.

2. Make the repo-string grammar unambiguous. Previously `@` was
   overloaded: skills used the right-hand side as a basename to search
   for, while rules/snippets used it as an exact file path. Same
   syntax, opposite resolution semantics, no error when users guessed
   wrong.

   Now there are two explicit forms:
     - `owner/repo@<name>`   recursive search by basename
     - `owner/repo::<path>`  exact path inside the repo

   Both accept `:version` (tag/branch) and `#sha` (commit) suffixes.
   `parseRepoString` rejects `@` references with slashes and points
   the user at the `::` form in the error message.

   The `@` form is now valid for rules / agent snippets too
   (recursive file lookup with disambiguation errors), and the `::`
   form is now valid for skills (exact directory lookup). The two
   features are symmetric.

Other surface changes:
- `capa add` accepts `owner/repo::path/to/skill` and the gitlab
  variant, with optional `:version` / `#sha`.
- `detectRepoCoordsFromRawUrl` emits the `::` form (raw URLs always
  know an exact path, so search would be the wrong default).
- Repo-string docs in `capabilities-schema.md` get a side-by-side
  table of the two grammars and explicit guidance on when to use
  each. Rules section gains worked examples for both.
- `buildRawUrl` only accepts `mode === 'exact'` and ships a JSDoc
  pointing at `fetchRepoFile` for the private-repo case, since the
  raw URL path is the original source of the bug.

Tests: 539 pass, 0 fail (29 new). New `repo-string.test.ts` covers
both grammars and every error path; `repo-file.test.ts` exercises
`@` search with unique-match, no-match (with candidate hints), and
ambiguous-match (with disambiguation hint) cases against a local git
fixture; `agents-file.test.ts` round-trip-tests that translated raw
URLs always parse back as exact paths.
GitHub's "Raw" button now generates URLs in the form
  https://raw.githubusercontent.com/<owner>/<repo>/refs/heads/<branch>/<path>
(and the equivalent /refs/tags/<tag>/ form). Both forms work when fetched
directly, but `detectRepoCoordsFromRawUrl` greedily took the first segment
after the repo as the ref, so it produced bogus repo strings like:

  owner/repo::heads/main/examples/foo.md:refs

which then failed the downstream `git clone` step with
"Repository not accessible" because there's no such ref as `refs`.

Fix: extract a `splitGithubRefAndPath` helper that recognizes
`refs/heads/<branch>/...` and `refs/tags/<tag>/...` prefixes and pulls
the actual ref out of the third segment. Apply it to both the
`raw.githubusercontent.com` and `github.com /raw/` branches. GitLab
raw URLs are unaffected (they use `/-/raw/<ref>/<path>` directly).

Tests: added 5 new cases ??? bare branch (HEAD-equivalent), non-default
branch, tag, github.com /raw/refs/heads/ form, and a regression test
matching the exact URL shape from the bug report. Round-trip test was
extended to include all three new shapes.
…nfig

Two related rule-cleanup bugs:

1. `capa clean` did not remove installed rule files. The cleanup call
   was gated on `capabilities.rules.length > 0`, so once the user
   commented out (or removed) all their rules and ran clean, the
   previously-installed `.cursor/rules/*.mdc` (or equivalent) files
   were left orphaned.

2. `capa install` did not remove a rule that was commented out since
   the previous install. The install loop only writes the *current*
   rules ??? there was no diff-and-prune step against what existed
   before ??? so removing one rule from the config left its file behind
   on disk indefinitely.

Both share the same root cause: rule cleanup was driven by the *current*
state of `capabilities.rules` rather than by what capa had previously
installed. Worse, rule files for directory-based providers (Cursor,
Copilot, Windsurf) were written without being registered in the
managed-files DB at all, so even when cleanup logic wanted to find
them, there was no persistent record of which files belonged to capa.

Fix:

- `installRules` now accepts an `onFileWritten` callback that fires
  once per rule file written by a directory-based provider. `install.ts`
  uses it to register every rule file in the managed-files DB, mirroring
  how skills are tracked.

- New `pruneRules(projectPath, providers, currentRules, previouslyManagedFiles)`
  in `rules-installer.ts` brings on-disk rules state in sync with the
  capabilities file:
  * For directory-based providers, iterates the previously-managed file
    list and deletes any file inside the provider's rules dir whose
    basename doesn't correspond to a current rule for that provider.
    User-authored files in the same dir are never touched because we
    only consider files capa explicitly registered.
  * For instruction-folded providers, scans the instruction file for
    `<!-- capa:start:rule:<id> -->` blocks and removes ones not in the
    current rules set. Marker blocks are self-tracking, so no DB lookup
    is needed.
  * Per-rule `providers:` filtering is honored ??? a rule restricted to
    one provider counts as "absent" from the others' perspective.
  * Returns the list of removed file paths so the caller can drop them
    from the managed-files DB.

- `install.ts` always runs `pruneRules` at step 3.6, even when the
  current rules array is empty. Commenting out a rule and re-running
  install now removes its file/marker block.

- `clean.ts` no longer gates the `cleanRules` call on
  `capabilities.rules.length > 0`. With rule files now registered in
  the managed-files DB, the existing managed-files loop already takes
  care of deleting orphan rule files; the unconditional `cleanRules`
  call additionally strips `<!-- capa:start:rule:* -->` blocks from
  instruction-folded providers' AGENTS.md / CLAUDE.md.

Tests: 553 pass (9 new):
- onFileWritten fires exactly once per directory-based rule, and
  never for instruction-folded providers
- pruneRules removes a single dropped rule
- pruneRules removes ALL rule files when rules section is emptied
- pruneRules never deletes user-authored files in the rules dir
- pruneRules removes the marker block for a removed rule but keeps
  current ones
- pruneRules removes ALL marker blocks when rules section is emptied
- pruneRules respects per-rule provider filtering
- pruneRules ignores managed paths outside the provider rules dir
  (regression guard against false deletions of skill directories)
…ules

`RulesList` already painted source-type badges (gitlab=orange,
github=green, etc.), but `SkillsList` rendered them with the neutral
`bg-bg-secondary` style ??? so the same `gitlab` source looked orange in
one section and gray in another. Confusing at a glance and inconsistent
with how the rest of the UI treats source provenance.

Extracted the color map into a shared `sourceTypeColors.ts` helper and
wired both lists through `sourceTypeBadgeClasses(type)`. Adding a new
component that surfaces a skill/rule/snippet source type from now on
means importing one helper instead of duplicating the literal classes.

Color map (kept the existing rule colors, added two for skill-only types):

  inline    -> blue
  remote    -> purple
  github    -> green
  gitlab    -> orange
  local     -> slate    (skills only ??? content lives on disk)
  installed -> amber    (skills only ??? user installed it elsewhere)

Unknown / future types fall back to the neutral classes so they still
render legibly.

Verified the new slate / amber utilities make it through Tailwind v4's
source-scan into the bundled CSS.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens and unifies how capa resolves GitHub/GitLab-backed rules and agent snippets by routing them through the existing clone-and-cache (OAuth-capable) path, and introduces an unambiguous repo-string grammar (@ basename search vs :: exact path) across skills/rules/snippets. It also adds rule-artifact pruning support and consolidates source-type badge styling in the web UI.

Changes:

  • Add fetchRepoFile() (clone + snapshot read) and fetchTextFile() (auth-aware raw fetch that rejects HTML login pages) to prevent silent SSO/HTML corruption.
  • Reshape repo-string parsing to support owner/repo@<basename> (search) and owner/repo::<path> (exact) consistently, updating CLI/docs/tests accordingly.
  • Track and prune rule artifacts (managed rule files + instruction marker blocks) when rules are removed from capabilities.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
web-ui/src/features/projects/components/sourceTypeColors.ts Centralizes badge color mapping for skill/rule source types.
web-ui/src/features/projects/components/SkillsList.tsx Uses shared badge color helper for skill type labels.
web-ui/src/features/projects/components/RulesList.tsx Uses shared badge color helper; removes inline color map.
src/types/capabilities.ts Updates JSDoc to document the two repo-string grammars.
src/shared/repo-string.ts Adds mode + target, supports :: exact paths and validates @ slashes.
src/shared/repo-file.ts New shared helpers for snapshot-based file reads + safe raw fetch behavior.
src/shared/providers/tests/registry.test.ts Adds coverage for rule managed-file registration and pruning behavior.
src/shared/tests/repo-string.test.ts Adds tests for parsing both grammars and buildRawUrl constraints.
src/shared/tests/repo-file.test.ts Adds tests for repo snapshot file lookup (exact/search) + HTML rejection.
src/cli/utils/rules-installer.ts Adds onFileWritten hook and new pruneRules() implementation.
src/cli/utils/agents-file.ts Routes GitHub/GitLab snippet/base fetching through snapshots; adds raw-URL detector.
src/cli/utils/tests/agents-file.test.ts Tests raw-URL detection and :: emission/round-trip parsing.
src/cli/commands/install.ts Uses new fetch helpers for rules/snippets; installs + prunes rule artifacts; updates skill repo parsing.
src/cli/commands/clean.ts Always cleans rule markers (even when rules list is empty).
src/cli/commands/add.ts Extends capa add parsing + help text to support :: exact-path grammar.
src/cli/commands/tests/add.test.ts Adds tests for :: parsing and updated help output expectations.
skills/capabilities-manager/references/workflows-and-examples.md Migrates a rule example to :: form and explains separator semantics.
skills/capabilities-manager/references/commands.md Updates capa add docs to show both grammars and pinning.
skills/capabilities-manager/references/capabilities-schema.md Adds repo-string format section and updates rules/snippets guidance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/shared/repo-file.ts
Comment thread src/shared/repo-file.ts Outdated
Comment thread src/shared/repo-file.ts
Comment thread src/shared/repo-string.ts
Comment thread src/cli/commands/install.ts
Comment thread src/cli/utils/agents-file.ts
Comment thread src/cli/utils/agents-file.ts Outdated
- Reject path-traversal in :: exact-path resolution
  Both fetchRepoFile and the skill installer now route the user-supplied
  target through a shared assertSafeRepoPath() guard that rejects
  ../ segments, leading slashes/backslashes, and drive-letter prefixes
  before joining with snapshotDir, closing the read-arbitrary-file
  vector Copilot flagged on `owner/repo::../../etc/passwd`-style
  strings.

- Reject empty :version / #sha suffixes in parseRepoString
  Inputs like `owner/repo::path.md:` or `...path.md#` now throw with a
  targeted message instead of silently producing version=''/sha='' which
  later corrupted snapshot resolution and raw URLs (`//path`).

- Document and percent-decode multi-segment refs in splitGithubRefAndPath
  GitHub raw URLs are genuinely ambiguous when a branch contains `/`
  (the ref-vs-path boundary is unknowable without an API call). Added a
  thorough JSDoc note on the limitation, percent-decode the ref segment
  so `feature%2Ffoo` round-trips correctly, and pinned the
  literal-multi-segment behavior with a regression test so any future
  change is intentional.

- Cosmetic fixes from the same review:
  * HTML-rejection error: "agents.basefrom" -> "agents.base from"
    (missing space between label and "from")
  * findFilesByBasename docstring no longer claims to skip dotfiles
    (it intentionally traverses .cursor / .github / .agents)
  * resolveRepoSnippet log now prints the original repo string so ::
    form references aren't rendered as @Form during install

Tests: +8 cases covering the path-traversal guard (POSIX absolute,
parent segments, backslash-prefix, drive-letter via the helper),
empty version/sha rejection, and the documented multi-segment-ref
behavior.
@Minitour Minitour merged commit f2cc3fe into main May 9, 2026
4 checks passed
@Minitour Minitour deleted the fix/private-repo-access-and-repo-string-syntax branch May 9, 2026 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants