Skip to content

Allow overriding MCP server instructions (e.g. .codegraph/instructions.md) — built-in text overstates explore/edge reliability on multi-language repos #765

@monochrome3694

Description

@monochrome3694

Context

Since #529 the MCP initialize instructions in mcp/server-instructions.ts are the single source of truth for how agents use the toolset — which is great, but they're a hardcoded string with no override mechanism (no config field, no env var, no per-project file). Whatever they claim, every agent in every workspace receives verbatim.

Problem: the built-in text overclaims on multi-language workspaces

We ran head-to-head comparisons (codegraph vs grep + file reads, checked against ground truth) on a large mixed workspace (~1,100 Swift files + TypeScript/React + Python; ~17k nodes after excluding vendored deps). Results:

What matched the instructions' claims:

  • codegraph_callers / codegraph_callees: excellent — accurate, skip comment mentions, name the enclosing function. Genuinely better than grep.
  • codegraph_explore with a bag of known symbol names: reliable, returned the correct call chain + verbatim source in one call.

What didn't:

  • codegraph_explore with a natural-language question: the right symbol was usually present but buried under keyword-matched noise from unrelated files/languages.
  • Cross-file edges conflate same-named symbols. Examples we hit: a Swift method named matches reported as called from an unrelated package's phonemizer code (different matches); React render/Layout edges surfacing in the relationships section of a Swift-only query. This is expected from tree-sitter name-matching (the Limitations section even hints at it), but…
  • …the instructions actively tell agents the opposite: "Trust codegraph's results — don't re-verify them with grep… They come from a full AST parse", "PRIMARY — call FIRST… most often the ONLY call you need". An agent following that takes name-collision edges at face value and builds conclusions on them. (Similar in spirit to codegraph_context task parameter description misleads models into passing natural language instead of keywords #571 — instruction text shaping agent behavior in the wrong direction.)
  • Minor: the "no covering tests found" annotation produced false negatives for symbols that do have covering tests.

On a single-language repo the built-in text is probably fine — the issue is that one fixed string can't be right for every workspace shape.

Ask

A supported override/extension point, e.g.:

  1. .codegraph/instructions.md — if present, replace (or append to) the built-in instructions for that project. Fits the existing zero-config philosophy (like the .gitignore-based exclusion lever) and .codegraph/ is already per-project.
  2. Or a global equivalent (~/.codegraph/instructions.md), or both with project taking precedence.

Smaller alternative if an override is unwanted: soften the built-in wording — drop "don't re-verify with grep" / "full AST parse" (it's tree-sitter name-matching across files, not type-resolved), and add a caution that cross-file edges between same-named symbols need confirmation in source, especially in multi-language repos.

Workaround we use today

Patching dist/mcp/server-instructions.js in the installed package with calibrated text. Works (verified via the initialize handshake) but silently reverts on every update — exactly the kind of thing a sanctioned override file would fix. Happy to share our calibrated instruction text as a starting point if useful.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions