"Tell me, O Muse, of that ingenious hero who travelled far and wide..."
Version: 0.2.0-spec
Status: Implementation Specification
Language: Rust
License: MIT
Homer is a CLI tool and library that mines git repositories — commits, diffs, PRs, issues, call graphs, import graphs, documentation, and AI agent interactions — and produces artifacts that accelerate agentic development: CLAUDE.md/AGENTS.md files, module context maps, change pattern skills, reverse-engineered specifications, and risk maps.
Homer combines four disciplines that have never been unified:
- Behavioral code analysis (Adam Tornhill / CodeScene): mining git history for hotspots, change coupling, team patterns
- Structural graph analysis: call graphs, import graphs, centrality metrics (PageRank, betweenness, HITS), community detection
- LLM-powered semantic extraction: intent summarization, design rationale extraction, convention detection
- Documentation and agent interaction mining: extracting knowledge from docs, doc comments, AI agent sessions, and curated agent context files
The unique insight: salience ≠ change frequency. A utility module half the codebase depends on but untouched for a year is invisible to behavioral analysis but arguably the most important thing for an agent to understand deeply (blast radius if changed is enormous). Homer surfaces these "quiescent high-centrality" nodes through graph analysis, then enriches them with semantic understanding.
Homer tells the story of a project. Like the poet who preserved civilization's knowledge through structured oral tradition, Homer preserves the accumulated wisdom of a codebase — its architecture, its patterns, its decisions, its evolution — in forms that both humans and AI agents can consume.
The output artifacts are epics: compressed, structured narratives of how a codebase came to be, what it is, and what it intends.
| Document | Description |
|---|---|
| ARCHITECTURE.md | System architecture, pipeline, data flow, crate structure |
| STORE.md | Hypergraph data model, SQLite schema, incrementality protocol |
| EXTRACTORS.md | Git history, GitHub API, structure, graph, document, and prompt extraction |
| GRAPH_ENGINE.md | Stack graphs evolution, tree-sitter, language support tiers |
| ANALYZERS.md | Behavioral, centrality, temporal, semantic, convention, and task pattern analysis |
| RENDERERS.md | All output artifact formats and generation logic |
| CLI.md | Command interface, configuration, MCP server surface |
| PERFORMANCE.md | Parallelism, memory, caching, benchmarking strategy |
| EVOLUTION.md | Plugin system, schema versioning, extensibility design, technology evaluations |
| INTEGRATIONS.md | Topos, Ananke, Loop — parallel project connections |
| REFERENCES.md | All citations, prior art, research sources |
| Decision | Choice | Rationale |
|---|---|---|
| Language | Rust throughout | Performance, tree-sitter native, single binary deployment |
| Storage | SQLite via rusqlite, trait-abstracted |
Zero deployment complexity, portable, upgradeable to libSQL |
| Graph extraction | Forked stack-graphs + tree-sitter fallback | Precise name resolution where available, breadth elsewhere |
| Graph algorithms | petgraph in-memory |
Mature Rust library, load from SQLite → compute → write back |
| LLM integration | HTTP API (reqwest + serde) | Provider-agnostic, no Python dependency |
| Incremental computation | Salsa-inspired memoization | Only recompute what changed |
| Parallelism | Rayon (CPU) + Tokio (I/O) | Parse/analyze in parallel, async for network |
| Serialization | serde ecosystem (defer facet.rs to post-v1) | Ecosystem maturity; facet.rs promising but too volatile |
| Doc comments | Metadata on code nodes, not separate nodes | Avoids graph inflation; doc comments are attributes of code |
| Prompt mining | Opt-in with privacy controls | Sensitive data requires explicit consent and redaction |
What Homer produces that no existing tool does:
-
Graph-theoretic salience + temporal evolution: PageRank on call graphs tracked over time, correlated with behavioral signals. Surfaces insights like "module X became 3× more central over 6 months but has zero tests and one contributor."
-
Hyperedge-native co-change analysis: Not "files A and B correlate" but "files {A, B, C, D} form a change set that moves together 87% of the time."
-
Agent-consumable artifacts from deep analysis: CLAUDE.md that tells agents which functions are load-bearing, which areas are fragile, what naming conventions actually are (not what someone wished), and what change patterns look like for common tasks.
-
Reverse-engineered specifications: Derives Topos-formatted specs from accumulated history — closing the loop for brownfield codebases that have code but no specification.
-
Documentation intelligence: Correlates doc comments with graph centrality — a high-PageRank function with a good doc comment can skip LLM summarization, while a high-PageRank function with no doc comment is a risk flag. Detects documentation staleness and coverage gaps.
-
Agent interaction mining: Extracts knowledge from AI coding agent sessions (Claude Code, Cursor, Windsurf, Cline) — task patterns, correction hotspots, domain vocabulary mappings, and areas that repeatedly confuse agents. This is a record of the human-codebase interface that exists nowhere else.
- Homer is not an IDE plugin (though it exposes MCP tools that IDEs can consume)
- Homer is not a code generation tool (it produces context for generators)
- Homer is not a formal verification system (see Ananke for that direction)
- Homer is not a real-time system (it runs on-demand or in CI, not in-editor)
- Homer does not require LLM access for basic functionality (LLM is the enrichment layer, not the core)
- Homer does not store raw source code in its database (source is read on-demand from the git repo)
- Homer does not store raw prompt text by default (privacy-first; only structured metadata is retained unless explicitly opted in)
# First-time analysis of a repository
cd my-project
homer init
# Incremental update after new commits
homer update
# Generate artifacts
homer render --format agents-md # → AGENTS.md
homer render --format module-ctx # → per-directory context files
homer render --format skills # → Claude Code skills
homer render --format spec # → reverse-engineered .tps spec
homer render --format report # → HTML report with visualizations
homer render --all # → everything
# Query the knowledge base
homer query src/auth/validate.rs # What Homer knows about this file
homer graph --metric pagerank # Top functions by PageRank
homer diff v1.0..v2.0 # Architectural changes between releases
# Start MCP server for agent integration
homer serve