The Brain of HackingTheRepo — A pure ML/AI engine that understands, plans, and rewrites code repositories on demand.
RepoMind is the standalone ML core that powers HackingTheRepo — a web-based platform where users describe code changes in plain English and an AI agent clones the target repository, implements the changes, and opens a Pull Request automatically.
RepoMind lives as a separate, independently deployable ML service. It is intentionally decoupled from the web frontend so it can be iterated on, tested, and scaled without touching the UI layer. Once mature, RepoMind is consumed by the HackingTheRepo backend over a simple REST API.
RepoMind is responsible for:
- Receiving a natural-language change instruction and a repository URL
- Cloning the repository and deeply understanding its structure
- Planning a step-by-step set of code edits using an LLM-powered agent
- Executing those edits via code generation tools
- Producing a diff and pushing a Pull Request back to GitHub
┌─────────────────────────────────────────────────┐
│ HackingTheRepo (Web Platform) │
│ User → describes change → selects repo → done │
│ │
│ Frontend (React/Next.js) │
│ │ │
│ Backend (Node / Django / etc.) │
│ │ │
│ │ POST /run (REST API call) │
│ ▼ │
│ ┌─────────────────────────────────┐ │
│ │ RepoMind (this repo) │ │
│ │ FastAPI · LangChain · GitHub │ │
│ └─────────────────────────────────┘ │
│ │ │
│ ▼ │
│ GitHub PR created automatically │
└─────────────────────────────────────────────────┘
- Agentic Planning — A LangChain-powered planner breaks any free-text instruction into an ordered list of atomic code edits before touching a single file.
- Context-Aware Code Generation — The agent reads and reasons over the entire repo before generating new code, respecting existing patterns and style.
- GitHub Integration — Clones repos, creates branches, commits changes, and opens PRs fully programmatically via the GitHub API.
- Diff Generation — Every proposed change is surfaced as a human-readable diff before being committed.
- Memory Layer — Conversation/session memory lets users refine and iterate on the same repo without losing context.
- FastAPI Service — Ships as a lightweight HTTP service so HackingTheRepo (or any other consumer) can call it with a simple POST request.
- Prompt Library — All LLM prompts are first-class citizens living in version-controlled Python files, not buried in application logic.
- Fully Tested & Dockerised — Comes with agent, tool, and API-level test suites plus a production-ready Dockerfile.
| Layer | Technology |
|---|---|
| Agent Framework | LangChain / LlamaIndex |
| LLM Backend | OpenAI GPT-4o (configurable) |
| API Server | FastAPI + Uvicorn |
| GitHub Integration | PyGitHub + GitPython |
| Code Parsing | Tree-sitter / AST |
| Config Management | Pydantic Settings |
| Testing | Pytest |
| Containerisation | Docker |
| CI | GitHub Actions |
RepoMind/ ← Project root
│
├── agent/ ← 🧠 Core ML logic (LangChain agent)
│ ├── chain.py ← Main LangChain chain definition; wires LLM + memory + tools
│ ├── planner.py ← Decomposes the user's instruction into ordered edit steps
│ ├── executor.py ← Runs each planned step, calling the right tools in sequence
│ └── memory.py ← Manages conversation / session memory across turns
│
├── tools/ ← 🔧 Agent-callable tool implementations
│ ├── github_tool.py ← Clone repo, create branch, push commits, open PR via GitHub API
│ ├── code_parser.py ← Parse repository files into AST/tree-sitter structures the LLM can reason about
│ ├── diff_generator.py ← Produce human-readable diffs from before/after file states
│ └── pr_tool.py ← Compose PR title, body, and labels; submit the pull request
│
├── prompts/ ← 💬 All LLM prompt templates (version-controlled)
│ ├── system_prompt.py ← Global system persona: "You are RepoMind, an expert software engineer…"
│ ├── code_gen_prompt.py ← Few-shot template for generating code edits given a file + instruction
│ └── pr_description.py ← Template that turns a list of diffs into a clear PR description
│
├── api/ ← 🌐 FastAPI HTTP service (consumed by HackingTheRepo backend)
│ ├── main.py ← FastAPI app entry-point; mounts router, configures middleware
│ ├── routes.py ← POST /run, GET /status/{job_id}, POST /refine endpoints
│ └── schemas.py ← Pydantic request/response models (RunRequest, JobStatus, etc.)
│
├── tests/ ← ✅ Test suite
│ ├── test_agent.py ← Unit tests for chain, planner, and executor logic
│ ├── test_tools.py ← Tests for github_tool, code_parser, diff_generator, pr_tool
│ └── test_api.py ← Integration tests against the FastAPI routes
│
├── config/ ← ⚙️ Configuration & environment
│ ├── settings.py ← Pydantic BaseSettings: reads env vars, sets defaults
│ └── .env.example ← Template listing all required environment variables
│
├── README.md ← You are here
├── Dockerfile ← Production Docker image for the RepoMind service
├── .github/workflows/ ← GitHub Actions CI pipeline (lint → test → build → push)
└── pyproject.toml ← Project metadata, dependencies, and tool config (black, ruff, pytest)
- Python 3.11+
- A GitHub Personal Access Token (PAT) with
reposcope - An OpenAI API key (or whichever LLM backend you configure)
- Docker (optional, for containerised deployment)
git clone https://github.com/your-org/repomind.git
cd repomindpip install -e ".[dev]"
# or
pip install -r requirements.txtcp config/.env.example .envOpen .env and fill in the required values:
OPENAI_API_KEY=sk-...
GITHUB_TOKEN=ghp_...
GITHUB_USERNAME=your-github-username
LLM_MODEL=gpt-4o
MAX_PLAN_STEPS=15
LOG_LEVEL=INFOuvicorn api.main:app --reload --port 8000The API is now live at http://localhost:8000. Visit http://localhost:8000/docs for the interactive Swagger UI.
docker build -t repomind .
docker run -p 8000:8000 --env-file .env repomindKick off a new agent job.
Request body:
{
"repo_url": "https://github.com/owner/repo",
"instruction": "Refactor all database calls in src/db/ to use async/await",
"branch_name": "repomind/async-db-refactor",
"pr_title": "refactor: async database calls"
}Response:
{
"job_id": "a1b2c3d4",
"status": "running"
}Poll the status of a running job.
Response:
{
"job_id": "a1b2c3d4",
"status": "completed",
"pr_url": "https://github.com/owner/repo/pull/42",
"diff_summary": "Modified 6 files, added 120 lines, removed 95 lines"
}Send a follow-up instruction on an existing job to iterate on the same PR.
Request body:
{
"job_id": "a1b2c3d4",
"instruction": "Also add type hints to all the new async functions"
}User Instruction
│
▼
[ Planner ] ← Breaks instruction into 1..N ordered steps
│
▼
[ Executor ] ← Iterates over steps; decides which tool to call
│
┌───┴────────────────────────────────┐
│ │
▼ ▼
[ code_parser ] [ github_tool ]
Parse existing files Clone, branch, commit
│ │
▼ ▼
[ code_gen_prompt ] [ diff_generator ]
Generate new code Produce human-readable diff
│ │
└───────────────┬────────────────┘
▼
[ pr_tool ]
Compose & open PR
Memory is maintained throughout the session via agent/memory.py, so follow-up refinement instructions have full context of what was already done.
pytest tests/ -vTo run only a specific layer:
pytest tests/test_agent.py # agent logic
pytest tests/test_tools.py # tool integrations
pytest tests/test_api.py # HTTP APIThe .github/workflows/ directory contains a GitHub Actions pipeline that runs on every push and pull request:
- Lint —
ruffandblack --check - Type check —
mypy - Test —
pytestwith coverage reporting - Build — Docker image build
- Push — Push image to container registry (on
mainonly)
RepoMind is the internal ML engine for HackingTheRepo. Contributions, bug reports, and feature suggestions are welcome.
- Fork the repository
- Create a feature branch:
git checkout -b feat/your-feature - Commit your changes:
git commit -m "feat: add your feature" - Push the branch:
git push origin feat/your-feature - Open a Pull Request
Please follow Conventional Commits for commit messages.
- Support for GitLab and Bitbucket alongside GitHub
- Streaming job status via WebSockets
- LlamaIndex-based codebase indexing for large monorepos
- Multi-file, multi-PR orchestration
- Fine-tuned code generation model as a drop-in LLM backend
- Plugin system for custom tools (linters, formatters, test runners)
MIT © HackingTheRepo Team