Fine-tune a small language model on your own git history to generate commit messages locally. Zero latency, zero API cost.
git add .
python suggest.py
# -> "Add deck composition features to observation for card-counting"
- Extract
(diff, commit_message)pairs from your GitHub repos - Fine-tune Qwen3-0.6B with LoRA (~10 min on M-series Mac)
- Suggest commit messages from staged changes in <1 second
The model learns YOUR commit style — vocabulary, conventions, and what parts of a diff matter.
- Apple Silicon Mac (M1/M2/M3/M4) with 8+ GB RAM
- Python 3.10+
- GitHub CLI (
gh) for fetching repos
# 1. Setup
git clone https://github.com/YOUR_USER/commit-msg-finetune.git
cd commit-msg-finetune
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# 2. Build dataset from your GitHub repos
./fetch_and_extract.sh # clones repos, extracts diffs
python build_dataset.py # curates into train/valid/test
# 3. Train (~10 min)
make train
# 4. Use it
git add .
python suggest.pyOr step-by-step with make:
make fetch # clone repos + extract pairs
make dataset # curate + split
make train # LoRA fine-tune
make suggest # generate message for staged changes
make evaluate # ROUGE-L scores on test setcommit-msg-finetune/
├── fetch_and_extract.sh # Clone GitHub repos + extract pairs
├── extract_diff_commit_pairs.py # Git history -> (diff, message) JSONL
├── build_dataset.py # Curate + split into train/valid/test
├── suggest.py # Inference: staged diff -> commit message
├── evaluate.py # Test set evaluation (ROUGE-L)
├── Makefile # Convenience commands
├── DEVELOPER_GUIDE.md # Deep implementation guide
├── data/
│ ├── train.jsonl # Training examples (ChatML format)
│ ├── valid.jsonl # Validation set
│ └── test.jsonl # Test set
└── adapters/ # LoRA weights (after training)
Training defaults can be overridden via make variables:
make train MODEL=Qwen/Qwen3-4B ITERS=500 BATCH=2| Variable | Default | Description |
|---|---|---|
MODEL |
Qwen/Qwen3-0.6B |
Base model (HuggingFace ID) |
ITERS |
300 |
Training iterations |
BATCH |
4 |
Batch size |
LR |
1e-4 |
Learning rate |
MAX_SEQ |
2048 |
Max sequence length |
NUM_LAYERS |
16 |
LoRA layers |
If 0.6B quality isn't enough, try a larger model — same pipeline:
| Model | Params | RAM needed | Quality |
|---|---|---|---|
| Qwen3-0.6B | 0.6B | ~2 GB | Good for common patterns |
| Qwen3-1.7B | 1.7B | ~4 GB | Better phrasing |
| Qwen3-4B | 4B | ~8 GB | Handles complex diffs |
| Qwen3-8B | 8B | ~14 GB | Near-human quality |
make train MODEL=Qwen/Qwen3-4B BATCH=2
python suggest.py --model Qwen/Qwen3-4BLarge diffs are compressed to fit the model's context window (same rules during training and inference):
| Diff size | Strategy |
|---|---|
| < 300 lines | Full patch |
| 300–1000 lines | git diff --stat + first 300 lines |
| > 1000 lines | git diff --stat only |
MIT