commit-msg-finetune

Fine-tune a small language model on your own git history to generate commit messages locally. Zero latency, zero API cost.

git add .
python suggest.py
# -> "Add deck composition features to observation for card-counting"

How it works

Extract (diff, commit_message) pairs from your GitHub repos
Fine-tune Qwen3-0.6B with LoRA (~10 min on M-series Mac)
Suggest commit messages from staged changes in <1 second

The model learns YOUR commit style — vocabulary, conventions, and what parts of a diff matter.

Requirements

Apple Silicon Mac (M1/M2/M3/M4) with 8+ GB RAM
Python 3.10+
GitHub CLI (gh) for fetching repos

Quickstart

# 1. Setup
git clone https://github.com/YOUR_USER/commit-msg-finetune.git
cd commit-msg-finetune
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# 2. Build dataset from your GitHub repos
./fetch_and_extract.sh          # clones repos, extracts diffs
python build_dataset.py         # curates into train/valid/test

# 3. Train (~10 min)
make train

# 4. Use it
git add .
python suggest.py

Or step-by-step with make:

make fetch      # clone repos + extract pairs
make dataset    # curate + split
make train      # LoRA fine-tune
make suggest    # generate message for staged changes
make evaluate   # ROUGE-L scores on test set

Project structure

commit-msg-finetune/
├── fetch_and_extract.sh            # Clone GitHub repos + extract pairs
├── extract_diff_commit_pairs.py    # Git history -> (diff, message) JSONL
├── build_dataset.py                # Curate + split into train/valid/test
├── suggest.py                      # Inference: staged diff -> commit message
├── evaluate.py                     # Test set evaluation (ROUGE-L)
├── Makefile                        # Convenience commands
├── DEVELOPER_GUIDE.md              # Deep implementation guide
├── data/
│   ├── train.jsonl                 # Training examples (ChatML format)
│   ├── valid.jsonl                 # Validation set
│   └── test.jsonl                  # Test set
└── adapters/                       # LoRA weights (after training)

Configuration

Training defaults can be overridden via make variables:

make train MODEL=Qwen/Qwen3-4B ITERS=500 BATCH=2

Variable	Default	Description
`MODEL`	`Qwen/Qwen3-0.6B`	Base model (HuggingFace ID)
`ITERS`	`300`	Training iterations
`BATCH`	`4`	Batch size
`LR`	`1e-4`	Learning rate
`MAX_SEQ`	`2048`	Max sequence length
`NUM_LAYERS`	`16`	LoRA layers

Scaling up

If 0.6B quality isn't enough, try a larger model — same pipeline:

Model	Params	RAM needed	Quality
Qwen3-0.6B	0.6B	~2 GB	Good for common patterns
Qwen3-1.7B	1.7B	~4 GB	Better phrasing
Qwen3-4B	4B	~8 GB	Handles complex diffs
Qwen3-8B	8B	~14 GB	Near-human quality

make train MODEL=Qwen/Qwen3-4B BATCH=2
python suggest.py --model Qwen/Qwen3-4B

How diffs are handled

Large diffs are compressed to fit the model's context window (same rules during training and inference):

Diff size	Strategy
< 300 lines	Full patch
300–1000 lines	`git diff --stat` + first 300 lines
> 1000 lines	`git diff --stat` only

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

commit-msg-finetune

How it works

Requirements

Quickstart

Project structure

Configuration

Scaling up

How diffs are handled

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
build_dataset.py		build_dataset.py
evaluate.py		evaluate.py
extract_diff_commit_pairs.py		extract_diff_commit_pairs.py
fetch_and_extract.sh		fetch_and_extract.sh
requirements.txt		requirements.txt
suggest.py		suggest.py

Folders and files

Latest commit

History

Repository files navigation

commit-msg-finetune

How it works

Requirements

Quickstart

Project structure

Configuration

Scaling up

How diffs are handled

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages