Rule-based transaction categorizer for bank-style CSV exports: infer date / merchant / amount columns, match merchants against keyword rules (exact token or phrase match, then fuzzy similarity), summarize spending by category, and flag low-confidence rows for review.
Python: 3.10+
From this directory:
pip install -e .Editable install registers the modules on your environment’s PYTHONPATH, so you can run import categorizer (and the other modules) from any working directory.
With test and dev tools (recommended for contributors):
pip install -e ".[dev]"Alternatively, install dependencies from the flat list (e.g. CI):
pip install -r requirements.txt
pip install -e .Runtime dependencies are also listed under [project] / [project.optional-dependencies] in pyproject.toml.
After install:
ll-categorizerThis starts the interactive menu (classify CSV, mock data, view rules, add rules). Custom rules are merged with the built-in defaults and persisted under your data directory as rules_overrides.json (only entries that differ from defaults are stored).
| Variable | Purpose |
|---|---|
LL_CATEGORIZER_DATA_DIR |
Override the data directory (default: ./ll_categorizer_data under the current working directory). |
Typical files under that directory:
rules_overrides.json— custom merchant rules (merged over built-ins)categorized_transactions.csv— optional output fromstorage.save_categorized_transactionscategorizer_report.txt— optional text report fromstorage.write_text_report
If you used an older build that wrote to ./ledgerlogic_data or LEDGERLOGIC_DATA_DIR, move those files into the new location or set LL_CATEGORIZER_DATA_DIR to the old folder.
pytestpyproject.toml sets pythonpath = ["."] for pytest, so tests run correctly from the repo root even before install (if dependencies are present).
With [dev] installed:
black --check .
flake8 .
mypy categorizer.py storage.py schemas.py csv_columns.py parsing.py textutil.pyBasic (built-in rules only):
from categorizer import run_classification
result = run_classification(file_path="statement.csv")
records = result["records"]
flagged = result["flagged"]With saved CLI-style overrides (rules_overrides.json in your data directory):
from categorizer import DEFAULT_RULES, run_classification
from storage import load_merged_category_rules
rules = load_merged_category_rules(DEFAULT_RULES)
result = run_classification(file_path="statement.csv", rules=rules)Data directory helpers and CSV persistence live in storage.py.
- Column detection scores headers against keyword lists, picks a one-to-one mapping that maximizes total score, and clears a role if that column’s score stays below an internal threshold—so odd exports may yield
Nonefor some roles (and row skips plus warnings). - Exact rules require whole-token or multi-token phrase matches, or bounded substring matches for rules with at least four non-space characters, so very short keys are not matched inside unrelated words.
- Fuzzy matching still uses edit distance / similarity; tune
thresholdinfind_best_rule_match/categorize_transactionsif you see borderline cases.
Sources are top-level modules (categorizer.py, csv_columns.py, parsing.py, etc.) packaged via pyproject.toml py-modules, not a nested src/ package.