A high-performance, lightweight Python library for fuzzy string matching and ranking, implemented in C++ with Pybind11.
- Blazing Fast: C++ core for 2-5x speed improvement over pure Python alternatives.
- Multiple Scorers: Support for Levenshtein, Jaccard, and Token Sort ratios.
- Partial Matching: Find the best substring matches.
- Hybrid Scoring: Combine multiple scorers with custom weights.
- Pandas & NumPy Integration: Native support for Series and Arrays.
- Batch Processing: Parallelized matching for large datasets using OpenMP.
- Unicode Support: Handles international characters and normalization.
- Benchmarking Tools: Built-in utilities to measure performance.
pip install fuzzybunnyimport fuzzybunny
# Basic matching
score = fuzzybunny.levenshtein("kitten", "sitting")
print(f"Similarity: {score:.2f}")
# Ranking candidates
candidates = ["apple", "apricot", "banana", "cherry"]
results = fuzzybunny.rank("app", candidates, top_n=2)
# [('apple', 0.6), ('apricot', 0.42)]Combine different algorithms to get better results:
results = fuzzybunny.rank(
"apple banana",
["banana apple"],
scorer="hybrid",
weights={"levenshtein": 0.3, "token_sort": 0.7}
)Use the specialized accessor for clean code:
import pandas as pd
import fuzzybunny
df = pd.DataFrame({"names": ["apple pie", "banana bread", "cherry tart"]})
results = df["names"].fuzzy.match("apple", mode="partial")Compare performance on your specific data:
perf = fuzzybunny.benchmark("query", candidates)
print(f"Levenshtein mean time: {perf['levenshtein']['mean']:.6f}s")MIT
