Skip to content

cachevector/fuzzybunny

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FuzzyBunny Logo

FuzzyBunny

A high-performance, lightweight Python library for fuzzy string matching and ranking, implemented in C++ with Pybind11.

Features

  • Blazing Fast: C++ core for 2-5x speed improvement over pure Python alternatives.
  • Multiple Scorers: Support for Levenshtein, Jaccard, and Token Sort ratios.
  • Partial Matching: Find the best substring matches.
  • Hybrid Scoring: Combine multiple scorers with custom weights.
  • Pandas & NumPy Integration: Native support for Series and Arrays.
  • Batch Processing: Parallelized matching for large datasets using OpenMP.
  • Unicode Support: Handles international characters and normalization.
  • Benchmarking Tools: Built-in utilities to measure performance.

Installation

pip install fuzzybunny

Quick Start

import fuzzybunny

# Basic matching
score = fuzzybunny.levenshtein("kitten", "sitting")
print(f"Similarity: {score:.2f}")

# Ranking candidates
candidates = ["apple", "apricot", "banana", "cherry"]
results = fuzzybunny.rank("app", candidates, top_n=2)
# [('apple', 0.6), ('apricot', 0.42)]

Advanced Usage

Hybrid Scorer

Combine different algorithms to get better results:

results = fuzzybunny.rank(
    "apple banana", 
    ["banana apple"], 
    scorer="hybrid", 
    weights={"levenshtein": 0.3, "token_sort": 0.7}
)

Pandas Integration

Use the specialized accessor for clean code:

import pandas as pd
import fuzzybunny

df = pd.DataFrame({"names": ["apple pie", "banana bread", "cherry tart"]})
results = df["names"].fuzzy.match("apple", mode="partial")

Benchmarking

Compare performance on your specific data:

perf = fuzzybunny.benchmark("query", candidates)
print(f"Levenshtein mean time: {perf['levenshtein']['mean']:.6f}s")

License

MIT

About

A fuzzy search tool written in python

Resources

License

Stars

Watchers

Forks

Packages

No packages published