Hyperdimensional Fingerprints (HDF)

Real-valued, fixed-size molecular fingerprints — no training, just NumPy (with optional Rust acceleration).

Hyper Fingerprints encodes molecules into continuous vector representations using Holographic Reduced Representations (HRR) with graph message passing. The result is a deterministic, real-valued fingerprint that works as a drop-in feature vector for similarity search, clustering, or any downstream ML task.

🚀 Quick start

from hyper_fingerprints import Encoder, cosine_similarity

enc = Encoder(dimension=512, seed=42)

# Encode molecules (SMILES strings or RDKit Mol objects)
fps = enc.encode(["CCO", "CO", "c1ccccc1"])  # shape: (3, 512), dtype: float64

# Cosine similarity — similar molecules get similar vectors
sim = cosine_similarity(fps, fps)

print(f"ethanol vs methanol: {sim[0, 1]:.3f}")   # high similarity
print(f"ethanol vs benzene:  {sim[0, 2]:.3f}")    # low similarity

To use a custom atom vocabulary, pass atom_types at init:

enc = Encoder(dimension=512, atom_types=["C", "N", "O", "H", "Si"])

See examples/00_quickstart.ipynb for a full walkthrough covering similarity search, joint fingerprints, custom atom types, save/load, and scikit-learn integration.

📖 API

Encoder

Encoder(
    dimension=256,      # hypervector size
    depth=3,            # message-passing layers (structural context radius)
    atom_types=None,    # atom vocabulary (default: Br, C, Cl, F, I, N, O, P, S)
    seed=None,          # random seed for reproducible codebook generation
    normalize=False,    # L2-normalize after each message-passing layer
    backend="auto",     # "auto" | "rust" | "numpy"
)

Molecules can be passed as SMILES strings, RDKit Mol objects, or lists of either.

Methods

encode(molecules) -> np.ndarray — Encode molecules into order-N hypervector fingerprints. Returns shape (batch_size, dimension).

encode_joint(molecules) -> np.ndarray — Concatenation of order-0 (atom identity only, no structural context) and order-N (full message-passing) embeddings. Returns shape (batch_size, 2 * dimension). Useful when you want both local atom-level and structural information in one vector.

Persistence

save(path) / Encoder.load(path) — Persist and restore an encoder (config + codebook) as a single .npz file. Useful for sharing a fixed fingerprint scheme or deploying without needing to track the seed.

enc.save("encoder.npz")
loaded = Encoder.load("encoder.npz")

Parameter guidance

Parameter	Guidance
`dimension`	32-256 for Bayesian optimization. 1024-2048 as a starting point for property prediction.
`depth`	Controls structural context radius, analogous to Morgan radius. `depth=3` captures up to 3-bond neighborhoods. Higher values capture more global structure but increase computation.

Atom features

Each atom is described by 5 discrete features:

Feature	Bins	Values
Atom type	`len(atom_types)` (varies with vocabulary)	Index into the atom vocabulary
Degree	6	0-5
Formal charge	3	neutral, positive, negative
Total Hs	4	0-3
Is aromatic	2	0, 1

⚠️ Limitations

No bond type features — bonds are treated as unweighted edges. Single, double, and aromatic bonds are not distinguished in the current feature scheme.
No stereochemistry — chirality and cis/trans isomerism are not encoded.
No GPU acceleration — encoding is CPU-only (NumPy or optional Rust extension).
Codebook scales with vocabulary — the codebook has product(feature_bins) entries (1296 for the default 9 atom types). Large custom atom type lists will increase memory usage.

📦 Installation

Requires Python 3.9+ and a Rust toolchain (1.83+).

From source (recommended for development)

git clone https://github.com/the16thpythonist/hyper-fingerprints.git
cd hyper-fingerprints

# Install maturin (builds the Rust extension)
pip install maturin

# Build and install in development mode (editable, release-optimized)
RUSTFLAGS="-C target-cpu=native" maturin develop --release

# Verify the Rust backend is available
python -c "from hyper_fingerprints._core import encode_batch_rs; print('Rust OK')"

From a pre-built wheel

# Build the wheel first
./build.sh

# Install the wheel
pip install target/wheels/hyper_fingerprints-*.whl

Dependencies

numpy >= 1.24
rdkit >= 2024.0.0
Rust toolchain >= 1.83 (build-time only)

🔧 Rust backend

The Rust extension accelerates both SMILES parsing/feature extraction (~23x) and the message-passing pipeline (~22x), for a combined ~22x end-to-end speedup. When installed, it is used automatically:

enc = Encoder(dimension=512, seed=42, backend="rust")   # require Rust
enc = Encoder(dimension=512, seed=42, backend="numpy")  # force pure Python
enc = Encoder(dimension=512, seed=42, backend="auto")   # default: Rust if available

🧪 Development

Install in dev mode with the Rust extension:

pip install maturin
RUSTFLAGS="-C target-cpu=native" maturin develop --release
pip install -e ".[dev]"

Run tests:

pytest

Build a wheel and test it in a clean environment:

nox -s build_test

Run tests across Python 3.9-3.13 with nox:

nox -s tests

Fingerprint outputs are regression-tested against recorded fixtures to ensure numerical stability across releases.

📚 References

This project builds on the theory of Holographic Reduced Representations and Vector Symbolic Architectures:

Plate, T. A. (1995). Holographic Reduced Representations. IEEE Transactions on Neural Networks, 6(3), 623-641. doi:10.1109/72.377968
Kanerva, P. (2009). Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors. Cognitive Computation, 1(2), 139-159. doi:10.1007/s12559-009-9009-8

📄 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
examples		examples
experiments		experiments
hyper_fingerprints		hyper_fingerprints
src		src
tests		tests
.gitignore		.gitignore
.zenodo.json		.zenodo.json
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
DEVELOP.md		DEVELOP.md
LICENSE		LICENSE
README.md		README.md
RUST_EXTENSION.md		RUST_EXTENSION.md
banner.png		banner.png
build.sh		build.sh
noxfile.py		noxfile.py
pyproject.toml		pyproject.toml
release.sh		release.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hyperdimensional Fingerprints (HDF)

🚀 Quick start

📖 API

Encoder

Methods

Persistence

Parameter guidance

Atom features

⚠️ Limitations

📦 Installation

From source (recommended for development)

From a pre-built wheel

Dependencies

🔧 Rust backend

🧪 Development

📚 References

📄 License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hyperdimensional Fingerprints (HDF)

🚀 Quick start

📖 API

Encoder

Methods

Persistence

Parameter guidance

Atom features

⚠️ Limitations

📦 Installation

From source (recommended for development)

From a pre-built wheel

Dependencies

🔧 Rust backend

🧪 Development

📚 References

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages