Massivora

A superfast pipeline for proteome-wide evolutionary coupling screens — a scalable framework for massive sequence co-evolution analysis.

Massivora ships a native (C++/optional CUDA) core built on NLopt and Eigen, exposed to Python via pybind11. GPU acceleration (via CuPy) is optional.

Installation

There are multiple ways to install this package. But the conda installation is recommended.

1. Conda (recommended)

Conda is the primary, most convenient path:

CPU only:

# use -n to customize the environment name
conda env create -f environment.yml
conda activate massivora

pip install . --no-build-isolation

With CUDA compatible GPU available:

# use -n to customize the environment name
conda env create -f environment.yml
conda activate massivora

conda install -c conda-forge cupy
pip install . --no-build-isolation

The CUDA backend is enabled automatically when a CUDA compiler (nvcc) is on PATH — no extra flag needed. Force it on/off with -C cmake.define.ENABLE_CUDA=ON / =OFF.

2. pip

Dependencies: eigen, nlopt. If they are not installed, pip will try to install them. If you wish to use the analysis module of massivora, you also need to install openstructure manually.

CPU only:

pip install .

With CUDA GPU available:

Note that you have to determine your CUDA toolkit version here. If you have CUDA 12 installed, for example, use [cuda12] in the install command:

pip install ".[cuda12]"

The [cuda12] extra installs the matching CuPy wheel, and the CUDA backend is built automatically because nvcc is detected — you no longer need to pass -C cmake.define.ENABLE_CUDA=ON. (Force the build on/off with that flag set to ON/OFF if you want to override the auto-detection.)

3. Docker (production)

Two images are provided.

GPU image (CUDA-enabled):

docker build -t massivora:gpu .
docker run --rm --gpus all massivora:gpu massivora --help

CPU-only image (smaller, no CUDA/CuPy):

docker build -f Dockerfile.cpu -t massivora:cpu .
docker run --rm massivora:cpu massivora --help

System configuration

Machine/cluster-specific settings live in a system config at ~/.massivora.yml. It is created automatically (from a bundled default) the first time you run massivora after installation, and is read from there afterwards. Edit it with:

massivora sysconf

This opens ~/.massivora.yml in your editor ($VISUAL/$EDITOR, falling back to vi), creating it from the bundled default if it doesn't exist yet.

When submitting to SLURM, massivora automatically uses the python/conda environment that is currently activated when you submit the job (CONDA_PREFIX), so just conda activate your env before running massivora batch ....

For a detailed description of the fields in the system configuration file, see the documentation.

Usage

Pairwise coupling calculation

Prepare all the Uniprot ID of your proteins and put them into a text file, one protein per line. Then create a new project with this command:

massivora new <project_name>

Then you will have a config file project_name.yml under the directory project_name. Put your list of proteins to project_name/data/proteins.txt. You may change this path in the yml file.

Massivora relies on JackHMMER to do homologue search, you need to manually specify the jackhmmer_binary and the search database in the project configuration yml file under align section.

After the configure is done, run this command to run homologue search for proteins on the local machine:

massivora run align project_name.yml

If you are running Massivora on a SLURM cluster, use the batch command:

massivora batch align project_name.yml

Massivora will start to run the alignment for each protein in your text file, and create a cronjob to monitor the status. After the job has been finished, or failed too many times, the cronjob will be removed.

When homologue search has been finished, run this command to perform the pairwise coupling calculation:

massivora run couple project_name.yml

Same as homologue search, you can also use batch to run Massivora on SLURM cluster:

massivora batch couple project_name.yml

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
benchmark_dataset		benchmark_dataset
src/massivora		src/massivora
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
Dockerfile.cpu		Dockerfile.cpu
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Massivora

Installation

1. Conda (recommended)

2. pip

3. Docker (production)

System configuration

Usage

Pairwise coupling calculation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Massivora

Installation

1. Conda (recommended)

2. pip

3. Docker (production)

System configuration

Usage

Pairwise coupling calculation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages