PDB2Net automatically extracts Protein Interaction Networks (PINs) from PDB/mmCIF files and visualizes them as Cytoscape networks.
It uses Gemmi for structure parsing, SciPy cKDTree for distance-based interaction detection, and BLAST+ for UniProt annotation of unidentified chains.
- Automatic parsing of
.pdb,.cif, and.mmCIFstructures - Distance-based chain interaction detection
- Protein-level and chain-level networks
- Full UniProt annotation via SIFTS and BLAST+
- Export of chain, protein, and combined networks (CX2 format)
- Recommended Version: Python 3.11
- Download Python
- Ensure that pip is installed:
python3 -m ensurepip --default-pip
python3 -m pip install -r requirements.txt
For local development notes and an environment pre-flight check, see docs/development.md.
- Download Cytoscape 3.10.4 or newer:
Cytoscape Download - Start once manually, so it can auto-launch later via PDB2Net.
- On headless servers, Cytoscape is automatically disabled (
open_in_cytoscape = false).
| File | Source | Purpose |
|---|---|---|
pdb_seqres.txt |
https://www.rcsb.org/downloads/fasta | PDB single-FASTA (chains) |
pdb_chain_uniprot.tsv |
https://www.ebi.ac.uk/pdbe/docs/sifts/quick.html | PDB β UniProt mapping (SIFTS) |
uniprot_sprot.fasta |
https://www.uniprot.org/uniprotkb?query=reviewed:true | Swiss-Prot for building BLAST DB |
- Go to the NCBI BLAST+ Download page:
π NCBI BLAST+ Download - Download the correct version for your OS:
- Windows: Download
ncbi-blast-*-win64.exe - Linux: Download
ncbi-blast-*-x64-linux.tar.gz - MacOS: Download
ncbi-blast-*-universal-macosx.tar.gz
- Windows: Download
- Install BLAST+:
- Windows: Run the
.exefile and follow the installation wizard. - Linux/MacOS: Extract the files and move them to
/usr/local/bin:tar -xvzf ncbi-blast-*-x64-linux.tar.gz sudo mv ncbi-blast-* /usr/local/bin
- Windows: Run the
Now, generate the BLAST database from the downloaded UniProt FASTA file.
-
Open a terminal (Linux/Mac) or PowerShell/Git Bash (Windows).
-
Run the following command:
makeblastdb -in C:/blast_db/uniprot_sprot.fasta -dbtype prot -out C:/blast_db/uniprot_db
Explanation:
-inβ Input FASTA file.-dbtype protβ Specifies a protein database.-outβ Output database name (uniprot_db).
-
Expected output:
Building a new DB, current time: 03/16/2025 12:45:32 New DB name: C:/blast_db/uniprot_db Number of sequences: 570,000This confirms that BLAST has successfully created the database.
PDB2Net loads configuration in layers β later files override earlier ones:
configs/config.base.jsonβ shared defaultsconfigs/config.{windows|linux|darwin}.jsonβ OS-specific overridesconfigs/config.local.jsonβ user machine settings (git-ignored)- Environment variables β highest priority
ποΈ Paths support
~and$VARSexpansion.
config.base.json(defaults):
{
"networks": {
"chain_per_pdb": true,
"combined_chain_network": true,
"protein_per_pdb": true,
"combined_protein_network": true
},
"distance_thresholds": { "ca_radius": 15.0, "all_atoms_radius": 5.0 },
"workers": { "parsing": "auto", "blast_threads": "auto" },
"keep_last_n_networks": 46,
"export_detailed_interactions": true
}config.windows.json
{
"input_folder_path": "E:/PDB_Files/Test500",
"pdb_fasta_path": "C:/Users/habit/Documents/Projekte/MPI_PDB2Net/Data/pdb_seqres.txt",
"uniprot_fasta_path": "C:/Users/habit/Documents/Projekte/MPI_PDB2Net/Data/uniprot_sprot.fasta",
"sifts_tsv_path": "C:/Users/habit/Documents/Projekte/MPI_PDB2Net/Data/pdb_chain_uniprot.tsv",
"output_path": "D:/Networks",
"cytoscape_path": "C:/Program Files/Cytoscape_v3.10.4/Cytoscape.exe",
"blast_db_path": "C:/Users/habit/Documents/Projekte/MPI_PDB2Net/Data/blast_db",
"blastp_executable": "C:/Program Files/NCBI/blast-2.17.0+/bin/blastp.exe",
"open_in_cytoscape": true
}config.linux.json
{
"input_folder_path": "/data/pdb_inputs",
"pdb_fasta_path": "/data/reference/pdb_seqres.txt",
"uniprot_fasta_path": "/data/reference/uniprot_sprot.fasta",
"sifts_tsv_path": "/data/reference/pdb_chain_uniprot.tsv",
"output_path": "/srv/pdb2net_outputs",
"blast_db_path": "/data/reference/blast_db",
"blastp_executable": "blastp",
"open_in_cytoscape": false
}config.darwin.json(macOS)
{
"input_folder_path": "$HOME/pdb2net/pdb_inputs",
"pdb_fasta_path": "$HOME/pdb2net/reference/pdb_seqres.txt",
"uniprot_fasta_path": "$HOME/pdb2net/reference/uniprot_sprot.fasta",
"sifts_tsv_path": "$HOME/pdb2net/reference/pdb_chain_uniprot.tsv",
"output_path": "$HOME/pdb2net/outputs",
"blast_db_path": "$HOME/pdb2net/reference/blast_db",
"blastp_executable": "blastp",
"open_in_cytoscape": true,
"cytoscape_path": "/Applications/Cytoscape.app/Contents/MacOS/Cytoscape"
}You can override individual settings via ENV:
| ENV var | Maps to config key |
|---|---|
PDB2NET_INPUT |
input_folder_path |
PDB2NET_OUTPUT |
output_path |
PDB2NET_PDB_FASTA |
pdb_fasta_path |
PDB2NET_UNIPROT_FASTA |
uniprot_fasta_path |
PDB2NET_SIFTS_TSV |
sifts_tsv_path |
PDB2NET_CYTO_PATH |
cytoscape_path |
PDB2NET_BLAST_DB |
blast_db_path |
PDB2NET_BLAST_CACHE_PATH |
blast_cache_path (optional SQLite cache) |
PDB2NET_BLASTP |
blastp_executable |
PDB2NET_OPEN_IN_CYTOSCAPE |
open_in_cytoscape (true/false/1/0/yes/no) |
PDB2NET_EXPORT_DETAILED_INTERACTIONS |
export_detailed_interactions (true/false/1/0/yes/no) |
PDB2NET_WORKERS_PARSING |
workers.parsing (auto or int) |
PDB2NET_WORKERS_BLAST |
workers.blast_threads (auto or int) |
PDB2NET_CA_RADIUS |
distance_thresholds.ca_radius |
PDB2NET_ALL_ATOMS_RADIUS |
distance_thresholds.all_atoms_radius |
PDB2NET_PP_MIN_CA_NEIGHBORS |
interaction_filters.protein_protein_min_ca_neighbors |
PDB2NET_PP_MIN_ALL_ATOM_CONTACTS |
interaction_filters.protein_protein_min_all_atom_contacts |
PDB2NET_PNA_MIN_ALL_ATOM_CONTACTS |
interaction_filters.protein_nucleic_acid_min_all_atom_contacts |
PDB2NET_NA_MIN_ALL_ATOM_CONTACTS |
interaction_filters.nucleic_acid_min_all_atom_contacts |
PDB2NET_STRUCTURE_MODEL_POLICY |
structure_model_policy (first or all) |
Windows PowerShell:
setx PDB2NET_INPUT "E:\PDB_Files\Dataset"
setx PDB2NET_OUTPUT "E:\Networks"
setx PDB2NET_OPEN_IN_CYTOSCAPE "true"
Linux/macOS:
export PDB2NET_INPUT=~/pdb2net/pdb_inputs
export PDB2NET_OUTPUT=~/pdb2net/outputs
export PDB2NET_OPEN_IN_CYTOSCAPE=false
For server or read-only reference-data deployments, set blast_cache_path (or
PDB2NET_BLAST_CACHE_PATH) to a writable SQLite file outside the BLAST database
directory. If unset, PDB2Net keeps the previous default next to blast_db_path.
For automated webserver jobs, keep open_in_cytoscape: false, set
blast_cache_path to a writable cache directory, and leave
structure_model_policy: "first" unless you intentionally want all models from
multi-model structures represented as separate chain nodes.
Once all dependencies and reference files are configured, run PDB2Net headlessly with explicit input and output folders:
python3 -m pdb2net run \
--input-dir /path/to/input_structures \
--output-dir /path/to/pdb2net_outputs \
--headlessIf the package is installed, the console command is also available:
pdb2net run \
--input-dir /path/to/input_structures \
--output-dir /path/to/pdb2net_outputs \
--headlessThe legacy config-driven entry point remains available:
python3 -m pdb2net.main- Output goes to a timestamped subfolder in
output_path, e.g.: ""/β¦/Networks/2025-10-20_18-32-45/"
For backend-style jobs, add --web-output-dir to collect stable
user-facing outputs:
python3 -m pdb2net run \
--input-dir /path/to/job/inputs \
--output-dir /path/to/job/work \
--web-output-dir /path/to/job/outputs \
--headlessSee docs/server_backend_usage.md for the
worker-facing output contract.
Valid PDB/mmCIF files found in input_folder_path
| File/Folder | Description |
|---|---|
runtime_analysis.txt |
Timing summary (parsing, classification, BLAST, interaction, exports) |
manifest.json / run_summary.json |
Machine-readable run status, inputs, generated files, counts, config snapshot, warnings, and errors |
*.cx2 |
Cytoscape networks (Chain/Protein/Combined), portable CX2 |
detailed_interactions.csv |
Per-atom residue/atom distance pairs (if export_detailed_interactions: true) |
error_in_batch_log/ |
Batch/runtime logs |
PDB2Net generates several network representations:
- Chain Interaction Network (per PDB) β Nodes: chains; Edges: interactions
- Combined Chain Network β All chains across all PDBs
- Protein Network (per PDB) β Nodes: UniProt IDs; Edges aggregated over chains
- Combined Protein Network β UniProt nodes across all PDBs
Headless / Server (open_in_cytoscape: false)
β Only CX2 files are written (no .cyjs).
β Deterministic positions and visual mappings are embedded.
Desktop (open_in_cytoscape: true)
β Networks are created in Cytoscape via py4cytoscape and also exported as CX2.
The BLAST database will be built from a UniProt FASTA file.
-
Download the latest UniProt Swiss-Prot database
- Manual Download: UniProt Swiss-Prot
-
Move the file to the BLAST database folder (adjust the path if necessary):
mkdir -p C:/blast_db # Windows (Git Bash) mkdir -p ~/blast_db # Linux/MacOS
Habitzreither, G., Gautam, Lupas, A., Elhabashy, H. PDB2Net: Automated extraction of biomolecular Interaction Networks from Three-Dimensional Structures. Manuscript in preparation.
- Gregor Habitzreither
- Hadeer Elhabashy
If you have any questions or inquiries, please feel free to contact Hadeer Elhabashy at (Elhabashylab [@] gmail.com))
- The PDB2NET code in this repository is licensed under the MIT License.