This repository contains code for the traceCB paper, featuring the main algorithm and a complete pipeline for trans-ancestry cell-type-specific eQTL mapping.
src/traceCB: The main source code for the Python package.src/coloc: Scripts for colocalization analysis.src/visual: Visualization scripts for GMM results.shell: Shell scripts for running the pipeline steps (preprocessing, LDSC, GMM, etc.).docs: Documentation and tutorials.data: Folder for storing input/output data (seedocs/pipeline.mdfor structure).
- Python >= 3.8
numba,pyarrow,scipy
Clone the repository and install the package using pip:
git clone https://github.com/lucajiang/traceCB.git
cd traceCBActivate your preferred Python environment (recommended, required python 3.8 or above):
conda activate <your_env_name>Or, create a new environment:
conda create -n traceCB_env python=3.8
conda activate traceCB_envThen install the dependencies and this package:
pip install -e .Installation in editable mode (-e) allows you to import the traceCB module in your scripts while keeping the ability to modify the source code if needed.
A step-by-step tutorial notebook is provided at docs/tutorial/run_traceCB.ipynb and colab. This tutorial guides you through running the traceCB algorithm on a single gene example.
It is highly recommended to run this tutorial first to understand the input data format and model outputs.
For full-scale analysis, we provide a structured shell-script pipeline. Detailed preprocessing steps are described in Pipeline Documentation.
- Install s-ldxr and plink1.9.
- Prepare python environment for
s-ldxrwhich requirespysnptoolsandstatsmodelsaddtionally.Prepare R environment if you need to run COLOC. Otherwise, omit the r_env option in next step.pip install pysnptools pip install statsmodels
- Modify
shell/setting.shto specify your paths and parameters according to your environment.
The analysis is divided into sequential modules:
# 1. Merge and align GWAS summary statistics
source shell/run_merge.sh
# 2. Calculate LD scores (s-ldxr)
source shell/run_ld.sh
# 3. Run Generalized Method of Moments (GMM)
source shell/run_gmm.sh
# 4. Colocalization Analysis (Optional)
source shell/run_coloc.shScripts for visualization are provided in src/visual/.
If you use traceCB in your research, please cite our paper:
Citation pending...
This project is licensed under the GPL-3 License - see the LICENSE file for details.
For any questions or issues, please contact wx.jiang@my.cityu.edu.hk or open an issue on GitHub.
