Skip to content

lizhebio/DeNovoCNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DenovoCNN_WDL

DeNovoCNN

A deep learning approach to call de novo mutations (DNMs) on whole-exome (WES) and whole-genome sequencing (WGS) data. DeNovoCNN uses trio BAM/CRAM + VCF (or tab-separated list of variants) files to generate image-like genomic sequence representations and detect DNMs with high accuracy.

DeNovoCNN is a combination of three models for the calling of substitution, deletion and insertion DNMs. Each of the model is a 9-layers CNN with squeeze-and-excitation blocks. DeNovoCNN is trained on ~50k manually curated DNM and IV (inherited and non-DNM variants) sequencing data, generated using Illumina sequencer and Sureselect Human All Exon V5/Sureselect Human All Exon V4 capture kits.

DeNovoCNN returns a tab-separated file of format:

Chromosome | Start position | End position | Reference | Variant | DNM posterior probability | Mean coverage

We used DNM posterior probability >= 0.5 to create a filtered tab-separated file with the list of variants that are likely to be de novo.

How does it work?

DeNovoCNN reads BAM files and iterates through potential DNM locations using the input VCF files to generate snapshots of genomic regions. It stacks trio BAM files to generate and RGB image representation which are passed into a CNN with squeeze-and-excitation blocks to classify each image as either DNM or IV (inherited variant, non-DNM).

Usage

Docker

DeNovoCNN is available as a docker container.

The example of DeNovoCNN usage for prediction (to use pretrained models, corresponding arguments shoud remain unchanged):

docker run \
  -v "YOUR_INPUT_DIRECTORY":"/input" \
  -v "YOUR_OUTPUT_DIRECTORY:/output" \
 registry.miracle.ac.cn/broad/denovocnn:latest \
  /app/apply_denovocnn.sh\
    --workdir=/output \
    --child-vcf=/input/<CHILD_VCF> \
    --father-vcf=/input/<FATHER_VCF> \
    --mother-vcf=/input/<MOTHER_VCF> \
    --child-bam=/input/<CHILD_BAM> \
    --father-bam=/input/<FATHER_BAM> \
    --mother-bam=/input/<MOTHER_BAM> \
    --snp-model=/app/models/snp \
    --in-model=/app/models/ins \
    --del-model=/app/models/del \
    --genome=/input/<REFERENCE_GENOME> \
    --outputo=predictions.csv

Parameters description and usage are described earlier in the previous section.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages