Workflow used to quantify Reference Alignment Bias for different functional genomic assays in Quantifying Reference Alignment Bias in Functional Genomics Analyses [Manuscript under review at Cell Reports Methods]
This code was specifically generated for the parameters of the study, and any end users may need to modify it to fit their specific use case.
To make the assumption that a read comes from one of the individual's haplotypes, the bulk reads must be phased based on the likelihood they originated from the maternal or paternal genome. Phasing was accomplished by leveraging MAPQ scores returned from alignments to the maternal, paternal, or maternal+paternal combined genomes. The exact commands used to split reads between maternal and paternal genomes can be accessed here, and the detailed methods in Tekkey et al.
Reference alignment bias was measured through the following process:
- Split the query genome into 500 bp bins (query being the individual haplotype from which the sequencing reads were generated).
- Lift the bins over to the reference genome to get syntenic bin pairs.
- Intersect reference and query bins with the reads aligned to that genome to get a count.
- Calculate the reference fraction by dividing the number of reads intersecting the reference by the sum of reads intersecting the reference and query.
The exact commands used for this process can be accessed here, and the detailed methods in Tekkey et al.
