We extend the single-cell RNA-sequencing simulation framework, scsim, with additional functionalities of: (1) enhancing RNA dropout effect, and (2) adding ambient RNA to drops.
Simulate single-cell RNA-seq data using the Splatter statistical framework, which is described here but implemented in python. In addition, simulates doublets and cells with shared gene-expression programs (I.e. activity programs). This was used to benchmark methods for gene expression program inference in single-cell RNA-seq data as described here
We implement the dropout step described in Splatter by fitting a sigmoid curve through genes’ log mean count and their cell fraction with zero reads, where the sigmoid is characterized by shape and midpoint parameters. To enhance dropout, the sigmoid is shifted (decrease its midpoint) and the adjusted dropout probability is computed. This probability is then used to binomially sample the observed counts. See example notebook.
We lend CellBender's model of sample RNA contamination (or ambient RNA). That is, we add to the gene expression of each drop a portion of contaminating RNA and sample its counts consequently. See example notebook.
Specifically, the mean expression of gene
The two models were combined as follows:
Thus, we sample: