rlsim: a package for simulating RNA-seq library preparation with parameter estimation
About
The rlsim package is a collection of tools for simulating RNA-seq
library construction, aiming to reproduce the most important factors
which are known to introduce significant biases in the currently used
protocols: hexamer priming,
PCR amplification and size selection.
It allows for a systematic exploration of the effects of the individual biasing
factors and their interactions on downstream applications by simulating
data under a variety of parameter sets.
The implicit simulation model implemented in the main tool (rlsim) is
inspired by the actual library preparation protocols and it is more
general than the models used by the bias correction methods hence it allows for
a fair assessment of their performance.
Although the simulation model was kept as simple as possible in order to
aid usability, it still has too many parameters to be inferred from data
produced by standard RNA-seq experiments. However, simulating datasets
with properties similar to specific datasets is often useful. To address
this, the package provides a tool (effest) implementing simple
approaches for estimating the parameters which can be recovered from
standard RNA-seq data (GC-dependent amplification efficiencies, fragment
size distribution, relative expression levels).
The latest release and the sources are available from the rlsim GitHub repository:
https://github.com/sbotond/rlsim
Key features
Simulation of priming biases loosely based on a nearest-neighbor
thermodynamic model.
Exact simulation of PCR amplification on the level of individual
fragments (consistent across expression levels, no approximations).
Fragment-specific amplification efficiencies determined by
GC-content and length.
Possibility to simulate PCR and sampling pseudo-replicates.
Simulation of size selection and polyadenylation with flexible
target distributions.
Estimation of GC-dependent amplification efficiencies from real
data, relying on assumptions about locality of biases and the mean
efficiency of the fragment pool.
Estimation of relative expression levels.
Estimation of empirical fragment size distribution, model selection
between normal vs. skew normal distributions.
Able to simulate experiments on the human transcriptome over a wide
range of expression levels on a desktop machine.
Citing the rlsim package
An associated manuscript is in preparation, meanwhile the package should
be cited as:
- Botond Sipos, Greg Slodkowicz, Tim Massingham, Nick Goldman (2013) Realistic simulations reveal extensive sample-specificity of RNA-seq biases arXiv:1308.3172
The analysis pipeline used to generate the results is available at https://github.com/sbotond/paper-rlsim.
Getting more help
Please consult the package documentation for more help on the tools and the technical background.
The BioStar Q&A forum (http://www.biostars.org) is an excellent place to get additional help.
The author of the package will monitor the posts having the rlsim tag.
 |