FREGENE is a C++ program that simulates sequence-like data over
large genomic regions in large diploid populations. Unlike coalescent-based
simulation tools, such as MS (Hudson, 2002), FREGENE works
forwards-in-time which allows a wide range of demographic and
selection scenarios to be implemented. Many such models are already
incorporated into FREGENE, and since it is open source users can modify or
extend these. Coalescent methods have difficulty incorporating large
amounts of gene conversion or crossover (Hoggart et al. 2007), whereas
these pose no particular problem for FREGENE. FREGENE offers a flexible model
for recombination hotspots, and can readily simulate regions up to
tens of Mb on a standard desktop computer.
The principle limitation of forward-in-time algorithms is
computational, since the entire population must be tracked through
time, not only the chromosomes that are ancestral to the observed
sample. FREGENE implements many features to enhance computational
efficiency, and includes a rescaling option that greatly reduces
computation time at the cost of some approximation.
The program SAMPLE, that comes with the FREGENE package, generates samples of
individuals from a FREGENE output population, together with a phenotype that depends on the genotype at one or more SNPs and may be binary (case/control) or Gaussian. SAMPLE can
also summarize the SNP minor allele frequency (MAF) spectrum and calculate
values for SNP pairs.
For further details about FREGENE, including the rescaling, see:
Balding DJ (2008). FREGENE: Simulation of realistic sequence-level data in
populations and ascertained samples . BMC Bioinformatics
in press.
Hoggart CJ, Chadeau-Hyam M, Clark TG, Lampariello R, Whittaker JC, De
Iorio M, Balding DJ (2007) Sequence-level population simulations over
large genomic regions. Genetics 177: 1725-1731, 2007, doi:
10.1534/genetics.106.069088
Please cite these articles in any publication that uses FREGENE.
FREGENE is free to use, distribute and modify, under the terms of the GNU
General Public License as published by the Free Software Foundation;
either version 3 of the License, or any later version. In particular
FREGENE comes WITHOUT ANY WARRANTY.
Please report any problems or bugs to d.balding@ic.ac.uk
FREGENE simulates a, possibly subdivided, population of monoecious, diploid, individuals whose genomes consist of a single, linear chromosome. The population evolves over non-overlapping generations according to a Wright-Fisher model, with or without selection.
- Mutation: a two-allele, symmetric mutation model:
multiple mutations at a site are allowed; the mutation rate is the
same for each allele at each site.
- Recombination: both crossovers and gene conversions, with
rates that may be uniform or vary along the chromosome at both broad
and fine scales (the latter corresponding to hotspots).
- Population size: constant, or exponential growth.
- Migration: within a subdivided population; the islands can
have different population sizes.
- Selection: A pre-specified proportion of sites is
non-neutral. At sites under selection, fitnesses are allocated to
each genotype stochastically, according to distributions that can
allow directional (positive or negative) and balancing selection
scenarios. Selection operates additively over sites. In a divided population, it can be local to the subpopulation in which the selected mutant arose, or global.
Many of these assumptions are easy to relax by small changes to the source code, but typically at a cost in computational efficiency.
- For computational efficiency, each chromosome is represented as a
list of sites at which the minor allele is present
- If a derived allele becomes the major allele, then the allele is
``swapped'' so that the ancestral allele is now recorded.
- FREGENE outputs the ``swap'' status of each site, i.e.
whether the minor allele is derived or ancestral.
- The main FREGENE output file, in xml format, summarizes the
parameters of the simulation and includes the chromosomes present in
the final generation. It is formatted so that it can be re-processed
as an input file, which allows complex demographic scenarios to be
built up from simple components, via repeat calls of FREGENE.
- FREGENE requires about 2 days on a standard desktop computer to
simulate the evolution of a 10 Mb chromosome in 10K individuals over
100K generations. However, by rescaling the number of
generations, population size, and the rate parameters, the computing
time can be greatly reduced at the cost of some approximation. For
example, with ten-fold rescaling the above simulation time is
reduced by a factor of 64 to less than 1 hour. Disadvantages of
rescaling include:
- Reduced population size; however, by setting a switch the user
can ask FREGENE to run extra generations in which the rescaling is
relaxed, bringing the population size up to its desired level.
- More mutations arising at sites that are already polymorphic (``double hit'', or ``back'' mutations); FREGENE tracks these so this effect can be monitored.
See Hoggart et al. (2007) for further details.
Subsections
Imperial College -- August 2008