SAMPLE generates (multiple) haplotype or genotype samples from a
FREGENE output population. Either a case-control sample, random sample
with a continuous phenotype or a random sample with no phenotype can
be generated. The number of cases must be set zero if a continuous
phenotype or random sample is required. Otherwise, cases/control
status is assigned according to a disease model that is multiplicative
over genotypes at each causal SNP, and also multiplicative over SNPs
if the number causal SNPs selected is greater than 1. To generate individuals with a
continuous phenotype the -sigma option must be selected, this parameter
specifies the phenotypic standard deviation. The heritability of each
SNP must be specified from which the regression coefficient are
calculated;
where
is the MAF,
is the user specified heritability and the
genotypes are coded as (-1, 0, 1). Thus, each individuals phenotype is
sampled from the following Gaussian distribution

N
where the
's are the genotypes at the selected causal SNPs.
Causal SNPs are selected at random within allele frequency bands
specified by the -f option.
All options are set on the command line.
- -i file_name (required):
Input file, usually the output file of a FREGENE run (
data/rin_example.xml).
- -og file_name option:
Records genotypes of sampled
individuals. Case control status is first entry on each line, followed
by genotypes coded as 0, 1, 2.
- -oh file_name option:
Records haplotypes of sampled
individuals in file_name, each haplotype on a separate line.
For both haplotypes of an individual, the 1st column is its
case/control status then alleles are recorded as 0 (major) or 1
(minor). Either og or oh must be specified; if both are
specified then both output files are generated.
- -scan file_name option:
File containing list of locations of SNPs output to the genotype and/or haplotype files.
- -min int option:
Specifies the minimum MAF of loci that
are output. If neither this option or -scan are selected all
polymorphisms are written to file the genotype and/or haplotype files.
- -chromolength float option:
This option takes as an
argument a chromosome length in Mb; the returned chromosomes are
then restricted to be of this length.
- -LD file_name option:
Argument specifying output file that lists all pairs of SNPs within 200kb
of each other and their pairwise
value.
(see ./data_sample/LD_example.txt).
- -sd int option:
Seed for random number generator.
- -controls int option:
Number of controls to sample.
- -samples int option:
Specifies the number of case-control, or just control, samples, default = 1.
To generate a phenotype, either binary or continous, the following
options must be specified
- -cases int option, only required for a binary
phenotype:
This options specifies that the phenotype is binary and
sets the number of cases to sample, default = 0.
- -sigma float option, only required for a continuous
phenotype:
This options specifies that the phenotype is continuous
and sets the phenotypic standard deviation.
- -prev float option, only required for a binary
phenotype:
Disease prevalence, value between 0 and 1.
- -snps int option:
Number of causal SNPs (
that
influence phenotype), default = 0.
- -length int option:
Size (in Mb) of the chromosome the
user wants to generate haplotype data for. If this option is not
chosen, this chromosome length considered is equal to the one in
FREGENE output file.
- -f float
... float
option:
Two arguments per
causal SNP, specifying its minimum and maximum allele frequencies.
- -rr float
... float
option:
One argument per
causal SNP, if binary phenotype is specified the option specifies
the risk ratio for both the heterozygote relative to the common
homozygote, and also the rare homozygote relative to the
heterozygote. If a continuous phenotype is specified the option
specifies the heritability attributed to each SNP. SNP order should
be the same as for -f.
- -log file_name (required):
Records disease model
details of each sampled file (see
data_sample/log_sample_example*.dat).
Imperial College -- August 2008