!!!Configuration options

You can override the default options of ArrayExpressHTS by setting the function's parameter. By default these are set to:
options = list(
  stranded=FALSE,             # set to TRUE if a strand specific protocol was used
  insize=NULL,                # an integer, which will be automatically determined if set to NULL
  insizedev=NULL,             # an integer, which will be automatically determined if set to NULL
  reference="genome",         # genome or transcriptome
  aligner="tophat",           # tophat, bowtie, bwa or custom
  aligner_options=NULL,       # string of options to be directly passed to the aligners according to their own manual pages
  count_feature="transcript", # count over genes or transcripts
  count_method="cufflinks",   # cufflinks, mmseq or count
  normalisation="none",       # none or tmm

usercloud = TRUE,             # set to FALSE if using locally
nnodes = 10,                  # maximum number of nodes to try to allocate 
atempts = 3,                  # number of tries to attempt to create a computing cluster in case of failure
dir = getwd(),                # path to the directory where the projects are or will be created
refdir = getDefaultReferenceDir(),     # path to the directory where the genome/transcriptome references are
want.reports = TRUE           # set to FALSE to disable the reporting steps
!! Details
* __count_method__ \\ The ''count_method'' can be set to either ''count'', ''cufflinks'' or ''mmseq''. \\ \\ ''count''  can only be used with the ''reference'' set to ''transcriptome'', though it will estimate gene level counts if ''count_feature'' is set to ''transcript'' and the sequence names in the reference include both transcript and gene names (e.g. see fasta files from Ensembl). It involves counting reads overlapping known transcripts. Reads are discarded if they overlap more than one isoform of the same gene or there is some ambiguity as from which gene they originated from. Count values are thus not very useful by themselves but can  be used for comparison of expression between conditions. \\ \\ Discarding multi-mapping reads leads to information loss and systematic underestimation of expression. The ''[mmseq|http://www.bgx.org.uk/software/mmseq.html]'' and ''[cufflinks|http://cufflinks.cbcb.umd.edu/]'' statistical methods can be used estimate gene and transcript level expression taking into account all reads. \\ \\ ''mmseq'' can only used with SAM/BAM files generated by the TopHat or Bowtie aligners. \\ \\ See also __standardise__ for a discussion on the types of values returned by these methods. \\ \\
* __normalisation__ \\ Normalisation is generally required to remove systematic effects that occur in the data. ''normalisation'' can be set to either ''none'' or ''tmm'', where ''tmm'' uses the trimmed mean of M-values for normalisation as implemented in the [edgeR|http://www.bioconductor.org/packages/release/bioc/html/edgeR.html] package and described [here|http://genomebiology.com/2010/11/3/R25]. \\ \\ __Note:__ when using ''cufflinks'' or ''mmseq'' with ''none'' or ''tmm'' the expression estimates do not correspond one-to-one to read counts. This because, unlike the ''count'' method which only uses uniquely mapping reads, both these methods try to estimate transcript abundance from all reads including multi-mapping ones (reads that map to more than one transcript or location).  \\ \\ 
* __reference__ \\ The ''reference'' should be set to either ''genome'' or ''transcriptome''. \\ \\
* __standardise__ \\ The three possible count methods available (''cufflinks'', ''mmseq'' and ''count'') produce different types of values by default: for the first two the expression estimates are in FPKM (Fragments Per Kilobase of exon per Million fragments mapped), and for ''count'' the values produced are in number of aligned reads. \\ \\ The type of values returned by the pipeline can be controlled by setting the ''standardise'' parameter to TRUE or FALSE, regardless of the counting method. They return respectively per feature (gene or transcript) counts/estimates and counts/estimates standardised by feature length and scaled to the number of aligned reads in the sample (FPKM).
* ... \\
!! Examples 
! Change the aligner and keep the rest of the default options:
ArrayExpressHTS( "E-MTAB-197", options = list( aligner="bowtie" ) )
! Override the options chosen by the pipeline to pass to Bowtie:
ArrayExpressHTS( "E-MTAB-197", 
                    options = list( aligner_options="-p 4 --phred33-quals --best --strata -a -S -m 40 -v 3 -I 0 -X 400" )

!!! For help

Please use our mailing list: [http://listserver.ebi.ac.uk/mailman/listinfo/arrayexpresshts]

Return to [ArrayExpressHTS Help Topics|http://www.ebi.ac.uk/Tools/rwiki/]