Evolutionary tools for genomic analysis
AYB is a base caller for the Illumina Genome Analyzer, using an explicit statistical model of how errors occur during sequencing to produce more accurate reads from the raw intensity data.

In particular, AYB deals with three sources of error:
In contrast to other base-calling approaching, AYB uses a general model of phasing estimated directly from the data rather than assuming that it occurs at a constant rate for all cycles. Dealing with phasing in this manner means that the base calls made by AYB at the end of each read tend to be more accurate than other methods, making greater read lengths feasible and increasing the number of the highest quality reads: AYB returning 2.8 times as many perfect reads than other base callers for 100 cycle data (with smaller gains for shorter reads).
By default AYB performs per-tile analysis, estimating phasing and cross-talk separately for every tile. This level of analysis is more processor intensive than the Illumina analysis pipeline but can be efficiently split between machines: an entire 8 lane run of 45 cycle data (95 million clusters) can be analysed within an hour on a modern eight-core server, as could 2 million clusters of much longer 101 cycle data. In addition AYB offers two options to reduce the total computational burden: fixing the cross-talk matrix across tiles, either at a value previously estimated by AYB or the Illumina pipeline, allows phasing to be solved analytically in each iteration and so speeding up estimation considerably; alteratively a Bustard-like approach can be used, estimating the cross-talk and phasing from a few tiles and then holding them fixed while calling bases for the remaining tiles.
AYB is freely available under the GNU General Public Licence version 3 (see www.gnu.org for further information). A copy of the licence is provided with the software.
Latest version of AYB.
Build instructions for Version II are in the README file.
The Version II AYB Manual contains user information including program options.
This version of AYB is the one on which Massingham and Goldman (2012) is based.
Original pre-release version of AYB with older phasing model. Historic interest only.
The ciftools package for manipulating CIF format intensities may also be useful.
AYB intensities
will process cif file intensities in one block using 5 iterations and output a fastq file, both in the current directory with log messages to stderr.
AYB -b R76R76 -i cifdir -o outputdir s_3_1301
will process a 76 base paired-end from the file s_3_1301.cif stored in the directory cifdir. Output will be stored in outputdir
AYB -i runfolder -b R8R108R108 -r L1T1301-2301
will process a 108 base paired-end run, with an additional 8 base index between the pairs, from a run folder. All the tiles between 1301 and 2301 will be processed from lane 1.
PhiX – 76 cycle control lane (27 tiles). Sanger Institute.
B. pertussis – 76 cycle paired-end data from a problematic run (100 tiles). Sanger Institute.
HiSeq – 101 cycle paired-end data from a HiSeq machine with PhiX spike-in. Illumina corp.
Ibis Test – 51 cycle test set of data distributed with the Ibis base-caller
NA19240/BGI (archive) – 45 cycle paired-end data from BGI (part of 1KGP, pilot 2, individual NA19240)
NA19240/Illumina (archive) – 51 cycle paired-end data from Illumina (part of 1KGP, pilot 2, individual NA19240)
AYB paper (Genome Biology open access)
All Your Base: a fast and accurate probabilistic approach to base calling. T. Massingham and N. Goldman (2012) Genome Biology 13:R13
Figures
Fig 1. Comparison of error rates.
Fig 2. Frequency of errors for B. pertussis data.
Fig 3. Quality calibration comparison between AYB and Ibis.
Supplementary – Fitting a block tridiagonal information matrix by ML
Supplementary (old) – Rapid estimation of M, P and N
Basecalls – Basecalls for data sets in manuscript
Please direct any queries to ayb@ebi.ac.uk
20 December 2012
26 August 2012
31 May 2012
25 April 2012
04 April 2012
29 Feburary 2012
21 February 2012
16 December 2011
01 December 2011
26 October 2011
18 October 2011
14 Sept 2011
1 Sept 2011
29 July 2011
22 July 2011
10 May 2011
17 Feb 2011
08 Feb 2011
21 Jan 2011
07 Dec 2010
28 Nov 2010
07 Oct 2010
21 May 2009