- Course overview
- Search within this course
- Introduction
- Real-time PCR
- What is Next Generation DNA Sequencing?
- RNA sequencing
- Biological interpretation of gene expression data
- Genotyping, epigenetic and DNA/RNA-protein interaction methods
- DNA/RNA-protein interactions
- Summary
- Quiz: Check your learning
- Your feedback
- Learn more
- References
Feature extraction
Feature extraction is the process of converting the scanned image of the microarray into quantifiable (computable) values and annotating it with the gene IDs, sample names and other useful information (Figure 5) (4).
This process is often performed using the software provided by the microarray manufacturer. The output of this process is raw (i.e. unprocessed) data files that can be in binary or text format (Table 1).
Table 1 Common microarray raw data file types.
Manufacturer | Typical raw data format | How to open / Analysis software examples |
Affymetrix | .CEL (binary) | R packages (affy, limma, oligo…) |
Agilent | feature extraction file (tab-delimited text file per hybridisation) |
R packages (e.g. limma) Spreadsheet software (Excel, OpenOffice, etc.) |
GenePix (scanner) | .gpr (tab-delimited text file per hybridisation) | Spreadsheet software (Excel, OpenOffice, etc.) |
Illumina | .idat (binary) | R packages (e.g. illuminaio) |
txt (tab-delimited text matrix for all samples) |
R packages (e.g. lumi) Spreadsheet software (Excel, OpenOffice, etc.) |
|
Nimblegen | NimbleScan, .pair (tab-delimited text matrix for all samples) | Spreadsheet software (Excel, OpenOffice, etc.) |
After the feature extraction process, the data can be analysed. Array manufacturers often provide software to open and analyse their raw data files. These programs may not always be available, may become obsolete after a few years, or may not be flexible enough for your needs. There are several free software tools that are suitable for the downstream processing of microarray files. Examples are the Galaxy platform, GenePattern, GeneSpring (licence required) and the statistics software R.
The functional genomics team at EMBL-EBI uses the R packages ‘oligo‘, ‘limma‘ and ‘lumi‘ (5) to analyse Affymetrix, Agilent and Illumina microarray data for the Expression Atlas.