How to get data from ArrayExpress
For each experiment in ArrayExpress, MAGE-TAB files describing the experiment and the associated results are available for download. MAGE-TAB is a simple tab-delimited format for sharing functional genomics data. In our example we want to download all data files for the experiment E-MEXP-3431 (Figure 9) as described in the steps below.
File types in experiment directories
- Investigation Description Format (IDF) file
- Sample and Data Relationship Format (SDRF) file
- Array Design Format (ADF) file (microarray only)
- Raw and processed data files
Each file describes a specific aspect of the selected experiment and is needed to understand what the experiment studied, how it was carried out, the results obtained and how they can be interpreted. Let's consider the file types associated with the experiment E-MEXP-3431 (Figure 10).
Investigation Description Format file
To start lets examine the Investigation Description Format (IDF) file which gives an overview of the experiment, including the experimental variables (factors), quality control strategy, contact details, publication information and protocols. You'll notice that a lot of information displayed in the interface for E-MEXP-3431, such as description, contact details etc., is also stored in this file (Figure 11).
Sample and Data Relationship Format file
The Sample and Data Relationship Format (SDRF) file describes the sample characteristics and the relationship between samples, arrays, data files etc. The information in the SDRF is organised so that it follows the natural flow of a functional genomics experiment. It begins by describing the experiment samples and finishes with the names of the data files generated from the analysis of the experiment results. For single-channel data, such as Affymetrix experiments, one row in the SDRF is equal to one hybridisation. For two-channel data one row is equal to one channel. Situations such as pooling of samples to create a common reference, technical replicates in which an extract is hybridized more than one time, or an extract is split and labeled with more than one dye can also occur.
Lets take a closer look at the various parts of the SDRF for E-MEXP-3431 which is a single channel Affymetrix experiment (Figure 12).
Array Design File
For microarray experiments a link to the array will be provided, from which you can download the complete array design file (ADF) (Figure 13). This file describes how a array was manufactured and what was printed/synthesised at each position on the array.
There are two types of data files that are associated with experiments: raw and processed and these are found in the E-XXXX-n.raw.1.zip and E-XXXX-n.processed.1.zip archives. Large datasets are split into several raw and processed file archives and are numbered sequentially e.g. E-MEXP-3431.processed.1.zip, E-MEXP-3431.processed.2.zip, E-MEXP-3431.processed.3.zip, E-MEXP-3431.processed.4.zip. The names of the files in these archives will correspond to the names of the files listed in the SDRF.
For some experiments the processed data file will be in the format of a MAGE-TAB data matrix. This file contains data from more than one hybridization, scan or normalization, in a single data file. This format allows data columns to be mapped to rows in the SDRF file (Figure 14).