![]() |
MAGE-TAB format files for describing array experiments and designsMAGE-TAB is a simple tab-delimited, spreadsheet-based format for sharing microarray data in a MIAME compliant fashion. All experiments and array designs in ArrayExpress now have associated MAGE-TAB files available for download, and experiments can be submitted to ArrayExpress in MAGE-TAB format. For information about all files available for download see the FTP archives help page. This web page gives a brief description of the MAGE-TAB files used to describe experiments and array designs and how they link together:
Note: ArrayExpress Bioconductor package. The naming convention for the raw and processed data file archive has changed (Oct 2009). You must use the latest version of the ArrayExpress package to access them.
Experiment MAGE-TAB filesThere are three types of file that are used to capture information about a microarray experiment and the results in MAGE-TAB format
IDF filesThe IDF file is used to give an overview of the experiment, including the experimental variables (factors), quality control strategy, contact details, publication information and protocols. Also included in the IDF file is an (optional) list of sources from which controlled vocabulary terms may have been used elsewhere in the MAGE-TAB document. The information is entered in specific fields e.g. The IDF for experiment E-MEXP-1718
SDRF filesThe SDRF file describes the relationship between every step in the chain of biological materials used in the experiment from source material through to the hybridization, and the acquisition and normalization of data. Experimental factors, protocols, protocol parameters and term sources defined in the IDF are referenced in the SDRF. Generally, in the SDRF each row corresponds to a sample. In two colour experiments, as in the example below, there are two rows (two samples) per hybridization. In single channel experiments such as Affymetrix experiments there is usually just one row per sample. Situations such as pooling of samples to create a common reference, technical replicates in which an extract is hybridized more than one time, or an extract is split and labeled with more than one dye can also occur. The first part of the SDRF describes the characteristics of the samples. E.g.
The next section of the SDRF shows the processing steps applied to create the extracts and labeled extracts and which labeled extracts were used in each hybridization. In the example below, the hybridization names are repeated twice because there are two labeled extracts linked with each hybridization.
A diagram showing the relationship between the samples, extracts, labeled extracts and hybridizations is also provided for each experiment in ArrayExpress as .png and .svg files. These can be opened in your browser window. E.g.
The final part of the SDRF lists which array design was used for each hybridization, which data files go with which hybridizations and the experimental factors (variables) associated with each hybridization. The factor values may be sample characteristics (such as genotype) or may be an external treatment (such as growth in low oxygen conditions). If there are multiple scans of the same hybridizations (resulting in multiple files per hyb) then the Scan Name column will list unique names for each of the scans. The raw data files are listed under the 'Array Data Matrix File' column and the processed data files in either the 'Derived Array Data File' (for per hyb/scan processed data files) or 'Derived Array Data Matrix File' columns (for a combined file containing data across all hybs/scans.
Data filesThere are two types of data files that are associated with experiments: raw and processed and these are found in the E-XXXX-n.raw.1.zip and E-XXXX-n.processed.1.zip archives in the FTP directory for each experiment. Large datasets are split into several raw and processed file archives and are numbered sequentially e.g. raw.1.zip, raw.2.zip etc. The names of the files in the archives will correspond to the names of the files listed in the SDRF. The raw data files are re-formatted so that they are in a generic format with feature coordinates in MetaColumn, MetaRow, Column, Row format. This means that specific software is not needed to access the data files. The exceptions are Affymetrix CEL files, Nimblegen raw data files and some unusual file formats that can't be processed in the usual way. In the following example, GenePix raw data files have been converted to the ArrayExpress generic format. The data values can be linked to the array design annotation using either the feature coordinates or the reporter (probe) identifiers.
The processed data files will be in the format of MAGE-TAB data matrices. E.g.
Array design MAGE-TAB filesEventually, all array designs files in ArrayExpress will be in MAGE-TAB format. At the moment they are in a similar format to that specified by MAGE-TAB and can be used in conjuction with MAGE-TAB data files. E.g.
The MAGE-TAB version of this array design file will be like this:
Linking between experiment and array design informationThe following is a summary of how the different MAGE-TAB files link together to give complete information about an experiment.
More information about MAGE-TABMore detailed information about MAGE-TAB format and submitting data in this format to ArrayExpress can be found here
![]() |