FTP Archives

Files (metadata and data) for public experiments and array designs in ArrayExpress can be downloaded from our public FTP site.

The FTP address is ftp.ebi.ac.uk/pub/databases/microarray/data

How are the files organised?

The FTP site consists of two major directories:

array Descriptions of all array designs in ArrayExpress.
experiment Data files and meta data (sample descriptions, protocols, etc) for all experiments in ArrayExpress.

Under each directory, files are grouped per experiment or per array design, first by the four letter code and then the accession number.

For example:

 

File types in experiment directories

The experiment information (meta data) and data files are available in a number of different formats.

File extension Description
.idf.txt Top level information about the experiment including title, description, submitter contact details and protocols.
.sdrf.txt Information about the samples, the relationships between the samples, extracts, labeled extracts, hybridizations, factor values and data files.
.raw.1.zip All raw data files as named in the .sdrf.txt file (see above).
Files are in .CEL format and generic tab-delimited format for Affymetrix and non-Affymetrix experiments respectively. Large data sets are split into a number of different archives with extensions such as ".raw.1.zip", ".raw.2.zip", etc.
.processed.1.zip All processed data files as named in the .sdrf description file.  (Not all experiments have processed data files.)
Files are in generic tab delimited text format. Each processed file may contain data from 1 or more hybridisations/assays.  Similar to raw data files, large data sets are split into a number of different archives with extensions such as ".processed.1.zip", ".processed.2.zip", etc.
.bam, .bam.bai, .bam.prop For a subset of highly-curated RNA-seq experiments in ArrayExpress, BAM alignment files are avialable. They are generated by the ArrayExpress team by processing the experimental raw FASTQ data files with the ArrayExpressHTS BioConductor package on the EBI R Cloud workbench.
.eSet.r For some experiments, ArrayExpress team generated R ExpressionSet objects using version 2.11 of the ArrayExpress BioConductor package. The R object can be easily loaded into Bioconductor for downstream analysis. See the "help page and the ArrayExpress package vignette for more details on using these objects.
additional.1.zip Additional files provided by the experiment's submitter which are supplementary to the IDF, SDRF, raw data and processed data files.  The format and content of these files are not controlled.

For some experiments there are some legacy files such as MAGE-ML files still available on the FTP site.

 

File types in array directories

File extension Description
.adf.txt "adf" stands for "array design file". This file captures all the annotation about the probes on the array, e.g. the unique probe identifiers, the coordinate location of the probe on the microarray, genomic mapping location of each probe, whether a probe is in the experimental or control set, what kind of control probe it is (negative control, empty spot with buffer only, etc).
.features.txt
.mageml.tar.gz
.reporters.txt
These are legacy files in MAGE-ML format, which was previously used in ArrayExpress (until mid-2011), but no longer supported in ArrayExpress after switching to MAGE-TAB format. All .adf.txt files (see description above) are now in MAGE-TAB format. MAGE-ML files are included for some arrays for the sake of completeness.