spacer
spacer

GEO Data in ArrayExpress

Some of the experiments in ArrayExpress have been imported by the ArrayExpress team from the Gene Expression Omnibus (GEO) at the NCBI. 

We import data on a weekly basis from GEO. All GEO experiments which are in GEO datasets on catalogue Affymetrix and Agilent platforms are imported and we re-curate these before loading into ArrayExpress. We also import all GSE on these platforms and these are loaded uncurated if they pass our quality checks (e.g. no corrupt data files).

For selected experiments the SOFT files and data files are obtained. We then extract the experiment description, sample annotation and experimental factor information etc using custom automated methods and text mining tools. Finally, the experiment information is manually curated before it is loaded into the ArrayExpress database. For more information about the programs used please email

Experiments imported from GEO have ArrayExpress accession numbers in the format of E-GEOD-n, where n is a number. They also have a secondary accession number (shown below the experiment title in the detailed view) which is the original GEO series identifier. You can also query for the GEO series identifier (e.g. GSE3038) of an experiment from the browse interface.

Experiment with secondary accession number

Clicking on the GEO secondary accession number will take you to the original series entry in GEO. Note that the GEO and ArrayExpress databases are not synchronized - if annotation and/or data files are updated in GEO this will not necessary be reflected in the corresponding ArrayExpress entry. We are currently working with GEO to develop a synchronization process.

Any further questions, please see our FAQ.

spacer
spacer