GEO Data

Some of the experiments and platforms in ArrayExpress have been imported from the Gene Expression Omnibus (GEO) at the NCBI.

GEO experiments

We import data on a weekly basis from GEO. SOFT files and data files are downloaded and then we extract the experiment description, sample annotation and experimental factor information etc using custom automated methods and text mining tools. Some experiments are also manually curated, particularly if selected for inclusion in the Expression Atlas.

GEO platforms

GEO platform designs are imported in a similar way to experiments. Affymetrix, Agilent, Illumina catalogue platform designs are not imported from GEO but instead experiments that use them are linked to the ArrayExpress equivalent, as in this example: E-GEOD-35642.

Accession numbers

You can query ArrayExpress using either the ArrayExpress accession or original GEO accession for a given experiment.

Imported experiments have ArrayExpress accession numbers in the format of E-GEOD-n, where n is a number. In ArrayExpress accessions, the number part is the same as the number in the original GEO series accession, e.g. GEO accession "GSE12345" would become "E-GEOD-12345" in ArrayExpress. Likewise, for array/platform designs, GEO accessions "GPL567" would become A-GEOD-567 upon import.

On an experiment's page (e.g. E-GEOD-35642), we provide a link that will take you back to the original entry in GEO:

Experiment with secondary accession number

Updates of imported data

Please note that the GEO and ArrayExpress databases are not synchronized - if annotation and/or data files are updated in GEO this will not necessary be reflected in the corresponding ArrayExpress entry immediately. We are currently working with GEO to develop a synchronization process.