We have stopped the regular imports of Gene Expression Omnibus (GEO) data into ArrayExpress. We will keep using data from GEO to build our added value database Expression Atlas, and the reprocessed and additionally annotated data for selected datasets will be available from there.
Some of the experiments and platforms in ArrayExpress have been imported from the Gene Expression Omnibus (GEO) at the NCBI.
We import data on a weekly basis from GEO. SOFT files and data files are downloaded and then we extract the experiment description, sample annotation and experimental factor information etc using custom automated methods and text mining tools. Some experiments are also manually curated, particularly if selected for inclusion in the Expression Atlas.
GEO platform designs are imported in a similar way to experiments. Affymetrix, Agilent, Illumina catalogue platform designs are not imported from GEO but instead experiments that use them are linked to the ArrayExpress equivalent, as in this example: E-GEOD-35642.
You can query ArrayExpress using either the ArrayExpress accession or original GEO accession for a given experiment.
Imported experiments have ArrayExpress accession numbers in the format of E-GEOD-n, where n is a number. In ArrayExpress accessions, the number part is the same as the number in the original GEO series accession, e.g. GEO accession "GSE12345" would become "E-GEOD-12345" in ArrayExpress. Likewise, for array/platform designs, GEO accessions "GPL567" would become A-GEOD-567 upon import.
On an experiment's page (e.g. E-GEOD-35642), we provide a link that will take you back to the original entry in GEO:
Updates of imported data
Please note that the GEO and ArrayExpress databases are not synchronized - if annotation and/or data files are updated in GEO this will not necessary be reflected in the corresponding ArrayExpress entry immediately. We are currently working with GEO to develop a synchronization process.