Accession Codes

Experiments and array designs in ArrayExpress are given unique accession numbers in the format of

  • E-XXXX-n for experiments
  • A-XXXX-n for array designs

XXXX represents a four letter code and n is a number.

E.g. E-MEXP-568, A-UHNC-18. Some experiments also have secondary accession numbers.

Accessions numbers are generated when sufficient meta-data and data files are provided for a submission. Please refer to our FAQ for more details about obtaining accession number for your data set.

The four letter code in the accession number indicates the data source. The source can be:

  • ArrayExpress submission tools: MEXP for MIAMExpress (deprecated since July 2014), MTAB for Annotare, and TABM for Tab2MAGE (deprecated since January 2012).
  • Data management tools / submisson pipelines from other research organizations (see full list below).

Note that the 4 letter code does not necessarily tell you which organization performed the experiment or manufactured the array design.

Full list of four-letter codes

Please note that some codes are no longer in active use, i.e. they are not used in accession numbers for new submisisons but are still valid. Some URLs are dead, or point to webpages which are out of maintenance, but are included in the table below for completeness.

Code Source URL
AFFY Affymetrix
AFMX Affymetrix data sets processed by an EBI in-house script  
AGIL Agilent
ATMX Arabidopsis experiments and array designs submitted through the At-MIAMExpress submission tool tool deprecated
BAIR Biological Atlas of Insulin Resistance (BAIR) project
BASE BASE microarray data management tool
BIOD BioDiscovery microarray data management tool
BUGS Bacterial Microarray Group at St George's, University of London (BuG@S)
CAGE Compendium of Arabidopsis Gene Expression plant developmental time series project and
CBIL Computational Biology and Informatics Laboratory at University of Pennsylvania
DKFZ German Cancer Research Center
DORD DDBJ Omics Archive (DOR)
EMBL European Molecular Biology Laboratory
ERAD European Read Archive Data (pipeline submission from Wellcome Trust Sanger Institute. "European Read Archive" is now renamed the "Sequence Read Archive" (SRA) as part of INSDC)
NAR article on the SRA
FLYC FlyChip Microarray services, Cambridge Systems Biology Centre, UK
FPMI Functional Pathogenomics of Mucosal Immunity project
GEAD Genomic Expression Archive (GEA)
GEHB GE Healthcare array designs
GEOD NCBI Gene Expression Omnibus (GEO) See also the How data is imported from GEO.
GEUV Genetic European Variation in Health and Disease (GEUVADIS), A European Medical Sequencing Consortium
HGMP Human Genome Mapping Project Resource Centre (now closed)
IPKG Leibniz Institute of Plant Genetics and Crop Plant Research (IPK-Gatersleben)
JCVI J. Craig Venter Institute
JJRD Johnson & Johnson Pharmaceutical Research and Development
LGCL LGC Limited
MANP Coding of experiment was manually prepared by EBI staff  
MARS Microarray Analysis and Retrieval System (MARS) from Graz University of Technology, Institute for Genomics and Bioinformatics
MAXD University of Manchester, Micorarray Group maxd software
MEXP EBI MIAMExpress webform submission tool (deprecated since July 2014)
MIMR MiMiR data warehouse at the Microarray Centre, Clinical Sciences Centre, Medical Research Council
MNIA Laboratory of Genetics, National Institute on Aging,
National Institutes of Health
MTAB Experiments in MAGE-TAB format, submitted via the MTAB spreadsheet submission tool (retired in September 2014), or via Annotare
MUGN Integrated Functional Genomics in Mutant Mouse Models as Tools to Investigate the Complexity of Human Immunological Disease (MUGEN) project and
NASC European Arabidopsis Stock Centre
NCMF Netherlands Cancer Institute Central Microarray Facility
NGEN Nimblegen
RUBN Gerry Rubin's lab
RZPD RZPD German Resource Center for Genome Research (no long accessible) (old URL)
SGRP Saccharomyces Genome Resequencing Project , Wellcome Trust Sanger Institute
SMDB Stanford Micorarray Database (moving to Princeton University, as of January 2013)
SNGR Wellcome Trust Sanger Institute
SYBR Sybaris Project
TABM EBI tab2mage spreadsheet submission tool (deprecated since January 2012)
TIGR The Institute for Genomic Research (now part of the J. Craig Venter Institute)
TOXM Toxicogenomics experiments  
UCON The Hutchison/MRC Research Centre
UHNC University Health Network Canada
UMCU University Medical Center Utrecht
WMIT Whitehead Institute for Biomedical Research/ Massachusetts Institute of Technology


Secondary accession numbers

Some experiments or array designs have a secondary accession number which links to another ArrayExpress experiment/array design, or to an external data source. The secondary accession number is shown in the "Links" section on an experiment's page. For example, for experiment E-GEOD-42281:

Expt Secondary Accession Numbers example

For array designs, the secondary accession number can be found in the header of the actual design file (in tab-delimited text format), e.g. for array design A-GEOD-9349:

ADF Secondary Accession Numbers example

Common reasons for secondary accession numbers

  1. Data re-analysis: if the data provided is a re-analysis of another dataset, the accession number of the original experiment will be the secondary accession number
  2. NCBI GEO-imported data: experimental data imported from the NCBI Gene Expression Omnibus (GEO) will have the GEO series (prefix "GSE") or data set (prefix "GSD") identifier as the secondary accession number. Similarly, array design will have the GEO platform (prefix "GPL") accession number. The ArrayExpress primary accession and GEO secondary accession for GEO-imported data are intuitively correlated. E.g. GEO accession "GSE12345" would be associated with ArrayExpress accession "E-GEOD-12345" . See also How data is imported from GEO.
  3. High-throughput sequencing experiments: they will have a link to the European Nucleotide Archive (ENA) or European Genome-phenome Archive (EGA) where the raw data files (e.g. fastq reads) are kept.