How to submit
- FOR ALL NEW SUBMISSIONS. If for some reasons you cannot use Annotare, please contact us at email@example.com and we will advise further
- Getting started with Annotare
- All sequencing and microarray experiments, up to 1000 assays, with no complex design (e.g. one microarray design used, no sample pooling)
- Fill in a series of webforms (auto-saved), featuring auto fill-down and copy/paste functions
- Upload data files directly within the tool or via FTP
- Automatic validation prior to submission to speed up downstream processing and loading into ArrayExpress (common validation errors and fixes)
MAGE-TAB submission tool was retired on 1 September 2014: Please start your new submission with Annotare. For pipeline submitters, an IDF/SDRF upload-and-validate functionality will be available from Annotare soon. Please contact us at firstname.lastname@example.org for more details. MAGE-TAB help is here if you still need it.
Details about experiment and array design submissions
ArrayExpress accepts all functional genomics data, including microarray-based and sequencing data. Microarray submissions follow the "Minimum Information About a Microarray Experiment" (MIAME) guidelines, while sequencing submissions follow a similar set of guidelines, "Minimum Information About a Sequencing Experiment" (MINSEQE).
Data from human samples and individuals that can potentially lead to the identification of the donors (e.g. genomic DNA sequences) can be submitted to ArrayExpress if the data has been consented for public release. Such approvals typically would be given by the relevant ethics committees and ensuring this is the responsibility of the submitters. Identifiable data approved for controlled access should be submitted to the European Genome-phenome Archive (EGA) (more details here).
Following the MIAME/MINSEQE guidelines, we ask experiment submitters to provide:
Meta-data: E.g. Background biology of your experiment, aim of the experiment, biological materials/samples used and their characteristics, wet-lab and dry-lab procedures (protocols).
Raw data files: These are unprocessed data files obtained from the microarray scanner (e.g. Affymetrix CEL files, Agilent feature extraction files) or from the sequencing machine (e.g. fastq files).
Microarray experiment submitters: Please make sure the format of your raw data files matches one of the accepted formats. For Illumina BeadChip microarray experiments, please see this Illumina BeadChip submission FAQ page.
Sequencing experiment submitters: Please make sure your raw sequence read files are formatted and prepared according to specifications by the European Nucleotide Archive (ENA) and you've gathered all required information describing your samples/libraries/data files. The raw sequence read files must be sent to ArrayExpress via FTP. See this specific help page on sequencing submissions for more information.
Processed data files: These are data files generated from the raw data files, often involving e.g. normalisation or background-subtraction. In sequencing experiments, the processing can also involve alignment of sequencing reads to a reference genome, calculation of RPKM/FPKM values, etc.
Microarray experiment submitters: Please prepare files as in this per-hybridisation normalised file example or the data matrix format. We do not enforce strict formatting requirements for processed data files, but you're encouraged to use the recommended formats. Please also make sure you have provided a detailed normalisation/data transformation protocol in your meta-data, as that will allow your files to be interpreted properly.
Sequencing experiment submitters: We do not enforce strict formatting requirements on processed files, except for bam (alignment) files, which should follow bam file specification from the European Nucleotide Archive. The alignments should also be generated against a reference genome which has been deposited in the International Nucleotide Sequence Database Collaboration (INSDC, involving DDBJ, ENA, and GenBank). See this sequencing submissions help page for more details.
- General submission questions: Submission FAQ
- How to use Annotare for submission: Annotare help
- How to upload data files via FTP for submission: FTP upload instructions
- Getting ArrayExpress accession number for your experiment: Accession number
- How to change release dates for your unpublished data: Release date changes
What is an array design?
For each microarray experiment submission, the submission tool will ask you for the array design (i.e. microarray chip/platform) used for each hybridisation. Many of these designs, especially commercially available ones, are already accessioned in ArrayExpress as array design format (ADF) files (one ADF for one microarray chip). The ADF file is a tabulated (spreadsheet-like) tab-delimited text file with a meta-data header followed by a multi-column table of probe information (here is an example).
Do I need to submit an array design for my experiment?
If your ADF is already accessioned in ArrayExpress, you do not need to submit any ADF to us. All you need to do is to specify the ADF by accession when asked by Annotare submission tool "What array did you use?", or insert the ADF's accession number in the
Array Design REF column of your spreadsheet (if using MAGE-TAB submission tool).
To find out whether your array design is already in ArrayExpress and get its accession, search by keywords (e.g. manufacturer name, species name) on the ArrayExpress array design search page. If the search does not return any results, broaden your search here to include array designs we imported from NCBI Gene Expression Omnibus (GEO) (accession numbers prefixed by "A-GEOD"). As a rule of thumb, a GEO platform accession can be converted to the ArrayExpress equivalent by replacing "GPL" with "A-GEOD", e.g. "GPL1234" will be "A-GEOD-1234". See this ADF search help page for further details.
How do I submit commercially available array designs?
If you use a commercially available microarray which is not accessioned in ArrayExpress, please do the following:
- Download this meta-data header template file;
- Fill in the template with as much background information about the array design as possible (here is a guide), especially the mandatory fields. Save it as a tab-delimited text (*.txt) file when done;
- Send us the filled template (*.txt file) and any annotation/support files for the array design (e.g. NimbleGen NDF file, GAL file for spotted arrays) by FTP because the support files contain information about the probes;
- Write to us at email@example.com, mentioning the files which have been uploaded.
An ArrayExpress curator will create the full ADF on your behalf by combining the meta-data header and some information converted from the annotation/support files. As the format of annotation files differ quite a lot from one manufacturer to another, please note that creation of ADFs often requires manual processing by the curator and can take up to five working days.
How do I submit custom array designs?
If you have a custom microarray which is not accessioned in ArrayExpress at all and is not commercially available, please download an array design format (ADF) template, fill it in, and submit it following these ADF submission guidelines. Find more information in this step by step guide on how to fill in the template.
Once we have finished curating and processing your submission, we will load the experiment or array design into ArrayExpress database. Occassionally, the loaded record needs to be modified or updated, perhaps because a submission error needs to be fixed, or you need to add/remove samples or files in response to comments from peer reviewers of your pre-published paper.
To make any changes to an experiment or array design which has been loaded into ArrayExpress database, please email us at firstname.lastname@example.org with the accession number and describe what corrections are needed. A curator will unload your data from the database, and may be able to do minor corrections such as fixing a few typing errors, before re-loading the corrected version back into the database. Data sets which undergo major changes, such as addition/removal of samples/assays/data files for an experiment, would most likely require re-curation before they are fit for loading into ArrayExpress again, and hence could take a considerable amount of time to complete.
Special rules apply to sequencing experiments, because the storage of meta-data and raw data files is split between ArrayExpress and ENA respectively. Please refer to our guide to updating a sequencing experiment for further details.