How to submit

Annotare webform tool


  • FOR ALL NEW SUBMISSIONS. If for some reasons you cannot use Annotare, please contact us at annotare@ebi.ac.uk and we will advise further
  • All sequencing and microarray experiments with no complex design (e.g. one microarray design used, no sample pooling)
  • For experiments with up to 1000 hybridisations or sequencing assays
  • Automatic validation prior to submission to speed up downstream processing and loading into ArrayExpress
  • Annotare help

MAGE-TAB spreadsheet tool


  • To be retired by 31 August 2014. Please start your new submission with Annotare
  • Pipeline submitters: An IDF/SDRF upload-and-validate functionality will be available from Annotare soon. Please contact us at arrayexpress@ebi.ac.uk for more details.
  • Make sure you create your submission with a new MAGE-TAB spreadsheet template
  • MAGE-TAB submission help

 

 

More information about submissions

1. What do I need to submit?
2. Which experiment submission tool to use?
3. What to expect from each experiment submisssion tool
4. Common questions
5. Array Design submissions
6. Update/modify an experiment or array design which has been loaded into ArrayExpress

1. What do I need to submit?

ArrayExpress accepts all functional genomics data, including microarray-based and sequencing data. Microarray submissions follow the "Minimum Information About a Microarray Experiment" (MIAME) guidelines, while sequencing submissions follow a similar set of guidelines, "Minimum Information About a Sequencing Experiment" (MINSEQE).

Please try and get your submission in the best possible shape to avoid retrospective fixes post-curation, because such fixes could be tedious and lengthy (especially for sequencing experiments).

Data from human samples and individuals that can potentially lead to the identification of the donors (e.g. genomic DNA sequences) can be submitted to ArrayExpress if the data has been consented for public release. Such approvals typically would be given by the relevant ethics committees and ensuring this is the responsibility of the submitters. Identifiable data approved for controlled access should be submitted to the European Genome-phenome Archive (EGA) (more details here).

Following the MIAME/MINSEQE guidelines, we ask experiment submitters to provide:

 

Meta-data: E.g. Background biology of your experiment, aim of the experiment, biological materials/samples used and their characteristics, wet-lab and dry-lab procedures (protocols).

Raw data files: These are unprocessed data files obtained from the microarray scanner (e.g. Affymetrix CEL files, Agilent feature extraction files) or from the sequencing machine (e.g. fastq files).

Microarray experiment submitters: Please make sure the format of your raw data files matches one of the accepted formats. For Illumina BeadChip microarray experiments, please see this Illumina BeadChip submission FAQ page.

Sequencing experiment submitters: Please make sure your raw sequence read files are formatted and prepared according to specifications by the European Nucleotide Archive (ENA) and you've gathered all required information describing your samples/libraries/data files. The raw sequence read files must be sent to ArrayExpress via FTP. See this specific help page on sequencing submissions for more information.

Processed data files: These are data files generated from the raw data files, often involving e.g. normalisation or background-subtraction. In sequencing experiments, the processing can also involve alignment of sequencing reads to a reference genome, calculation of RPKM/FPKM values, etc.

Microarray experiment submitters: Please prepare files as in this per-hybridisation normalised file example or the data matrix format. We do not enforce strict formatting requirements for processed data files, but you're encouraged to use the recommended formats. Please also make sure you have provided a detailed normalisation/data transformation protocol in your meta-data, as that will allow your files to be interpreted properly.

Sequencing experiment submitters: We do not enforce strict formatting requirements on processed files, except for bam (alignment) files, which should follow bam file specification from the European Nucleotide Archive. The alignments should also be generated against a reference genome which has been deposited in the International Nucleotide Sequence Database Collaboration (INSDC, involving DDBJ, ENA, and GenBank). See this sequencing submissions help page for more details.

Top

 

 

2. Which experiment submission tool to use?

Both tools accept both microarray and sequencing experiments. They are designed to help you create submissions which adhere to the MIAME and MINSEQE guidelines:

Tool Experiment type Tool features Help
Annotare webform tool (recommended)
  • All experiments with no complex designs (e.g. one microarray design per experiment, no sample pooling)
  • For experiments with up to 1000 hybridisations or sequencing assays
  • Fill in a series of webforms
  • Upload data files by FTP, or by HTML within the tool
  • Automatic validation prior to submission to speed up downstream processing and loading into ArrayExpress
MAGE-TAB spreadsheet tool (to be retired by 31 August 2014)
  • Only for existing (incomplete) submissions that need to be finished before the tool's retirement
  • Bespoke meta-data spreadsheet template for your experiment
  • Upload filled spreadsheet (please always use a freshly-created template).
  • Upload .zip or .gz data file bundles (for microarray experiments)

Top

3. What to expect from each experiment submission tool

Annotare

Annotare offers:

  • webforms with text boxes to fill in;
  • drop-down lists to select terms from;
  • (for microarray experiments) a HTML data-file upload functionality inside the tool which allows you to upload microarray files and then assign each of them to specific hybridisation assay.
  • (for sequencing experiments, after uploading raw data files via FTP) a file name and MD5 checksum form to fill in. You can then assign files listed in the form to specific sequencing assays. The MD5 checksums will be used to validate file integrity.

Before you submit the experiment to ArrayExpress, the information you've provided will be automatically validated by Annotare. Any validation errors (e.g. missing information) will be highlighted. Please fix the issues and validate again because only experiments which pass validation can be submitted. If the errors persist, please refer to this Annotare help page or contact the Annotare team at annotare@ebi.ac.uk.

MAGE-TAB

New submitters: please check out this quick start guide.

You will be sending us a meta-data spreadsheet and data files. Microarray data files should be bundled in data archives. Supported archive formats are .zip or .gz only. Sequencing raw data files (e.g. fastq.gz or bam) must be uploaded to ArrayExpress via FTP.

Do not create the spreadsheet from scratch but use the MAGE-TAB submission tool to generate a template tailored for your experiment. Always use a freshly-created template for a new submission, even if you're a regular submitter, because the template-generation system is updated regularly. Since the templates have changed quite a lot over the years, we are unable to accept spreadsheets created from older templates.

Download the template spreadsheet file (which is a tab-delimited text file) to fill it in offline. The file will open in any spreadsheet program, e.g. OpenOffice Calc or Microsoft Excel. Most of the row/column headings in the template are mandatory, so please DO NOT remove or edit them. Refer to this MAGE-TAB help page for help with constructing the MAGE-TAB spreadsheet and data files.

Save the filled template as a tab-delimited text file for submission. We are unable to process any unfilled template spreadsheets or spreadsheets in Excel format (*.xls or *.xlsx files).

To submit, login to the submission tool again and upload the spreadsheet along with your microarray data file bundles (if any). Make sure you click "Submit Experiment" at the very end, after which you should receive an automatic confirmation email.

Top

4. Common questions

Top

5. Array design submissions

What is an array design?

For each microarray experiment submission, the submission tool will ask you for the array design (i.e. microarray chip/platform) used for each hybridisation. Many of these designs, especially commercially available ones, are already accessioned in ArrayExpress as array design format (ADF) files (one ADF for one microarray chip). The ADF file is a tabulated (spreadsheet-like) tab-delimited text file with a meta-data header followed by a multi-column table of probe information (here is an example).

Do I need to submit an array design for my experiment?

If your ADF is already accessioned in ArrayExpress, you do not need to submit any ADF to us. All you need to do is to specify the ADF by accession when asked by Annotare submission tool "What array did you use?", or insert the ADF's accession number in the Array Design REF column of your spreadsheet (if using MAGE-TAB submission tool).

To find out whether your array design is already in ArrayExpress and get its accession, search by keywords (e.g. manufacturer name, species name) on the ArrayExpress array design search page. If the search does not return any results, broaden your search here to include array designs we imported from NCBI Gene Expression Omnibus (GEO) (accession numbers prefixed by "A-GEOD"). As a rule of thumb, a GEO platform accession can be converted to the ArrayExpress equivalent by replacing "GPL" with "A-GEOD", e.g. "GPL1234" will be "A-GEOD-1234". See this ADF search help page for further details.

How do I submit commercially available array designs?

If you use a commercially available microarray which is not accessioned in ArrayExpress, please do the following:

  1. Download this meta-data template file;
  2. Fill in the template with as much background information about the array design as possible (here is a guide), especially the mandatory fields. Save it as a tab-delimited text (*.txt) file when done;
  3. Send us the filled template (*.txt file) and any annotation/support files for the array design by FTP;
  4. Write to us at arrayexpress@ebi.ac.uk, mentioning the files which have been uploaded.

An ArrayExpress curator will then create the ADF table from the annotation files on your behalf, and get the ADF accessioned in ArrayExpress. As the format of annotation files differ quite a lot from one manufacturer to another, please note that creation of ADFs often requires manual processing by the curator and can take up to five working days.

How do I submit custom array designs?

If you have a custom microarray which is not accessioned in ArrayExpress at all and is not commercially available, please download an array design format (ADF) template, fill it in, and submit it following these ADF submission guidelines. Find more information in this step by step guide on how to fill in the template.

Top

6. Update/modify an experiment or array design which has been loaded into ArrayExpress

Once we have finished curating and processing your submission, we will load the experiment or array design into ArrayExpress database. Occassionally, the loaded record needs to be modified or updated, perhaps because a submission error needs to be fixed, or you need to add/remove samples or files in response to comments from peer reviewers of your pre-published paper.

To make any changes to an experiment or array design which has been loaded into ArrayExpress database, please email us at arrayexpress@ebi.ac.uk with the accession number and describe what corrections are needed. A curator will unload your data from the database, and may be able to do minor corrections such as fixing a few typing errors, before re-loading the corrected version back into the database. Data sets which undergo major changes, such as addition/removal of samples/assays/data files for an experiment, would most likely require re-curation before they are fit for loading into ArrayExpress again, and hence could take a considerable amount of time to complete.

Special rules apply to sequencing experiments, because the storage of meta-data and raw data files is split between ArrayExpress and ENA respectively. Please refer to our guide to updating a sequencing experiment for further details.