Array Design Submission Guidelines

In this help page:

Overview

This page is about submitting microarray designs (array layout and annotation) to ArrayExpress. An array design describes how a microarray was manufactured, what was printed/synthesized at each position on the array, and what biological sequences these represent. The same array design can be used in many different hybridizations across many different experiments.

Use the diagram below to decide if you need to submit an array design to us. Red text is a hyperlink to the relevant help section.

 

array design submission decision tree how to check if already in AE experiment submission new commerical array design Affy and Nimblegen custom array designs submitting an array design  

 

 

Checking if an array design is already in ArrayExpress

If the array design you used has already been described in ArrayExpress then you do not need to submit it. Many commercial and academic array designs from organizations such as Affymetrix, Agilent, Illumina, Nimblegen and Sanger are already loaded into ArrayExpress. Use these links to find array designs already in ArrayExpress:

 

 

If your array design is already in ArrayExpress then you can use it in your experiment submission as follows:

  • for MIAMExpress submissions select the name of the array design used from the drop-down list that is provided when describing your hybridizations.
  • for Tab2MAGE/MAGE-TAB submissions enter the accession number of the array design in the 'Array[accession]' /'Array Design REF' column in the Hybridization/SDRF section of your spreadsheet.

 

New commercial array designs

If you used a commercial catalogue array design and you cannot find it in ArrayExpress please contact us at and tell us the exact name of the array design you used and the manufacturer and we'll let you know if we can get the information directly from the manufacturer or if we need you to submit the layout information to us. 

 

Custom Affymetrix and Nimblegen array designs

If you have used a custom array design you will need to submit the array design as an Array Design Format (ADF) file - see next section. There are two exceptions:

  • Affymetrix custom arrays - For most Affy custom arrays we will be able to create an array design using the information in the Affymetrix .CDF file plus some other library files containing the probe sequences and descriptions of the genes targeted by each probe set. Put these files on our FTP site and then email to tell us that they are there and what they contain. We will let you know if we are able to create a design from the files provided and will send an accession number for the array design. If you are not sure which files to provide email . Instructions for sending files using FTP >>
  • NimbleGen custom arrays - NimbleGen arrays are described by a .NDF file. As it is not simple to convert the NimbleGen NDF to our ADF format we ask you to provide the NDF as it is. Simply make an array design submission using MIAMExpress and upload the NDF in the part of the form where an ADF is requested. See instructions for submitting an array design.

 

 

Submitting an array design

If you need to submit an array design you can do this using the MIAMExpress submissions tool. You will need to first create an Array Design Format file to upload. Then go to the MIAMExpress login page:
MIAMExpress submissions login page >>

  • Create a MIAMExpress login account if you have not already done so.
  • Select 'Array design submission'. Enter the array details in the form provided and select or create your array manufacture protocol.
  • Upload your array description file (ADF) - see instructions on creating an ADF on this page.
  • Submit the design for curation.

After submitting an array design you can continue with your experiment immediately - you do not need to wait for us to process the array submission. To use your new array design:

  • for MIAMExpress experiment submissions select it from the lower half of the drop-down list that you will see when describing your hybridizations.
  • for Tab2MAGE/MAGE-TAB submissions enter the array design name in the 'Array[accession]' /'Array Design REF' column and we'll replace this with the array accession number after the array design has been processed.

Although you can complete your experiment submission before the array design is processed please be aware that the array design submission needs to be curated and loaded into ArrayExpress before we can process the experiment.

 

Creating an array design format (ADF) file

An array design format (ADF) file is simply a table with standardized column names describing what was printed/synthesized at each position on a microarray. The ADF file can be created in any spreadsheet application but must be saved as a tab delimited text file. An ADF file contains the following information:

  1. Location and name of each reporter (probe) on the array
  2. Annotation such as the nucleotide sequence or database accession associated with the reporter e.g. RefSeq
  3. Type of reporter (cDNA, oligonucleotide, RNA, DNA etc) that is present
  4. Group of reporter - which are the control and which are the experimental reporters
  5. Optional additional information - this can be used to show that several reporters are associated with the same gene for example, or add a comment about a reporter.

The following sections show you how to create an ADF file containing this information. A .gal file or data file can be a good place to start your ADF file from. The annotation you provide in the ADF file will be referenced by all data files that match the array. You can download a typical template header in tab delimited format and excel format and an example is also shown below, where the mandatory fields are in red bold font.

 

MetaColumn

MetaRow

Column

Row

Reporter Identifier

Reporter Name

Reporter BioSequence DatabaseEntry [database_code]

Reporter BioSequence Type

Reporter BioSequence Polymer Type

Reporter BioSequence [Actual Sequence]

Reporter Group [role]

Reporter Control Type

 

1. Location and name

 

Each spot on the array is called a feature. The position of each feature is described by 4 coordinates: MetaColumn, MetaRow, Column, Row.

 

MetaRow

MetaColumn

Column

Row

1

1

1

1

 

These 4 columns are mandatory in the ADF and each line in your ADF will correspond to one feature. Features cannot be duplicated on an array as each spot can occur only once, but reporters can be printed at several different locations. All the features that appear in your raw data files must be included in the ADF even if there is nothing spotted there.

 

  feature coordinate diagram

GAL files with Block, Column, Row coordinates

If you have a gal file containing Block, Column, Row coordinates you can convert these to MetaColumn, MetaRow, Column, Row coordinates using this tool:  GAL to ADF converter tool >>


Please note that this tool only creates the feature coordinate columns of the ADF file - you will need to add the rest of the columns as described in this help page.

 

Reporter Identifier and Reporter Name columns are mandatory. A reporter represents the sequence spotted on the array.  The Reporter Identifier is used internally in the ADF file, the name will be displayed and should be biologically meaningful, for example a gene or clone name. The identifier entered should be the same as the one you use for the reporter in final gene expression matrices and other normalized data files. We use the reporter identifier values in the array design files and data files to link array annotation to measurement values in data files.

 

MetaColumn

MetaRow

Column

Row

Reporter Identifier

Reporter Name

1

1

1

1

62D3T7

62D3T7

If the same sequence is spotted at more than 1 Feature then the Reporter Identifier can be repeated.

 

MetaColumn

MetaRow

Column

Row

Reporter Identifier

Reporter Name

1

1

1

1

62D3T7

62D3T7

1

1

1

2

62D3T7

62D3T7

 

If your Reporter doesn't have a name, for example if it is an unknown sequence, then repeat the Reporter Identifier in the Reporter Name column.

 

MetaColumn

MetaRow

Column

Row

Reporter Identifier

Reporter Name

1

1

1

1

oligoA

oligoA

1   
1   
oligoB rad51

 

Don't reuse names for different reporters that have different identifiers, we will ask you why. The example below is not acceptable.

 

MetaColumn

MetaRow

Column

Row

Reporter Identifier

Reporter Name

1

1

1

1

oligo1

unknown 1

1

1

1

1

oligo2

unknown 1

 

2. Annotation

 

We need database entries or actual sequence to describe the sequences on your array. At least one database entry or sequence is needed for MIAME compliance. If you do not have database entries for all reporters then we may contact you for more information.

We need to know which database these accession numbers are from and we ask you to supply a database code inside the [square brackets] in the header row. You can find a complete list of allowed databases here (use the values in the 'Name' column). A short list of common ones is below. If you supply entries from a database not already on this list then we will look at it to make sure it is suitable and we may ask you about it.

 

catma

flybase nasc refseq trembl

ensembl

image omim rgd unigene

embl

locus plasmodb tair wormbase

 

To provide a single database entry do this:

 

MetaColumn

MetaRow

Column

Row

Reporter Identifier

Reporter Name

Reporter BioSequence Database Entry [embl]

1

1

1

1

62D3T7

62D3T7

AJ000001

 

To show multiple database entries from the same database do this:

 

MetaColumn

MetaRow

Column

Row

Reporter Identifier

Reporter Name

Reporter BioSequence Database Entry [embl]

1

1

1

1

62D3T7

62D3T7

AJ000001;AJ000002

 

To show multiple database entries from different databases do this:

 

MetaColumn

MetaRow

Column

Row

Reporter Identifier

Reporter Name

Reporter BioSequence Database Entry [embl]

Reporter BioSequence Database Entry [refseq]

1

1

1

1

62D3T7

62D3T7

AJ000001

NM_000001;NM_000002

 

To include chromosome coordinates do this (change the part 'ucsc_hg17' to indicate the source of your coordinates):

 

MetaColumn

MetaRow

Column

Row

Reporter Identifier

Reporter Name

Reporter BioSequence Database Entry [chromosome_coordinate:ucsc_hg17]

1

1

1

1

62D3T7

62D3T7

chr1:1234-5678

 

If you have sequence verified information for your array then you can provide that in the Reporter BioSequence [Actual Sequence] column. This is not mandatory unless you have oligos on your array. Only supply actual sequence if there are oligos or if YOU have sequence verified it. If you have oligo arrays and the manufacturer will not allow you to add these, then continue with your submission, we may ask you about this later.

 

MetaColumn

MetaRow

Column

Row

Reporter Identifier

Reporter Name

Reporter BioSequence Database Entry [embl]

Reporter BioSequence [Actual Sequence]

1

1

1

1

62D3T7

62D3T7

AJ000001

actactactactact

 

 

3. Type of reporter

 

We need to know what Reporter Type you are using and how it was generated. These values go into the Reporter BioSequence Type and Reporter BioSequence Polymer Type columns. The Reporter BioSequence Type describes the material spotted on the array.   A list of common allowed values to put in these columns is shown below.

 

Reporter BioSequence Type   Reporter BioSequence Polymer Type
PCR_amplicon   DNA
ss_oligo   RNA
genomic_DNA   protein

 

MetaColumn

MetaRow

Column

Row

Reporter Identifier

Reporter Name

Reporter BioSequence Database Entry [embl]

Reporter BioSequence Type

Reporter BioSequence Polymer Type

1

1

1

1

62D3T7

62D3T7

AJ000001

ss_oligo

DNA

 

4. Group of reporter

 

We need to know how the Reporters on the array are grouped. There are two groups: Experimental and Control. There is a column to specify which group each Reporter belongs to called Reporter Group [role]. This column is mandatory.

Allowed values for this column are: Experimental and Control

 

MetaColumn

MetaRow

Column

Row

Reporter Identifier

Reporter Name

Reporter BioSequence Database Entry [embl]

Reporter BioSequence [Actual Sequence]

Reporter BioSequence Type

Reporter BioSequence Polymer Type

Reporter Group [role]

1

1

1

1

62D3T7

62D3T7

AJ000001

actactactactact

ss_oligo

DNA

Experimental

1

1

1

2

GAPDH

GAPDH

Q23451

atgcgcatcatcac

ss_oligo

DNA

Control

Describe what type of controls were used in the next column. If the spot is not a control then do not fill in anything in this column. The allowed values for this column are:

  • control_biosequence - for example a spike
  • control_buffer - buffer spotted on the array
  • control_empty - nothing spotted on the array
  • control_genomic_DNA - e.g. salmon sperm DNA
  • control_label - landing lights
  • control_reporter_size - size standard
  • control_spike_calibration - spike at varying concentrations
  • control_unknown_type

 

MetaColumn

MetaRow

Column

Row

Reporter Identifier

Reporter Name

Reporter BioSequence Type

Reporter BioSequence Polymer Type

Reporter Group [role]

Reporter Control Type

1

1

1

1

62D3T7

62D3T7

ss_oligo

DNA

Experimental

1

1

1

2

GAPDH

GAPDH

ss_oligo

DNA

Control

control_biosequence

 

5. Additional information

Reporter comment

 

If you want to add free text information about a Reporter add this column:

 

MetaColumn

MetaRow

Column

Row

Reporter Identifier

Reporter Name

Reporter Comment

Reporter BioSequence Database Entry [embl]

Reporter BioSequence Type

Reporter BioSequence Polymer Type

Reporter BioSequence [Actual Sequence]

1

1

1

1

62D3T7

62D3T7

your comment here

AJ000001

ss_oligo

DNA

actactactactact

 

Composite Sequences

 

If you want you can provide an additional level of information showing which Reporters represent the same gene. For example if you have different Reporters that correspond to two exons of a gene you might want to supply different information on these exons and also information on the gene. This done by providing extra columns that describe CompositeSequences. Columns that are used to add this information are shown below. Note that you need all the information in the example above as well as what is shown below in red bold.

 

Reporter Identifier

CompositeSequence identifier

CompositeSequence Name

CompositeSequence Database Entry [locus]

CompositeSequence Comment

oligoA

CS1

JunA

16476

Oncogene

oligoB

CS1

JunA

16476

Oncogene

 

CompositeSequence Identifier

A unique identifier for each CompositeSequence, this is just an id, and not a database entry that we will link to. Ids can be duplicated if the same CompositeSequence is on the array multiple times. If you provide this column then name and database entry are mandatory.

 

CompositeSequence Name

CompositeSequences need a name. If names are missing copy over the identifier into this field.

 

CompositeSequence Database Entry

One or more database entries (accession numbers) that describe the CompositeSequence. The database that the entries are from is provided inside [square brackets]. The example above is for LocusLink. A list of allowed values for databases is here (use values in 'Name' column). We will check that the id format you provide belongs to the database that you specified.

 

CompositeSequence Comment

A free text comment that describes the CompositeSequence.

 

Example ADF files for download

 

Array Type

.txt format

.xls format

Clone based

ADF_example_clone_based_array.txt

ADF_example_clone_based_array.xls

Oligo based

ADF_example_oligo_seq_adf.txt

ADF_example_oligo_seq_adf.xls

With composite sequences (complex)

ADF_example_composite_sequence.txt

ADF_example_composite_sequence.xls

 

Checking your ADF before submission

 

A tool is provided which will check your ADF for common formatting errors. The tool will report any problems in the ADF - please try to fix as many as you can before submitting the ADF as this will speed up the processing of your array design submission:
ADF format checking tool >>

ADF checkList

  1. File is in tab-delimited text format (not Excel)
  2. Feature coordinates are in MetaColumn, MetaRow, Column, Row format
  3. The following columns are included:
    • Reporter Identifier
    • Reporter Name
    • Reporter BioSequence Type
    • Reporter BioSequence Polymer Type
    • Reporter Group [role]
    • Reporter Control Type
  4. If it is an oligo array the column Reporter BioSequence [Actual Sequence] is included
  5. One or more Reporter BioSequence Database Entry [] columns are included (if sequence not included)

 

Any further questions, please see our FAQ.

spacer