Submitting assembled and annotated sequences

Submission of sequence information to the primary nucleotide sequence archives prior to publication has become standard practice. A unique accession number is assigned by the database which permanently identifies the sequence submitted. The database accession number should be included in the manuscript, preferably on the first page of the journal article, or as required by individual journal procedures. This procedure ensures availability of new sequence data in a timely fashion.

Please note that it is only necessary to submit to one of the INSDC databases. Public data are exchanged between the three INSDC databases, ENA, GenBank and DDBJ on a daily basis.

How to submit

Webin checklist submissions

Webin is ENA's preferred submission tool. Webin guides the user through a sequence of forms and checklists allowing interactive submission of sequence data and descriptive information.

Webin checklists (e.g. for prokaryote 16S rRNA genes) provide a straightforward way to submit sequences with annotation. Sequences, source and other feature annotation can be uploaded via fasta files or spreadsheets for interactive editing, validation and submission. Webin can be used to submit single or multiple entries. Checklists are improved and new ones introduced from time to time.

Submitters are able to modify and view their data prior to submission in the format in which it will be finally published. For most users, Webin is the easiest way to submit sequences into ENA and the quickest way to get your accession numbers assigned.

Webin contains a number of annotation submission checklists, grouped into categories, for various sequence types. For a list of currently available checklists and descriptions please see here.

Webin submitters are encourage to contact datasubs@ebi.ac.uk for expert advice on checklist selection and for requesting for the addition of new checklists.

Webin Entry Upload submissions

Submitters can also upload pre-prepared flat files (known as EMBL format files) to ENA.

To upload ENA supported flat files you need to use the ‘Entry Upload’ option in Webin. Please make sure you have checked available Webin sequence checklists and there is no suitable checklist available before you consider using ‘Entry Upload’.

There are two options for creating ENA supported flat files:

Use third party software to create ENA flat files

To Upload ENA supported flat files containing sequence and annotation you can use third party tools that allow export of your annotation in this format. We recommend using Artemis for this purpose.Submitters who use Sequin as an annotation and submission tool are asked to save and export their sequences in ‘EMBL format’, which is supported by Sequin, prior to submission to ENA. Please note that Sequin created ID lines must be amended to contain the topology only (where topology is either linear or circular) and the AC line must not contain anything else than XXX, before uploading the files to ENA. Example:

ID   XXX; XXX; {topology}; XXX; XXX; XXX; XXX.
XX
AC   XXX;

 

Create your own ENA flat files

Submitters can also create their own flat files to be uploaded to ENA. It is possible to create and upload a full ENA flat file or just the feature table and sequence (with an ID line containing the topology and DE line). Please refer to the section ‘Entry Upload Templates’ for detailed description and examples of flat files that you can use to populate your data before uploading to ENA. These examples have been created on the basis of the most commonly used multi-feature sequence types.

Alternatively, you can create your own flat files by searching similarly annotated database records and changing the annotation to match your sequence.

For more details on flat file format and line structures please refer to the ENA user manual.

For description and structure of all INSDC supported Features and Qualifiers please visit here or the full INSDC Feature Table Document.

Third Party Data (TPA) submissions

For submission of Third Party Data (TPA) please contact datasubs@ebi.ac.uk and refer to INSDC TPA policy documentation for more information on this submission type.

Genome assembly and complete genome submissions

Submissions of genome assembly components (contigs, scaffolds, chromosomes) need to be done via ENA genome assembly submission system, unless stated otherwise in the genome assemblies documentation. For genome assembly related FAQs please see here.

How to validate Flat files

If you have a bulk submission or are using third party software to create ENA supported flat files, we recommend the use of the ENA standalone validator for checking flat files before upload. This will make sure you can amend any errors reported by the validator before uploading the sequences to ENA.

How to screen for vector contamination

All submissions should be checked for vector sequence contamination. To assist submitters EBI provides a vector screening service using the BLAST algorithm and a special sequence databank known as EMVEC that can be accessed by clicking here. The EMVEC is an extraction of the sequences from the SYNthetic division containing more than 2,000 sequences commonly used in cloning and sequencing experiments. EMVEC is by no means a complete vector databank but it is representative of the kind of material used in modern sequencing.

How to update

Updates can be reported by completing a form available in Webin. This form can be used to:

  1. Update your previously submitted entry
  2. Report errors in ENA data

Alternatively, updates can be reported by e-mail to update@ebi.ac.uk, citing relevant accession numbers.

Editorial policy

Keeping sequences and annotation up to date is the sole responsibility of the owners of the database entries. Entry owners are the original submitter and all the co-authors of the associated citation(s). If you own an existing database record and wish to correct, update or add new data to it, please use the update form.

If you spot errors or inconsistencies in database entries not owned by yourself, first try contacting the authors so that they can update their sequences directly. If you are unsuccessful, then please use the form indicating third party update.

Other useful links

Latest ENA news

09 Dec 2014: ENA release 122
Release 122 of ENA's assembled/annotated sequences is now available.

12 Nov 2014: Simplification of data release procedures
The European Nucleotide Archive will couple the public release of sequence records and the release of study records that contain these sequence records, with immediate effect.

11 Nov 2014: ENA/EMG Sample Record Annotation Workshop
European Nucleotide Archive (ENA) and EBI Metagenomics Portal (EMG), are organising the ENA/EMG Sample Record Annotation Workshop on the 1-5 December 2014 to enrich the environmental sample records.