Submitting assembled and annotated sequences

Submission of sequence information to the primary nucleotide sequence archives prior to publication has become standard practice. A unique accession number is assigned by the database which permanently identifies the sequence submitted. The database accession number should be included in the manuscript, preferably on the first page of the journal article, or as required by individual journal procedures. This procedure ensures availability of new sequence data in a timely fashion.

Please note that it is only necessary to submit to one of the INSDC databases. Public data are exchanged between the three INSDC databases, ENA, GenBank and DDBJ on a daily basis.

How to submit

Webin checklist submissions

Webin is ENA's submission tool. Webin guides the user through a sequence of forms and checklists allowing interactive submission of sequence data and descriptive information.

Webin checklists (e.g. for prokaryote 16S rRNA genes) provide a straightforward way to submit sequences with annotation. Sequences, source and other feature annotation can be entered using the interactive interface for interactive editing, validation and submission or uploaded via spreadsheets. Webin can be used to submit single or multiple entries. Checklists are improved and new ones introduced from time to time. For submission of Transcriptome Shotgun Assemblies please go here.

For most users, Webin is the easiest way to submit sequences into ENA and the quickest way to get accession numbers assigned.

Webin contains a number of annotation submission checklists, grouped into categories, for various sequence types. For a list of currently available checklists and descriptions please see here.

Webin submitters are encourage to contact for expert advice on checklist selection and for requesting for the addition of new checklists.

Webin Entry Upload submissions

Submitters can also upload pre-prepared flat files (formerly known as EMBL format files) to ENA using our Webin-CLI tool. Please make sure you have checked available Webin sequence checklists and there is no suitable checklist available before you consider using ‘Entry Upload’.

There are two options for creating ENA supported flat files:

Use third party software to create ENA flat files

To Upload ENA supported flat files containing sequence and annotation you can use third party tools that allow conversion or export of your annotation in this format. Examples are:



If your flat file is in GenBank format, you can consider converting it using EMBOSS Seqret. Submitters who use Sequin as an annotation and submission tool are asked to save and export their sequences in ‘EMBL format’, which is supported by Sequin, prior to submission to ENA.

Please note that many tools output an incorrect ID/AC line (for example, Sequin and other GenBank converted flat files). Please edit your ID and AC lines to look like that below (where {topology} is CIRCULAR or LINEAR depending on the molecule sequenced):

ID   XXX; XXX; {topology}; XXX; XXX; XXX; XXX.


Create your own ENA flat files

Submitters can also create their own flat files to be uploaded to ENA. It is possible to create and upload a full ENA flat file or just the feature table and sequence (with an ID line containing the topology and DE line). Please refer to the section ‘Entry Upload Templates’ for detailed description and examples of flat files that you can use to populate your data before uploading to ENA. These examples have been created on the basis of the most commonly used multi-feature sequence types.

Alternatively, you can create your own flat files by searching similarly annotated database records and changing the annotation to match your sequence.

For more details on flat file format and line structures please refer to the ENA user manual.

For description and structure of all INSDC supported Features and Qualifiers please visit here or the full INSDC Feature Table Document.

Genome assembly and complete genome submissions

Submissions of genome assembly components (contigs, scaffolds, chromosomes) need to be done via ENA genome assembly submission system, unless stated otherwise in the genome assemblies documentation.

How to validate Flat files

You will need to submit the files through Webin-CLI, which will validate them automatically before allowing submission. Alternatively, Webin-CLI has a validate function which can be used independently of submitting.

How to screen for vector contamination

All submissions should be checked for vector sequence contamination. To assist submitters EBI provides a vector screening service using the BLAST algorithm and a special sequence databank known as EMVEC that can be accessed by clicking here. The EMVEC is an extraction of the sequences from the SYNthetic division containing more than 2,000 sequences commonly used in cloning and sequencing experiments. EMVEC is by no means a complete vector databank but it is representative of the kind of material used in modern sequencing.

Editorial policy

Keeping sequences and annotation up to date is the sole responsibility of the owners of the database entries. Entry owners are the original submitter and all the co-authors of the associated citation(s). If you own an existing database record and wish to correct, update or add new data to it, please use our contact form.

If you spot errors or inconsistencies in database entries not owned by yourself, first try contacting the authors so that they can update their sequences directly. If you are unsuccessful, then please use the form indicating third party update.

Other useful links