Submitting assembled and annotated sequences

Submission of sequence information to the primary nucleotide sequence archives prior to publication has become standard practice. A unique accession number is assigned by the database which permanently identifies the sequence submitted. The database accession number should be included in the manuscript, preferably on the first page of the journal article, or as required by individual journal procedures. This procedure ensures availability of new sequence data in a timely fashion.

Please note that it is only necessary to submit to one of the INSDC databases. Public data are exchanged between the three INSDC databases, ENA, GenBank and DDBJ on a daily basis.

How to submit

Webin template submissions

Webin is ENA preferred submission tool. Webin guides the user through a sequence of forms and templates allowing interactive submission of sequence data and descriptive information.

Webin templates (e.g. for prokaryote 16S rRNA genes) provide a straightforward way to submit sequences with annotation. Sequences, source and other feature annotation can be uploaded via fasta files or spreadsheets for interactive editing, validation and submission. Webin can be used to submit single or multiple entries.

Submitters are able to modify and view their data prior to submission in the format in which it will be finally published. For most users, Webin is the easiest way to submit sequences into ENA and the quickest way to get your accession numbers assigned.

Templates are improved and new ones introduced continuously. For examples, there are template for CDS, GSS, Intergenic spacer, unannotated WGS, EST, COI gene, 16S rRNA and ITS region submissions.  Webin submitters are encouraged to contact datasubs@ebi.ac.uk for expert advice on template selection and for requesting for the addition of new templates.

For more information about Webin template submissions please contact datasubs@ebi.ac.uk.

Custom template submissions

For researchers wishing to submit 25 or more related sequences (e.g. the same gene sequenced in a large number of different organisms) for which template cannot be found in the Webin submissions page, ENA offers a custom template submission procedure. Submitters are required to create one representative sequence entry, describing which of the features of the entry differ between each sequence. An ENA curator will then review this initial sequence within five working days and contact the submitter giving further instructions on how to submit the rest of their sequence data.

For more information about custom template submission please contact datasubs@ebi.ac.uk.

Genome assembly submissions

For large Eukaryotic genomes and gapped assemblies of Prokaryotic genomes the sequence is frequently held in the INSDC as contigs (overlapping reads) and scaffolds ('CON' entries) which are assembled from contigs and based on assembly information which should be prepared in the AGP format.

Contig sequences are normally expected to be submitted in the fasta format. Where contigs are subject to complete reassembly without tracking of sequences they will be stored as 'WGS' entries; otherwise the 'STD' data class is used.

Detailed submission instructions are available in Genome assembly submissions.

Submission of complete genomes and long sequences

In addition to fasta and spreadsheet based submissions, Webin also supports the submission of pre-prepared flat files. This submission route is suitable for long, heavily annotated sequences, such as complete genomic components, where a standalone annotation tool, such as Artemis, was used to create a flat file.

We recommend that functional annotation be provided in cases where there is a sufficient level of assembly to be able to generate annotation and no closely related reference sequences nor specialist community resources exist that provide sufficient annotation.

Detailed submission instructions are available in Genome assembly submissions.

Artemis Users

Artemis-generated flat files can be submitted using Webin.

For more information please contact datasubs@ebi.ac.uk.

Sequin Users

Sequin is a stand-alone software tool developed by the NCBI for submitting sequence entries. From March 1st, 2012, ENA no longer accepts submissions in '.sqn' format. Submitters who use Sequin as an annotation and submission tool are asked to save and export their sequences in EMBL format, which is supported by Sequin, prior to submission to ENA.

We highly recommend that our submitters use ENA's submission system Webin as this offers simplicity of use, and greater consistency and validation of data.

What to submit

Assembled and annotated sequences typically result from direct sequencing for example of cDNAs, ESTs, genomic DNA.

ENA reviews all submissions, but the ultimate responsibility for the accuracy and quality of the information lies with the submitter.

The following information is required for all submissions of assembled and annotated sequences and will be collected during submission:

  1. Submitter information
  2. Release date
  3. Sequence data, description and source information
  4. Reference citation information
  5. Feature information (e.g. coding regions, regulatory signals etc.)

Data submitted via Webin templates will prompt for the required information, although curators may contact the submitters concerning details.

For a complete list of features and qualifiers that can be used for the functional annotation of sequence records, please refer to the INSDC Feature Table Document or the Feature Table Browser WebFeat.

Submitter information

There should be one person whom ENA curators correspond with as the submission is being processed. ENA terms this person as the original submitter and holds records of their contact details in an internal database. Submitter information should not include personal addresses.

Owners of ENA entries are defined as those researchers who formed the submission team at the time the original submission took place. Entry owners additional to the original submitter are listed in submission references. All the entry owners have equal rights to update the entry at any time. This distinction between original submitter and owner has been drawn simply to facilitate communication.

Release date

Authors will be asked whether their submitted data can be made available to the public immediately or whether they should be withheld until an author-specified date. Data are never withheld after publication of work referring to the submitted data.

How to update

Updates can be reported by completing a form available in Webin. This form can be used to:

  1. Update your previously submitted entry
  2. Report errors in ENA data

Alternatively, updates can be reported by e-mail to update@ebi.ac.uk, citing relevant accession numbers.

Editorial policy

Keeping sequences and annotation up to date is the sole responsibility of the owners of the database entries. Entry owners are the original submitter and all the co-authors of the associated citation(s). If you own an existing database record and wish to correct, update or add new data to it, please use the update form.

If you spot errors or inconsistencies in database entries not owned by yourself, first try contacting the authors so that they can update their sequences directly. If you are unsuccessful, then please use the form indicating third party update.

Checking for vector contamination

All submissions should be checked for vector sequence contamination. To assist submitters EBI provides a vector screening service using the BLAST algorithm and a special sequence databank known as EMVEC that can be accessed by clicking here. The EMVEC is an extraction of the sequences from the SYNhtetic division containing more than 2,000 sequences commonly used in cloning and sequencing experiments. EMVEC is by no means a complete vector databank but it is representative of the kind of material used in modern sequencing.