spacer

EMBL - Information for Submitters

1  INTRODUCTION

Submission of sequence information to EMBL-Bank prior to publication
has become standard practice. A unique accession number is assigned by the
database which permanently identifies the sequence submitted. The database
accession number should be included in the manuscript, preferably on the first
page of the journal article, or as required by individual journal procedures.
This procedure ensures availability and distribution of new sequence data in a
timely fashion.

Note:  It is only necessary to submit to one database, without regard
to  where  the sequence will be published.  Data are exchanged between
EBI, NCBI and DDBJ on a daily basis.



2  CHECKING SEQUENCES FOR VECTOR CONTAMINATION

To assist submitters the EBI provides a vector screening service using
the latest implementation of the BLAST algorithm and a special sequence
databank known as EMVEC. EMVEC is an extraction of sequences from the 
SYNthetic division of EMBL-Bank containing more than 2000 sequences commonly 
used in cloning and sequencing experiments. EMVEC is by no means a complete
vector databank but it is representative of the kind of material used in 
modern sequencing. The databank will be updated with each release of EMBL-Bank
and made publicly available on the EBI's FTP (ftp.ebi.ac.uk) server. 

The interactive WWW service can be found at: 

http://www.ebi.ac.uk/blastall/vectors.html 
       


3  HOW TO SUBMIT DATA TO EMBL-BANK

3.1  Webin - WWW Nucleotide Sequence Submissions

http://www.ebi.ac.uk/ena/about/page.php?page=submissions

Webin is the EBI's preferred tool for  the  submission  of  nucleotide
sequence  data to EMBL-Bank.  The user is guided through a series of WWW forms
allowing submission of sequence data and  descriptive  information in an
interactive and easy way.  All the information required to create a  database
entry  will  be  collected during the submission process:

     1  Submitter Information
     2  Release Date Information
     3  Sequence Data, Description and Source Information
     4  Reference Citation Information
     5  Feature Information (e.g. coding regions, regulatory signals)

3.2  Sequin

http://www.ebi.ac.uk/Sequin/index.html

Sequin is a multi-platform (Mac/PC/Unix) stand-alone software tool
developed  by  the  NCBI  for  submitting  entries  to the EMBL-Bank,
GenBank, or DDBJ sequence databases.  The Sequin program,  along  with
detailed downloading  and  installation  instructions  plus  general
information are available from the EBI via WWW browser, anonymous  FTP and
from the file server.



4  WHAT TO SUBMIT TO EMBL-BANK

WWW - Data submitted via the WWW submission system  will  contain  the
required  components,  although  we  may contact the author concerning
details.

Sequin  -  Sequin  output,  generated  by   selecting   the   'Prepare
Submission'  menu option in computer-readable form (by electronic mail)



5  HOW LONG WILL IT TAKE TO GET AN ACCESSION NUMBER?

The time taken to process submissions varies on the complexity of the 
submission and also on its place within the submission queue. If you
are submitting data to provide a journal with accession numbers, we 
strongly suggest giving a clearance of at least 7 working days. We 
will process data submissions and release accessions as soon as possible,
pending review by an ENA database curator. If further information is
required, the curator will make contact in order to resolve the issue.

6  DATA CONFIDENTIALITY AND RELEASE DATES

Authors will be  asked  whether  their  submitted  data  can  be  made
available to the public immediately or whether they should be withheld
until  an  author-specified  date.



7  BULK SUBMISSIONS

For researchers wishing to submit 25 or more related sequences (e.g. the
same gene sequenced in a large number of different organisms), WEBIN offers 
a bulk submission procedure. This alternative path through WEBIN allows 
submitters to create one representative sequence entry. By instructing
EMBL-Bank curators which of the entries' features differ between each
sequence, minimal template WEBIN forms are customised to fit the exact
requirements of that particular set of sequences. The bulk procedure is
highly efficient and less time consuming for the submitter, who no longer
has to duplicate information. The procedure also ensures that EMBL-Bank
curators process related data together and consistently. Because there are
fewer forms, just one form per 10 sequences, bulk submissions are also
much faster over slow networks. A recent addition to the tool allows user
upload of fasta format data to populate WEBIN bulk submission forms.

Alternatively, authors planning to submit very large numbers of similar
sequences should contact the database before submitting the data.
Database staff will then assist in making the submission of this data as
convenient as possible, thus saving the author the time and effort required
to complete numerous submission events individually. When contacting
database staff, authors should indicate the number of sequences they plan
to submit. Database staff will create series of templates and communicate
these to the author for completion with just the information unique to each
sequence required. These templates, once resubmitted, will then be
processed en masse by database curators. 

Please contact database staff if you require further information. 

                 e-mail: datasubs@ebi.ac.uk 



8  THIRD PARTY ANNOTATION (TPA) SUBMISSIONS

Following the 2002 Collaborative Meeting, DDBJ/EMBL-Bank/GenBank have been 
building a Third Party Annotation (TPA) dataset. The TPA data-collection 
is a complement to the existing DDBJ/EMBL-Bank/GenBank comprehensive database 
of primary nucleotide sequences, which typically result from direct sequencing
of cDNAs, ESTs, genomic DNAs etc. 

Primary data are defined as data for which the submitting group has done 
the sequencing and annotation, and as 'owner' of these data has privileges 
to submit updates/corrections, etc. In contrast, non-primary sequences are 
defined as sequences which 
a) consist exclusively of DNA from one or several already existing entries 
   'owned' by other groups or 
b) consist of a mixture of new & already existing sequences. 


TPA categories and requirements  

Users can submit re-annotations/re-assemblies of sequences already 
present in DDBJ/EMBL-Bank/GenBank and owned by other groups to be included in the 
Third Party Annotation (TPA) data-collection. 

Categories of data submissions accepted for TPA include:

     a. re-annotation/analysis of sequence(s)from DDBJ/EMBL-Bank/GenBank
     b. mixed primary/non-primary TPA sequence including regions of new and
        existing sequence (e.g. filling the gaps with HTG or EST or newly  
        sequenced data)
     c. TPA sequences based on trace archive data
     d. TPA sequences based on Whole Genome Shotgun (WGS) sequences

Not accepted are consensus sequences from multiple organisms.
 
TPA entries are submitted to DDBJ/EMBL-Bank/GenBank as part of the process of
publishing biological studies that include the annotation of existing
nucleotide sequences in the primary sequence database. Thus, a publicly
accessible TPA record will be linked to a publication that documents that the
data are supported by biological evidence. 

The EBI's submission system WEBIN has been customised to allow submissions
of TPA sequences to EMBL-Bank. WEBIN is available at URL
http://www.ebi.ac.uk/ena/about/page.php?page=submissions.

Third Party Annotation records include mandatory information on the 
composition of the TPA sequence to show which spans in a TPA sequence 
originated from which contributing primary sequences.
     
a) TPA-SPAN             base span on TPA sequence  
b) PRIMARY_IDENTIFIER   acc.version of contributing EMBL-Bank sequence(s) or
                        trace identifier for Trace Archive sequence(s)
c) PRIMARY_SPAN         base span on contributing EMBL-Bank primary sequence
                        or not_available for Trace Archive sequence(s)
d) COMP                 'c' is used to indicate that contributing sequence
                        originates from complementary strand in primary
                        entry


Example:

AH   TPA-SPAN       PRIMARY_IDENTIFIER     PRIMARY_SPAN     COMP
AS   1-426          AC004528.1             18665-19090         
AS   427-526        AC001234.2             1-100            c
AS   527-1000	    TI55475028		   not_available

TPA sequences are exchanged amongst the DDBJ/EMBL-Bank/GenBank database 
collaboration. The TPA data-collection is available via the 
EBI FTP server at ftp://ftp.ebi.ac.uk/pub/databases/embl/tpa and also 
via the ENA browser at http://www.ebi.ac.uk/ena



9  UPDATING YOUR DATA

Once a database entry has been created from a submission,  a  copy  is
sent  to  the  submitter  for  their  reference  and  for  comments or
corrections.  However, it often happens that the entry is correct when
it is created but, with the passage of time, becomes out of date:  the
authors may make corrections to the sequence itself, or  may  discover
new  features  of  the  sequence.   Since  such findings are often not
published, the only way to keep entries correct and up to date  is  if
the  authors communicate their new findings to the database.  This can
be done by completing the online form (preferred) at
URL http://www.ebi.ac.uk/embl/webin/update.html or by e-mail to
update@ebi.ac.uk, citing relevant accession numbers.

Citation Updates: 
One type of update which merits separate mention is that  relating  to
citations.   Most  submissions  represent  data that have not yet been
accepted for publication, and therefore a full  journal  citation  for
the  data  is  not  available  when the entry is created.  Adding this
information at a later date requires that the database staff  identify
which  submissions correspond to which publications.  This task is not
always straightforward, for instance, if the accession number  is  not
included  in  the  article, or if the submitted and the published data
are not identical.  We therefore urge researchers to let us know  when
and where data they have submitted to us are published, and to include
relevant accession numbers in such publications.



APPENDIX I.  EBI WWW, E-mail and FTP Servers


    (a)  WWW Server

         http://www.ebi.ac.uk/ena

    (b)  E-mail Server


         Computer users with access to Internet can obtain  copies  of
         database entries, by sending an email to a file server at EBI.
         To use  this  facility,  send  file  server  commands  in  an
         electronic  mail message  to  the address  NetServ@EBI.AC.UK.
         (Please do not use the Datasubs@Ebi.Ac.Uk  address  for this).
         Each line of the mail message should consist of a single file
         server command, and nothing else.


         The most important file server command, to get users started,
         is  HELP.   If the file server receives this command, it will
         return a help file to the sender, explaining in  some  detail
         how to use the facility.

         To request help information the mail message  should  contain
         the following command:

                     HELP

         For  those  requiring  software,  an  extra   message,   HELP
         SOFTWARE,  will provide relevant information for installation
         of the programs.

         Users can  also  request  specific  sequences  via  the  File
         Server.   Information  on  how  to do this is provided in the
         HELP file.

    (c)  FTP Server

         EBI has an anonymous FTP server operational at  the  Internet
         address FTP.EBI.AC.UK.

         Users should log in with the username  "anonymous",  and  for
         the password give their email address.






APPENDIX II.  HOW TO CONTACT THE NUCLEOTIDE SEQUENCE DATABASE


EMBL-Bank:

    (a)  Email:    datasubs@ebi.ac.uk (for data submissions and
         enquiries); update@ebi.ac.uk (for updates and notification
         of publication)

    (b)  Postal  address:   European Nucleotide Archive Submissions,
         European  Bioinformatics  Institute,  Wellcome  Trust  Genome
         Campus, Hinxton, Cambridge CB10 1SD, UK.

    (c)  Telephone:  +44-1223-494444 (general)
                     +44-1223-494499 (submissions)

    (d)  Telefax:    +44-1223-494468 (general)
                     +44-1223-494472 (submissions)



Last modified: 17-SEP-2010

spacer
spacer