EMBL - Information for Submitters
1 INTRODUCTION
Submission of sequence information to EMBL-Bank prior to publication
has become standard practice. A unique accession number is assigned by the
database which permanently identifies the sequence submitted. The database
accession number should be included in the manuscript, preferably on the first
page of the journal article, or as required by individual journal procedures.
This procedure ensures availability and distribution of new sequence data in a
timely fashion.
Note: It is only necessary to submit to one database, without regard
to where the sequence will be published. Data are exchanged between
EBI, NCBI and DDBJ on a daily basis.
2 CHECKING SEQUENCES FOR VECTOR CONTAMINATION
To assist submitters the EBI provides a vector screening service using
the latest implementation of the BLAST algorithm and a special sequence
databank known as EMVEC. EMVEC is an extraction of sequences from the
SYNthetic division of EMBL-Bank containing more than 2000 sequences commonly
used in cloning and sequencing experiments. EMVEC is by no means a complete
vector databank but it is representative of the kind of material used in
modern sequencing. The databank will be updated with each release of EMBL-Bank
and made publicly available on the EBI's FTP (ftp.ebi.ac.uk) server.
The interactive WWW service can be found at:
http://www.ebi.ac.uk/blastall/vectors.html
3 HOW TO SUBMIT DATA TO EMBL-BANK
3.1 Webin - WWW Nucleotide Sequence Submissions
http://www.ebi.ac.uk/ena/about/page.php?page=submissions
Webin is the EBI's preferred tool for the submission of nucleotide
sequence data to EMBL-Bank. The user is guided through a series of WWW forms
allowing submission of sequence data and descriptive information in an
interactive and easy way. All the information required to create a database
entry will be collected during the submission process:
1 Submitter Information
2 Release Date Information
3 Sequence Data, Description and Source Information
4 Reference Citation Information
5 Feature Information (e.g. coding regions, regulatory signals)
3.2 Sequin
http://www.ebi.ac.uk/Sequin/index.html
Sequin is a multi-platform (Mac/PC/Unix) stand-alone software tool
developed by the NCBI for submitting entries to the EMBL-Bank,
GenBank, or DDBJ sequence databases. The Sequin program, along with
detailed downloading and installation instructions plus general
information are available from the EBI via WWW browser, anonymous FTP and
from the file server.
4 WHAT TO SUBMIT TO EMBL-BANK
WWW - Data submitted via the WWW submission system will contain the
required components, although we may contact the author concerning
details.
Sequin - Sequin output, generated by selecting the 'Prepare
Submission' menu option in computer-readable form (by electronic mail)
5 HOW LONG WILL IT TAKE TO GET AN ACCESSION NUMBER?
The time taken to process submissions varies on the complexity of the
submission and also on its place within the submission queue. If you
are submitting data to provide a journal with accession numbers, we
strongly suggest giving a clearance of at least 7 working days. We
will process data submissions and release accessions as soon as possible,
pending review by an ENA database curator. If further information is
required, the curator will make contact in order to resolve the issue.
6 DATA CONFIDENTIALITY AND RELEASE DATES
Authors will be asked whether their submitted data can be made
available to the public immediately or whether they should be withheld
until an author-specified date.
7 BULK SUBMISSIONS
For researchers wishing to submit 25 or more related sequences (e.g. the
same gene sequenced in a large number of different organisms), WEBIN offers
a bulk submission procedure. This alternative path through WEBIN allows
submitters to create one representative sequence entry. By instructing
EMBL-Bank curators which of the entries' features differ between each
sequence, minimal template WEBIN forms are customised to fit the exact
requirements of that particular set of sequences. The bulk procedure is
highly efficient and less time consuming for the submitter, who no longer
has to duplicate information. The procedure also ensures that EMBL-Bank
curators process related data together and consistently. Because there are
fewer forms, just one form per 10 sequences, bulk submissions are also
much faster over slow networks. A recent addition to the tool allows user
upload of fasta format data to populate WEBIN bulk submission forms.
Alternatively, authors planning to submit very large numbers of similar
sequences should contact the database before submitting the data.
Database staff will then assist in making the submission of this data as
convenient as possible, thus saving the author the time and effort required
to complete numerous submission events individually. When contacting
database staff, authors should indicate the number of sequences they plan
to submit. Database staff will create series of templates and communicate
these to the author for completion with just the information unique to each
sequence required. These templates, once resubmitted, will then be
processed en masse by database curators.
Please contact database staff if you require further information.
e-mail: datasubs@ebi.ac.uk
8 THIRD PARTY ANNOTATION (TPA) SUBMISSIONS
Following the 2002 Collaborative Meeting, DDBJ/EMBL-Bank/GenBank have been
building a Third Party Annotation (TPA) dataset. The TPA data-collection
is a complement to the existing DDBJ/EMBL-Bank/GenBank comprehensive database
of primary nucleotide sequences, which typically result from direct sequencing
of cDNAs, ESTs, genomic DNAs etc.
Primary data are defined as data for which the submitting group has done
the sequencing and annotation, and as 'owner' of these data has privileges
to submit updates/corrections, etc. In contrast, non-primary sequences are
defined as sequences which
a) consist exclusively of DNA from one or several already existing entries
'owned' by other groups or
b) consist of a mixture of new & already existing sequences.
TPA categories and requirements
Users can submit re-annotations/re-assemblies of sequences already
present in DDBJ/EMBL-Bank/GenBank and owned by other groups to be included in the
Third Party Annotation (TPA) data-collection.
Categories of data submissions accepted for TPA include:
a. re-annotation/analysis of sequence(s)from DDBJ/EMBL-Bank/GenBank
b. mixed primary/non-primary TPA sequence including regions of new and
existing sequence (e.g. filling the gaps with HTG or EST or newly
sequenced data)
c. TPA sequences based on trace archive data
d. TPA sequences based on Whole Genome Shotgun (WGS) sequences
Not accepted are consensus sequences from multiple organisms.
TPA entries are submitted to DDBJ/EMBL-Bank/GenBank as part of the process of
publishing biological studies that include the annotation of existing
nucleotide sequences in the primary sequence database. Thus, a publicly
accessible TPA record will be linked to a publication that documents that the
data are supported by biological evidence.
The EBI's submission system WEBIN has been customised to allow submissions
of TPA sequences to EMBL-Bank. WEBIN is available at URL
http://www.ebi.ac.uk/ena/about/page.php?page=submissions.
Third Party Annotation records include mandatory information on the
composition of the TPA sequence to show which spans in a TPA sequence
originated from which contributing primary sequences.
a) TPA-SPAN base span on TPA sequence
b) PRIMARY_IDENTIFIER acc.version of contributing EMBL-Bank sequence(s) or
trace identifier for Trace Archive sequence(s)
c) PRIMARY_SPAN base span on contributing EMBL-Bank primary sequence
or not_available for Trace Archive sequence(s)
d) COMP 'c' is used to indicate that contributing sequence
originates from complementary strand in primary
entry
Example:
AH TPA-SPAN PRIMARY_IDENTIFIER PRIMARY_SPAN COMP
AS 1-426 AC004528.1 18665-19090
AS 427-526 AC001234.2 1-100 c
AS 527-1000 TI55475028 not_available
TPA sequences are exchanged amongst the DDBJ/EMBL-Bank/GenBank database
collaboration. The TPA data-collection is available via the
EBI FTP server at ftp://ftp.ebi.ac.uk/pub/databases/embl/tpa and also
via the ENA browser at http://www.ebi.ac.uk/ena
9 UPDATING YOUR DATA
Once a database entry has been created from a submission, a copy is
sent to the submitter for their reference and for comments or
corrections. However, it often happens that the entry is correct when
it is created but, with the passage of time, becomes out of date: the
authors may make corrections to the sequence itself, or may discover
new features of the sequence. Since such findings are often not
published, the only way to keep entries correct and up to date is if
the authors communicate their new findings to the database. This can
be done by completing the online form (preferred) at
URL http://www.ebi.ac.uk/embl/webin/update.html or by e-mail to
update@ebi.ac.uk, citing relevant accession numbers.
Citation Updates:
One type of update which merits separate mention is that relating to
citations. Most submissions represent data that have not yet been
accepted for publication, and therefore a full journal citation for
the data is not available when the entry is created. Adding this
information at a later date requires that the database staff identify
which submissions correspond to which publications. This task is not
always straightforward, for instance, if the accession number is not
included in the article, or if the submitted and the published data
are not identical. We therefore urge researchers to let us know when
and where data they have submitted to us are published, and to include
relevant accession numbers in such publications.
APPENDIX I. EBI WWW, E-mail and FTP Servers
(a) WWW Server
http://www.ebi.ac.uk/ena
(b) E-mail Server
Computer users with access to Internet can obtain copies of
database entries, by sending an email to a file server at EBI.
To use this facility, send file server commands in an
electronic mail message to the address NetServ@EBI.AC.UK.
(Please do not use the Datasubs@Ebi.Ac.Uk address for this).
Each line of the mail message should consist of a single file
server command, and nothing else.
The most important file server command, to get users started,
is HELP. If the file server receives this command, it will
return a help file to the sender, explaining in some detail
how to use the facility.
To request help information the mail message should contain
the following command:
HELP
For those requiring software, an extra message, HELP
SOFTWARE, will provide relevant information for installation
of the programs.
Users can also request specific sequences via the File
Server. Information on how to do this is provided in the
HELP file.
(c) FTP Server
EBI has an anonymous FTP server operational at the Internet
address FTP.EBI.AC.UK.
Users should log in with the username "anonymous", and for
the password give their email address.
APPENDIX II. HOW TO CONTACT THE NUCLEOTIDE SEQUENCE DATABASE
EMBL-Bank:
(a) Email: datasubs@ebi.ac.uk (for data submissions and
enquiries); update@ebi.ac.uk (for updates and notification
of publication)
(b) Postal address: European Nucleotide Archive Submissions,
European Bioinformatics Institute, Wellcome Trust Genome
Campus, Hinxton, Cambridge CB10 1SD, UK.
(c) Telephone: +44-1223-494444 (general)
+44-1223-494499 (submissions)
(d) Telefax: +44-1223-494468 (general)
+44-1223-494472 (submissions)
Last modified: 17-SEP-2010
 |