![]() |
Direct submissions of BARCODE data to EMBL Nucleotide Sequence DatabaseIntroductionResponsibility for presentation of BARCODE sequence and its annotation lies with the member databases of the International Nucleotide Sequence Database Collaboration (INSDC, www.insdc.org), made up of DDBJ, EMBL Nucleotide Sequence Database and GenBank. Submissions of BARCODE data can be routed through INSDC member databases and CBOL. The EMBL Nucleotide Sequence Database (EMBL-Bank, www.ebi.ac.uk/embl) has as its remit the collection and presentation of all nucleotide sequence and annotation in the public domain. In order to achieve collection, EMBL-Bank provides a range of tools and services to facilitate submission of sequence and annotation data. The web application, Webin (www.ebi.ac.uk/embl/Submission/webin.html), serves as a portal for submission of a wide variety of data types. Submissions that are received through Webin are subject to rapid processing by the EMBL-Bank curation team. Once all of the data required for completion of processing have been provided, EMBL-Bank returns database accession numbers within 2 working days for small-scale submissions (less than 25 entries) and 5 working days for large-scale submissions. Typically, though, processing is completed in shorter turnaround times than these. This document describes specific adaptations implemented in Webin that allow rapid and easy submission of data from BARCODING projects. Screen shots are shown as examples of BARCODE submission of mitochondrial cytochrome oxidase subunit I coding region, but submissions will be extended to other BARCODE loci as they are introduced. Multi-locus BARCODE data, where linked sequence from multiple loci is derived from each specimen, are supported. Custom web forms and file uploadFor a variety of large-scale submissions, Webin is able to discriminate between commonalities between entries and fields that vary between entries; for most large-scale studies, such as sequencing of EST libraries, BARCODE and cDNA libraries, there are few fields that vary between entries and many that are common to the entire set. Using the principle that it is only necessary to recruit common fields once, users are provided with forms and tools to submit a representative sample, indicating which fields will vary between entries (figure I). From this information, a member of the curation team generates a template, which instructs the web application to present custom web forms to the user to upload variable field data (figure II). A web form view and a submission overview (figure III) are available for users to check and edit fields. When the user has completed submission of variable data, the curation team generates the appropriate number of complete database entries for loading and distribution. Figure I. Submission of representative sample: Pages are provided for submitter information, submission of sequence for representative sample, submission of sample source details, submission of sample citation information, annotation of biological features, flatfile summary and submission of list of fields that will vary between entries. The figure show a) Submitter details, b) Sequence source, c) Literature citation and d) Flatfile summary pages. a)
b)
c)
d)
Figure II. Custom web forms for variable data upload
Figure III. Submission overview
Pre-determined fields for BARCODE dataFor CBOL approval, each BARCODE sequence record must include a number of mandatory fields and a number of fields that are strongly recommended (see tables Ia and Ib). Since this list of fields is consistent between BARCODE entries from all BARCODE projects, EMBL-Bank submission procedures have been adapted such that any requests for web form submission for BARCODE data alert curators to the list of mandatory and recommended fields, such that the submitter need not describe all fields in order for them to be presented in the web forms. Curators will apply mandatory BARCODE fields and suggest recommended BARCODE fields in web forms. BARCODE fields may be common to all entries in a submission or may vary between entries, so where it is not clear from the representative sample submission, curators will liaise with submitters prior to creating the custom web forms. BARCODE fieldsTable Ia. Mandatory fields
Table Ib. Recommended fields
Alternative variable field data entry routesWhile the custom web forms provide a suitable tool for the entry of medium-scale variable field data sets, as the number of sequences grows, they become less and less viable as a submission option. For this reason, the custom web forms allow upload of variable field information in fasta format (figure IV). Users can switch from fasta upload to web form view and overview at will to make minor edits once data have been uploaded. Figure IV. FASTA upload
FASTA format is widely supported by a number of tools, including many that are open source. However, a certain degree of bioinformatic expertise may be required to generate fasta files for upload, so EMBL-Bank is able to accept BARCODE submissions in a number of alternative systematic formats. Most systematic formats are suitable, as long as they can be easily converted to a systematic text format. A number of users choose to submit data in Microsoft Excel spreadsheet format, for example (figure V). Users with formats other than fasta are advised to indicate at the time of representative sample submission that they intend to upload variable field data in their specific format. Figure V. Spreadsheet upload
![]() |