Minimum information about species barcode nucleotide sequence

The Species BARCODE Data Standard is a biodiversity standard formulated by the Consortium for the Barcode of Life (CBOL) for reporting minimum information about species barcode nucleotide sequences. The CBOL specifies requirements on reporting sample provenance information and on sequence quality with the aim to create a reference library of barcode DNA sequences integrated with related biodiversity information, such as taxonomy, specimen vouchers or geo-reference. Ultimately, DNA barcoding shall serve as a global standard for species identification.

The International Barcode of Life project (iBOL) develops a DNA barcode reference library that will serve as DNA-based identification system for multi-cellular life.

The Barcode of Life Data Systems (BOLD) is the central informatics platform for DNA barcoding providing acquisition, storage, analysis and publication of DNA barcode records.

A suitable species barcode marker has to meet several criteria. Ideally, the barcode marker (1) can be easily amplified in one read following a standardised protocol, (2) is on both sides flanked by a highly conserved region for reliable primers annealing, (3) is capable of organism identification on a species level.

Currently, the CBOL approves as effective barcodes the following loci:

  • for metazoa, the cytochrome c oxidase 1 (cox1) gene region
  • for land plants, a two-locus barcode, the ribulose-bisphosphate carboxylase (rbcL) and maturaseK (matK) gene regions (with recommendation to collect also non-coding regions, such as the chloroplast trnH-psbA spacer region)
  • for fungi, the ribosomal internal transcribed spacer (ITS) region

INSDC records that meet the criteria of Species BARCODE Data Standard have the keyword ‘BARCODE’.

The MIMARKS includes the Species BARCODE Data Standard, which means that a MIMARKS-compliant dataset is also Species BARCODE compliant.

Species BARCODE data submission

The Species BARCODE reporting requirements are devided into mandatory (available here), highly recommended (available here) and optional (available here) irrespective of the sequenced marker locus.

A checklist for a submission of Species BARCODE sequences of the cytochrome c oxidase 1 (cox1) gene region is available from the Webin submission tool.

Mandatory Species BARCODE checklist

Field Description Example
Organism name; Formal taxonomic name of this metozoan organism or informal name if unpublished/unidentified. Arabidopsis thaliana
Bio-repository data Reference to physical specimen from which the sequence was obtained (e.g. curated museum collection, living specimen), can be structured or unstructured. structured YMUK:12345
unstructured ABCD-12345
Country Political name of country or ocean in which a sequenced sample or isolate was collected. France, Mediterranean Sea
Translation table

Mitochondrial translation table for this organism. Choose between vertebrate (table 2) and invertebrate (table 5) codes.

2
Codon Start (required to determine reading frame) The codon start for the reading frame which should be translated is the coordinate of the base for the fisrt complete codon. 3
Forward Primer Name Name of the forward direction PCR primer. ArthFW1
Forward Primer Sequence Sequences should be given in the IUPAC degenerate-base alphabet, except for the modified bases; those must be included within angle brackets. GACATTGKG<I>T 
Reverse Primer Name Name of the reverse direction PCR primer. ArthRV1
Reverse Primer Sequence Sequences should be given in the IUPAC degenerate-base alphabet, except for the modified bases; those must be included within angle brackets. CATGRTTAGAC

Highly recommended Species BARCODE checklist

Field Description Example
Latitude/Longitude Geographical coordinates of the location where the specimen was collected, in decimal degrees (to 2 places). 47.94, -12.45
Identified by The person that identified the organism/sample. John White
Collector Name of the person that originally collected the sample/organism John White
Collection Date Date of collection of the original sample/organism 12-Apr-2013

Optional Species BARCODE checklist

Field Description Example
Strain Name Name of the indetifier for strain. Often used for mice and fly lines. BALB/c
Breed The recognised breed name of the organism. Friesian Holstein

Isolate Name

A name of the individual sample. MP7
Clone Identifier Identifier given to each clone in a sequenced library. lib_1_9
Geographical Area Political name of the area of country or ocean in which the sequenced sample or isolate was collected. North Atlantic Ridge
Locality More geographic-specific location where sequenced material was sourced. Must have 'Geographic Area' selected. Loch Ness
Isolation Source Physical geography of the sampling/isolation site. rainforest conopy
Natural Host The natural host (scientific name) of the organism from which the sequenced material was taken. Canis lupus familiaris
Developmental Stage Developmental stage of the organism, either a named stage, or a measurement of time. blastula
Cell Type Cell type from which the sequence was generated. palisade cell
Tissue Type Tissue type from which the sequence was obtained. root
Sex Sex of the organism from which the sequence was obtained. male

Latest ENA news

19 Jan 2018: Forthcoming changes to WGS and TSA sequences

ENA is making changes to provision of WGS and TSA sequences

05 Jan 2018: ENA release 134

Release 134 of ENA's assembled/annotated sequences is now available

21 Dec 2017: ENA services over the holiday period

Between Friday 22nd December and Tuesday 2nd January ENA services such as submissions and retrieval...

21 Dec 2017: ENA release 134 expected early January

The last release of assembled and annotated sequences for 2017 (134) has been particularly...