spacer

EMBL - DE line Standards

EMBL annotation examples

Simple DE lines:
DNA * mRNA * Partial sequences * Exons, clones, strains and isolates

Complex DE lines:
Naturally occurring plasmids * Organelles * Subspecies * Artificial sequences * Regulatory regions *
Hypothetical proteins * ESTs * Transposons and insertion elements * Satellite * Operons

Miscellaneous DE lines:
rRNA * Complete genomes * RNA viruses * Immunogenetic molecules

Simple DE lines

DNA

The product name should be added after the gene name if there are fewer than three genes.

  • Homo sapiens GGT1 gene for gamma-glutamyltransferase 1
  • Salmonella typhimurium rpsG gene for ribosomal protein S7 and fusA gene for elongation factor G
  • Homo sapiens TSC2, NTHL1 and SLC9A3R2 genes

The gene name alone should be used when the product name is the same as the gene name or when the full product name is not known.

  • Human immunodeficiency virus type 1 pol gene
  • Thermus thermophilus partial uvrA gene

If the product is exported into a organelle it should be indicated in the DE line

  • Capsicum annuum fnr gene for chloroplast ferredoxin-NADP+ oxidoreductase precursor

mRNA

The gene name should be added after the product.

  • Nicotiana tabacum mRNA for protein phosphatase 2A catalytic subunit (ppa2 gene)

If the gene and product names are the same, only the product name needs be included in the DE line.

  • Thermus thermophilus mRNA for UvrA protein

If the product is exported into a organelle it should be indicated in the DE line

  • Capsicum annuum mRNA for chloroplast ferredoxin-NADP+ oxidoreductase precursor (fnr gene)

Partial sequences

If the sequence represents only one partial gene the word "partial" is included in the DE line before the gene name.

  • Homo sapiens partial GGT1 gene for gamma-glutamyltransferase 1

If there is more than one gene in the sequence, and not all are partial then the word "partial" is added after the appopriate genes.

  • Homo sapiens TSC2 (partial), NTHL1 and SLC9A3R2 genes

Exons, clones, strains and isolates

For groups of sequences, which might be isolated from different strains, or for genes which have multiple exons, source qualifiers and exon numbers should be added at the end of the DE line to enable a unique DE line to be constructed for each sequence.

  • Homo sapiens GGT1 gene for gamma-glutamyltransferase 1, clone GT1a
  • Lactococcus lactis lacZ gene, strain ATCC7962
  • Homo sapiens partial HLA-DRB1 gene for MHC class II antigen, exon 2
  • Homo sapiens partial HLA-DRB1 gene for MHC class II antigen, exons 2 and 3
  • Homo sapiens partial HLA-DRB1 gene for MHC class II antigen, exons 2-4
  • Homo sapiens partial HLA-DRB1 gene for MHC class II antigen, exon 1 and joined CDS

Complex DE lines

Plasmids

Descriptions for naturally occurring plasmids should include the plasmid name and any genes encoded by the sequence.

  • Escherichia coli plasmid pJB6 yiaK, yiaJ, insB, and insA genes

Organelles

The type of organelle (ie mitochondrial, chloroplast, plastid) should be included before the gene name or before "mRNA".

  • Rattus norvegicus mitochondrial COI gene for cytochrome oxidase subunit I
  • Rattus norvegicus mitochondrial mRNA for cytochrome oxidase subunit I (COI gene)

Subspecies

  • Homo sapiens neanderthalensis mitochondrial D-loop, hypervariable region I

The abbreviation subsp. should be included for sequences from plant, fungi and bacteria

  • Eucalyptus globulus subsp. globulus partial 18S rRNA gene

Artificial sequences

Cloning vectors, expression vectors, shuttle vectors or oligonucleotides should be described as artificial vectors.

  • Cloning vector pABC
  • Artificial oligonucleotide primer sequence, 1234-AB

Regulatory regions

When the sequence contains regulatory or untranscribed regions such as 5'UTR or 3'UTR, these should be added after the gene name.

  • Homo sapiens CLCN6 gene, promoter region

Hypothetical proteins

Sequences which contain no known genes or proteins are described using numbered ORFs and hypothetical proteins.

  • Mus musculus ORF1 and ORF2 DNA for hypothetical proteins, clone CL123

ESTs

Expressed sequence tags and fragments of DNA which contain no features should always have a clone name in order to provide a unique DE line for each sequence submitted.

  • Mus musculus EST, clone cl123-1
  • Rattus norgevicus genomic fragment, clone 123

Transposons and IS elements

Satellite/ Microsatellite

Satellite or microsatellite sequences should always have a clone name (or other identifier) in order to provide a unique DE line for each sequence submitted.

  • Drosophila melanogaster satellite DNA, clone p20D5M9
  • Drosophila melanogaster microsatellite DNA, clone pA334-2E

Operons

When all of the gene names in the operon are known, these should be included in the DE line.

  • Streptomyces lividans galactose operon (galTEK genes)

If the genes in the operon are not known, then the DE line should contain the name of the operon.

  • Streptomyces lividans galactose operon

Miscellaneous DE lines

rRNA

When multiple rRNA sequences are submitted, strain or isolate names should be used in order to provide a unique DE line for each sequence.

  • Contracaecum eudyptulae 5.8S rRNA gene, isolate Ceud10
  • Drosophila melanogaster partial 5.8S rRNA gene, strain Oregon R

Complete Genomes

  • Rattus norvegicus complete mitochondrial genome
  • Human immunodeficiency virus type 1 complete proviral genome, isolate 123

RNA viruses

  • Avian influenza virus H5HA gene for hemagglutinin, genomic RNA
  • Avian influenza virus mRNA for hemagglutinin (H5HA gene)

Immunogenetic molecules:DNA

  • Homo sapiens HLA-B gene for MHC class I antigen, B*123 allele, exon 2

Kappa, lambda and light can also be used to distinguish the immunoglobulin chain type.

  • Homo sapiens IGHV gene for immunoglobulin heavy chain variable region, exon 2

Immunogenetic molecules:mRNA

  • Homo sapiens mRNA for MHC class I antigen (HLA-B gene), B*123 allele
  • Homo sapiens mRNA for immunoglobulin heavy chain variable region (IGHV gene)
  • Synthetically constructed mRNA for Homo sapiens/Mus musculus chimeric immunoglobulin heavy chain variable region

spacer
spacer