Locus tags

Locus tags are identifiers that are systematically applied to every gene in a genome within the context of sequencing projects. These tags have become surrogate gene names by the biological community. If two submitters of two different genomes use the same systematic names to describe two very different genes in two very different genomes, it can be very confusing. In order to prevent this from happening INSDC has created a registry of locus tag prefixes. Submitters of eukaryotic and prokaryotic genomes should register a locus tag prefix prior to submitting their genome annotation into ENA.

Locus tags can be registered in Webin at the time of project registration. All genome assembly and annotation projects are required to register a sequencing project. If during the submission process the project is said to contain functional annotation, then the user will be prompted to register a locus tag prefix. Users can opt to select their preferred locus tag prefix (subject to availability) or to have ENA assign one automatically from one of the following ranges: BN1-BN9999, BQ1-BQ9999 or CZ1-CZ9999.

The locus tag prefix is to be separated from the tag value by an underscore ‘_’ (e.g. /locus_tag='BN5_00001').

Locus tags should be assigned to all protein coding and non-coding genes such as structural RNAs. /locus_tag should appear on gene, mRNA, CDS, 5'UTR, 3'UTR, intron, exon, tRNA, rRNA, misc_RNA, etc. within a genome project submission. We discourage the use of the /locus_tag qualifier on repeat_region and misc_feature features in the context of complete genome annotation. The same /locus_tag should be used for all components of a single gene. For example, all of the exons, CDS, mRNA and gene features for a particular gene would have the same /locus_tag. There should only be one /locus_tag associated with one /gene, i.e. if a /locus_tag is associated with a /gene symbol in any feature, that gene symbols (and only that /gene symbol) must also be present on every other feature that contains that /locus_tag.

Locus tags are systematically added to genes within a genome. They are generally in sequential order on the genome. If a genome center were to update a genome and provide additional annotation, the new genes could either be assigned the next sequential available /locus_tag or the submitter can leave gaps when initially assigning /locus_tags and fill in new annotation with tag values that are between the gaps.


Incremental /locus_tags 
Original	 Revised
submission 	submission 
BN7_0022 	BN7_0022 
BN7_4568 (new gene) 
BN7_0023 	BN7_0023 


Gaps in original /locus_tags 
Original 	Revised
submission 	submission 
BN7_0020 	BN7_0020 
BN7_0021 (new gene) 
BN7_0030 	BN7_0030 


Decimal integers
Original 	Revised
submission 	submission 
BN7_0020	BN7_0020 
BN7_0020.1 (new gene) 
BN7_0030 	BN7_0030 

It is preferable to use the same numbering convention for all /locus_tags within a project no matter whether the gene is a protein coding gene or structural RNA or from one chromosome or another.

However, submitters wishing to encode information about chromosome number, or RNA type in the /locus_tag value, may add this information to the /locus_tag after the prefix and underscore:

BN7_I00001 for gene 1, chromosome I
BN7_II00001 for gene 1, chromosome II
BN7_r1112 for ribosomal RNA genes
BN7_t1113 for tRNA genes