Variant identifiers

As we saw on the previous page, VCF files include variant identifiers. Identifiers are unique combinations of letters and numbers that are assigned to known entities, such as genes, variants and proteins. Variant identifiers are particularly useful when searching for information about a variant because they are unambiguous, unique and stable, unlike descriptive names, which can be used differently by different people, be identical between species and change over time.

Variants may have identifiers from multiple databases. You will see these different types of identifiers used throughout the literature and in other databases. Different types of identifiers are used for short variants and structural variants. Some common databases and examples of the identifiers they use are shown in the tables below.

Table 1 Types of variant identifiers

Identifier typeExampleDescription
 ssIDss335Submitted SNP ID assigned by dbSNP or EVA.
 rsIDrs334Reference SNP ID assigned by dbSNP or EVA. ssIDs of the same variant type that colocalise are combined to give an rsID for that locus.
 HGVS*ENST00000366667.4:c.803T>CExpresses the location of the variant in terms of a transcript or protein.
 COSMIC IDCOSM1290ID assigned by COSMIC for somatic variants.
 HGMDCD830010ID assigned by HGMD to variants known to be associated with human inherited diseases.
 ClinVarRCV000016573ID assigned to dbSNP or dbVar/DGVa annotated variants, linking them to human health.
 UniProtVAR_010085ID assigned by UniProt for reviewed human.
DGVa variant callessv8691751Submitted structural variant ID assigned by DGVa. Variants are shared with dbVar.
dbVar variant callnssv1602417Submitted structural variant ID assigned by dbVar. Variants are shared with DGVa.
DGVa variant regionesv3364878Variant region variant ID assigned by DGVa. Overlapping submitted variants (essv and nssv) are combined into a single variant region. The boundaries of a variant region may not match those of the submitted variants, which can vary.
dbVar variant regionnsv916030Variant region variant ID assigned by dbVar. Overlapping submitted variants (essv and nssv) are combined into a single variant region. The boundaries of a variant region may not match those of the submitted variants, which can vary.
*HGVS can be an ambiguous way to represent variants, so it is important to understand the format and its limitations. HGVS provide detailed documentation on using this notation, including for indels, intronic variants etc.