Variation exercises



Exercise 1 — Human population genetics and phenotype data

The SNP rs11725853 in an intron of the human GLRA3 gene has been identified as a genetic risk factor for kidney disease in individuals with type 1 diabetes.

(a) How many alternate alleles does this SNP have? Which allele is the risk (or associated) alleles for urinary albumin excretion rate in type 1 diabetes?

(b) Which super-population (AFR, AMR, EAS, EUR or SAS) in the 1000 Genomes Project has the highest frequency of the risk allele?

(c) In which paper(s) is the association between rs11725853 and urinary albumin excretion rate described?

(d) What is the ancestral allele? Is it conserved in the 70* eutherian mammals?

*This number is correct for Ensembl release 93 ( In later releases this group may contain more than 70 species.

Exercise 2 — Exploring a SNP in human

The missense variation rs1801133 in the human MTHFR gene has been linked to elevated levels of homocysteine, an amino acid whose plasma concentration seems to be associated with the risk of cardiovascular diseases, neural tube defects, and loss of cognitive function. This SNP is also referred to as ‘A222V’, ‘Ala222Val’, as well as other HGVS names.

(a) Find the page with information for rs1801133.

(b) Is rs1801133 a Missense variation in all transcripts of the MTHFR gene?

(c) Why are the alleles for this variation in Ensembl given as G/A and not as C/T, as in dbSNP and literature? (

(d) What is the major allele in rs1801133?

(e) In which paper(s) is the association between rs1801133 and homocysteine levels described?

(f) According to the data imported from dbSNP, the ancestral allele for rs1801133 is G. Ancestral alleles in dbSNP are based on a comparison between human and chimp. Does the sequence at this same position in other primates confirm that the ancestral allele is G?


Exercise 3 — Exploring a SNP in mouse

Madsen et al in the paper ‘Altered metabolic signature in pre-diabetic NOD mice’ (PloS One. 2012; 7(4): e35445) have described several regulatory and coding SNPs, some of them in genes residing within the previously defined insulin dependent diabetes (IDD) regions. The authors describe that one of the identified SNPs in the murine Xdh gene (rs29522348) would lead to an amino acid substitution and could be damaging as predicted as by SIFT (

(a) Where is the SNP located (chromosome and coordinates)?

(b) What is the HGVS recommended nomenclature for this SNP?

(c) Why does Ensembl put the C allele first (C/T)?

(d) Are there differences between the genotypes reported in NOD/LTJ and BALB/cByJ, according to the PERLGEN panel?


Exercise 4 — The VEP

Resequencing of the genomic region of the human CFTR (cystic fibrosis transmembrane conductance regulator (ATP-binding cassette sub-family C, member 7) gene (ENSG00000001626) has revealed the following variants (alleles defined in the forward strand):

  • G/A at 7: 117,530,985

  • T/C at 7: 117,531,038

  • T/C at 7: 117,531,068

Use the VEP tool in Ensembl and choose the options to see SIFT and PolyPhen predictions.

(a) Do these variants result in a change in the proteins encoded by any of the Ensembl genes? Which gene?

(b) Do these variants already exist in the Ensembl database?

(c) Which of these variants is predicted to be the most damaging, based on the SIFT and PolyPhen scores?