Variation exercises

InformationTry it!

We explore different kinds of variation data in these exercises. Use (if you want to see the current version, use, but note that some of the answers may be different).


Exercise 12 – Human population genetics and phenotype data


The SNP rs1738074 in the 5’ UTR of the human TAGAP gene has been identified as a genetic risk factor for a few diseases.

(a) In which transcripts is this SNP found?

(b) What is the least frequent genotype for this SNP in the Yoruba (YRI) population from the HapMap set?

(c) What is the ancestral allele? Is it conserved in the eutherian mammals?

(d) With which diseases is this SNP associated? Are there any known risk (or associated) alleles?


Exercise 13 – Exploring a SNP in human


The missense variation rs1801133 in the human MTHFR gene has been linked to elevated levels of homocysteine, an amino acid whose plasma concentration seems to be associated with the risk of cardiovascular diseases, neural tube defects, and loss of cognitive function. This SNP is also referred to as ‘A222V’, ‘Ala222Val’ as well as other HGVS names.

(a) Find the page with information for rs1801133.

(b) Is rs1801133 a missense variation in all transcripts of the MTHFR gene?

(c) Why are the alleles for this variation in Ensembl given as G/A and not as C/T, like in the literature and dbSNP? (

(d) What are the major and the minor alleles of rs1801133?

(e) In which paper is the association between rs1801133 and homocysteine levels described?

(f) According to the data imported from dbSNP, the ancestral allele for rs1801133 is G. Ancestral alleles in dbSNP are based on a comparison between human and chimp. Does the sequence at this same position in four other primates, i.e. gorilla, orangutan, macaque and marmoset, confirm that the ancestral allele is G?

(g) Were both alleles of rs1801133 already present in Neanderthal? To answer this question, have a look at the individual reads at its genomic position in the Neanderthal Genome Browser (


Exercise 14 – Structural variation in human 


In the paper ‘The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility’ (Gonzalez et al Science. 2005 Mar 4;307(5714):1434-40) it is shown that a higher copy number of the CCL3L1 (Chemokine (C-C motif) ligand 3-like 1) gene is associated with lower susceptibility to HIV infection. 

(a) Find the human CCL3L1 gene.

(b) Have any CNVs been annotated for this gene? Note: In Ensembl, CNVs are classified as structural variants.


Exercise 15 – VEP (Variant Effect Predictor tool)


Resequencing of the genomic region of the human CFTR (cystic fibrosis transmembrane conductance regulator (ATP-binding cassette sub-family C, member 7) gene (ENSG00000001626) has revealed the following variants (alleles defined in the forward strand):

  • G/A at 7:117,171,039
  • T/C at 7:117,171,092
  • T/C at 7:117,171,122

(a) Use the VEP tool in Ensembl and choose the options to see SIFT and PolyPhen predictions. Do these variants result in a change in the proteins encoded by any of the Ensembl genes? Which gene? Have the variants already been found?

(b) Go to Region in detail for CFTR. Do you see the VEP track?


Exercise 16 – Exploring a SNP in mouse


Madsen et al in the paper ‘Altered metabolic signature in pre-diabetic NOD mice’ (PloS One. 2012; 7(4): e35445) have described several regulatory and coding SNPs, some of them in genes residing within the previously defined insulin dependent diabetes (IDD)regions. The authors describe that one of the identified SNPs in the murine Xdh gene (rs29522348) would lead to an amino acid substitution and could be damaging as predicted as by SIFT (

(a) Which chromosome and coordinates in the SNP located?

(b) What is the HGVS recommendation nomenclature for this SNP?

(c) Why does Ensembl put the C allele first (C/T)?

(d) Are there differences between the genotypes reported in NOD/LTJ and BALB/cByJ?