Variation exercise answers



You'll get identical answers if you use If you want the more up-to-date version, use, but some of the answers may differ.


Answer 12 – Human population genetics and phenotype data


(a) There is more than way to get this answer. Either go to the Variation Table for the human TAGAP gene, and Show variants in the 5’UTR, or search Ensembl for rs1738074 directly.

Once you’re in the Variation tab, click on the Genes and regulation link or icon. This SNP is found in three transcripts (ENST00000326965, ENST00000338313, and ENST00000367066).

(b) Click on Population genetics at the left of the variation tab. (Or, click on Explore this variation at the left and click the population genetics icon.)

In Yoruba (CSHL-HAPMAP:HapMap-YRI population) the least frequent genotype is CC at the frequency of 9.7%. This is also the least frequent genotype in ASW, CHB, HCB, among others (to find out what the three letter population are, have a look at our FAQs (

(c) Click on phylogenetic context. The ancestral allele is T. Select the 13 eutherian mammals EPO alignment and click on Go. A region containing the SNP (highlighted and placed in the centre) and its flanking sequence is displayed. The T allele is conserved in all but one of the 13 eutherian mammals displayed.

(d) Click Phenotype Data at the left of the Variation page to see that this variation is associated with diabetes, multiple sclerosis and coeliac. There are known risk alleles for both multiple sclerosis and coeliac and the corresponding p values are provided. The allele A is associated with celiac disease. Note that the alleles reported by Ensembl are T/C. Ensembl reports alleles on the forward strand. This suggests that A was reported on the reverse strand in the PubMed article.

You can add the DAS source that mirrors data from SNPedia. We share information about the effects of variations in DNA, citing peer-reviewed scientific publications. Click on SNPedia in the left hand menu.


Answer 13 Exploring a SNP in human 


(a) Go to the Ensembl homepage (

Type rs1801133 in the Search box.

Click Go.

Click on Variation on the results page.

Click on Human.

Click on the Variation ID link or rs1801133.

(b) Click on Genes and Regulation in the side menu.

No, rs1801133 is missense variation in four MTHFR transcripts, i.e. ENST00000376583, ENST00000376585, ENST00000376590 and ENST00000376592. It's downstream of ENST00000418034 though.

(c) In Ensembl, the alleles of rs1801133 are given as G/A because these are the alleles in the forward strand of the genome. In the literature and in dbSNP, the alleles are given as C/T because the MTHFR gene is located on the reverse strand. The alleles in the actual gene and transcript sequences are C/T.

(d) Click on Population genetics in the side menu. 

In all populations but two from the 1000 genomes and HapMap projects, the allele G is the major one. The two exceptions are: CLM (Colombian in Medelin; 1000 Genomes), HCB (Han Chinese in Beijing, China; HapMap).

(e) Click on Phenotype Data in the left hand side menu. The specific study where the association was originally described is given in the Phenotype Data table. Click on pubmed/20031578 for more details.

The association between rs1801133 and homocysteine levels is described in the paper ‘Novel associations of CPS1, MUT, NOX4 and DPEP1 with plasma homocysteine in a healthy population: a genome-wide evaluation of 13,974 participants in the Women’s Genome Health Study’ (Pare et al, Cir Cardiovasc Genet. 2009 Apr;2(2):142-50.).

(f) Click on Phylogenetic Context in the side menu.

Select Alignment: 6 primates EPO.

Click Go.

Gorilla, orangutan, macaque and marmoset all have a G in this position, which confirms that G is indeed the ancestral allele at least in the six different primates provided in Ensembl (alignment: 6 primates EPO).

(g) Go to the Neanderthal Genome Browser (

Type rs1801133 in the Search Neandertal text box.

Click Go.

Click on rs1801133 on the results page.

Click on Jump to region in detail.

Click on Configure this page in the side menu.

Click on Variation features.

Select All variations – Normal.

Click SAVE and close.

Draw a box of about 50 bp around rs1801133 (shown in yellow in the centre of the display).

Click on Jump to region on the pop-up menu.

The Sequences track shows that there are four reads for Neanderthal at the position of rs1801133, all with a G, so based on these (very limited) data there is no evidence that both alleles were already present in Neanderthal.


Answer 14 – Structural variation (human) 


(a) Go to the Ensembl homepage (

Select Search: Human and type ccl3l1 in the search box.

Click Go.

Click on Gene on the results page.

Click on Human.

Click on CCL3L1

(b) Click on Structural Variation in the side menu.

Yes, CNVs have been annotated for this gene by multiple studies, as indicated by the many black and grey bars in the All structural variants track in the display. Details are given in the table below the display.


Can you do this with BioMart?


Answer 15 – VEP (Variant Effect Predictor tool)


(a) Go to and click on the link tools at the top of the page. Currently there are 5 tools listed in that page. Click on Variant Effect Predictor and enter the three variants as below:

7         117171039            117171039            G/A

7         117171092            117171092            T/C

7         117171122            117171122            T/C


Variation data input can be done in a variety of formats. See more details here:

Under the non-synonymous SNP predictions option, select prediction only for SIFT and PolyPhen, then click Next. The output format is either in HTML or text. You will get a table with the consequence terms from the Sequence Ontology project ( (i.e. synonymous, missense, downstream, intronic, 5’ UTR, 3’ UTR, etc) provided by VEP for the listed SNPs. You can also upload the VEP results as a track and view them on Location pages in Ensembl. SIFT and PolyPhen are available for missense SNPs only. For two of the entered positions, the variations have been predicted to be probably damaging/deleterious (coordinate 117171092) and benign/tolerated (coordinate 117171122). All the three variations have been already described and are known as in rs1800078, rs1800077 and rs35516286 in dbSNP and other sources (databases, literature, etc).

(b) In order to see your uploaded SNPs as a track in Region in detail, you will need to choose a name for this upload (e.g. VEP) when entering the data into the VEP tool. So you will need to enter the data again. Once you have done that and given a name to the upload, click on any link under the location column (in the VEP results table) to see your newly added VEP track with the three variations in the Location tab (or Region in detail view) in Ensembl.


Answer 16 – Exploring a SNP in mouse


(a) Go to, type rs29522348 in the search box. Click on the Variation link then on Mouse under By Feature type (or alternatively click on Mouse then Variation under By Species).

rs29522348 is located on 17:73924993. Its allele in Ensembl is provided as in the forward strand. 

(b) This SNP has got three HGVS names, one at the genomic DNA level (g.73924993C>T), one at the transcript level (721G>A) and one at the protein level (p.Val241Ile).

(c) In Ensembl, the allele that is present in the reference genome assembly is always put first (C is the allele for the reference mouse genome, strain C57BL/6J).

(d) Click on Individual genotypes is the left hand side menu. In the summary of genotypes by population, click on Show to see there are indeed differences between the genotypes reported in those two different strains. The genotype reported in NOD/LTJ is TT whereas in BALB/cByJ the genotype is CC.