Exercise solutions

 

Homepage, assemblies and species

Exercise 1 – Panda

(a) Select Panda from the drop-down species list, or click on View full list of all Ensembl species, then choose Panda from the list.

The assembly is ailMel1 or GCA_000004335.1

(b) Click on More information and statistics. Statistics are shown in the tables on the right.

The length of the genome is 2,245,312,831 bp.
There are 19,343 coding genes.

Exercise 2 – Zebrafish

Click on Zebrafish on the front page of Ensembl to go to the species homepage.

Under Other assemblies three previous assembly names and the releases you can find them in are listed.

Assembly GRCz10 is available in the archived release 91, assembly Zv9 is available in the archived release 79 and assembly Zv8 is available in the archived release 54.

Exercise 3 – Mosquitoes

(a) Go to metazoa.ensembl.org. Click on View full list of all Ensembl Metazoa species. Type Anopheles into the filter box at the right above the table.

There are two Anopheles species: Anopheles gambiae and Anopheles darlingi.

(b) Click on Anopheles gambiae, then on More information and statistics.

The genome was revised in April 2014.

Exercise 4 – Bacteria

Go to bacteria.ensembl.org and start to type the name Belliella baltica into the genome search box. It will autocomplete, allowing you to select Belliella baltica DSM 15883, (TaxID 866536) from the drop-down list. Click on More information and statistics.

Belliella baltica has 3,680 coding genes and 53 non-coding genes.

Region in detail

Exercise 5 – Exploring a genomic region in human

(a) Go to the Ensembl homepage www.ensembl.org.

Select Search: Human and type 13:31937000-32633000 in the text box (or alternatively leave the Search drop-down list like it is and type human 13:31937000-32633000 in the text box).
Click Go.

This genomic region is located on cytogenetic band q13.1. It is made up of eight contigs, indicated by the alternating light and dark blue coloured bars in the Contigs track. Note that KF455761.1 is a tiny contig that splits AL137143.8 in two. You may need to zoom in to find it.

(b) Draw with your mouse a box encompassing the BRCA2 transcripts. Click on Jump to region in the pop-up menu.

(c) Click Configure this page in the side menu (or on the cog wheel icon in the top left hand side of the bottom image).

Type tilepath in the Find a track text box.
Select Tilepath.
Click on the (i) button to find out more.
The tilepath track shows the BAC clones that the assembly was based upon.
Save and close the new configuration by clicking on (or anywhere outside the pop-up window).
There is no single clone that contains the complete BRCA2 gene. The BAC clone RP11-37E23 contains most of the gene, but its very 3’ end is contained in RP11-298P3. This is reflected by the two contigs that make up the complete BRCA2 gene (the Contigs track is on by default). You may find this easier to see if you highlight the 3’ exon on BRCA2.

(d) Click Share this page in the side menu.

Select the link and copy it.

Get your neighbour’s email address and compose an email to them, paste the link into the email and send the message.

When you receive the link from your neighbour, open the email and click on the link.  You should be able to view the page with the new configuration and data tracks they have added to in the Location tab. You might see differences where they have specified a slightly different region to you, or where they have added different tracks.

(e) Click the Export data button in the side menu. Leave the default parameters as they are.
Click Next>.
Click on Text.

Note that the sequence has a header that provides information about the genome assembly (GRCh38), the chromosome, the start and end coordinates and the strand. For example:

<>13 dna:chromosome chromosome:GRCh38:13:32311910:32405865:1

(f) Click Configure this page in the side menu.
Click Reset configuration.
Click or click anywhere outside the or anywhere outside the pop-up window.

Exercise 6 – Exploring assembly exceptions in human

(a) Go to the Ensembl homepage www.ensembl.org.

Select Search: Human and type 21:32630000-32870000 in the text box (or alternatively leave the Search drop-down list like it is and type human 21:32630000-32870000 in the text box).
Click Go.

You will see a red highlighted region in the middle of this region. Click on the thin dark red bar in the Chromosome or Region view to see the label CHR_HSCHR21_3_CTG1_1:32769079-32843731. Click on What are assembly exceptions? to open a new window that explains assembly exceptions (Alternative sequence).

(b) Assembly exceptions are marked in the Chromosome view at the top.
There are seven assembly exceptions on chromosome 21.

(c) Another option in the pop-up is Compare with reference. Click on this.

Scroll down the page to see the comparison between the assembly exception and primary assembly. Aligned sequences are highlighted in pink and linked together in green.

The assembly exception CHR_HSCHR21_3_CTG1_1 contains an extra region compared to the primary assembly.

Exercise 7 – Exploring a genomic region in Anopheles gambiae

(a) Go to metazoa.ensembl.org.

Select Search: Anopheles gambiae and type 2L:7300000-7450000 in the text box. Click Go.

The region is located on the cytogenetic band 21B.

(b) There are seven genes in this region and two that overlap the ends.

Drag out a box around the second exon of the gene, the second red box from the left, and click on Jump to region. You may wish to drag out more boxes to zoom in further.

Click Configure this page in the side menu (or on the cog wheel icon in the top left hand side of the bottom image).

Type Start/stop codons in the Find a track text box.
Select Start/stop codons.
Close the menu by clicking on or anywhere outside the window.

Start codons are shown in green. You should see one that coincides with the start of the filled red box of the second exon.

(c) Drag out a box around the green start codon and select Mark region.

Scroll up to the Overview image to drag out a box around the gene and select Jump to region (or use the Zoom out button).

The highlighted region is now visible as a grey dotted line.

Extra Exercise 8 – Exploring CRISPR sites

(a) Go to the Ensembl homepage www.ensembl.org.

Select Search: Human and type 10:110578600-110578700 in the text box (or alternatively leave the Search drop-down list like it is and type human 10:110578600-110578700 in the text box).
Click Go.

Click Configure this page. Type crispr in the Find a track text box.
Select the CRISPR Cas9 track in Labels track style.

There are five positive strand and three negative strand CRISPR sites.

(b) Click on the variants in the track All phenotype-associated - short variants (SNPs and indels) which is shown by default, and the CRISPR sites to get their identifiers.

CRISPR sites 1074131234, 1074131235 and 1074131236 overlap variants rs113411202 and rs1064797151.

074131234 and 1074131235 also overlap rs779773957.

1074131233 overlaps rs78663177.

(c) In the Region image at the bottom on the page, click and drag a box around the negative strand CRISPR sites, then select Mark region in the pop-up window.

In the middle Overview image on the page, click and drag a box around the SMC3 gene, then select Jump to region in the pop-up window.

Count the exons of the SMC3-201 transcript or click on them to get the number of the exon with the marked region.

The negative strand CRISPR sites are found in exon 7 of the SMC3-201 transcript.