Ensembl genes and transcripts exercises


Finding out about Ensembl genes and transcripts


Exercise 1 – Exploring the human MYH9 gene

(a) Find the human MYH9 (myosin, heavy chain 9, non-muscle) gene, and go to the Gene

  • On which chromosome and which strand of the genome is this gene located?
  • How many transcripts (splice variants) are there and how many are protein coding?
  • What is the longest transcript, and how long is the protein it encodes?
  • Which transcript is the best quality?

(b) Click on Phenotype at the left side of the page. Are there any diseases associated with this gene, according to OMIM (Online Mendelian Inheritance in Man)?

(c) What are some functions of MYH9 according to the Gene Ontology consortium? Have a look at the GO pages for this gene.

(d) In the transcript table, click on the transcript ID for MYH9-201, and go to the Transcript tab.

  • How many exons does it have?
  • Are any of the exons completely or partially untranslated?
  • Is there an associated sequence in UniProtKB/Swiss-Prot? Have a look at the General identifiers for this transcript.

(e) Are there microarray (oligo) probes that can be used to monitor ENST00000216181 expression?


Exercise 2 – Finding a gene associated with a phenotype

Phenylketonuria is a genetic disorder caused by an inability to metabolise phenylalanine in any body tissue. This results in an accumulation of phenylalanine causing seizures and mental retardation.

(a) Search for phenylketonuria from the Ensembl homepage and narrow down your search to only genes. What gene is associated with this disorder?

(b) How many protein coding transcripts does this gene have? View all of these in the transcript comparison view.

(c) What is the MIM gene identifier for this gene?


Exercise 3 – Exploring a bacterial gene (Clostridium sporogenes)

Start in http://bacteria.ensembl.org/index.html and select the Clostridium sporogenes str. DSM 795 (GCA_001020205) genome.

(a) What GO: biological process terms are associated with the polC CLSPOx_12590 gene?

(b) Go to the transcript tab for the only transcript, PolC-1. How long is the transcript?

(c) What domains can be found in the protein product of this transcript? How many different domain prediction methods agree with each of these domains?