Browser exercise answers


Information

Answers

These answers were found using version 70 of Ensembl. To get the same answers with vertebrates, use e70.ensembl.org (if you want to see the current version, use ensembl.org, but note that some of the answers may differ).

For non-vertebrates, use www.ensemblgenomes.org (note that archive sites are not retained for Ensembl genomes so you may not be able to find identical answers - let us know at helpdesk@ensembl.org if you have any questions).

 

Answer 1 – Exploring the human MYH9 gene

Human

(a) Go to the Ensembl homepage (http://www.ensembl.org).

Select Search: Human and type MYH9.

Click Go.

Click on Human on the results page.

Click on Gene.

Click on either the Ensembl ID ENSG00000100345 or the gene name MYH9.

  • Chromosome 22 on the reverse strand.
  • Ensembl has 12 transcripts annotated for this gene.
  • Three transcripts are protein coding.
  • The longest transcript is MYH9-001 and it codes for a protein of 1960 amino acids
  • MYH9-001 has a CCDS record. CCDS is the consensus coding sequence set. These coding sequences (CDS) have been agreed upon by Ensembl, NCBI, UCSC and Havana.
Help

The CCDS set is a collection of reviewed, agreed-upon coding sequences (for human and mouse). These sequences are high-confidence, and unlikely to change in the future.

(b) These are some of the phenotypes associated to MYH9 according to MIM: autosomal dominant deafness, Epstein syndrome, and Fechtner syndrome. Click on any of these for more information in the MIM record itself.

(c) Click on ENST00000216181

  • It has 41 exons. This is shown in the Transcript summary.
  • Click on Exons in the side menu. Exon 1 is completely untranslated, and exons 2 and 41 are partially untranslated (UTR sequence is shown in purple). You can also see this in the cDNA view.
  • Click on General identifiers in the side menu. MYH9-HUMAN from Swiss-Prot matches the Ensembl transcript. Click on it to go to UniProtKB, or click align for the alignment.
  • Have a look at Ontology table. The Gene Ontology project (http://www.geneontology.org/) maps terms to a protein in three classes: biological process, cellular component, and molecular function. Meiotic spindle organisation, cell morphogenesis, and cytokinesis are some of the roles associated with MYH9-001.

(d) Click on Oligo probes in the side menu. Probesets from Affymetrix, Agilent, Codelink, Illumina, and Phalanx match to this transcript sequence. Expression analysis with any of these probesets would reveal information about the transcript. Hint: this information can sometimes be found in the ArrayExpress Atlas: www.ebi.ac.uk/arrayexpress/

 

Answer 2 -Exploring a genomic region in human

Human

(a) Go to the Ensembl homepage (http://www.ensembl.org/).

Select Search: Human and type 13:32448000-33198000 in the text box (or alternatively leave the Search drop-down list like it is and type human 13:32448000-33198000 in the text box).

Click Go.

This genomic region is located on cytogenetic band q13.1. It is made up of seven contigs, indicated by the alternating light and dark blue coloured bars in the Contigs track.

(b) Draw with your mouse a box encompassing the BRCA2 transcripts.

Click on Jump to region in the pop-up menu.

(c) Click Configure this page in the side menu.

Type clones in the Find a track text box.

Select 1Mb clone set, 32k clone set and Tilepath.

Click on the tick.

It doesn’t look like there is a clone that contains the complete BRCA2 gene. For example clone RP11-37E23 contains most of the gene, but not its very 3’ end. This was reflected on the two contigs needed to make up the BRCA2 gene (the Contigs track is on by default).

(d) Click Configure this page in the side menu.

Type refseq in the Find a track text box.

Select Human RefSeq import – Expanded with labels.

Click on the tick.

Click on individual transcript models (RefSeq or otherwise) to retrieve more information about them.

There has been one transcript annotated by RefSeq for the BRCA2 gene, i.e. NM_000059.3. This transcript is almost identical to Ensembl transcript BRCA2-001 (ENST00000380152). Both encode a 3418 aa protein. The RefSeq transcript is shorter at the 5’ UTR and longer at the 5’ UTR.

(e) Click Share this page in the side menu.

Select the link and copy.

Go into your email account and compose an email to yourself.

Paste the link in, then send.

Open the email and click on your link.

(f) Click Export data in the side menu.

Click Next>.

Click on Text.

Note that the sequence has a header that provides information about the genome assembly (GRCh37), the chromosome, the start and end coordinates and the strand. For example:

>13 dna:chromosome chromosome:GRCh37:13:32883613:32978196:1

(g) Click Configure this page in the side menu.

Click Reset configuration.

Click on the tick.

 

Answer 3 - Exploring a bacterial gene

Bacillus subtilis

(a) Go to the Ensembl Bacteria homepage: (http://bacteria.ensembl.org).

Start to type Bacillus subtilis subsp. subtilis strain 168 into the species search box. As you type this will autocomplete; once you can see the correct subspecies name, click on it. 

Click on More information and statistics.

The references listed show the sequencing project published in 1997 (Kunst et al). The gene annotation comes from ENA and UniProtKB.

(b) Click on the back button to return to the species homepage.

Search for dhaS and click on the gene name link. You are now in the gene tab.

Click Orthologues at the left.

Orthologues have been predicted for 225 genomes in release 17 of Ensembl Bacteria.

Pan-taxonomic Compara compares homologues across a subset of many Ensembl and Ensembl Genome species (including some plants, vertebrates, etc).

(c) Click on the transcript ID CAB13823 in the Transcript table at the top of the view.

Click on Protein sequence at the top left for the amino acid sequence.

Click on the Protein Summary link at the top left to view the Aldehyde_DH domain from various projects (for example Superfamily and Pfam).

 

Answer 4 – Exploring a plant gene (Vitis vinifera, grape)

Grape

(a) Go to http://plants.ensembl.org/index.html. Select Vitis vinifera from the drop down menu All genomes – select a species or click on View full list of all Ensembl Plants species and then choose V. vinifera. Type MADS4 and click on the gene name link MADS4 [VIT01s0010g03900]. Click on Orthologues in the side menu. There is one gene in Arabidopsis thaliana predicted to be an orthologue for the grape MADS4. It is located on 1:8593637-8596105 (reverse strand) and the orthologue type assigned for MADS4 when comparing A. thaliana and V. vinifera is 1-to-1 (only one orthologue copy is found in each species).

(b) Click on Genomic alignments (text), select the pairwise alignment Arabidopsis thaliana – blastz from the Alignment drop down menu and click Go. To display the start and stop codons in the alignment, click on Configure this page and select the START/STOP codons display option under the field Codons. These codons will be displayed in the sequence against a yellow background.

c) Click on the transcript tab named VIT01s0010g03900.t01 and then on Variation table in the left hand side menu. Several variations have been mapped to this transcript and only one is a missense variation which causes a change at the protein level. The possible amino acids are R (Arginine) and S (Serine).