Data upload exercises


Exercise 1 - Attach URLs of large files

Larger files, such as BAM files generated by NGS, need to be attached by a URL. There is a BAM file of human chromosome 20 RNAseq data online at:

Here you can see a number of BAM files (.bam) with corresponding index files (.bam.bai). We are interested in the files GRCh38.20.illumina.merged.1.bam and GRCh38.20.illumina.merged.1.bam.bai. These files are the BAM file and the index file, respectively. When attaching a BAM file to Ensembl, there must be an index file in the same folder, but only the BAM file itself needs to be uploaded to Ensembl.

(a) Attach and view the BAM file of human chromosome 20 RNAseq data.

(b) Go to the region on chromosome 20 that contains the CDH22 gene. Configure the page to show your added track in the ‘Unlimited’ style. What is the relationship between the number of RNAseq reads and the exons of CDH22?

(c) Zoom onto exon 1 of CDH22 so that you can see the the sequence of the individual RNAseq reads.

(d) Remove the track from your Region in detail view.

Exercise 2 - Track Hubs

(a) Add the ENCODE Analysis Hub to the Region in detail view for the genomic region surrounding the BRCA2 gene.

Hint: You will need to add and view this Track Hub to the human GRCh37 genome assembly.

(b) Turn on all the available tracks relating to Histone Modification Peaks and Transcription Factor Peaks in HeLa-S3 cells.

(c) Which Transcription Factors and Histone Modifications are annotated in this region?

(d) Add the tracks showing Signals for the ENCODE Histone Modifications and Transcription Factors that have peaks in this region. Compare the signal intensity to the location of annotated peaks.

(e) Remove the ENCODE Analysis Hub from your list of custom tracks.