Exercises

Exercises 3

1. Annotated Features

Annotated Features represents the results of an analysis of raw or processing signal data. These correspond to regions in the genome enriched for specific events (like TF binding or Histone Marks) i.e. they are 'peak calls'.

Compare the number of annotated features in the region Y:5000000-40000000 between the Human feature sets:

K562_DNase1_ENCODE_Duke_SWEmbl_R0025_D150

HepG2_DNase1_ENCODE_Duke_SWEmbl_R0025_D150

What are the differences and why?

 

2. Motif Features

Motif features represent putative binding sites based on alignments of PWMs fromJASPAR. MotifFeatures are always associated to AnnotatedFeatures representing Transcription Factor (TF) Binding. More information about how we integrate these into the regulatory build process can be found here.

Get the 'motif' regulatory attributes associated to the Human Regulatory Feature 'ENSR00001227187'. Print their properties.

Hint: use 'motif' as a parameter for regulatory_attributes.

Print the properties of the annotated features associated to the motif feature.

 

3. Binding Matrices and motif strength

Each MotifFeature is associated with a PWM, which are represented by the 'BindingMatrix' class. The MotifFeature score represents the relative binding affinity with respect to the PWM defined in the BindingMatrix.

Using the Motif feature obtained in exercise 2, get the associated Binding Matrix and print some details.

Check potential effect of changes in the sequence of the motif feature on the relative strength of that motif feature.

Check the GERP conservation scores along the motif. Compare with the JASPARmatrix.

 

Answer

Nathan explains the answers to these questions in this 11 min video. You can download his sample scripts and outputs:

1. sample script and output

2. sample script and output

3. sample script and output