Exercises 2

1. DataSets

Datasets are a meta container for data, grouping the data obtained by an analysis (FeatureSets) to the underlying raw data (ResultSets).

a) Create a script which fetches all available DataSets for Human.

b) How many are there?

c) Now get the 'RegulatoryFeatures:MultiCell' data set and print the display label of the product feature set and all the supporting sets.

Hint: Use the DataSetAdaptor methods.

2. FeatureSets

Feature Sets hold processed data or features i.e. peak calls or the output of a high level analysis e.g. the Regulatory Build.

a) Print the name of the feature sets for the Human 'GM12878' cell type.

b) Print the name of the feature sets for the Human 'CTCF' feature type.

c) Is the Human FeatureSet 'VISTA enhancer set' associated to any cell type or feature type?

d) Trick question: Get the supporting data for the VISTA FeatureSet.

Hint: Most adaptors have a fetch_by_name method.
DataSetAdaptor->fetch_by_product_FeatureSet will fetch the DataSet containing the supporting/raw data for a FeatureSet.


Nathan explains the answers to these questions in this 7 min video. You can download his sample scripts and outputs:

1. sample script and output

2. sample script and output