Course progress: 0%

Loading data

Dataset description

We’ll be using the “00_logFCs.tsv” file, which can be downloaded from this link on the Sanger website (outside of EMBL-EBI). This file is one of the eight files in essentiality_matrices.zip and contains a data matrix with the depletion log fold changes for 17,995 genes scored for each of the 325 cell lines [18]. All but one cell (named HT29v1.1) are cancer cell lines [19] in this matrix.

Note: The essentiality_matrices.zip is a rather large file (241,5 MB), and may take a while to download and unzip. The “00_logFCs.tsv” file is 104,9 MB.

To load the data file it first needs to be converted to a CSV file. This can be done by opening the TSV file in a program such as Excel and saving the file as a CSV file. We have also made the CSV file available for direct download: download CSV file.

Model development