Accessing raw and processed data

Finding the raw data

In the paper it is stated that the participants gave their informed consent for the raw data to be shared provided that it is held securely in a controlled-access resource. The European Genome-phenome Archive (EGA) is the EMBL-EBI’s controlled-access data resource. This means that if you want to access data held at the EGA you must apply and be approved. This is a relatively straight forward process and access is usually granted to qualified investigators for appropriate use.

There are 33 studies at EGA that were submitted by WTCCC; a screenshot of these studies is shown in Figure 17.

Figure 17 The WTCCC data is held at the controlled-access EGA across 33 studies. Users must apply for access to the data owners in order to download these datasets (view in EGA).

Once you have been granted access to the data you can download the raw data for use in your own studies.

Accessing processed data

In addition to accessing the raw WTCCC data, we are also able to access the processed data. Because the paper used Genome-Wide Association Study (GWAS) methods we can find the results in the GWAS Catalog by searching with the article’s PubMed ID (Figure 18).

Figure 18 The processed data from the WTCCC publication are available at the GWAS Catalog (view in GWAS Catalog).

The data from the WTCC GWAS can be downloaded and reused in meta-analyses. For example, Zeggini et al combined these data with two other GWAS studies to uncover SNPs associated with type II diabetes2.

It is also possible to browse specific variants using the GWAS Catalog as a starting point, as we saw in case study 2.

To learn more about a particular variant we can link from the GWAS Catalog to Ensembl to further analyse the associated information. In turn, you can probe the effect of specific variants on protein structure and function using UniProt and PDBe as we did in case study 2.