- Course overview
- Search within this course
- Introduction to public genetic variation data
- Case study 1: variants in a gene (PKD1)
- Cast study 2: Search for a variant (rs334)
- Case study 3: Search for a phenotype (non-melanoma skin cancer)
- Case study 4: Starting with the literature
- Your feedback
Accessing raw and processed data
Finding the raw data
In the paper it is stated that the participants gave their informed consent for the raw data to be shared provided that it is held securely in a controlled-access resource. The European Genome-phenome Archive (EGA) is the EMBL-EBI’s controlled-access data resource. This means that if you want to access data held at the EGA you must apply and be approved. This is a relatively straight forward process and access is usually granted to qualified investigators for appropriate use.
There are 33 studies at EGA that were submitted by WTCCC; a screenshot of these studies is shown in Figure 17.
Once you have been granted access to the data you can download the raw data for use in your own studies.
Accessing processed data
In addition to accessing the raw WTCCC data, we are also able to access the processed data. Because the paper used Genome-Wide Association Study (GWAS) methods we can find the results in the GWAS Catalog by searching with the article’s PubMed ID (Figure 18).
The data from the WTCC GWAS can be downloaded and reused in meta-analyses. For example, Zeggini et al combined these data with two other GWAS studies to uncover SNPs associated with type II diabetes2.
It is also possible to browse specific variants using the GWAS Catalog as a starting point, as we saw in case study 2.
To learn more about a particular variant we can link from the GWAS Catalog to Ensembl to further analyse the associated information. In turn, you can probe the effect of specific variants on protein structure and function using UniProt and PDBe as we did in case study 2.