- Course overview
- Search within this course
- Introduction to public genetic variation data
- Case study 1: variants in a gene (PKD1)
- Cast study 2: Search for a variant (rs334)
- Case study 3: Search for a phenotype (non-melanoma skin cancer)
- Case study 4: Starting with the literature
- Your feedback
Introduction to public genetic variation data
A wealth of genetic variation data is generated within the scientific community to investigate many diverse subject areas from human disease to informing selective breeding in common bean varieties.
In addition to the continued generation of new genetic variation data, it is important for the community to have access to data that has been previously generated to aid re-evaluation and/or re-use of data in testing both new and previously established hypotheses.
What EMBL-EBI databases and resources are available for sharing, exploring and understanding genetic variation data?
European Variation Archive (EVA): a database of genetic variation data. These datasets are submitted from the community to EVA in order to aid data sharing, and data reuse.
Ensembl: a genome browser that provides a single point of access to annotated genomes. It includes information about genetic variants, population genetics and tools for exploring your own variant data.
GWAS catalog: a quality controlled, manually curated database of published GWAS studies. The GWAS karyotype diagram provides an interactive way of exploring all SNP-trait associations.
UniProt: EMBL-EBI’s resource for protein sequence and annotation data. You can use UniProt’s protein feature viewer to explore variants in relation to protein sequences, structure and function.
Each of these databases uses standardised ways of identifying and classifying variants.
For example, Sequence Ontology (SO) provide a standard nomenclature for categorising variants based on where they fall with respect to genes and other genomic features (Figure 1). For an overview of identifiers used by different databases see the section on variant identifiers in part I of this course.
In addition to the sequence ontology terms, an IMPACT measure, agreed by Ensembl and SnpEff provides a subjective classification of the severity of each class. Terms commonly used by Ensembl to describe variants are shown in Figure 2.
There are many different starting points for exploring publicly available genetic variation data. The next section features case studies that will illustrate four different ways you can access genetic variation data using a gene, variant, phenotype or publication as a starting point.