- Course overview
- Search within this course
- What is genetic variation
- Variant identification and analysis
- Types of genetic variation studies
- Quiz: Check your learning
- Summary
- Your feedback
- References
Summary
What is genetic variation?
Genetic variation is fundamental to the evolution of all species and is what makes us individuals. Mutations are the original source of genetic variation. In humans, recombination contributes to genetic variation by shuffling parental DNA and creating new combinations of variants.
Genetic variation is commonly divided into three main forms:
- Single base-pair substitution, also known as single nucleotide polymorphism (SNP)
- Insertion or deletion, also known as ‘indel’
- Structural variation
The term variant is used to refer to a specific region of the genome which differs between two genomes. Different versions of the same variant are called alleles. The term reference allele refers to the base that is found in the reference genome. The alternative allele refers to any base, other than the reference, that is found at that locus.
Genetic differences or variation between individuals leads to differences in an individual’s phenotype, trait or risk of developing a disease. A mendelian trait is one that is controlled by a single locus, for example a single SNP. For complex phenotypes there may be multiple variants in the genome that increase or decrease the likelihood of an individual having a certain trait, along with environmental factors.
What are the effects of genetic variation?
Variants can be categorised based on where they fall with respect to genes and other genomic features. Variant falling within a coding region are classified as synonymous, missense or nonsense variants based on how they affect the codon they falls within.
Variants in transcription factor binding sites are called TF binding site variants.
The effects of variants on protein structure can vary dramatically depending on the type of protein and the extent of variation.
Studying genetic variation
Some common steps in genetic variation studies include variant calling, variant analysis and prediction of variant effects on protein structure and function.
Variant calling is the process by which we identify variants from sequence data. It involves the alignment of whole geneome or whole exome sequencing data to a reference genome in order to identify where the aligned reads differ from the reference genome. The results are stored in a VCF file.
VCF is the standard file format for storing variation data. VCF files are tab delimited text files. Each column in the file provides information about a particular variant. VCFs are the preferred format because they are unambiguous, scalable and flexible.
Variant identifiers are unique combinations of letters and numbers that are assigned to known variants. Different types of identifiers are used for short variants and structural variants.
Predictions can be useful for understanding why a particular variant may cause a particular phenotype. However, prediction can never be as valuable as experimental data. Any predictions determined bioinformatically should be followed up experimentally.
Three common genetic variation study types are GWAS, studies on the functional consequences of variants and population genetics.
What’s next?We recommend taking at look at part II of our human genetic variation course in which we will learn how to explore publicly available genetic variation data. |