Systematic description and prediction of variations and their effects

29/04/2014 - Room Garden Room at 14:00 - External Seminar
Mauno Vihinen
(Professor of Medical Structural Biology, Lund Uni)
Next Generation Sequencing methods are used to produce ever increasing amounts of sequence and variation information. Bottleneck has moved from sequencing to interpretation the significance and meaning of variations. Two apoproaches which help in interpretation will be described. Variation Ontology (VariO) allows systematic annotation of the effects, consequences and mechanisms of variations. PON-P2 is a high-accuracy tool for filtering and predicting disease-related variations. VariO describes variation effects att the three major biological levels - DNA, RNA and protein. Each of them have four main sublevels. Variation type describes the type of variation. Function terms provide annotation for the general function(s) affected by the variation. Structure sublevel terms are for describing affected structural features. Property terms are used for defining diverse features. VariO is for annotating the variant, not the wild type features or properties, and requires a reference compared to which the changes are indicated. VariO is versatile and can be used for variations ranging from genomic multiplications to single nucleotide or amino acid changes whether of genetic or non-genetic origin and in any organism. VariO annotations are position specific. AmiVario viewer can be used for visualization of the terms. VariOtator tool has been developed to help in annotation. VariO can be used for annotations e.g. in locus specific variation databases (LSDBs), central variation databases, as well as variation type specific databases. One of the first tasks in exome and complete genome sequencing project after the actual sequencing is prioritization and filtering of likely harmful variations. Several tools have been developed for this purpose, however, their performance has been suboptimal. We developed a a new tool called PON-P2 for filtering harmful amino acid subsitutions. The method is based on machine learning and utilizes random forest classifier. It has been trained and tested with a dataset of 32 000 experimentally verified disease-causing or benign cases available in VariBench, database for variation benchmarks. To our knowledge, PON-P2 has the highest accuracy among this kind of tools, in addition it is very fast and can accept submissions in several formats. VariO http://variationontology.org PON-P2 http://structure.bmc.lu.se/PON-P2 Relevant references: Nair, P. S. and Vihinen, M. VariBench: A benchmark database for variations. (2013) Hum. Mutat. 34, 42-49. Niroula, A., Urolagin, S. and Vihinen, M. PON-P2: Prediction method for fast and reliable identification of harmful variants. (revised). Thusberg, J, Olatubosun, A., and Vihinen, M. (2011) Performance of mutation pathogenicity prediction methods. Hum. Mutat. 32, 358–368. Vihinen, M. (2014) Variation Ontology for annotation of variation effects and mechanisms. Genome Res. 24, 356-364. Vihinen, M. (2014) Variation Ontology: Annotator guide. J. Biomed. Semantics 5:9.
Hosted by: Gary Saunders