![]() |
Enright Group : Research
Background
Complete genome sequencing projects are generating enormous amounts of data. Although progress has been rapid, a large proportion of genes in any given genome are either un-annotated or possess a poorly characterised function. The goal of our laboratory is to predict and describe the functions of genes, proteins and in particular, regulatory RNAs and their interactions in living organisms. Our lab is entirely computational and our work involves the
development of algorithms, protocols and
datasets for functional genomics. Our research currently focuses
on determining the functions of regulatory RNAs. We are also interested
in analysis of biological networks, protein-protein interactions, clustering
algorithms and vitalization techniques.
Decoding microRNA function and regulationThe recent discovery of widespread translational regulation by microRNAs (miRNAs) highlights the enormous diversity and complexity of gene regulation in living systems and the need for computational techniques to help understand these systems. We developed the miRanda algorithm for miRNA target detection in collaboration with the Computational Biology Center, at Memorial Sloan-Kettering Cancer Center in New York. Recently, we have predicted large-scale miRNA-target networks for mammalian, fish and insect genomes using the miRanda algorithm and cross-species sequence analysis as part of the miRBase database. The lab will continue to develop and improve methods for computational detection of miRNA target sites to investigate other possible aspects of miRNA target specificity, including sequence and structural motifs.
Much of our work centres around working closely with experimental labs interested in the function of small RNAs in their system of interest. We develop novel algorithms and techniques for analysis of primary data from such experiments (e.g. microarray). One example of this is the Sylamer algorithm for associating miRNA or siRNA effects with gene expression data. We also work on methods for the analysis of miRNA expression from both microarray and new technology sequencing approaches.
RNA Functional GenomicsWe also work on the prediction and analysis of other regulatory RNAs including piwi-associated RNAs (piRNAs) and small non-coding RNAs (sncRNAs) in bacteria. Part of this work involves prediction of the transcriptional units of common RNAs and their upstream regulatory factors. We are also interested in the evolution of regulatory RNAs and developing phylogenetic techniques appropriate for short non-coding RNA. Our long-term goal is to combine regulatory RNA target prediction, secondary effects and upstream regulation into complex regulatory networks that may help us better understand the context of RNA in complex cellular networks Studying Regulatory RNAs in Model SystemsThrough our collaborations we work on understanding the role of RNA regulation in multiple diverse biological systems. These include: Zebrafish development, Mouse Knock-out models, Neuronal development, Disease and Cancer models and Embryonic Stem Cells. Typically these experiments involve identification of miRNAs through profiling techniques followed by experimental perturbation of miRNAs of interest. High-throughput techniques such as microarrays and new-technology sequencing are used to determine the effect of individual miRNAs in the system of interest. Classification and Clustering
Protein family analysis aims to describe the function of a protein by placing it into an evolutionary context with other related proteins. Genes which have recently diverged from a common ancestor are usually easy to detect by virtue of very close sequence homology and frequently perform the same (or very similar) function across species. This classification becomes more difficult, however, for distantly related sequences, where homology is not readily detectable at the sequence level. Another difficulty involves determining whether two homologous genes are directly related through a speciation event (orthologs) or whether they are related by virtue of gene duplication (paralogs). Additionally, eukaryotic genomes are problematic as they tend to contain proteins with complex domain architectures and widespread 'promiscuous' domains which hinder accurate classification. We use sequence clustering methods to overcome these problems and group sequences together based on shared sequence similarity domain architecture. The Markov Cluster Algorithm (MCL) developed by Stijn van Dongen, is exceptionally fast and accurate enough for large-scale sequence clustering of many hundreds of thousands of protein sequences. We will to continue to develop and improve sequence classification approaches using MCL in conjunction with other techniques for large-scale, accurate and hierarchical classification of protein sequences. Visualisation of Biological DataWe try to combine our ideas and algorithms for graph-based clustering and analysis of biological data using visualisation tools. One of our methods BioLayout is now integrated with the OpenGL 3D system for fast display of complex graphs. A new version (BioLayout Express 3D) is now available which integrates this visualisation with MCL based sequence clustering and data mining of annotations. We have tested this approach using large-scale gene expression data.
Analysis of physical and functional interactions between proteinsHigh-throughput experimental techniques for determining protein-protein interactions (e.g. Yeast Two-Hybrid) are now widely available. We have been involved in the development of complementary computational techniques which aim to predict physical and functional interactions between proteins based on genomic sequence data. For example, the detection of fused composite proteins in one organism, which correspond directly to orthologous un-fused component proteins in other organisms, is a fingerprint that these protein pairs may interact. Other techniques involve the detection of genes which share phylogenetic profiles or gene locality across many genomes. Further evidence of interaction can be derived from the detection of correlated mutation in alignments derived from protein sequence families. Recently, we have shown that the clustering of proteins in the context of their position in interaction networks can be used to infer their biological function or process.
Selected ReferencesFull Publication list here
![]() |