spacer
MicroCosm Targets is an online resource for microRNA target prediction for metazoa
more
Sylamer is a new tool for analysis of word enrichment in ordered genelists. It is appropriate for assessing miRNA or siRNA effects in gene-expression data. more
spacer

Enright Group : Research




Background

Anton Enright
Anton Enright

Complete genome sequencing projects are generating enormous amounts of data. Although progress has been rapid, a large proportion of genes in any given genome are either un-annotated or possess a poorly characterised function. The goal of our laboratory is to predict and describe the functions of genes, proteins and in particular, regulatory RNAs and their interactions in living organisms.

Our lab is entirely computational and our work involves the development of algorithms, protocols and datasets for functional genomics. Our research currently focuses on determining the functions of regulatory RNAs. We are also interested in analysis of biological networks, protein-protein interactions, clustering algorithms and vitalization techniques.

Decoding microRNA function and regulation

The recent discovery of widespread translational regulation by microRNAs (miRNAs) highlights the enormous diversity and complexity of gene regulation in living systems and the need for computational techniques to help understand these systems. We developed the miRanda algorithm for miRNA target detection in collaboration with the Computational Biology Center, at Memorial Sloan-Kettering Cancer Center in New York. Recently, we have predicted large-scale miRNA-target networks for mammalian, fish and insect genomes using the miRanda algorithm and cross-species sequence analysis as part of the miRBase database. The lab will continue to develop and improve methods for computational detection of miRNA target sites to investigate other possible aspects of miRNA target specificity, including sequence and structural motifs.

Sylamer Output
Double-stranded RNA (siRNA) entering the Argonaute Complex. The 3' overhang of the RNA inserts into a cavity on the PAZ domain of the bilobal Ago complex. (PDB: 28FS). Image produced with PyMol.

Much of our work centres around working closely with experimental labs interested in the function of small RNAs in their system of interest. We develop novel algorithms and techniques for analysis of primary data from such experiments (e.g. microarray). One example of this is the Sylamer algorithm for associating miRNA or siRNA effects with gene expression data. We also work on methods for the analysis of miRNA expression from both microarray and new technology sequencing approaches.

Sylamer Output
Sylamer Results for the miR-430 microRNA in Zebrafish

RNA Functional Genomics

We also work on the prediction and analysis of other regulatory RNAs including piwi-associated RNAs (piRNAs) and small non-coding RNAs (sncRNAs) in bacteria. Part of this work involves prediction of the transcriptional units of common RNAs and their upstream regulatory factors. We are also interested in the evolution of regulatory RNAs and developing phylogenetic techniques appropriate for short non-coding RNA. Our long-term goal is to combine regulatory RNA target prediction, secondary effects and upstream regulation into complex regulatory networks that may help us better understand the context of RNA in complex cellular networks


Studying Regulatory RNAs in Model Systems

Through our collaborations we work on understanding the role of RNA regulation in multiple diverse biological systems. These include: Zebrafish development, Mouse Knock-out models, Neuronal development, Disease and Cancer models and Embryonic Stem Cells. Typically these experiments involve identification of miRNAs through profiling techniques followed by experimental perturbation of miRNAs of interest. High-throughput techniques such as microarrays and new-technology sequencing are used to determine the effect of individual miRNAs in the system of interest.


Classification and Clustering

Markov Clustering
Markov clustering of a simple graph. Stochastic flow through the input network (left) is detected and modulated across iterations to produce a final clustering (right).

Protein family analysis aims to describe the function of a protein by placing it into an evolutionary context with other related proteins. Genes which have recently diverged from a common ancestor are usually easy to detect by virtue of very close sequence homology and frequently perform the same (or very similar) function across species. This classification becomes more difficult, however, for distantly related sequences, where homology is not readily detectable at the sequence level. Another difficulty involves determining whether two homologous genes are directly related through a speciation event (orthologs) or whether they are related by virtue of gene duplication (paralogs). Additionally, eukaryotic genomes are problematic as they tend to contain proteins with complex domain architectures and widespread 'promiscuous' domains which hinder accurate classification.

We use sequence clustering methods to overcome these problems and group sequences together based on shared sequence similarity domain architecture. The Markov Cluster Algorithm (MCL) developed by Stijn van Dongen, is exceptionally fast and accurate enough for large-scale sequence clustering of many hundreds of thousands of protein sequences. We will to continue to develop and improve sequence classification approaches using MCL in conjunction with other techniques for large-scale, accurate and hierarchical classification of protein sequences.


Visualisation of Biological Data

We try to combine our ideas and algorithms for graph-based clustering and analysis of biological data using visualisation tools. One of our methods BioLayout is now integrated with the OpenGL 3D system for fast display of complex graphs. A new version (BioLayout Express 3D) is now available which integrates this visualisation with MCL based sequence clustering and data mining of annotations. We have tested this approach using large-scale gene expression data.

BioLayout
Clustered Gene Expression Network Screenshot

Analysis of physical and functional interactions between proteins

High-throughput experimental techniques for determining protein-protein interactions (e.g. Yeast Two-Hybrid) are now widely available. We have been involved in the development of complementary computational techniques which aim to predict physical and functional interactions between proteins based on genomic sequence data. For example, the detection of fused composite proteins in one organism, which correspond directly to orthologous un-fused component proteins in other organisms, is a fingerprint that these protein pairs may interact. Other techniques involve the detection of genes which share phylogenetic profiles or gene locality across many genomes. Further evidence of interaction can be derived from the detection of correlated mutation in alignments derived from protein sequence families. Recently, we have shown that the clustering of proteins in the context of their position in interaction networks can be used to infer their biological function or process.

Gene Fusion
Stereo structure of the fused three component tri-functional protein Paranitrobenzyl esterase from Bacillus subtilis. PDB ID: 1C7J, figure produced using PyMol.

Selected References

Full Publication list here



Detection of microRNA binding and siRNA off-targets from expression data.
van Dongen S., Abreu-Goodger C., Enright AJ
Nature Methods 2008; 18978784 DOI: 10.1038/nmeth.1267
Genomic analysis of human microRNA transcripts.
Saini HK, Griffiths-Jones S, Enright AJ
Proc Natl Acad Sci U S A. 2007;. PMID: 17965236 DOI: 10.1073/pnas.0703890104
Zebrafish MiR-430 Promotes Deadenylation and Clearance of Maternal mRNAs.
Giraldez AJ, Mishima Y, Rihel J, Grocock RJ, Van Dongen S, Inoue K, Enright AJ, Schier AF
Science. 2006;. PMID: 16484454
MicroRNAs Regulate Brain Morphogenesis in Zebrafish.
Giraldez AJ, Cinalli RM, Glasner ME, Enright AJ, Thomson MJ, Baskerville S, Hammond SM, Bartel DP, Schier AF
Science. 2005;. PMID: 15774722
MicroRNA targets in Drosophila.
Enright AJ, John B, Gaul U, Tuschl T, Sander C, Marks DS
Genome Biol. 2003;5;R1. PMID: 14709173
An efficient algorithm for large-scale detection of protein families.
Enright AJ, Van Dongen S, Ouzounis CA
Nucleic Acids Res. 2002;30;1575-84. PMID: 11917018
Protein interaction maps for complete genomes based on gene fusion events.
Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA
Nature. 1999;402;86-90. PMID: 10573422



spacer
spacer