A powerful HMMER for data mining

A powerful HMMER for data mining

1 Jun 2015 - 11:30

HMMER, a fast, sensitive search tool, helps biologists find sequence relationships deep in evolutionary time. HMMER algorithms are now available through a dedicated website at EMBL-EBI.

  • HMMER software implements a powerful new generation of mathematical techniques for identifying hundreds of thousands of related sequences;
  • HMMER results help researchers infer the function of a protein and its evolutionary history; 
  • New, open-source web interface offers fast, easy-to-use search and visualisation;
  • Results are easier to interpret thanks to filters for taxonomy and domain architecture.

Hinxton, 1 June 2015 - Researchers looking to understand the function and evolutionary history of a protein can now use HMMER algorithms through a dedicated website. HMMER, which uses sophisticated probability models and searches large sequence databases in seconds, has been incorporated into protein data services at EMBL-EBI including Pfam and InterPro.

HMMER was originally developed by Sean Eddy, currently at the Howard Hughes Medical Institute’s Janelia Research Campus in the US, and the project turned into a 10-year collaboration with Rob Finn, now head of Protein Family resources at EMBL-EBI. The most recent version of the website is faster and more robust, and is widely available with a full suite of tools to help researchers interpret protein data.

“We wanted HMMER to be more accessible, because it is an incredibly powerful tool for looking at protein function,” explains Rob. “Five years ago, the same kind of search would take hours to perform – now you can search more interactively, and the iterative search lets you start with a single sequence and pull in hundreds of thousands of related sequences. That takes you straight to these very distant relationships, which let you infer function and evolutionary history of a protein. So in my own research, I can now follow my train of thought without being interrupted by annoying holding pages saying that my search is running.”

The HMMER website lets users filter the input and output sequences by taxonomy or domain architecture, making it easier to interpret results. It also provides an intuitive visual interface that lets users navigate seamlessly between analyses just by clicking icons or figure elements such as histogram bars, table entries and taxonomy trees.

“Here at Janelia we’re focused on the probability theory and computer engineering that power HMMER’s algorithms,” says Sean. “What Rob and his team have done with the new EMBL-EBI web site is to make HMMER far more useful and accessible to biologists. The new website is super fast, but even more than that, it incorporates a lot of creative data visualisation and interactive tools, based on Rob’s long experience with protein sequence analysis. Rob’s work takes the HMMER project to a new level.” 

HMMER is both fast and extremely sensitive, detecting distant relationships and identifying fragments of sequences. The deep homologous relationships HMMER can find enable many aspects of computational biology research, from characterising metagenomic communities to understanding the evolution of development.

What’s next? The HMMER web tools are already used as a curation platform for Pfam, and can be used to add expert community knowledge to public data resources. This will potentially expand the number of freely available, high-quality, annotated protein family entries available to researchers worldwide. 

The next step for the collaborators is to extend the software to accommodate DNA searches, which involves far larger datasets.

Source articles

Finn RD, Clements J, Arndt W, et al. (2015) HMMER web server: 2015 update. Nucleic Acids Res. DOI: 10.1093/nar/gkv397

Wheeler TJ and Eddy SR (2013) nhmmer: DNA homology search with profile HMMs. Bioinformatics 29:2487-2489. DOI: 10.1093/bioinformatics/btt403

Eddy SR (2011) Accelerated profile HMM searches. PLoS Comput Biol. 7:e1002195. DOI: 10.1371/journal.pcbi.1002195

Johnson LS, Eddy SR and Portugaly E (2010) Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinform. 11:431. DOI: 10.1186/1471-2105-11-431

Eddy SR (2009) A new generation of homology search tools based on probabilistic inference. Genome Inform. 23:205-211

Contact the news team

Mary Todd Bergman
Senior Communications Officer
mary@ebi.ac.uk
+44 (0)1223 494 665

Oana Stroe
Communications Officer
stroe@ebi.ac.uk
+44 (0)1223 494 369

Subscribe to the e-mail newsletter
Get a monthly round-up of the hottest news and features from EMBL, straight to your inbox.
Or stay updated with the RSS feed (EMBL-EBI only).