Rob Finn

Rob Finn
Team Leader, Sequence Families

Dr Rob Finn leads EMBL-EBI’s Sequence Families team, which is responsible for the InterPro, Pfam, Rfam and RNAcentral data resources. The team also looks after EMBL-EBI's fast-growing Metagenomics data service. Rob also has a small research group that probes the functions of microbial 'dark matter'.

Rob joined EMBL-EBI from the Janelia Research Campus in the US, where he led a group that designed fast, web-based, interactive protein-sequence searches and annotations. Between 2001 and 2010, he was the project leader for Pfam at the Wellcome Trust Sanger Institute in the UK. Rob’s academic background is in microbiology and he holds a PhD in biochemistry from Imperial College, London.

ORCID iD: 0000-0001-8626-2148

Tel:+ 44 (0) 1223 492 679 / Fax:

Finn team

The Sequence Families team is responsible for InterPro, Pfam, Rfam and RNAcentral. These services form the backbone of analysis capabilities in EBI Metagenomics.

Like all EMBL-EBI services, these data resources are freely available to all.

About our data services

Our services use complex mathematical models tailored for life-science research. For example, the HMMER3 algorithm offers fast detection of distantly related proteins and is available through our HMMER website infrastructure. We aim to simplify access to curated, complex data, and to maximise biological knowledge by extending annotation based on sequence similarity.

A unified view on sequence families

InterPro integrates protein-family data from 11 major sources, classifying the different protein family, domain and functional site definitions hierarchically to provide a unified view of diverse data. Pfam, a member database of InterPro, generates new protein family entries and has the largest sequence coverage of any of the InterPro member databases. Both InterPro and Pfam have a number of important applications, including the automatic annotation of proteins for UniProtKB/TrEMBL and genome annotation projects. Rfam classifies non-coding RNA sequences into families, using probabilistic models that take into account both sequence and secondary structure information, termed covariance models (CMs). Rfam is uniquely placed to annotate non-coding RNAs in genome projects and is a major contributing database to RNAcentral, a sequence resource launched in 2014.

InterProScan allows users to identify InterPro entries on protein sequences, and is available as either a web service or a downloadable software package. 

Understanding environmental samples

Our Metagenomics data service enables researchers to submit sequence data and associated descriptive metadata about environmental samples to public nucleotide archives. Once deposited, our team helps ensure the data is functionally analysed (using an InterPro-based pipeline), taxonomically analysed and visualised via a web interface. 

This graphic shows the workflow for our metagenomics analysis.


We participate in EMBL-EBI's Training Programme, offering courses in metagenomics and other approaches to sequence analysis.


We welcome new collaborations in all areas, and are particularly interested in working with people who have developed new tools for analysis, or who are working with metagenomics datasets generated with new sequencing technology.


Combined experimental/computational fellowships: ESPODs

Our team is recruiting postdoctoral fellows through the joint EMBL-EBI–Wellcome Trust Sanger Institute ESPOD programme. ESPOD fellowships are for researchers who take both an experimental and computational approach to their work.



Team members

  • Alexandre Almeida
  • Joanna Argasinska
  • Matthias Blum
  • Miguel Boland
  • Boris Burkov
  • Hsin Yu Chang
  • Amy Cottage
  • Hubert Denise
  • Sara El-Gebali
  • Matthew Fraser
  • David Jakubec
  • Ioanna Kalvari
  • Jaina Mistry
  • Alex Mitchell
  • Gift Nuka
  • Typhaine Paysan-Lafosse
  • Anton Petrov
  • Simon Potter
  • Matloob Qureshi
  • Lorna Richardson
  • Gustavo Salazar-Orejuela
  • Amaia Sangrador
  • Maxim Scheremetjew
  • Blake Sweeney