Using alignments to improve Rfam families: a case of curating non-coding RNA families
Before starting to work on this mini-project you will need to read this paper: Non-coding RNA analysis using the Rfam database https://doi.org/10.1002/cpbi.51
Reading this short self-paced online tutorial about Rfam would also help https://www.ebi.ac.uk/training/online/courses/rfam-quick-tour/
Scenario
Rfam is the database of non-coding RNAs (ncRNAs) families. Each family is represented by a Multiple Sequence Alignment (MSA), a Secondary Structure (SS) and a Covariance Model (CM) [1]. Additionally, Rfam families contain metadata, including references to literature, ontology terms, and descriptions of the reported function of the ncRNA. Furthermore, each family is linked to a Wikipedia page. These pages are modifiable by any user, provide wide-reaching cross references, and can be reviewed by the public. Every one of these elements can be improved to produce a more accurate RNA family.
During this mini-project you will examine and improve miRNA families.
miRNAs are a type of ncRNA found in plants and animals that are involved in regulating gene expression post-transcriptionally. Rfam is currently synchronising our miRNA collection with miRBase. miRBase is the authoritative community resource for miRNA biology [2].
Project aims
- You will evaluate data from both the Rfam and miRBase databases and review the miRNA phylogenetic distribution, miRNA signature sequences like the mature sequence and comparisons between previous and newly constructed CM to improve Rfam alignments.
- We will also update the Rfam metadata which includes miRBase references and add short descriptions for the families.
Useful references
[1] Rfam 14: expanded coverage of metagenomic, viral and microRNA families https://doi.org/10.1093/nar/gkaa1047
[2] miRBase: from microRNA sequences to function https://doi.org/10.1093/nar/gky1141
[3] Non-coding RNA analysis using the Rfam database https://doi.org/10.1002/cpbi.51
[4] Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families https://doi.org/10.1093/nar/gkx1038
There is also a webinar about Rfam taking place on 17th May. It is at the same time as the Practical Biocuration course, however the recording will be made available shortly after the webinar and may be of interest:
https://www.ebi.ac.uk/training/events/annotating-genomes-non-coding-rnas-using-rfam-and-infernal/.