Understanding data in RNAcentral
RNAcentral identifiers
In RNAcentral, each distinct ncRNA sequence is assigned a Unique RNA Sequence identifier (URS ID), which is stable across releases. The same sequence can be observed in several species, and to distinguish between them RNAcentral provides species-specific identifiers3 which consist of the URS ID joined with the NCBI taxid for the species where the sequence occurs.
Example:
URS00005EB5B7_9606 refers to the human hsa-let-7a-1 microRNA
URS00005EB5B7_9913 refers to the same sequence in cow
The URS IDs can be marked inactive if there are no current cross references. For example, at the time of writing URS0000631465 is inactive. This tRNA sequence was present in the Rfam database version 13.0, however, later Rfam versions did not include this sequence. Since there are no other cross references to this sequence, it is marked as inactive.
The URS IDs are never deleted and can always be accessed on the RNAcentral website using direct URLs in the following format: https://rnacentral.org/rna/<URS ID>. However, the inactive sequences are removed from the sequence similarity and text search results, and all FTP files, except the rnacentral_inactive.fasta.gz.
Sequence naming and RNA type
RNAcentral provides descriptions and RNA type for all sequences. These annotations are essential to understanding the function of any RNA sequence, but there are some important factors to consider.
First, these annotations are computed automatically from the descriptions and RNA types provided by the member databases. No description or RNA type is assigned manually in RNAcentral, although sequences from certain member databases, such as GENCODE or HGNC may be manually curated. Additionally, member databases may disagree on an annotation. In such cases, RNAcentral strives to pick the annotations that are most consistent with the available data.
Gene entries
Since release v26, RNAcentral has had gene-level entries. Genes are assigned a unique RNAcentral gene identifier (RNACG ID) and the genes view returns key properties such as length, function, location and a list of related transcripts. They can be accessed on the RNAcentral website using direct URLs in the following format: https://rnacentral.org/genes/<RNACG ID>, for example: https://rnacentral.org/genes/RNACG10416967208.5