spacer

UniRef at the EBI


The UniProt NREF (UniProt Reference Clusters) database.

The two major objectives of UniRef are:

(i) to facilitate sequence merging in UniProt, and
(ii) to allow faster and more informative sequence similarity searches.
Although the UniProt Knowledgebase is much less redundant than UniParc, it still contains a certain level of redundancy because it is not possible to use fully automatic merging without risking merging of similar sequences from different proteins. However, such automatic procedures are extremely useful in compiling the UniRef databases to obtain complete coverage of sequence space while hiding redundant sequences (but not their descriptions) from view.

A high level of redundancy results in several problems, including slow database searches and long lists of similar or identical alignments that can obscure novel matches in the output. Thus, a more even sampling of sequence space is advantageous. This can be addressed by clustering closely similar sequences to yield a representative subset of sequences. Therefore, we have created various non-redundant databases with different sequence identity cut-offs. In the UniRef90 and UniRef50 databases no pair of sequences in the representative set has >90% or >50% mutual sequence identity. The UniRef100 database presents identical sequences and sub-fragments as a single entry with protein IDs, sequences, bibliography, and links to protein databases.

Accessing the database


Query the database


UniRef100 UniRef90 UniRef50 Explanation
UniRef Search UniRef Search UniRef Search Database queries
SRS SRS SRS Database queries
dbfetch dbfetch dbfetch Entry and sequence retrieval
Wu-BLAST2 Wu-BLAST2 Wu-BLAST2 Sequence similarity searches
NCBI-BLAST2 NCBI-BLAST2 NCBI-BLAST2 Sequence similarity searches
SSEARCH SSEARCH SSEARCH Sequence similarity searches
FASTA FASTA FASTA Sequence similarity searches

Ftp server

spacer
spacer