UniRef

Three UniProt Reference Cluster (UniRef) databases provide clustered sets of sequences from UniProtKB and selected UniParc records. This hides redundant sequences and obtains complete coverage of the sequence space at three resolutions for faster similarity searches:

UniRef100 combines identical sequences and sub-fragments with 11 or more residues from any organism into a single UniRef entry.

UniRef90 is built by clustering UniRef100 sequences such that each cluster is composed of sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence in the cluster (the seed sequence).

UniRef50 is built by clustering UniRef90 seed sequences that have at least 50% sequence identity to, and 80% overlap with, the longest sequence in the cluster.

UniProt

UniRef

Congratulations!