0%

Similar Proteins tab

The Similar proteins tab is where you can explore proteins with similar 3D shapes and structures. This is a critical feature because while sequence-based searches (i.e. BLAST) are powerful, they often fail to find distantly related proteins where the amino acid sequence has diverged significantly over evolutionary time.

The tab is divided into two key tables:

  1. Similar structures
  2. Structure similarity cluster

1. Similar structures

This section uses Foldseek, a fast and accurate algorithm that compares protein shapes directly. Foldseek works by simplifying 3D structures into a linear, one-dimensional representation through its unique ‘3Di alphabet’.

The search results are organised into tabs, allowing you to search against several datasets:

  • PDB structures: Experimentally determined structures from the Protein Data Bank.
  • AFDB50 predicted structures: Predicted structures from the AFDB, clustered at 50% sequence identity to reduce redundancy.
Foldseek structural similarity results. The table lists structurally similar proteins found in the PDB. Users can select a result to superimpose its structure onto the query protein in the 3D viewer for direct comparison.

Each row in the results table includes a pairwise sequence alignment, residue range, E-value (statistical significance), sequence identity, and quality metrics like resolution (for PDB structures) and average pLDDT (for predicted structures). You can select and superimpose structures in the 3D viewer to compare their structural alignment.

The alignment is accompanied by an RMSD (Root-Mean-Square Deviation) value, which highlights the average distance between corresponding atoms in the aligned structures.

Download Options

You can download the search results, individual data files, and the aligned 3D models from this interface.

  • Results Data: From the main results table, you can export several types of data for offline analysis:
    • Tabular data: Download up to 10,000 rows of the similarity search results.
    • Structure coordinates: Download the original coordinate files for individual entries (in PDB or mmCIF format).
    • Validation data: For structures from the PDB, you can download the validation data as an XML file.
    • Predicted Aligned Error: For predicted structures, you can also download the Predicted Aligned Error (PAE) data as a JSON file.
  • Aligned Structures: To download the superimposed 3D models that you have aligned in the viewer, follow these steps:
    • First, select the desired entries in the results table.
    • Toggle on the “Align in 3D” option.
    • Once the structures are aligned, you can download their superimposed coordinates as individual mmCIF files.
Foldseek structural similarity results. The “Similar Proteins” view after running a Foldseek search. Several structures have been selected using the “Align in 3D” toggle and are shown superimposed in the viewer. The “Download data” menu is also open, showing the option to export the tabular search results.

2. Structure similarity cluster

The AlphaFold Database also incorporates information about structurally similar proteins by grouping them into clusters. This feature is based on a two-phase clustering process:

  • Sequence-based clustering (AFDB50/MMseqs2): The initial step groups the 214 million UniProtKB protein sequences in the AFDB using MMseqs2.
  • Structure-based clustering (AFDB/Foldseek): The representative proteins from the first step are then further clustered using Foldseek.

You’ll find a table listing its cluster members. This table provides links to the pages of other proteins in the same cluster, allowing you to compare their structures and explore potential functional relationships quickly. Exploring cluster members can reveal evolutionary relationships, functional similarities, and unexpected connections between proteins.

Structure similarity cluster members. This table lists other proteins that belong to the same structural cluster, enabling easy comparison and exploration of potentially related proteins.