0%

Course summary

Pfam

Pfam is a protein sequence database that is used to analyse novel protein sequences and predict their domain architecture, which is important to inference a possible function and its relationship with distant homologues.

Pfam can be searched in different ways, using a UniProt ID/accession, Pfams IDs/accessions, PDB IDs, keywords and protein/DNA FASTA sequences. All data is freely available and can be downloaded.

In Pfam there is structural information from experimentally solved structures (from PDB) and structure predictions from AlphaFold and trRosetta models. This is of great help as the protein sequence data available exceeds the structure determination ability. In the case of AlphaFold, it is possible to have a full length structure visualisation which is particularly important to understand how domains are organised in space and protein function.

InterPro

InterPro is a classification resource for protein families, domains and functional sites, which integrates the following protein signature databases: CDD, SFLD, PROSITE, PRINTS, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. By uniting the member databases, InterPro capitalises on their individual strengths, producing a powerful diagnostic tool and integrated resource.

InterPro can help if you have a sequence or set of sequences and want to know the protein family to which they belong, the domains they contain and/or what their function is.

The InterPro website offers different type of searches (sequence, text, domain architecture), but also a Browse feature allowing to filter the data. It is supported by an API with six main data types: entry, protein, structure, taxonomy, proteome and set. All the data is freely available and can be downloaded in different formats to perform further analysis.

You can now explore the Pfam and InterPro resources with your favourite structures or proteins.