Bulk downloads (FTP)
The File Transfer Protocol (FTP) is a well-established protocol that is reliable for transferring large amounts of data. The FTP server is the best option for downloading large datasets, such as all the predicted structures for a particular organism. This method does not require coding experience and also allows you to access previous versions of the database. The FTP area hosts the proteomes of 48 organisms in TAR files, which contain compressed PDB and mmCIF files.
Note that the FTP site is the only way to access predictions for very large human proteins (over 2,700 amino acids). These are provided as 1,400aa overlapping fragments, these are named with an -F suffix (e.g., Q8WZ42-F1, Q8WZ42-F2).
Note that Predicted Aligned Error (PAE) and the Multiple Sequence Alignments (MSAs) are not included in FTP downloads.
To simplify navigating the FTP server, we have created a Google Colab notebook. This tool helps you browse the FTP directory and download only the specific PDB or mmCIF files you need without extracting entire proteome archives.