Quick access

The European Bioinformatics Institute FTP site provides access to data from SIFTS.

Individual PDB entry data can either be found in a path like this:

ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/xml/1xyz.xml.gz - where 1xyz is the PDB code

or in a path like this:

ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/split_xml/xy/1xyz.xml.gz - where 'xy' are the second and third characters of the PDB code and 1xyz is the PDB code itself.

Residue-level cross-reference data from the PDBe database has been made available in the xml_remediated directory. This directory contains a file for each PDB entry as before and in the same format. Note that the tab delimited ASCII text format files and PDBe view web pages correspond to the xml_remediated files.

The residue-level cross reference legacy data is available in XML format and is located in the XML directory. This directory contains a file for each PDB entry.

The SIFTS data for chain-level mapping for all PDB chains is available in both a tab-delimited ASCII format located in the "flatfiles/tsv" directory and a comma-separated value format (CSV) located in the "flatfiles/csv" directory

The following summary files are available:

pdb_chain_uniprot.tsv.gz
pdb_chain_uniprot.csv.gz
A summary of the PDBe to UniProt residue level mapping, showing the start and end residues of the mapping using SEQRES, PDB sequence and UniProt numbering.
pdb_chain_taxonomy.tsv.gz
pdb_chain_taxonomy.csv.gz
A summary of the NCBI tax_id(s),scientific_name(s) and chain type for each PDB chain that has been processed.
pdb_pubmed.tsv.gz
pdb_pubmed.csv.gz
A summary of the Pubmed id(s) associated with each PDB entry, together with an ordinal number.
pdb_chain_enzyme.tsv.gz
pdb_chain_enzyme.csv.gz
A summary of the EC number(s) for each PDB chain that has been processed.
pdb_chain_go.tsv.gz
pdb_chain_go.csv.gz
A summary of the GO identifier(s) for each PDB chain that has been processed.
pdb_chain_interpro.tsv.gz
pdb_chain_interpro.csv.gz
A summary of the InterPro identifier(s) for each PDB chain that has been processed.
pdb_chain_pfam.tsv.gz
pdb_chain_pfam.csv.gz
A summary of the Pfam domain identifier(s)(derived via the UniProt mapping) for each PDB chain that has been processed.
pdb_chain_cath_uniprot.tsv.gz
pdb_chain_cath_uniprot.csv.gz
A summary of the CATH identifier(s) and UniProt primary accession number(s) for each PDB chain that has been processed.
pdb_chain_scop_uniprot.tsv.gz
pdb_chain_scop_uniprot.csv.gz
A summary of the SCOP identifier(s) and UniProt primary accession number(s) for each PDB chain that has been processed.
uniprot_pdb.tsv.gz
uniprot_pdb.csv.gz
A summary of the UniProt to PDB mappings showing the UniProt accession followed by a semicolon-separated list of PDB four letter codes.