Quick access

SIFTS data can be accessed through the SIFTS module of the PDBe REST API:

http://www.ebi.ac.uk/pdbe/api/doc/sifts.html

Alternatively, the European Bioinformatics Institute FTP site provides access to data from SIFTS.

Individual PDB entry data can either be found in a path like this:

ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/xml/1xyz.xml.gz - where 1xyz is the PDB code

or in a path like this:

ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/split_xml/xy/1xyz.xml.gz - where 'xy' are the second and third characters of the PDB code and 1xyz is the PDB code itself.

Residue-level cross-reference data from the PDBe database has been made available in the xml_remediated directory. This directory contains a file for each PDB entry as before and in the same format. Note that the tab delimited ASCII text format files and PDBe view web pages correspond to the xml_remediated files.

The residue-level cross reference legacy data is available in XML format and is located in the XML directory. This directory contains a file for each PDB entry.

The SIFTS data for chain-level mapping for all PDB chains is available in both a tab-delimited ASCII format located in the "flatfiles/tsv" directory and a comma-separated value format (CSV) located in the "flatfiles/csv" directory

The following summary files are available:

pdb_chain_uniprot.tsv.gz
pdb_chain_uniprot.csv.gz
A summary of the PDBe to UniProt residue level mapping, showing the start and end residues of the mapping using SEQRES, PDB sequence and UniProt numbering.
pdb_chain_taxonomy.tsv.gz
pdb_chain_taxonomy.csv.gz
A summary of the NCBI tax_id(s),scientific_name(s) and chain type for each PDB chain that has been processed.
pdb_pubmed.tsv.gz
pdb_pubmed.csv.gz
A summary of the Pubmed id(s) associated with each PDB entry, together with an ordinal number.
pdb_chain_enzyme.tsv.gz
pdb_chain_enzyme.csv.gz
A summary of the EC number(s) for each PDB chain that has been processed.
pdb_chain_go.tsv.gz
pdb_chain_go.csv.gz
A summary of the GO identifier(s) for each PDB chain that has been processed.
pdb_chain_interpro.tsv.gz
pdb_chain_interpro.csv.gz
A summary of the InterPro identifier(s) for each PDB chain that has been processed.
pdb_chain_pfam.tsv.gz
pdb_chain_pfam.csv.gz
A summary of the Pfam domain identifier(s)(derived via the UniProt mapping) for each PDB chain that has been processed.
pdb_chain_cath_uniprot.tsv.gz
pdb_chain_cath_uniprot.csv.gz
A summary of the CATH identifier(s) and UniProt primary accession number(s) for each PDB chain that has been processed.
pdb_chain_scop_uniprot.tsv.gz
pdb_chain_scop_uniprot.csv.gz
A summary of the SCOP identifier(s) and UniProt primary accession number(s) for each PDB chain that has been processed.
pdb_chain_scop2_uniprot.tsv.gz
pdb_chain_scop2_uniprot.csv.gz
A summary of the SCOP2 identifier(s) and UniProt primary accession number(s) for each PDB chain that has been processed.
pdb_chain_scop2b_sf_uniprot.tsv.gz
pdb_chain_scop2b_sf_uniprot.csv.gz
A summary of the SCOP2B identifier(s) and UniProt primary accession number(s) for each PDB chain that has been processed. SCOP2B is expansion of SCOP2 domain annotations (at superfamily level) to every PDB with same UniProt accession having atleast 80% SCOP2 domain coverage.
uniprot_pdb.tsv.gz
uniprot_pdb.csv.gz
A summary of the UniProt to PDB mappings showing the UniProt accession followed by a semicolon-separated list of PDB four letter codes.
pdb_chain_ensembl.tsv.gz
pdb_chain_ensembl.csv.gz
A summary of the Ensembl identifier(s) and UniProt primary accession number(s) for each PDB chain that has been processed.
pdb_chain_hmmer.tsv.gz
pdb_chain_hmmer.csv.gz
A summary of the Pfam domain identifier(s) (automatically calculated using HMMER) for each PDB chain that has been processed.
uniprot_segments_observed.tsv.gz
uniprot_segments_observed.csv.gz
A summary of the UniProt to PDBe residue level mapping (observed residues only), showing the start and end residues of the mapping using SEQRES, PDB sequence and UniProt numbering.