File format for seqdata.dat file

This file lists protein sequence annotations (UniProt id and accession number, E.C. number, and Gene Ontology classification).

Note that the data include superseded PDB entries.

Each annotation is on a separate line. The line shows:

  • PDB code and chain identifier.
  • Annotation type:
    • SWS_CODE = Swissprot (ie UniProt) code
    • SWS_ID = Swissprot accession code
    • EC_NO = Enzyme Classification number
    • GO_ID_C = Gene Ontology identifier (Cellular component)
    • GO_ID_F = Gene Ontology identifier (Molecular Function)
    • GO_ID_P = Gene Ontology identifier (Biological process)
  • Source of annotation:
    • MSD = Macromolecular Structure Database (now known as PDBe)
    • PDB = header records in PDB file
    • GO = Gene Ontology Consortium (GO)
  • Other chains in this PDB entry (as a comma-separated list) sharing the same annotation

Example

139lA EC_NO:    3.2.1.17     MSD
139lA SWS_CODE: LYS_BPT4     MSD
139lA SWS_ID:   P00720       MSD
13gsA EC_NO:    2.5.1.18     PDB   B
13pkA EC_NO:    2.7.2.3      MSD   B,C,D
13pkA GO_ID_F:  0004618      GO    B,C,D
13pkA GO_ID_P:  0006096      GO    B,C,D
13pkA SWS_CODE: PGKC_TRYBB   MSD   B,C,D
13pkA SWS_ID:   P07378       MSD   B,C,D
140lA EC_NO:    3.2.1.17     MSD
140lA SWS_CODE: LYS_BPT4     MSD
140lA SWS_ID:   P00720       MSD
141lA EC_NO:    3.2.1.17     MSD
141lA SWS_CODE: LYS_BPT4     MSD
141lA SWS_ID:   P00720       MSD