spacer

Goldman Group Projects

Hidden Markov Models (HMMs) for Modelling Protein Secondary Structure and Sequence Evolution

This page contains information regarding a project which uses HMMs to model secondary structure along protein-coding amino acid sequences. The models are used in phylogenetic analyses, and can be used for protein structure prediction.

This project is supported by the BBSRC.

Six papers have been published on this work: follow this link to see the references.




The links below are mentioned in some of these papers:
  • software (1): the PASSMLPASSML and PASSML-TM programs are available here. PASSML and PASSML-TM can analyse data using the models described in the above papers. Contact Pietro Li&ograve for further information.
  • software (2): Jeff Thorne's C program for analysing data using the models described in the above papers is available as a compressed tar file structevol.tar.Z. It includes basic instructions. Contact Jeff Thorne for further information.
  • parameter estimates (1): values of the parameters rij described in the 1998 Genetics paper are available here (38 x 38 matrix: the 38 states are ordered Hb1, ..., Hb10, Eb1, ..., Eb6, Tb1, Tb2, Cb1, He1, ..., He10, Ee1, ..., Ee6, Te1, Te2, Ce1). [See also this link]
  • parameter estimates (2): values of the parameters akij described in the 1998 Genetics paper are available for k = Hb, Eb, Tb, Cb, He, Ee, Te and Ce (each a 20 x 20 matrix: amino acids are ordered A, R, N, D, C, Q, E, G, H, I, L, K, M, F, P, S, T, W, Y, V). [See also this link]
  • data sets: 11 aligned sequences analysed in the 1998 Genetics paper are available as follows: ADPGP, GP120, P17ALL, P17B, SUSY, XYLA, ADHAN, ADHPL, GDH, GPA, PEPCK. Single-letter amino acid codes are used, plus `-' for alignment gaps and `?' used for residues deemed untrustworthy (typically due to alignment uncertainty) and treated as unknown.

spacer
spacer