Matches to existing PDB structures
DescriptionSequence matches to proteins of known structure in the PDB are found using a FASTA sequence search. Matching sequences are aligned to the query sequence using a simple pile-up procedure. The resulting multiple sequence alignment can be annotated with various structural and sequence information, including:
The alignment and annotation is performed by the Sequences Annotated by Structure (SAS) program. A separate link to SAS at the bottom of the page will generate a new search against the current PDB. (This is useful where ProFunc was run some time ago and you would like to check the sequence against the most up-to-date version of the structural database).
View optionsThere are three options in the "View" box.
Structures. This option opens a new page listing the matched seuqneces. You can then use the check boxes to select which structures you want to view superposed in RasMol. Each structure can be coloured a different colour, or all structures can be coloured according to the current SAS annotation.
FASTA output. This opens a new window containing the original FASTA output on which the SAS alignment was based. To download the alignment, shift-click on the link instead.
All annotations on target sequence. This opens a window containing the secondary structure diagram of the query sequence with all the structural annotations from all the other sequences mapped onto it. This can highlight the functionally important regions in the query structure based on the sequence and structural information from the matched sequences. A PostScript version of the diagram is available for viewing or download.
Alignment optionsThe annotations of the sequences can be modified using any of the three options in the "Modify alignment" box. Click on the "APPLY" icon to apply the changes you have selected.
Annotate by. Here you can determine how the residues in the sequences are coloured. There are 9 options, as listed above, the default being to colour by Residue type. A key at the bottom of the alignment describes whichever is the current colour scheme.
Number. This option individually numbers each sequence using the residue numbering in the source PDB file. This is useful for identifying specific residues and locating them in the 3D structure.
Show secondary structure. The "Yes" option will add an extra line of secondary structure annotation below each sequence. Here helices are indicated by a string of "H"s, strands by "E"s and random coil by hyphens.
The alignmentThe sequence alignment is shown in blocks of 65 residue positions each. Each sequence's PDB code and chain are shown on the left. Clicking on the PDB code will take you to that protein structure's PDBsum page.
To the right of each sequence is the protein name. In some cases you will also see a number, n, shown as "×n", to the right of the sequence. This indicates that the sequence represents n other identical sequences in the PDB. The representative sequence will also have an asterisk by its PDB code. Its annotation will come from all the sequences that it represents (for example all contacts to ligands in any of the other structures will be mapped onto the representative sequence to give a consensus annotation). A full list of the duplicate sequences is given at the bottom of the page.
Below the alignment are given the alignment statistics as reported by the FASTA search. Here are listed the duplicate sequence, if any. A check box next to each PDB code allows you to include/exclude any of the sequences from the alignment. Click on the "SELECT" box to effect your selection changes. Any excluded sequences will be removed from the alignment and listed separately at the bottom of the page.
The selection short-cuts above the "SELECT" box can either instantly
select/deselect all sequences, or invert the current selection. These
short-cuts are useful where there are very many sequences in the