ArchSchema documentation
ArchSchema is a java web start application. To run it from the
ArchSchema home page, or from links to it in other sites (eg PDBsum),
requires that you have the most recent version of the Java runtime
environment from Sun installed on your machine. Web start applications
are run via javaws.
Download and install JRE
To obtain java, use the link below:
 |
|
 |
Java SE Runtime Environment (JRE)
|
Follow the appropriate installation instructions in the documentation
on the downloads page. Note that other versions of java (eg GNU java)
may not run the program correctly.
If you wish, you can install your own, local copy of ArchSchema. You can
then perform your own searches directly. Go the ArchSchema download page to obtain the ArchSchema jar file.
Initial screen
On launching ArchSchema from scratch you will get the following screen.

Initial screen (click to enlarge)
|
 |

Enter UniProt sequence id or Pfam domain id
|
Enter the UniProt id or accession code (eg Q76RF1_HHV8 or Q76RF1,
respectively) of the protein sequence you're interested in the top
text box. Alternatively, enter the Pfam domain id in the lower text
box (see above right). Then click on the appropriate Search box.
The example below uses UniProt sequence Q76RF1.
Panel layout
If the search returns a reasonable set of results (ie not too many
domains or too many sequences), the right-hand panel will split into
two, with the graph plotted in the upper part and a ky to the plot
given in the lower part.
| |
|

|
 |
|
| |
|
|
1. Graph panel |
 |
|
|
|
3. Graph
criteria panel |
|
|
|
| |
|
 |
|
| |
|
|
2. Data panel |
| |
|
|
|
| |
|
Click to enlarge |
|
|
If the search returns too many sequences or Pfam domains, you will be
asked how you would like to filter the results (see section on
Search Selection Criteria).
1. Graph panel

Click to enlarge

Mouse operations for navigating about the graph
|
 |
The graph panel shows the plot of related Pfam domain architectures.
Each node shows a set of coloured boxes representing the sequence of
Pfam domains defining the particular architecture. Tall boxes
correspond to Pfam-A domains, whereas small boxes correspond to Pfam-B
domains. Some example nodes are shown below.
The red underlines in (c) indicate that there is structural
information in the PDB for two of the domains: one or more complete
structures for the first domain, and one or more partial or
fragmentary structures for the third.
The lines between the nodes on the plot show the relationships between the
architectures, based on the similarity of their domain compositions.
The node corresponding to the architecture of the original search
sequence is identified by a grey background.
Navigation about the plot
Navigation around the plot is as described on the left (taken from the
Graph Criteria Panel (described later).
Of these operations, the most informative is left-clicking on a node.
This provides information in the Data Panel (see below)
describing the node's constituent domains and listing the protein
sequences that have the given architecture. The data panel also
identifies which sequences, if any, have whole or partial structures
in the PDB.
Clicking the small black triangles between the panels allows you to
close/open either panel. You can also adjust the size of a panel by
left-click dragging the separator. Note that the Graph Criteria
Panel cannot be reduced beyond a certain point.
|
2. Data panel
The data panel shows the detailed information about specific nodes.
Initially the parent sequence's node is shown, as below,
together with a key to the plot and some statistics.
Below the key is a list of the Pfam domains making up the parent
sequence's architecture. This gives the Pfam identifiers and the names
and descriptions of each domain. The hyperlink on the Pfam identifier
will take you to that domain's page in the Pfam database (which
should open in your web browser).
Below this list is a table listing all the domains on the plot,
in decreasing order of occurrence. The last two columns in this table
show the number of occurrences of each domain on the plot and the
number of architectures it occurs in.
Clicking the small black triangles separating the Data panel
from the Graph panel allows you to close/open either panel. You
can also left-click-drag the border between the two panels to adjust
their relative size.
Node details
If you left-click on any architecture node in the Graph panel,
the data panel will display the following information about that node:
The first table above shows the Pfam domains making up the selected
node's architecture.
The second table lists all the UniProt sequences that have this domain
architecture. The table gives the UniProt ref (ie accession number),
which is hyperlinked to the appropriate page in UniProt, the UniProt
code, a marker indicating whether there are one or more 3D structures of
this protein in the PDB, and the protein name.
A full green tick in the PDB column indicates that there is at least
one 3D structure in the PDB of the full-length protein. A dashed green
tick indicates that, at best, the PDB contains partial structures of
the protein only. Clicking on either type of tick will take you to a
PDBsum page listing the 3D structures that are available.
Note that, if the node corresponds to a huge number of sequences only
some of these will be listed. For a full list, use the Pfam
database. Or, if you are just interested in proteins from a particular
species or only those having 3D structures, go to the Search Criteria
Panel and make the appropriate selections before generating the
plot again.
3. Graph criteria panel
 |
 |
The graph criteria panel allows you to alter some of the parameters
affecting the display of the graph.
The first thing to do may be to click on the Freeze Graph
button to freeze the graph's jiggling as it strives to optimize the
relative placement of the nodes.

The Plot options radio buttons allow you to switch on either
the UniProt sequences belonging to each Pfam architecture, or the
PDB structures.

These will then appear, attached to their parent nodes, as below:
 |
 |
 |
 |
 |
| UniProt sequences |
|
PDB structures |
|
Many satellite nodes |
The nodes in the far right example are coloured pink to indicate that
there are too many sequences to display, so only a selected number are
being shown.
Below the Plot options are two sliders. The top slider allows
you adjust the typical lengths between the nodes on the plot to get a
more compact or more distended plot.

The bottom slider is useful if you have a large graph with many
architecture nodes. It can be used to prune some of the outer nodes.

Clicking the small black triangles separating the Graph Criteria Panel
from the Graph Panel allows you to close/open either panel.
The Search tab at the top of this panel will replace it by the
Search Criteria Panel, described next.
|
Search Criteria Panel
 |
 |
The search criteria panel allows you to either initiate a new search,
or to filter the sequences returned by the current one. The latter
may be particularly useful if the initial search returns a huge number
of sequences.
The large text box lists the all domains in the current sequence. It
shows their Pfam identifiers, names and number of
sequences. Initially, all are selected. Click on any domain to select
it, and shift-click to add others to the selection.

The radio buttons above this list allow you to specify whether the
architectures to be returned should contain all the selected
domains (ie the AND option) or one or more of the
selected domains (ie the OR option).

Note that, if you select all the domains and use the AND
operator you will, of course, only get a single node on the plot:
namely, that of the parent architecture.
The Organism button lists all the species for which proteins
containing any of the above domains were found in the initial
search. Each shows the species and number of protein sequences. Select
which species to filter the sequences by.

The Sequences button allows you to limit the search to just
those proteins for which there is a complete or partial 3D structure in
the PDB.

Once you have made your selections, click on the Plot Graph
button to retrieve the data and plot the graph.

Note that, if your selection criteria happen to exclude the parent
sequence, it will still be returned by the search for reference purposes.
|
 |