ArchSchema documentation

ArchSchema is a java web start application. To run it from the ArchSchema home page, or from links to it in other sites (eg PDBsum), requires that you have the most recent version of the Java runtime environment from Sun installed on your machine. Web start applications are run via javaws.

Download and install JRE

To obtain java, use the link below:

Java SE Runtime Environment (JRE)

Follow the appropriate installation instructions in the documentation on the downloads page. Note that other versions of java (eg GNU java) may not run the program correctly.

If you wish, you can install your own, local copy of ArchSchema. You can then perform your own searches directly. Go the ArchSchema download page to obtain the ArchSchema jar file.

Initial screen

On launching ArchSchema from scratch you will get the following screen.

Initial screen (click to enlarge)

Enter UniProt sequence id or Pfam domain id

Enter the UniProt id or accession code (eg Q76RF1_HHV8 or Q76RF1, respectively) of the protein sequence you're interested in the top text box. Alternatively, enter the Pfam domain id in the lower text box (see above right). Then click on the appropriate Search box.

The example below uses UniProt sequence Q76RF1.

Panel layout

If the search returns a reasonable set of results (ie not too many domains or too many sequences), the right-hand panel will split into two, with the graph plotted in the upper part and a ky to the plot given in the lower part.

  1. Graph panel
3. Graph criteria  
  2. Data panel
    Click to enlarge    

If the search returns too many sequences or Pfam domains, you will be asked how you would like to filter the results (see section on Search Selection Criteria).

1. Graph panel

Click to enlarge

Mouse operations for navigating about the graph

The graph panel shows the plot of related Pfam domain architectures.

Each node shows a set of coloured boxes representing the sequence of Pfam domains defining the particular architecture. Tall boxes correspond to Pfam-A domains, whereas small boxes correspond to Pfam-B domains. Some example nodes are shown below.

a.   b.   c.
The red underlines in (c) indicate that there is structural information in the PDB for two of the domains: one or more complete structures for the first domain, and one or more partial or fragmentary structures for the third.

The lines between the nodes on the plot show the relationships between the architectures, based on the similarity of their domain compositions.

The node corresponding to the architecture of the original search sequence is identified by a grey background.

Navigation about the plot

Navigation around the plot is as described on the left (taken from the Graph Criteria Panel (described later).

Of these operations, the most informative is left-clicking on a node. This provides information in the Data Panel (see below) describing the node's constituent domains and listing the protein sequences that have the given architecture. The data panel also identifies which sequences, if any, have whole or partial structures in the PDB.

Clicking the small black triangles between the panels allows you to close/open either panel. You can also adjust the size of a panel by left-click dragging the separator. Note that the Graph Criteria Panel cannot be reduced beyond a certain point.

2. Data panel

The data panel shows the detailed information about specific nodes.

Initially the parent sequence's node is shown, as below, together with a key to the plot and some statistics.

Below the key is a list of the Pfam domains making up the parent sequence's architecture. This gives the Pfam identifiers and the names and descriptions of each domain. The hyperlink on the Pfam identifier will take you to that domain's page in the Pfam database (which should open in your web browser).

Below this list is a table listing all the domains on the plot, in decreasing order of occurrence. The last two columns in this table show the number of occurrences of each domain on the plot and the number of architectures it occurs in.

Clicking the small black triangles separating the Data panel from the Graph panel allows you to close/open either panel. You can also left-click-drag the border between the two panels to adjust their relative size.

Node details

If you left-click on any architecture node in the Graph panel, the data panel will display the following information about that node:

The first table above shows the Pfam domains making up the selected node's architecture.

The second table lists all the UniProt sequences that have this domain architecture. The table gives the UniProt ref (ie accession number), which is hyperlinked to the appropriate page in UniProt, the UniProt code, a marker indicating whether there are one or more 3D structures of this protein in the PDB, and the protein name.

A full green tick in the PDB column indicates that there is at least one 3D structure in the PDB of the full-length protein. A dashed green tick indicates that, at best, the PDB contains partial structures of the protein only. Clicking on either type of tick will take you to a PDBsum page listing the 3D structures that are available.

Note that, if the node corresponds to a huge number of sequences only some of these will be listed. For a full list, use the Pfam database. Or, if you are just interested in proteins from a particular species or only those having 3D structures, go to the Search Criteria Panel and make the appropriate selections before generating the plot again.

3. Graph criteria panel

The graph criteria panel allows you to alter some of the parameters affecting the display of the graph.

The first thing to do may be to click on the Freeze Graph button to freeze the graph's jiggling as it strives to optimize the relative placement of the nodes.

The Plot options radio buttons allow you to switch on either the UniProt sequences belonging to each Pfam architecture, or the PDB structures.

These will then appear, attached to their parent nodes, as below:

UniProt sequences   PDB structures   Many satellite nodes

The nodes in the far right example are coloured pink to indicate that there are too many sequences to display, so only a selected number are being shown.

Below the Plot options are two sliders. The top slider allows you adjust the typical lengths between the nodes on the plot to get a more compact or more distended plot.

The bottom slider is useful if you have a large graph with many architecture nodes. It can be used to prune some of the outer nodes.

Clicking the small black triangles separating the Graph Criteria Panel from the Graph Panel allows you to close/open either panel.

The Search tab at the top of this panel will replace it by the Search Criteria Panel, described next.

Search Criteria Panel

The search criteria panel allows you to either initiate a new search, or to filter the sequences returned by the current one. The latter may be particularly useful if the initial search returns a huge number of sequences.

The large text box lists the all domains in the current sequence. It shows their Pfam identifiers, names and number of sequences. Initially, all are selected. Click on any domain to select it, and shift-click to add others to the selection.

The radio buttons above this list allow you to specify whether the architectures to be returned should contain all the selected domains (ie the AND option) or one or more of the selected domains (ie the OR option).

Note that, if you select all the domains and use the AND operator you will, of course, only get a single node on the plot: namely, that of the parent architecture.

The Organism button lists all the species for which proteins containing any of the above domains were found in the initial search. Each shows the species and number of protein sequences. Select which species to filter the sequences by.

The Sequences button allows you to limit the search to just those proteins for which there is a complete or partial 3D structure in the PDB.

Once you have made your selections, click on the Plot Graph button to retrieve the data and plot the graph.

Note that, if your selection criteria happen to exclude the parent sequence, it will still be returned by the search for reference purposes.