spacer

Tempura webservice documentation

A test version of a perl client for prgrammatic access to tempura has been developed and is currently being tested. Once in a beta-release state it will be made available for download from the Tempura homepage. This documentation will therefore be subject to change and reflects the current state of the service.

The Tempura server has been selected for WebService development as part of [project name] and to act as a prototype for the development of WebService access to the ProFunc server. Web Services are integrative technologies and to ensure software from various sources work well together, this technology is built on open standards such as:
  • Simple Object Access Protocol (SOAP), a messaging protocol for transporting information.
  • Web Services Description Language (WSDL), a standard method of describing Web Services and their capabilities.
  • Representational state transfer (REST), a software architecture style.
The choice of which webservice format to use depends on the nature of the application and how it interacts with the service, but there are a number of advantages and disadvantages to each. In REST each unique URL is a representation of an object, which are then manipulated using HTTP requests (GET, PUT, DELETE or POST). The main advantages of REST web services are:
  • Lightweight
  • Flexible
  • Uses common protocol (HTTP)

In contrast SOAP based web services:
  • Greater overhead than REST
  • Constained to exchange of XML documents
  • Requirement for additional software (SOAP stack)
  • Controlled interface (WSDL) which forms a contract
  • Strict types and type mappings

For the Tempura WebService it was decided that based on the existing service a REST approach would be more appropriate. The next major consideration to make was whether or not to run Tempura as a synchronous or asynchronous WebService. Synchronous services are equivalent to a user running a command on a console or terminal and waiting for it to complete. However, this requires the client to be constantly connected to the server and is only really suitable for database searches that can be executed in up to 5 minutes. Asynchronous services are those where the user submits a job and receives a job identifier in return (the same as running a UNIX command in the background and obtaining a job id). The user can later query the status with the job identifier and retrieve any results. This submission mode is recommended when submitting batch jobs or large database searches. One advantage of this mode is that it is impervious to system or network failure as the results of jobs are stored for a set time period after the job has completed. As the tempura service works using a job queuing system, the asynchronous job submission model is the most appropriate.
The final aspect to consider was the limitation of requests. As a webservice is not restricted by time or number of submissions there is the possibility of a single user tying up the service indefinitely or crashing it with an unreasonable request (e.g. submission of every structure in the PDB). The EBI policy for most webservices offered are that no more than 25 jobs should be submitted at any one time or by a single user, in the case of Tempura we have limited any negative effects by preventing any jobs being submitted if the Tempura job queue has reached 30 jobs waiting to run.

Details of the Tempura WebService

The single script can be used for all elements of the job submission, checking and results retrieval process and is documented in the help option for the script (--help).
IMPORTANT: The Tempura webservice perl client assumes the presence of the LWP:UserAgent perl module in the $PATH in order for the script to run at all.

There are three request types for the Tempura service:
  1. Submit - provide the service with a protein structure (PDB code or uploaded file) and a list of amino acids of interest for the template generation process.
  2. Status - check the progress of a submitted job
  3. Get Results - retrieve any results (or error reports) for a given job number

Each type of call has a particular expected syntax which is described in detail, with examples, in the help option of the script. The most important element of formatting is the residue list for the template generation process. Here the residues are expected to be listed in the following way:
ResidueType_ResidueNumber_Chain,ResidueType_ResidueNumber_Chain, ...
For example if you were interested in Tyrosine 7 of chain A, Threonine 15 of chain A and Aspartate 88 of chain B then the expected format is:
"TYR_7_A,THR_15_A,ASP_88_B"
Where there is no chain identifier, the letter is left out but the preceding underscore is still required:
"TYR_7_,THR_15_,ASP_88_"

Submission

At the present time, the residue list is used to narrow down from the automatically generated list of templates, a list of those containing at least one of the selected residues. There exists an option in the webservice to force the template generation process to use the selected residues to generate a single template of only those residues selected for scanning. This option is not yet activated but will be ready for the next version of the Tempura webservice.
On submission of a job using the Tempura WebService, the user gets back a job identifier (e.g. "tempura8496_X-SNIP-X_1150010911_X-SNIP-X_LEU_24_A,GLU_23_A,TRP_38_A_X-SNIP-X_NO") which can be used to check the status of the job and get the results back. The job identifier contains information on the options selected, the location of the results and the residues selected. This job identifier is more complicated than it needs to be and is expected to be fixed in version 2 of the WebService.

Job Status Checking

The status checking can return one of the following possibilities:
  • DONE - job has finished
  • RUNNING - job is in the queue or has been started
  • NOT_FOUND - job cannot be found, check job number
  • ERROR - the job encountered an error, get results for more detailed error reports
  • BUSY - there are too many jobs in the queue (>30 waiting to run), resubmit at a later date
If the status produces either "DONE" or "ERROR" then it is possible to get results (or error log), otherwise there is no point, as the results do not exist or the job identified is invalid.

Job Results Retrieval

Using the "getresults" option, the job identifier is submitted to the WebService, if there are any results to return then the user will receive a plain text dump of the top matches from the Tempura server. In early versions of the WebService the results were returned as a URL to the Tempura results pages but this was deemed to be of limited use when constructing workflows. The current version of the Tempura WebService returns results in a plain text format normally used by CGI scripts as part of the Tempura server to generate its results pages. As a consequence the output is not intuitive to read so we have made the details of this format available online at http://www.ebi.ac.uk/thornton-srv/databases/tempura/tempuraWS_format.html.

The results returned can then be parsed by the user and used as input for other webservices as part of a larger workflow. At the moment there is no way to retrieve the structural superpositions generated for each hit but this is something to be added for future versions of the WebService.


Help ouput from Tempura webservice perl client


The Tempura server generate templates from a submitted structure and uses them to scan the Protein Data Bank (PDB) for local similarities.

For more information on Tempura refer to http://www.ebi.ac.uk/thornton-srv/databases/tempura/tempura_documentation.html

This script can be used for:
  1. Submitting Jobs
  2. Checking status of jobs
  3. Retrieving results of jobs

This service runs in an asynchronous manner and the results are stored for up to 1 month.
  1. Job Submission Usage:

    WStempura_client.pl --usage submit --type existingpdb --email test@ebi.ac.uk --pdbcode 1aab --residues GLY_25_,THR_34_
    OR


    WStempura_client.pl --usage submit --type upload --email test@ebi.ac.uk --infile struct.pdb --residues THR_55_A,TRP_149_B

    Returns: jobid
  2. Job Status Usage:

    WStempura_client.pl --usage status --jobid
    Returns: string indicating the status of the job
    • DONE - job has finished
    • RUNNING - job is running
    • NOT_FOUND - job cannot be found
    • ERROR - the job has encountered an error
    • BUSY - too many webservice jobs scheduled - resubmit at a later date
    Please note: We currently restrict submission of jobs to the webservice to times when the number of jobs in the tempura queue is less than 30.

  3. Get Results Usage:

    WStempura_client.pl --usage getresults --jobid
    Returns: list of top hits as plain text or error report. For details of the results format see http://www.ebi.ac.uk/thornton-srv/databases/tempura/tempuraWS_format.html


[WSTempura_client.pl Option Details]

OptionFormatValues
--usage: str : either "submit" or "status" or "getresults"
--infile [Structure File]: file : protein structure file in PDB format
OR
--pdbcode [CODE]: str id : 4 digit PDB identifier code (e.g. "5p21")
--type : str : either "upload" or "existingpdb"
--email : str : email address of user
--residues: list : comma separated list of residues to be used in the template generation process from the submitted/uploaded structure. This must be in the format "Residue name_Residue number_Chain identifier" (e.g. "SER_19_A,THR_55_A,TRP_149_B"). If no chain identifier do not put anything after the underscore (e.g. "GLY_25_,THR_34_,GLY_55_").
--forced : str : either "y" or "n" (default is "n") - forces single template using submitted residues: not recommended as significance of results cannot be relied upon.
-h, --help : : prints this help text
-j, --jobid : str : jobid that was returned

spacer
spacer