![]() |
Table of Contents
LWPThe standard method for accessing Hypertext Transfer Protocol (HTTP) 1) 2) based services using Perl 3) is using libwww-perl (LWP) 4) (CPAN). Since REST Web Services are based on HTTP, LWP can be used to access any REST service. InstallationThe libwww-perl modules are part of the standard Perl distribution and should be installed by default. However if the modules are missing or it needs to be re-installed or updated:
For an overview of the how to install a module see Installing Perl Modules. HTTP GETHTTP GET is simplest of the HTTP requests and is used to get a document given a URL. So to use a GET the URL of the required Web Service resource is needed. Depending on the service this may be a static URL or more commonly the URL has to be constructed based on the parameters for the request. The following examples illustrate the process using the dbfetch, WSDbfetch (REST) and SRS services. dbfetchThe dbfetch service (http://www.ebi.ac.uk/Tools/dbfetch/dbfetch) provides a generic interface to retrieve data entries given an identifier (Id or accession) from a wide range of biological databases available at EMBL-EBI. Two styles of URL can be used to access dbfetch:
The dbfetch documentation (http://www.ebi.ac.uk/Tools/dbfetch/dbfetch) details the valid values for the database name ({DB}), data format ({FORMAT}) and data style ({STYLE}). The identifier list ({IDS}) is a comma separated list of entry identifiers. The identifiers can be either Ids, names or accessions. For example to retrieve the rat and mouse WAP proteins from UniProtKB:
LWP offers a number of methods for retrieving documents from a URL, for example using the document style URL above:
Using the more powerful Exercise 1: RESTful dbfetchIn the sample project a dbfetch client using LWP::UserAgent is provided (examples/REST/LWP/dbfetch_lwp_useragent_get.pl). Starting from this client use dbfetch to get the EMBL-Bank entries with accessions: M28668, M60493 and M76128. See the dbfetch and WSDbfetch REST documentation for details of the valid values for the parameters and the structure of the request URL. Sample solution: solutions/REST/LWP/q1_dbfetch_lwp.pl SRSWhile dbfetch provides a useful interface for entry retrieval, it is not a general query system. One option for performing queries is SRS (http://srs.ebi.ac.uk/). SRS offers a URL based interface which can be used to perform complex multi-database queries in a single request. The simplest form of an SRS URL retrieves a list of entry identifiers in DB:ID format, for example to retrieve the entries in UniProtKB which contain the term “auxin”: http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?[uniprot-all:azurin*] By default only the first 30 entries are returned. To get the number of entries matching the query the “cResult” page can be used: http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+cResult+[uniprot-all:azurin*] This returns the number of entries matched by the query: 170 entries for [uniprot-all:azurin*]
Given the number of entries found the results can retrieved in chunks by using http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-bv+1+-lv+30+[uniprot-all:azurin*] http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-bv+31+-lv+30+[uniprot-all:azurin*]
As well as getting the identifiers of the entries matching the query the complete entry can be obtained using http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-view+SeqSimpleView+[uniprot-all:azurin*] or to get fasta formatted sequence the “FastaSeqs” view: http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-view+FastaSeqs+[uniprot-all:azurin*] For more information about SRS URLs see the Linking to SRS guide. Exercise 2: REST and SRSIn the sample project a dbfetch client using LWP::UserAgent is provided (examples/REST/LWP/dbfetch_lwp_useragent_get.pl). Starting from this client use SRS to find the number of entries in the EMBL Coding Sequences database (EMBLCDS) which contain the gene name CFTR. See the Linking to SRS guide for details of how to construct a URL for SRS and the SRS Query Language Quick Guide for details of how to construct the query string. Hint: use the SRS web interface (http://srs.ebi.ac.uk/) to perform the query and copy the query string created into the URL you construct. Sample solution: solutions/REST/LWP/q2_srs_lwp.pl HTTP POSTWhile HTTP GET is great for retrieving information there are restrictions on the amount of data that can be sent using GET. Thus for transferring large amounts of data or complex parameters an alternative method has to be used. Since HTTP POST sends the data independently of the URL, POST is used in circumstances where complex or large data needs to be transferred. dbfetchThe dbfetch service accepts HTTP POST requests as well as HTTP GET requests, this is useful when using list of identifiers. Unlike HTTP GET, a POST request can only be performed using LWP::UserAgent (examples/REST/LWP/dbfetch_lwp_useragent_post.pl): # Load LWP use LWP::UserAgent; # Parameters for the request my $db = 'uniprotkb'; # Database: UniProtKB my $id = 'WAP_RAT,WAP_MOUSE'; # Entry identifiers my $format = 'uniprot'; # Result format my $style = 'raw'; # Result style # Create a user agent my $ua = LWP::UserAgent->new(); # URL for service (endpoint) my $url = 'http://www.ebi.ac.uk/Tools/dbfetch/dbfetch'; # Populate POST data fields (key => value pairs) my (%post_data) = ( 'db' => $db, 'id' => $id, 'format' => $format, 'style' => $style ); # Perform the request my $response = $ua->post($url, \%post_data); # Check for HTTP error codes die 'http status: ' . $response->code . ' ' . $response->message unless ($response->is_success); # Output the entry print $response->content(); ProxiesIn some environments it is necessary to configure an HTTP proxy before a client can connect to external services. LWP supports the configuration of proxies through:
User-Agent
HTTP clients usually provide information about what they are, allowing services to handle specific clients differently if necessary, and giving service providers some information about how their services are being used. By default LWP sets the HTTP User-Agent header (see RFC2616 section 14.43) to something like # Modify the user-agent to add a more specific prefix (see RFC2616 section 14.43) $ua->agent("Example-Client/1.0 ($OSNAME) " . $ua->agent()); Note: while the HTTP specification does not define a limit on the size of HTTP headers, web server implementations often do limit the maximum size of an HTTP header to 8KB or 16KB. If the server limit for an HTTP header is exceeded a “400 Bad Request” will be returned by the server. Sample ClientsMost REST Web Services at EMBL-EBI have sample clients which provide command-line access to the service and example code. For Perl most of the clients are based on LWP, for example:
1)
RFC1945 - Hypertext Transfer Protocol – HTTP/1.0 - http://www.faqs.org/rfcs/rfc1945.html
2)
RFC2616 - Hypertext Transfer Protocol – HTTP/1.1 - http://www.faqs.org/rfcs/rfc2616.html
3)
Perl - http://www.perl.com/
4)
libwww-perl - http://gitorious.org/libwww-perl
![]() |