spacer
spacer

LWP

The standard method for accessing Hypertext Transfer Protocol (HTTP) 1) 2) based services using Perl 3) is using libwww-perl (LWP) 4) (CPAN). Since REST Web Services are based on HTTP, LWP can be used to access any REST service.

Installation

The libwww-perl modules are part of the standard Perl distribution and should be installed by default. However if the modules are missing or it needs to be re-installed or updated:

  1. Using an operating system package manager install/update the appropriate package. For example:
    1. On Debain Linux or Debain based Linux distributions (e.g. Bio-Linux, Linux Mint or Ubuntu), update/install the libwww-perl package:
      sudo apt-get install libwww-perl
    2. On RedHat based Linux distributions (e.g. CentOS, Fedora, Red Hat Enterprise Linux, and Scientific Linux) install/update the perl-libwww-perl package:
      yum install perl-libwww-perl
    3. If using MacPorts on OS X install/update the p5-libwww-perl package.
    4. If using Fink on OS X install/update the package corresponding to your Perl version, for example libwww-pm5162 for Perl 5.16.2.
  2. Using CPAN to install/update Bundle::LWP (CPAN). For example:
    1. Using the cpan client:
      cpan -i Bundle::LWP
    2. Using the CPAN shell:
      $ perl -MCPAN -e shell
      > install Bundle::LWP
  3. Downloading the LWP distribution (see CPAN) and installing manually

For an overview of the how to install a module see Installing Perl Modules.

HTTP GET

HTTP GET is simplest of the HTTP requests and is used to get a document given a URL. So to use a GET the URL of the required Web Service resource is needed. Depending on the service this may be a static URL or more commonly the URL has to be constructed based on the parameters for the request. The following examples illustrate the process using the dbfetch and WSDbfetch (REST) services.

dbfetch

The dbfetch service (http://www.ebi.ac.uk/Tools/dbfetch/dbfetch) provides a generic interface to retrieve data entries given an identifier (Id or accession) from a wide range of biological databases available at EMBL-EBI. Two styles of URL can be used to access dbfetch:

  1. Parametrised URL:
    http://www.ebi.ac.uk/Tools/dbfetch/dbfetch?db={DB}&id={IDS}&format={FORMAT}&style={STYLE}
  2. Document style URL:
    http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/{DB}/{IDS}/{FORMAT}

The dbfetch documentation (http://www.ebi.ac.uk/Tools/dbfetch/dbfetch) details the valid values for the database name ({DB}), data format ({FORMAT}) and data style ({STYLE}). The identifier list ({IDS}) is a comma separated list of entry identifiers. The identifiers can be either Ids, names or accessions. For example to retrieve the rat and mouse WAP proteins from UniProtKB:

  1. Parametrised URL:
    http://www.ebi.ac.uk/Tools/dbfetch/dbfetch?db=uniprotkb&id=WAP_RAT,WAP_HUMAN&format=uniprot&style=raw
  2. Document style URL:
    http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/uniprotkb/WAP_RAT,WAP_MOUSE/uniprot

LWP offers a number of methods for retrieving documents from a URL, for example using the document style URL above:

  1. LWP::Simple→get($url) (examples/REST/LWP/dbfetch_lwp_simple_get.pl)
    # Load LWP
    use LWP::Simple;
     
    # Parameters for the request
    my $db = 'uniprotkb'; # Database: UniProtKB
    my $id = 'WAP_RAT,WAP_MOUSE'; # Entry identifiers
    my $format = 'uniprot'; # Result format
     
    # Construct document style URL for entry
    my $baseUrl = 'http://www.ebi.ac.uk/Tools/dbfetch/dbfetch';
    my $url = "$baseUrl/$db/$id";
    $url .= "/$format" if(defined($format));
     
    # Get the document from the URL
    my $content = get($url);
     
    # Check for fetch failure
    die 'Unable to get document from: ' . $url unless ($content); 
     
    # Output the entries
    print $content;
  2. LWP::UserAgent→get($url) (examples/REST/LWP/dbfetch_lwp_useragent_get.pl)
    # Load LWP
    use LWP::UserAgent;
     
    # Parameters for the request
    my $db = 'uniprotkb'; # Database: UniProtKB
    my $id = 'WAP_RAT,WAP_MOUSE'; # Entry identifiers
    my $format = 'uniprot'; # Result format
     
    # Create a user agent
    my $ua = LWP::UserAgent->new();
     
    # Construct document style URL for entry
    my $baseUrl = 'http://www.ebi.ac.uk/Tools/dbfetch/dbfetch';
    my $url = "$baseUrl/$db/$id";
    $url .= "/$format" if(defined($format));
     
    # Perform the request
    my $response = $ua->get($url);
     
    # Check for HTTP error codes
    die 'http status: ' . $response->code . ' ' . $response->message unless ($response->is_success); 
     
    # Output the entry
    print $response->content();

Using the more powerful LWP::UserAgent methods is recommended since these give full access to all the information in the response, including the status code, header and the document content. This makes it simpler to deal with errors originating from the Web Service and eases debugging the client.

Exercise 1: RESTful dbfetch

In the sample project a dbfetch client using LWP::UserAgent is provided (examples/REST/LWP/dbfetch_lwp_useragent_get.pl). Starting from this client use dbfetch to get the EMBL-Bank entries with accessions: M28668, M60493 and M76128.

See the dbfetch and WSDbfetch REST documentation for details of the valid values for the parameters and the structure of the request URL.

Sample solution: solutions/REST/LWP/q1_dbfetch_lwp.pl

HTTP POST

While HTTP GET is great for retrieving information there are restrictions on the amount of data that can be sent using GET. Thus for transferring large amounts of data or complex parameters an alternative method has to be used. Since HTTP POST sends the data independently of the URL, POST is used in circumstances where complex or large data needs to be transferred.

dbfetch

The dbfetch service accepts HTTP POST requests as well as HTTP GET requests, this is useful when using list of identifiers.

Unlike HTTP GET, a POST request can only be performed using LWP::UserAgent (examples/REST/LWP/dbfetch_lwp_useragent_post.pl):

# Load LWP
use LWP::UserAgent;
 
# Parameters for the request
my $db = 'uniprotkb'; # Database: UniProtKB
my $id = 'WAP_RAT,WAP_MOUSE'; # Entry identifiers
my $format = 'uniprot'; # Result format
my $style = 'raw'; # Result style
 
# Create a user agent
my $ua = LWP::UserAgent->new();
 
# URL for service (endpoint)
my $url = 'http://www.ebi.ac.uk/Tools/dbfetch/dbfetch';
 
# Populate POST data fields (key => value pairs)
my (%post_data) = (
		   'db' => $db,
		   'id' => $id,
		   'format' => $format,
		   'style' => $style
		   );
 
# Perform the request
my $response = $ua->post($url, \%post_data);
 
# Check for HTTP error codes
die 'http status: ' . $response->code . ' ' . $response->message unless ($response->is_success); 
 
# Output the entry
print $response->content();

Proxies

In some environments it is necessary to configure an HTTP proxy before a client can connect to external services. LWP supports the configuration of proxies through:

  • Environment variables (see Perl and Proxies): http_proxy, ftp_proxy, no_proxy, etc.
  • Specification when creating the user agent:
    my $ua = LWP::UserAgent->new(
        env_proxy => 1, # Read proxy configuration from environment.
    );
  • User agent methods:
    • Set protocols to use a proxy:
      $ua->proxy(['http', 'ftp'], 'http://proxy.example.org:8000/');
    • Domains for which proxy should not be used:
      $ua->no_proxy('localhost', 'example.org');
    • Load proxy details from environment:
      $ua->env_proxy();

User-Agent

HTTP clients usually provide information about what they are, allowing services to handle specific clients differently if necessary, and giving service providers some information about how their services are being used. By default LWP sets the HTTP User-Agent header (see RFC2616 section 14.43) to something like libwww-perl/5.831, where the version number (5.831) is the version of LWP. If additional identification of the client is required a more specific product token (see RFC2616 section 3.8) should be added to the beginning of the User-Agent string:

# Modify the user-agent to add a more specific prefix (see RFC2616 section 14.43)
$ua->agent("Example-Client/1.0 ($OSNAME) " . $ua->agent());

Note: while the HTTP specification does not define a limit on the size of HTTP headers, web server implementations often do limit the maximum size of an HTTP header to 8KB or 16KB. If the server limit for an HTTP header is exceeded a “400 Bad Request” will be returned by the server.

Sample Clients

Most REST Web Services at EMBL-EBI have sample clients which provide command-line access to the service and example code. For Perl most of the clients are based on LWP, for example:

Service Sample client
ClustalW2 (REST) clustalw2_lwp.pl
InterProScan (REST) iprscan_lwp.pl
NCBI BLAST (REST) ncbiblast_lwp.pl
WSDbfetch (REST) dbfetch_lwp.pl


Up Perl Contents Contents
1) RFC1945 - Hypertext Transfer Protocol – HTTP/1.0 - http://www.faqs.org/rfcs/rfc1945.html
2) RFC2616 - Hypertext Transfer Protocol – HTTP/1.1 - http://www.faqs.org/rfcs/rfc2616.html
 
tutorials/06_programming/perl/rest/lwp.txt · Last modified: 2014/03/27 11:29 by hpm
spacer
spacer