![]() |
Table of Contents
urllib/urllib2The standard method for accessing Hypertext Transfer Protocol (HTTP) 1) 2) based services using Python 3) is using urllib 4) or urllib2 5). Since REST Web Services are based on HTTP, urllib and urllib2 can be used to access any REST service. InstallationThe urllib and urllib2 modules are part of the standard Python distribution and should be installed as part of the Python installation. HTTP GETHTTP GET is simplest of the HTTP requests and is used to get a document given a URL. So to use a GET the URL of the required Web Service resource is needed. Depending on the service this may be a static URL or more commonly the URL has to be constructed based on the parameters for the request. The following examples illustrate the process. WSDbfetch (REST)Using the WSDbfetch (REST) service (dbfetch_urllib.py): # Defaults dbName = 'uniprotkb' entryId = 'wap_rat' format = None # Construct URL baseUrl = 'http://www.ebi.ac.uk/Tools/dbfetch/dbfetch' url = baseUrl + '/' + dbName + '/' + entryId if format != None: url += '/' + format # Get the entry fh = urllib2.urlopen(url) result = fh.read() fh.close() # Print the entry print result, HTTP POSTWhile HTTP GET is great for retrieving information there are restrictions on the amount of data that can be sent using GET. Thus for transferring large amounts of data or complex parameters an alternative method has to be used. Since HTTP POST sends the data independently of the URL, POST is used in circumstances where complex or large data needs to be transferred. WU-BLAST (REST)WU-BLAST (REST) requires a POST to be used to submit the parameters to be used for the search (wublast_urllib.py): # Base URL for service baseUrl = 'http://www.ebi.ac.uk/Tools/services/rest/wublast' # Query sequence seq = """>Q8E5Q5_STRA3 MKLSKRYRFWQKVIKALGVLALIATLVLVVYLYKLGILNDSNELKDLVHKYEFWGPMIFI VAQIVQIVFPVIPGGVTTVAGFLIFGPTLGFIYNYIGIIIGSVILFWLVKFYGRKFVLLF MDQKTFDKYESKLETSGYEKFFIFCMASPISPADIMVMITGLSNMSIKRFVTIIMITKPI SIIGYSYLWIYGGDILKNFLN""" # Structure containing parameters params = { 'email':'email@example.org', 'program':'blastp', 'database':'uniprotkb_swissprot', 'stype':'protein', 'sequence':seq } # Submit job submitUrl = baseUrl + '/run' postData = urllib.urlencode(params) # Errors are indicated by HTTP status codes. try: fh = urllib2.urlopen(submitUrl, postData) jobId = fh.read() fh.close() except urllib2.HTTPError, ex: # Trap exception and output the document to get error message. print >>sys.stderr, ex.read() raise # Print job identifier print jobId HTTP Status MessagesServices may use many different methods for reporting errors to the client. On common method is to use HTTP status codes to indicate the error and a custom status message to describe the error. Unfortunately urllib2 overrides HTTP status messages and replaces them with standardised messages derived from the HTTP specification. One possible workaround is to use the error document received to report the message: # Errors are indicated by HTTP status codes. try: # Make the request. fh = urllib2.urlopen(submitUrl, postData) jobId = fh.read() fh.close() except urllib2.HTTPError, ex: # Trap exception and output the document to get error message. print >>sys.stderr, ex.read() # Re-throw exception to get stack trace. raise ProxiesGenerally urllib and urllib2 will automatically configure the required proxy settings from the system settings:
If required these system settings can be overridden, see the Python documentation for details: User-AgentHTTP clients usually provide information about what they are, this allows services to handle specific clients differently if necessary, and gives service providers information about how their services are being used. By default the HTTP User-Agent header (see RFC2616 section 14.43) is set to something like:
If additional identification of the client is required the a more specific product token (see RFC2616 section 3.8) should be added to the beginning of the User-Agent string. Note: while the HTTP specification does not define a limit on the size of HTTP headers, web server implementations often do limit the maximum size of an HTTP header to 8KB or 16KB. If the server limit for an HTTP header is exceeded a “400 Bad Request” will be returned by the server. urllibFor urllib the user agent can be set for all requests by modifying the URLopener used to create the connections. For example: # Modify the user-agent to add a more specific prefix (see RFC2616 section 14.43) import urllib class AppURLopener(urllib.FancyURLopener): version = 'Example-Client/1.0 Python-urllib/%s' % urllib.__version__ urllib._urlopener = AppURLopener() urllib2For urllib2 the user agent has to be specified in a User-Agent header for each request, for example: # Modify the user-agent to add a more specific prefix (see RFC2616 section 14.43) import urllib2, sys user_agent = 'Example-Client/1.0 Python-urllib/%s' % sys.version[:3] http_headers = { 'User-Agent' : user_agent } req = urllib2.Request(url, None, http_headers) Sample ClientsMost REST Web Services at EMBL-EBI have sample clients which provide command-line access to the service and example code. For Python some of the clients are based on urllib/urllib2.
Further Reading1)
RFC1945 - Hypertext Transfer Protocol – HTTP/1.0 - http://www.faqs.org/rfcs/rfc1945.html
2)
RFC2616 - Hypertext Transfer Protocol – HTTP/1.1 - http://www.faqs.org/rfcs/rfc2616.html
3)
Python - http://www.python.org/
![]() |