spacer

FAQ

UniProt Knowledgebase Release [2012_01]


Is this a sequence retrieval tool?

No, the UniProtJAPI is not a sequence retrieval tool, but a data integration framework for Java applications. It is well known that multiple sequence retrieval tools exist; these include for instance WSDbfetch which is a webservice implementation of Dbfetch. All these services retrieve the data in various formats including XML, RDF, fasta and flat-file. However in order to process the information parsers need to be written that transform the data into suitable data structures. Sometimes this can be done without further knowledge about the UniProt structure; the protein sequence, for instance, can simply be parsed from its corresponding sequence line. Difficulties arise if for instance splice variants of a protein sequence are required. In this case both the comment lines and feature lines need to be parsed at the same time and a deeper understanding of the intrinsic UniProt structures is required. Frequent line type and data change in the formats above further complicate parsing UniProt data. Consequently the end user may have to deal with maintenance issues, instead of concentrating on the data. This is where the UniProtJAPI moves beyond simple sequence retrieval into a data integration roll for Java applications.


Is there a diagram of how the UniProtJAPI works?

below is a simplified diagram that shows at an abstract level the overal architecture of the UniProtJAPI


Abstract Arch

Can I get access to the source code for the UniProtJAPI?

Yes the UniProtJAPI is available on request.


What databases are accessable via the UniProtJAPI?

The UniProt Knowledgebase (UniProtKB) is the central access point for extensive curated protein information, including function, classification, and cross-reference. UniProt Reference Clusters (UniRef) databases combine closely related sequences into a single record to speed searches. The UniProt Archive (UniParc) is a comprehensive repository, reflecting the history of all protein sequences. The UniProt Metagenomic and Environmental Sequences (UniMES) database is a repository specifically developed for metagenomic and environmental data. InterPro is a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences. GOA project provides high-quality Gene Ontology (GO) annotations to proteins in the UniProt Knowledgebase (UniProtKB).


Domain Model

Below is a simple domain model showing the relationship between the above described databases at a class level.


Simple Data Model

What technologies are used in the UniProtJAPI?

UniProtJAPI is based on remote services implemented from plain Java classes using the Spring HTTP invoker, this uses the simplicity of HTTP communication with Java's built-in object serialization. Communication between the client and server is through Apache Commons HttpClient for HTTP. The Kraken framework provides access to the various databases and facilitates access to search and Blast services.


Class Diagram

The below diagram shows how the HTTP Invoker allows us to have interfaces that you can use to call the services remotely


Class Diagram

What is the Kraken framework?

Provides a framework that supports applications processing protein and protein related data. These include data selection, browsing, data mining and other high throughput applications. The technologies used in the Kraken Framework are Lucene which is used to provide a full text and field based index over all Protein Data. Oracle's BerkleyDb which provides file system database for entry retrieval and storage and Spring's configuration management. The advantage of using the UniProtJAPI is that all these technologies are remotely accessed so creating a thin client on top of this complex and resource intensive framework.


How Does Spring Remoting Work?

The remoting mechanism works by using remote service exporters which are used to create the remote service end-points for the client programs to call. Service exporters also manage any registries to look up the remote services. There are proxy factory beans which create proxies that the client uses to connect to the remote services. Finally the HTTP invoker allows the client to make remote calls across HTTP and pass Java objects using Java serialization back.

What's the difference between a normal Iterator and an EntryIterator ?

The EntryIterator is a standard iterator. In addition it implements the iterable interface and has an getResultSize(); method which will return the number of results that are accessable. This method will not cause a request to the server and is know as soon as the EntryIterator object is return from the remote service. The initial calls to the iterator's next(); hasNext() methods will fully initialize the iterator with additional calls to the remote service to get the necessary objects.

Why don't you return List of Entries rather than EntryIterators?

The advantage of the EntryIterator is that is uses 'lazy loading' so if your result set contains perhaps 1 million entries it will be as quick to access as if you had an iterator with 100 entries.

What advantage is there to using expression languages to access Attributes over accessing the object graph myself?

If you wish to access large number of entries but don't need all the data from the object this provides a very efficient mechanism for getting your data. Further this mechanism provides another level of protection over format changes that may occur in the underlying domain model. If for example you are only interested in GO terms and use Attributes to access the data the only time you will need to update the jars are for changes to the GO data all other format changes will be handled on the servers.

Why is this not a webservice ?

Using the HTTP invoker archicture over webservices means that we are limiting the scope of this work to Java based applications but in doing that we are simplifying access to this data.

How reliable is the system and is it maintained ?

The system is supported by the UniProt production team at the EBI and is maintained releasing updates in line with the UniProt updates.

Can i use the service in comercial software ?

Please refer to the licenses and disclaimers.

How can I turn on/off logging?

The UniProtJAPI uses log4j logging. As the log4j library does not make any assumptions about its environment it therefore has no default log4j appenders. If you use the uniprotjapi without telling log4j of any appenders you will get the following message at startup:


log4j:WARN No appenders could be found for logger (org.springframework.util.ClassUtils).
log4j:WARN Please initialize the log4j system properly.
    

So to use logging simply make sure the log4j jar is in your classpath and create your own appenders. In the standard UniProtJAPI distribution we include the log4j jar and a default log4j.properties file that will only show WARN messages to the Console when the examples are run. Below is the log4j.properties files used in this case. For more information on using Log4j please refer to the Apache project (http://logging.apache.org/log4j/).



# Set root logger level to WARN and its only appender to A1.
log4j.rootLogger=WARN, A1

# A1 is set to be a ConsoleAppender.
log4j.appender.A1=org.apache.log4j.ConsoleAppender

# A1 uses PatternLayout.
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

What are the startup times when you use the api?

With Java 5.0 improvements have been made in scalability and performance, with a new emphasis on startup time and memory footprint. However their is still a small startup time when the JVM initializes. Once your application is running the a number of services are initialized including the HTTP connection this is the biggest overhead during startup. After this process accessing the services will incur no additional startup time.

How can I use the UniProtJAPI through a Proxy?

We do have a mechanism to support Proxies in the UniProtJAPI. The UniProtJAPI enumeration at class loading time (i.e. when your application starts) will search your applications classpath for a file called uniprotjapi.properties If it finds this file it will parse in its contents to the applications context.


The uniprotjapi.properties file allows you to specify some of the startup parameters of the UniProtJAPI and specifically the following 4 key/value pairs allow you to define a proxy for the UniProtJAPI to pass through:


#PROXY INFO
#Leave username password empty if no credientials required
username=aname
password=apw
proxy.host=patience.ebi.ac.uk
proxy.port=8111
    



spacer