The ontology lookup webservice was developed in the course of working on
the PRIDE project to answer the
need to have a programmatic single point of query for multiple ontologies and controlled
vocabularies in a single interface with consistent return formats.
The Ontology Lookup Service (OLS) was created to integrate publicly available
biomedical ontologies into a single database. All modified ontologies are updated daily.
A listing of currently available ontologies is available at here.
The database can be queried online to obtain information on a single term or to browse a complete
ontology using AJAX (Asynchronous Javascript and XML). Auto-completion provides a user-friendly
search mechanism. An AJAX-based ontology viewer is available to browse a complete ontology
or subsets of it. A programmatic interface is available to query the webservice using SOAP.
The service is described by a WSDL descriptor file available here.
- Components in OLS
The OLS core is responsible for data model object manipulation and database operations
via the OJB object-relation bridge.
The OLS webservice resides in a servlet container and provides a SOAP interface for
querying the database. You can view the WSDL service descriptor here.
This offers a platform-independent programmatic access to the webservice API.
Application data loaders will connect to an external CVS repository where the
OBO-formatted ontology flat files are kept, download the latest versions locally and process
the files into data model objects that can be persisted to the database. The loaders will also
use Lucene indexers to create searchable text indexes which will be used to perform full-text
case-insensitive queries. Lucene is used rather than DBMS-specific text searching to obtain
consistent results across the various technological platforms onto which the OLS can be installed.
- Software dependencies The OLS
requires the following external dependencies to compile and run. These files must be obtained from their
individual projects. Some of these are distributed as part of the OLS CVS codebase and are indicated
as such.:
Java clients to the webservice will require the following components, all of which are included in the OLS
| Artifact ID |
Version |
| ols-client.jar |
1.14 |
| axis.jar |
1.4 |
| commons-discovery.jar |
0.2.jar |
| commons-logging.jar |
1.0.3 |
| log4j.jar |
1.2.8 |
| jaxrpc.jar |
1.1 |
| saaj.jar |
1.2 |
| wsdl4j.jar |
1.5.1 |
- Obtaining OLS
OLS is available for download from a CVS repository and can be checked out through anonymous (pserver)
CVS with the following instruction set. When prompted for a password for anonymous, simply press the Enter key.
cvs -d :pserver:anonymous@pride-proteome.cvs.sourceforge.net:/cvsroot/pride-proteome login
cvs -z3 -d :pserver:anonymous@pride-proteome.cvs.sourceforge.net:/cvsroot/pride-proteome co ook
The CVS head (latest release) is the most up to date version. Please be aware that this version is not
always guaranteed to be bug-free as OLS is under active development. The latest stable release is:
OLS 1.14 (CVS Tag: ols_1400)
- Creating the OLS database
The OLS database creation scripts for Oracle and MySQL can be found in the
rdbms/ folder and are named ols-oracle.ddl and ols-mysql.ddl
respectively. If you wish to use another database, it is recommended to simply adapt the existing
scripts to work with your database.
- Database Link Configuration
The OLS uses Apache OJB as a layer
between model objects and the database layer. The database connection descriptor that will be used to
access the database is configured in the conf/repository_database.xml file. The source tree
you have obtained will have some options pre-configured in this file, specifically for Oracle and MySQL.
You will need to update the dbalias, username and password fields
for the appropriate connection, as well as the subprotocol field if using Oracle. If necessary,
you can find more documentation on OJB configuration
here.
<!-- The Oracle Production JDBC Connection for OLS. -->
<jdbc-connection-descriptor
jcd-alias="OLS_ORACLE"
default-connection="false"
platform="Oracle"
driver="oracle.jdbc.driver.OracleDriver"
jdbc-level="2.0"
protocol="jdbc"
subprotocol="oracle:thin:@[YOUR DATABASE HOST:YOUR DATABASE PORT]"
dbalias="[YOUR DB ALIAS]"
username="[YOUR USERNAME]"
password="[YOUR PASSWORD]"
batch-mode="true"
useAutoCommit="0"
ignoreAutoCommitExceptions="false"
>
<object-cache class="org.apache.ojb.broker.cache.ObjectCacheDefaultImpl">
<attribute attribute-name="timeout" attribute-value="10000"/>
<attribute attribute-name="autoSync" attribute-value="true"/>
<attribute attribute-name="cachingKeyType" attribute-value="0"/>
</object-cache>
<!-- to ensure that stale connections are not kept in the pool -->
<!-- equivalent to autoReconnect=true, but db implementation independent -->
<!-- will also run Evictor Thread model to remove stale db connections as task -->
<connection-pool
maxActive="50"
minEvictableIdleTimeMillis="60000"
timeBetweenEvictionRunsMillis="120000"
numTestsPerEvictionRun="10"
testOnBorrow="true"
testOnReturn="false"
testWhileIdle="true"
validationQuery="select 1 from dual"
/>
<sequence-manager className="org.apache.ojb.broker.util.sequence.SequenceManagerHighLowImpl">
<attribute attribute-name="grabSize" attribute-value="500"/>
<attribute attribute-name="autoNaming" attribute-value="false"/>
<attribute attribute-name="globalSequenceId" attribute-value="true"/>
<attribute attribute-name="globalSequenceStart" attribute-value="10000"/>
</sequence-manager>
</jdbc-connection-descriptor>
<!-- The MySQL JDBC Connection for OLS. -->
<jdbc-connection-descriptor
jcd-alias="OLS_mysql"
default-connection="false"
platform="MySQL"
jdbc-level="2.0"
driver="com.mysql.jdbc.Driver"
protocol="jdbc"
subprotocol="mysql"
dbalias="//[YOUR DATABASE HOST]/[YOUR DATABASE NAME]"
username="[YOUR USERNAME]"
password="[YOUR PASSWORD]"
batch-mode="true"
useAutoCommit="0"
ignoreAutoCommitExceptions="false"
>
<object-cache class="org.apache.ojb.broker.cache.ObjectCacheDefaultImpl">
<attribute attribute-name="timeout" attribute-value="10000"/>
<attribute attribute-name="autoSync" attribute-value="true"/>
<attribute attribute-name="cachingKeyType" attribute-value="0"/>
</object-cache>
<!-- to ensure that stale connections are not kept in the pool -->
<!-- equivalent to autoReconnect=true, but db implementation independent -->
<!-- will also run Evictor Thread model to remove stale db connections as task -->
<connection-pool
maxActive="50"
minEvictableIdleTimeMillis="60000"
timeBetweenEvictionRunsMillis="120000"
numTestsPerEvictionRun="10"
testOnBorrow="true"
testOnReturn="false"
testWhileIdle="true"
validationQuery="select 1"
/>
<sequence-manager className="org.apache.ojb.broker.util.sequence.SequenceManagerHighLowImpl">
<attribute attribute-name="grabSize" attribute-value="500"/>
<attribute attribute-name="autoNaming" attribute-value="false"/>
<attribute attribute-name="globalSequenceId" attribute-value="true"/>
<attribute attribute-name="globalSequenceStart" attribute-value="10000"/>
</sequence-manager>
</jdbc-connection-descriptor>
You need to take note of the jcd-alias for your connection as the value you
specify there (for the example below: OLS_ORACLE or OLS_MySQL) will be used to
configure your local installation. The OLS will simply use this name to look up all the details you specify
here. The OLS is designed to work in a platform-independent manner as much as possible, and has been tested
on Oracle(8-10) and MySQL(4.1-5). If you decide to use another database, you will need to configure the proper
dependency for the correct JDBC driver in the build.xml and build-ws.xmlfiles.
Both the Oracle and MySQL dependencies are already configured.
- Configuring OLS
As mentioned above, the OLS requires many external dependencies to compile. The build files are
configured to look for the external jars in a single common library directory. This configuration variable
needs to be set for your local build environment. Please look in the build.properties file and
set the common.lib.dir variable to point to the directory where all the external jars
are located.
The main configuration file for the OLS is located in conf/ols-config.properties.in. The
configuration keys and their meaning are explained in the table below. IMPORTANT NOTE: any changes to
your configurations must be done in the conf/ols-config.properties.in and not the
conf/ols-config.properties, which is auto-generated and will be over-written each time the build
process is run. Certain properties are automatically generated by the system and should not be changed.
| ols.version.number |
The current version number of the OLS. Automatically generated, do not modify. |
| ols.version.date |
The date when the OLS was built. Automatically generated, do not modify. |
| ols.dbalias |
The jcd-alias of the database connection to use to load ontologies and query them |
| lucene.index.path |
The fully qualified path of a directory where Lucene can read/write the text indexes. This needs to
be writable by the loader processes and readable by the web application |
| loader.cacherefresh.url |
A fully qualified URL that will be called after new ontologies are loaded or refreshed to clear
cached data |
| loader.csv.anonymous |
A boolean flag to indicate if anonymous CVS should be used. If true, anonymous CVS is performed
to update the ontologies. If false, developer CVS over SSH is used. You will need to have such access
on the OBO project for this to succeed. |
| loader.csv.username |
The username to use for CVS. Use anonymous if using anon CVS. |
| loader.csv.password |
The password to use for CVS. Leave blank if using anon CVS. |
| loader.csv.local.repository |
The fully qualified path of a directory that points to the OBO CVS repository where the
ontologies are located. This needs to be writable by the loader processes. |
| dot.executable |
The fully qualified path to the "dot" executable. This is required to created the graphical term
hierarchy diagrams. Dot is available from AT&T at
http://www.research.att.com/sw/tools/graphviz/ |
| mesh.raw.file |
A path to the ASCII MeSH dump (d2006.bin) file available from
the National Library of Medicine. This file is
required in the post-processing of MeSH synonyms. This path must be readable by the loader process. |
| mesh.dump.file |
A path to a file that will be used to store a serialized data structure of the raw MeSH file. This
will speed up the post-processing of the MeSH ontology if it needs to be reloaded (as the ASCII file
is only released once a year). This path must be writable by the loader process. |
| log4j.* |
All these entries reflect Log4J configurations. More information on configuring Log4J
can be found here. |
| log4j.appender.UKFile.File |
This is the only entry that requires special attention. It must point to a file that
is writable by the loaders and by the web application for logging and debugging. |
- Building OLS
Two ant build files are distributed, build.xml and build-ws.xml. The first
can be used to build the core OLS jar. The second can be used to webapp WAR file and use it to deploy to
your tomcat container.
Once all the required dependencies are installed and the properties file properly configured, you
can build the core OLS jar by typing the following command: ant jar. You should get the following output:
$ ant jar
Buildfile: build.xml
init:
[mkdir] Created dir: build
[mkdir] Created dir: build\classes
[mkdir] Created dir: dist
compile:
[javac] Compiling 89 source files to build\classes
jar:
[copy] Copying 1 file to conf
[jar] Building jar: dist\ols-1.12.jar
[jar] Building jar: dist\ols-client-1.12.jar
[copy] Copying 1 file to \dev\common.lib
BUILD SUCCESSFUL
Total time: 6 seconds
Once OLS core has been successfully built, you can now proceed to build the web application.
The build-ws.xml file contains multiple tasks to automatically deploy the war file to remote
Tomcat servers. Please note however that these are designed primarily for internal use at the EBI, as our
production environment is configured to use two distinct servers that are load-balanced through an common
externally visible URL. This will most probably not reflect your local setup. We have therefore provided
an ant task to create a WAR file and a deployment descriptor that can be used to manually deploy
the web application.
To build the web application, use the following command: ant -f build-ws.xml war.
Please note that you must have compiled the OLS core with ant jar before running this command.
It should produce the following output:
$ ant -f build-ws.xml war
Buildfile: build-ws.xml
init:
internal-war-init:
internal-java2wsdl:
[axis-java2wsdl] Java2WSDL uk.ac.ebi.ook.web.interfaces.Query
internal-wsdl2java:
[axis-wsdl2java] WSDL2Java wsdl\OntologyQuery.wsdl
[axis-wsdl2java] OntologyQuerySoapBindingImpl.java already exists, WSDL2Java will not overwrite it.
jar:
init:
compile:
[javac] Compiling 5 source files to build\classes
jar:
[copy] Copying 1 file to conf
[jar] Building jar: dist\ols-1.12.jar
[jar] Building jar: dist\ols-client-1.12.jar
[copy] Copying 1 file to dev\common.lib
internal-war:
init:
src:
[zip] Building zip: dist\ols-1.12-src.zip
[war] Building war: dist\ols-20060718-1134-prod.war
internal-package:
[copy] Copying 1 file to webapp\conf
BUILD SUCCESSFUL
Total time: 28 seconds
The internal-war-init task defines the name that will be given to the WAR file
and also defines a service.location variable. This is the URL that will be integrated in
the WSDL file as the SOAP service location. It currently defaults to the installation at the EBI. If you
wish to change this to your local installation, you must update the task in the build-ws.xml file.
<target name="internal-war-init">
<property name="war.file" value="${name}-${stamp}-prod.war"/>
<property name="service.location" value="http://www.ebi.ac.uk/${service.url}"/>
</target>
This will build the war file that is correctly timestamped in the dist directory
as well as a deployment descriptor file (context.xml) that can be used to deploy to Tomcat. You
can use this file with the Tomcat
Manager application.
<?xml version='1.0' encoding='utf-8'?>
<Context docBase="/ebi/www/prod/deploy/ols-20060201-1327-prod.war"
path="/ontology-lookup" debug="1" reloadable="true" crossContext="true">
<Logger className="org.apache.catalina.logger.FileLogger"
debug="9" prefix="ols-ontology-lookup.log." suffix=".txt"
timestamp="true"
/>
<!-- Link to the user database we will get roles from -->
<ResourceLink name="users" global="UserDatabase"
type="org.apache.catalina.UserDatabase"/>
<!-- For the automatic email mechanism to work, this flag needs to be set to true: -->
<Parameter name="send_automatic_email" value="true" override="false"/>
<Parameter name="support_email" value="rcote@ebi.ac.uk" override="false"/>
<Parameter name="from_address" value="ols@ebi.ac.uk" override="false"/>
<Parameter name="mail_server" value="mailserv.ebi.ac.uk" override="false"/>
<Parameter name="email_account_name" value="" override="false"/>
<Parameter name="email_password" value="" override="false"/>
<Parameter name="email_protocol" value="smtp" override="false"/>
</Context>
While most settings will be automatically updated during the build process, several settings will
need to be manually set. The most important one is the docBase parameter, which needs to point
to the full path to the WAR file on the server where it will be deployed in your servlet container. It defaults
to the server.deploy.dir variable of the server.properties file.
The OLS can be configured to send automatic email notices when errors occur on the web application.
If you wish to enable this, set send_automatic_email to true. If automatic emails are enabled, you
will need to configure the appropriate values to the parameters above. Note that the EBI mail server will not
relay emails coming from outside the ebi.ac.uk domain.
- Data Loading
The database model was inspired by the relevant portion of the BioSQL
database schema. Versions of the database schema currently exist for mySQL™ and
Oracle™. Ontology loaders feed the database by parsing OBO-formatted flat files and
creating an object map that is persisted to the database using Apache
ObjectRelationalBridge (OJB). All relevant information is extracted from the OBO
file, including term accessions, names, synonyms, definitions, comments,
relationships with other terms and cross-references with other ontologies and
databases. The OLS does not do any curation on loaded ontologies, meaning that
the data that is in the source flat file is loaded faithfully.
The OBO project maintains all of its ontologies in a CVS repository,
making it easy to keep the database up-to-date. Updated files are obtained on a
daily basis and any modified ontology will be loaded to the database. No loss of
service is experienced during this process as the old version of the ontology is
kept alive until the new one is fully loaded. Once loaded, the new version is set
live and the old one is deleted.
Ontologies need to be checked out of the OBO sourceforge CVS repository
once before the loader process can be properly initiated. To checkout the ontologies,
you need to perform the following commands:
cvs -d :pserver:anonymous@obo.cvs.sourceforge.net:/cvsroot/obo login
cvs -d :pserver:anonymous@obo.cvs.sourceforge.net:/cvsroot/obo checkout obo/ontology
There is no password for anonymous CVS so simply press return if prompted.
This will download all the ontology files to the [CURRENT DIRECTORY]/obo/ontology folder.
Keep note of this path as it will be required to configure the loader process later on.
Once the ontology has been persisted, another process will create an
Apache Lucene text index that will be used later on for case-insensitive full text
queries. Terms are indexed on the preferred term name as well as on any annotated
synonyms. Lucene has several advantages as a text-searching technology platform over
RDBMS-based queries. It is very efficient at indexing and searching, it has a very
powerful search syntax that can be used to limit and refine queries and it is
platform independent, meaning that users do not need to rely on RDBMS-specific
technologies to obtain good performance.
The easiest way to ensure that the ontologies are up to date is to
create a daily process (using cron on linux, for example) to run the loader code.
As a sample file, here is the crontab file that currently perform this task at the EBI:
HOME=/homes/rcote
SHELL=/usr/local/bin/tcsh
MAILTO=rcote@ebi.ac.uk
ORACLE_BASE=/sw/arch/dbtools/oracle
ORACLE_HOME=/sw/arch/dbtools/oracle/product/9.2.0
TNS_ADMIN=/sw/arch/dbtools/oracle/product/9.2.0/network/admin/
ORACLE_PATH=/sw/arch/dbtools/oracle/product/9.2.0/bin
ORA_NLS32=/sw/arch/dbtools/oracle/product/9.2.0/ocommon/nls/admin/data
#LD_PRELOAD=/homes/oracle/libcwait.so
LD_LIBRARY_PATH=/sw/arch/dbtools/oracle/product/9.2.0/lib
ANT_HOME=/sw/common/share/java/ant
JAVA_HOME=/usr/java/j2sdk1.4.2_07
PATH=$HOME/bin:$JAVA_HOME/bin:$ANT_HOME/bin:$ORACLE_HOME/bin:/usr/local/bin:/bin
:/usr/bin:/usr/X11R6/bin:.:/ebi/sp/pro3/bin
CVS_RSH=ssh
0 10 * * * $HOME/loader/runloader.sh
More information on using cron can be found on Wikipedia.
This script is configured to run every morning at 10:00 and runs a shell
script to run the loaders to query the CVS server repository, download the latest ontology
files, parse them, load them to the database and finally run the lucene indexers.
Once done, the output is emailed to the address specified in the MAILTO variable.
The shell script that is run is the following:
#!/bin/tcsh
cd $HOME/loader
(ant loadAll > $HOME/loader/stdout.txt) >& $HOME/loader/stderr.txt
echo "Loader script terminated."
set resultCode=`cat $HOME/loader/resultcode.txt`
#All OK - output the log to see what's been loaded.
if ($resultCode == "0") then
echo "STDOUT:";cat $HOME/loader/stdout.txt
endif
#OK but no ontologies updated
if ($resultCode == "1") then
echo "No ontologies loaded"
endif
#PB/IO ERROR
if ($resultCode == "2") then
echo "An error occurred:"
echo "STDOUT:";cat $HOME/loader/stdout.txt
echo "STDERR:";cat $HOME/loader/stderr.txt
endif
#CVS ERROR
if ($resultCode == "3") then
echo "An error occurred:"
echo "STDOUT:";cat $HOME/loader/stdout.txt
echo "STDERR:";cat $HOME/loader/stderr.txt
endif
The script starts ant and redirects the standard out and standard error to corresponding
text files for logging. The ant task writes its status exit code to a text file that is read by the
shell script and the appropriate output is sent to the MAILTO email address for confirmation.
The ant task that is run is the following:
<target name="loadAll" >
<java fork="true" classname="uk.ac.ebi.ook.loader.impl.ConfigurableOBOLoader" failonerror="false" resultproperty="resultcode">
<classpath refid="full.path"/>
<jvmarg line="-Xms256m -Xmx1024m"/>
<!--<arg value="-f"/>
<arg value="MI"/>
<arg value="GO"/>-->
</java>
<echo message="${resultcode}" file="resultcode.txt"/>
</target>
A few notes on the above ant task. It is sometimes possible that a loading process will fail
(usually when the sourceforge CVS repository is offline or when the OBOEdit codebase is updated).
It is possible to force the refresh of all ontologies or specific ones. Uncommenting the <arg value="-f"/>
argument will force the reload of ontologies. If no other parameter is supplied, it will reload all ontologies.
If other parameters are supplied with the -f argument (for example, MI and GO in the above script), only those ontologies
will be refreshed.
- Using the OLS web interface
An interactive front-end was created using Java Server Pages (JSP) in
the Struts Framework. From the OLS homepage, users can search for ontology terms
using an auto-completing form. Users can select a specific ontology or search
across all loaded ontologies. As users type a search term, a query is sent to a
Java Servlet using Asynchronous JavaScript and XML (AJAX) once a search string is
at least 3 characters long (excluding white spaces). A collection of close matches
are sent back to the user, which are displayed in a drop-down menu
Queries are done on the preferred term name as well as on any synonyms. If the
exact term is in the list, the user can select it to obtain the preferred term
accession id. Once a term is selected, a further AJAX request will return the
definition for this term as well as any annotations associated with it
(including definitions, comments and known synonyms). If the number of possible
terms matching the search term exceeds a cut-off limit, the user has the
possibility to see the full list by selecting the "… and more" option.
Users can also browse ontologies using a dynamically generated tree
structure. Once an ontology is selected, the root terms of that ontology are
displayed in the ontology browser. Clicking on a tree node will send an AJAX
request to a Java Servlet which will return the child terms for this parent
term and update the browser. Selecting a term will display its
definition and any annotations. It will also update the term hierarchy, which is
a generated image that displays all the paths to the root(s) of the ontology
for that given term.
Relationships between terms are colour-coded to quickly provide an
additional level of information. The three most significant relationships that
comprise close to 98% of the relationships loaded in the OLS ("is a", 72%, "part of",
25% and "develops from", less than 1%) have been highlighted. Though several
ontologies have defined custom relationship types, their usage is limited overall.
To keep the interface simple, these relationships are colour-coded as "others" but
hovering the mouse cursor over these terms will display the relationship type in
the browser.
Users can also browse a subset of the ontology. This can be done by
clicking on the "browse" button from the main page after a term has been selected
from the autocompletion selections or by clicking on the "zoom" button from the
ontology browser. This will re-root the browser on the selected term.
Although it would have been possible to generate a complete,
fully-browsable tree for small ontologies, this would rapidly become cumbersome
and inefficient for large ontologies such as GO, which have in excess of 20,000
terms. Using AJAX methodology, the tree is built up gradually as the user browses
the ontology.
- Direct AJAX links
AJAX is a technology that allows complex interactions to a back-end
server that are transparent to the user. In the case of the ontology browser, for
example, selecting a term will update the metadata section of the display without
reloading the whole page.
Application developers can take advantage of this without needing to
install the OLS locally. The XML that is processed by the AJAX libraries can be
accessed directly using specifically-formatted URLs:
-
To obtain term metadata: http://www.ebi.ac.uk/ontology-lookup/ajax.view?q=termmetadata&termid=[ID]&ontologyname=[LABEL]
Where ID is a required term ID (eg: GO:0007067) and LABEL is the optional short label of an ontology (eg: GO).
-
To obtain term name suggestions: http://www.ebi.ac.uk/ontology-lookup/ajax.view?q=termautocomplete&termname=[NAME]&ontologyname=[LABEL]
Where NAME is a required term name substring (eg: mito) and LABEL is the optional short label of an ontology (eg: GO). If left blank,
the term names will be prefixed with the ontology label where they can be found.
-
To obtain a full term name: http://www.ebi.ac.uk/ontology-lookup/ajax.view?q=termname&termid=[ID]&ontologyname=[LABEL]
Where ID is a required term ID (eg: GO:0007067) and LABEL is the optional short label of an ontology (eg: GO).
- Writing clients to OLS
In java, a client class is already available in the OLS client jar. To use
it, simply import the following classes into your code. Note that Map and Iterator
are not required, but many of the methods return a Map.
For users who need to configure a proxy server to connect to external networks,
this can be achieved in two ways: