Technical details


Why isn't my favourite database available in RDF?

We appreciate that users might wish to see all of EMBL-EBI's databases made available as RDF - the more databases included, the more valuable Linked Data becomes as an integration platform. However, our database teams have limited resources, and must prioiritise their development activities accordingly. Transforming EMBL-EBI's large and complex databases into RDF and maintaining the various services is not a trivial undertaking. By creating this platform, we hope to lay the ground work that may make it easier for more databases to produce RDF in the future, as well as to evaluate the demand. If you would like to see a particular database made available as RDF, we suggest that you let us know so we can pass on your feedback.

What is the license for using this data?

The licenses for using this data are specific to each service. You can find the license information by visiting the individual services in question. Alternatively, this information is also available as part of the dataset description and thus can be access via a SPARQL query and the Linked Data browser. See our provenance page as well as using SPARQL documentation on how to retrieve this information via SPARQL.

What are Linked Data and the Semantic Web?

In its most common usage, Linked Data refers to the use of RDF technologies to make data available on the web. More specifically, it is about linking together multiple dinstinct datasets using a common set of URIs, such that they can be easily integrated together using the same technology. The primary advantage of making data available in this form is that it makes it much easier for programmatic queries to be executed over multiple datasets by a computer. This is because it does not require a human to first create special code to understand the syntax (e.g. file formats) of the data. By breaking down these syntactic barriers, the process of data integration becomes concentrated on the meaning or semantics of the data themselves, and these semantics are made explicit in the data. For this reason, RDF datasets on the web are sometimes collectively referred to as the Semantic Web. Note that RDF technologies provide methods to allow the semantics of data to be expressed. They can help deal with some aspects of semantics (e.g. URIs can make it easier to know when two identifiers from different databases are the same thing), but they do not solve the problem of semantics per se.

What is RDF?

The acronym "RDF" is most often used to mean one of two things: The Resource Description Framework (RDF) is a family of web standards maintained by the World Wide Web Consortium (W3C). It can be used as a way to represent, share and interact with data on the web. RDF encompasses a number of different technologies including a data model, syntax schemas, serialisation formats, and a query language (SPARQL). See our introduction to RDF for more details. In certain contexts the term RDF is often used to refer specifically to the RDF/XML file format, which usually uses the .rdf file extension, but there are in fact several widely used file formats for representing the RDF data model, each with their own advantages. For example, a turtle (.ttl) file might still be said to contain RDF data, even though it is not an RDF XML file.

What is a URI?

A URI, or Uniform Resource Identifier, is a globally unique identifier string that follows a standard syntax. URIs lie at the heart of the Semantic Web, because they are used to identify anything that can be referred to within the RDF data model: subjects and predicates are always URIs, and objects are either URIs or primitives. Sometimes URIs identify real physical objects or people (e.g. "London"), sometimes they represent abstract concepts (e.g. "a city"), or the connection between two things (e.g. "the geographical location").

Because URIs are unique to a specific "thing", a single URI can never refer to more than one thing at the same time. This means that data expressed in two different RDF datasets that contain references to the same URI are talking about the same "thing" and can therefore be integrated programatically and semantically. For example, a simple identifier like 9606 might mean one of several different "things" depending on the dataset it is contained within: tirofiban hydrochloride (ChEBI) or Homo sapiens (NCBI Taxonomy). This makes it impossible for a computer program to unambiguously identify what 9606 means without explicit instructions. By contrast, any two datasets that include a reference to are guaranteed to be talking about the same thing. This subtle but important distinction is crucial in enabling the integration of generic information of data on the web, because software need no longer be limited to dealing only with some specific data it was designed for by its human creator.

A URL, or Uniform Resource Locator, is a specific type of URI that implies a location on the web: most commonly HTTP URLs (URIs that begin with http:). In practice this distinction has largely been lost: most URIs used within RDF are HTTP URLs, and it is considered good practice for these URLs to resolve to something relevant on the web.

What is SPARQL?

SPARQL is a network protocol and query language for RDF. It is partially analogous to SQL (Structured Query Language, used for interacting with relational databases). SPARQL combines web standards such as HTTP with other components of the Resource Description Framework, and defines a syntax for retrieving RDF data (triples). A SPARQL query is a pattern of instructions and limitations for retrieving a certain collection of data. When a SPARQL endpoint receives a SPARQL query, it will search the RDF statements contained within it and return those that satisfy the pattern.

For example, a SPARQL query might represent a request such as:

Give me the names and email addresses of all friends of Tim who are employed by any company in the Cambridge area and have an interest in cycling.
SPARUL is an extension of SPARQL that specifies update/write operations: adding, updating and deleting data.

Technical details

Does the RDF Platform support HTTPS?

Yes, the RDF Platform website does support HTTPS. This includes the SPARQL endpoints and Linked Data Browsers. Clicking around the site should keep you within an HTTPS browsing environment. However, there are some caveats:

  • The UniProt SPARQL endpoint does not currently support HTTPS.
  • The URIs used within the data themselves are http URIs and our URL resolver does not currently support HTTPS. This means that if you follow a link in the Linked Data Browser that takes you to a location, for example, this will be unencrypted. These external links are marked with an icon. It also means you cannot obtain RDF via content negotiation over HTTPS.
We are currently considering how to improve this support, and would be interested in receiving your feedback about any specific requirements.

What are the limitations of the SPARQL endpoints?

We provide a public SPARQL endpoint so that you can explore how the data are modelled, and for occasional use in running targeted queries. They are not intended for regular heavy use, or as an alternative means of downloading large datasets. We therefore apply some limits in the configuration of the endpoint, which can mean certain queries will not work. For example:

  • The maximum number of rows that will be returned is 100,000.
  • There is a maximum query execution time of 10 minutes.
For more control over access to the data, we recommend you run your own SPARQL endpoint containing our data, for which we provide instructions.

Which version of SPARQL is supported?

Our SPARQL endpoint currently supports most of the features of the SPARQL 1.1 Query Protocol (5th January 2012 working draft). You can use BIND and VALUES clauses, property path expressions and RDFS inferencing.

What does "production" mean?

EMBL-EBI's data resource teams engage in a large variety of activities, many of which produce useful output for our users. This includes a number of resources or collaborations that may produce data or applications in the area of semantics and Semantic Web technologies. However, not all of these are considered production services. A production service should be:

  • kept up to date with primary databases
  • covered by current funding
  • currently maintained
  • supported by a fault-tolerant systems architecture
The RDF Platform aims for all of its output to become production services. This means that users have greater confidence in decisions to incorporate these services into their own research environment.