URI based identification systems

We have tried to provide objective information on this page. If you believe it contains inaccuracies, omissions or misinterpretations, then please contact us so that we can rectify the situation.

The major purpose of any identification schemes should be to address the following issues:

There are several URI schemes, besides the MIRIAM one. The most common or well-known efforts are described below, together with explanation of some technical points that the systems entail. An additional point that should be borne in mind is that there is a difference between a physical resource (URL) and a URI. Both can be used as identifiers, but have different limitations.

Existing Schemes

Life Science Identifiers

LSIDs are Life Science Identifiers, and were developed by IBM to provide location-independent resource identifiers. For a data provider to associate their records with LSIDs, they are required to register a namespace identifier with the LSID Authority. LSIDs take the form 'URN label:Authority identifier:Namespace Identifier:Object Identifier:Optional Revision Identifier', for example: urn:lsid:ubio.org:namebank:11815.

Current limitations:

Life Science Record Names

LSRNs are Life Science Record Names, and are used to identify records in Life Science databases. This effort comprises a centralised repository of identifiers and schemas, and defines a prefix (e.g GO: for Gene Ontology) for registered databases. It also provides mappings of LSRNs to current URL records. The registry itself is based on the cross-referenced resources used in Gene Ontology and similar sources. While an LSRN itself is not a valid URI, it can be used to construct one: LSRNs take the form 'database_abbreviation:record_identifier', for example: PMID:18078503, which can be transformed into http://lsrn.org/PMID:18078503.

Note: The information stored by the LSRN repository has been subsumed into the MIRIAM Registry. LSRN will not be developed further (deprecated), and has been replaced by MIRIAM URIs using the Identifiers.org resolving system.

Current limitations:

Shared Names

The Shared Names initiative aims to assign URIs as names for publicly available biomedical information records. While still a developing effort, initial population of the list of Shared Names will be drawn from Gene Ontology, Enzyme and Pfam database cross-references. The Shared Name URIs are proposed to resolve to selectable format encodings of a record, for example html, rdf, xml. Amongst the issues that remain to be resolved are the form of the URI itself, which may be of a form similar to http://sharename.org/uniprot/Q9ULZ0. The URI would be resolved to a URL through redirection using a set of active servers, and probably including the PURL server.

Current limitations:

PURLs

PURLs are Persistent Uniform Resource Locators, and were introduced in 1995 by the OCLC to provide a single URL that permanently identifies a resource on the World Wide Web. It functions as a simple redirection from an intermediate resolution service, which associates the PURL with a target URL endpoint, returning the target to the client. This provides a persistent URI, in the form of a URL, where the location to which it resolves may be modified over time, as necessary. The advantage of PURLS lies in their URL form, which means there are directly usable on the Web. An example of a PURL is: http://purl.obofoundry.org/obo/OBI_0000225.

Current limitations:

DOI

DOIs are Digital Object Identifiers, and were first used by applications in 2000. They were created as a system by The International Digital Object Identifier Foundation (IDF) as a means to provide a persistent identifier for objects on the World Wide Web. Once assigned, as with PURLs, a DOI is resolved through a simple redirection.

Current limitations:

MIRIAM URIs

The MIRIAM Registry system was originally created to provide support for the use of MIRIAM URIs, which are part of the MIRIAM Guidelines. Namespace information stored in the MIRIAM Registry can be used to generate URIs, which themselves can be resolved through the Identifiers.org system. This system resolves URIs to an intermediate page listing all possible resolving locations for an identifier. An example of a MIRIAM URI resolved through the Identifiers.org system. Nowadays, these URIs can, and are, being used outside the context of models.

Current limitations:
Advantages:

Identifier scheme characteristics comparison

Scheme Uniqueness Standard compliance Resolvability Creation Curation
DOI yes no yes (*) red lock icon (*) no
LSID yes yes no red lock icon (*) no
LSRN yes no yes (*) green lock icon (*) no
PURL no yes yes green lock icon no
Shared Name yes yes yes ? (*) ? (*)
MIRIAM URI yes yes yes (*) green lock icon yes

(*) If you move your mouse above the cell marked with an asterisk, you'll be able to see more information.


Uniqueness

Whether or not two different identifiers (in a given scheme) can be assigned to the same entity.

For example, one can find the following (different) PURLs which identify the same thing:

Standard compliance

Represents the adherence of a scheme to a defined, publicly available standard, which in some way meets community approval (for example URI, URN, ...).

Resolvability

Represents the capacity for an identifier (employing a particular scheme) to be resolved to physical location(s) on the World Wide Web. Even though many of these solutions were initially designed to simply identify a piece of information, it is now becoming increasingly important to be able to access the data or information about the data.

URLs based technologies can theoretically directly handle this kind of access. However, solutions like Shared Names and PURLs still requires the usage of a resolver; which provides the necessary redirection work between the persistent URL and the non persistent one actually used by a specific resource. Also, solutions based on PURLs (so potentially true for Shared Names too) have another, more important, drawback: there can be only a one to one mapping between the identifier (a URL in this case) and the physical location where the information can be obtained.

Solutions relying on a resolver can theoretically (and in the case of MIRIAM Registry, actually do) handle a one to many mapping between the identifier (a URI) and the physical locations (URLs) which serve the data. The key difference between these two methodologies lies in whether the identifier for the information is separated from the resource that actually provides that information. Schemes such as PURL lock the two pieces of information together, hence are only able to provide a one-to-one mapping. URI Schemes, such as MIRIAM, dissociate the identifier from the physical resource, facilitating a more flexible one-to-many mapping.

All solutions providing resolvability are initially liable to the single point of failure situation. But all of these solutions can overcome this issue by setting up mirrors of their resolvers.

Creation

Describes the procedure by which an identifier for a particular scheme can be created. We mainly focus on the openness (publicly available and accessible) and cost of this creation procedure. The timeliness of the process is not considered.

Curation

Whether there exists a central system/procedure to ensure that the identifiers created adhere to their scheme. This should includes various elements, some related to the scheme (such as validity, consistency, coherency, etc.), others related to the attached physical locations (such as existence, maintenance, etc.).

Other characteristics

Coverage

It is important, when comparing such technologies, to consider the coverage of these solutions: which datasets can be identified. Unfortunately, such measure is not easy to obtain for most solutions, except for LSRN and MIRIAM Registry.