URI based identification systems
We have tried to provide objective information on this page. If you believe it contains inaccuracies, omissions or misinterpretations, then please contact us so that we can rectify the situation.
The major purpose of any identification schemes should be to address the following issues:
- persistence: association of an identifier permanently with a piece of information
- uniqueness: association of a single identifier with a piece of information
- unambiguousness: association of an identifier unmistakably with a piece of information
There are several URI schemes, besides the MIRIAM one. The most common or well-known efforts are described below, together with explanation of some technical points that the systems entail. An additional point that should be borne in mind is that there is a difference between a physical resource (URL) and a URI. Both can be used as identifiers, but have different limitations.
Life Science Identifiers
LSIDs are Life Science Identifiers, and were developed by IBM to provide location-independent resource identifiers. For a data provider to associate their records with LSIDs, they are required to register a namespace identifier with the LSID Authority. LSIDs take the form 'URN label:Authority identifier:Namespace Identifier:Object Identifier:Optional Revision Identifier', for example:
- To use LSIDs, data providers must register their resources with the LSID Authority.
- LSIDs, as they do not use HTTP URIs, are not directly resolvable.
- Resolving an LSID will only ever generate a single referenced URL.
Life Science Record Names
LSRNs are Life Science Record Names, and are used to identify records in Life Science databases. This effort comprises a centralised repository of identifiers and schemas, and defines a prefix (e.g GO: for Gene Ontology) for registered databases. It also provides mappings of LSRNs to current URL records. The registry itself is based on the cross-referenced resources used in Gene Ontology and similar sources. While an LSRN itself is not a valid URI, it can be used to construct one: LSRNs take the form 'database_abbreviation:record_identifier', for example:
PMID:18078503, which can be transformed into http://lsrn.org/PMID:18078503.
Note: The information stored by the LSRN repository has been subsumed into the MIRIAM Registry. LSRN will not be developed further (deprecated), and has been replaced by MIRIAM URIs using the Identifiers.org resolving system.
- The registry is available to edit by anyone. Some changes (for example the ones dealing with the resolvability) must be specifically approved or rejected by an editor, others are directly committed without any review.
- There are no services provided to interact with the registry (for example to generate LSRNs).
- An LSRN needs to be modified to generate a URI.
- Replaced by MIRIAM URIs.
The Shared Names initiative aims to assign URIs as names for publicly available biomedical information records. While still a developing effort, initial population of the list of Shared Names will be drawn from Gene Ontology, Enzyme and Pfam database cross-references. The Shared Name URIs are proposed to resolve to selectable format encodings of a record, for example html, rdf, xml. Amongst the issues that remain to be resolved are the form of the URI itself, which may be of a form similar to http://sharename.org/uniprot/Q9ULZ0. The URI would be resolved to a URL through redirection using a set of active servers, and probably including the PURL server.
- Shared Names is currently in a conceptual phase, with no working examples.
- The main entry page and the issues list would appear to suggest that no progress has been made recently.
- This initiative is hard to judge in its present state, due to its immaturity.
PURLs are Persistent Uniform Resource Locators, and were introduced in 1995 by the OCLC to provide a single URL that permanently identifies a resource on the World Wide Web. It functions as a simple redirection from an intermediate resolution service, which associates the PURL with a target URL endpoint, returning the target to the client. This provides a persistent URI, in the form of a URL, where the location to which it resolves may be modified over time, as necessary. The advantage of PURLS lies in their URL form, which means there are directly usable on the Web. An example of a PURL is: http://purl.obofoundry.org/obo/OBI_0000225.
- Resolving a PURL will only ever generate a single referenced URL.
- The ability to generate a PURL requires registration with the system.
- Registration must be completed for each PURL resolver.
- Unique identifiers are problematic, since there may be different resolvers, registered by different users, which resolve to the same physical URL.
- There is no centralised governance of PURL Resolvers, and so no means to search for existing services which may be reused.
DOIs are Digital Object Identifiers, and were first used by applications in 2000. They were created as a system by The International Digital Object Identifier Foundation (IDF) as a means to provide a persistent identifier for objects on the World Wide Web. Once assigned, as with PURLs, a DOI is resolved through a simple redirection.
- To use DOIs, data providers must register with an IDF appointed agency, which requires a fee, though the system overall is 'not-for-profit'.
- Resolving a DOI will only ever generate a single referenced URL.
- DOIs do not reuse data providers' identifiers.
The MIRIAM Registry system was originally created to provide support for the use of MIRIAM URIs, which are part of the MIRIAM Guidelines. Namespace information stored in the MIRIAM Registry can be used to generate URIs, which themselves can be resolved through the Identifiers.org system. This system resolves URIs to an intermediate page listing all possible resolving locations for an identifier. An example of a MIRIAM URI resolved through the Identifiers.org system. Nowadays, these URIs can, and are, being used outside the context of models.
- The ability to use a MIRIAM URI requires the existence of the corresponding data collection in the MIRIAM Registry. Requests for new data collections can be made using this form and can be submitted freely by anyone.
- Since MIRIAM URIs use HTTP URLs, they are directly resolvable.
- MIRIAM URIs do not belong to data providers, and do not need to be registered by them.
- Anybody can submit new data collections and therefore contribute to the creation of new MIRIAM URIs. The requirements for new data collections are listed on the website.
- The data provided by MIRIAM Registry is curated: everything has been checked and can be trusted. Errors and modifications to existing entries can also be made.
- MIRIAM Registry has a dedicated team of curators and developers, and is not reliant of editors to approve additional resources.
- MIRIAM Registry is continuously updating its records in order to offer the most up to date information available.
- MIRIAM URIs resolve to an intermediate page, listing all possible resolving locations (or resources), and users of the system can select the most appropriate hyperlink for their needs.
- The resolvability of MIRIAM URIs is monitored daily (automatically). This allows rapid identification of any changes in the way the data can be accessed for any resources, and allows curators to take remediatory action.
- MIRIAM URIs are used and supported by numerous tools and data resources.
- MIRIAM Registry provides programmatic access to its data via SOAP and REST Web Services
- MIRIAM Registry provides access to its entire dataset via an XML export.
Identifier scheme characteristics comparison
|Shared Name||yes||yes||yes||? (*)||? (*)|
|MIRIAM URI||yes||yes||yes (*)||yes|
(*) If you move your mouse above the cell marked with an asterisk, you'll be able to see more information.
Whether or not two different identifiers (in a given scheme) can be assigned to the same entity.
For example, one can find the following (different) PURLs which identify the same thing:
Represents the capacity for an identifier (employing a particular scheme) to be resolved to physical location(s) on the World Wide Web. Even though many of these solutions were initially designed to simply identify a piece of information, it is now becoming increasingly important to be able to access the data or information about the data.
URLs based technologies can theoretically directly handle this kind of access. However, solutions like Shared Names and PURLs still requires the usage of a resolver; which provides the necessary redirection work between the persistent URL and the non persistent one actually used by a specific resource. Also, solutions based on PURLs (so potentially true for Shared Names too) have another, more important, drawback: there can be only a one to one mapping between the identifier (a URL in this case) and the physical location where the information can be obtained.
Solutions relying on a resolver can theoretically (and in the case of MIRIAM Registry, actually do) handle a one to many mapping between the identifier (a URI) and the physical locations (URLs) which serve the data. The key difference between these two methodologies lies in whether the identifier for the information is separated from the resource that actually provides that information. Schemes such as PURL lock the two pieces of information together, hence are only able to provide a one-to-one mapping. URI Schemes, such as MIRIAM, dissociate the identifier from the physical resource, facilitating a more flexible one-to-many mapping.
All solutions providing resolvability are initially liable to the single point of failure situation. But all of these solutions can overcome this issue by setting up mirrors of their resolvers.
Describes the procedure by which an identifier for a particular scheme can be created. We mainly focus on the openness (publicly available and accessible) and cost of this creation procedure. The timeliness of the process is not considered.
Whether there exists a central system/procedure to ensure that the identifiers created adhere to their scheme. This should includes various elements, some related to the scheme (such as validity, consistency, coherency, etc.), others related to the attached physical locations (such as existence, maintenance, etc.).
It is important, when comparing such technologies, to consider the coverage of these solutions: which datasets can be identified. Unfortunately, such measure is not easy to obtain for most solutions, except for LSRN and MIRIAM Registry.