Frequently Asked Questions

This FAQ addresses many of the common questions regarding the Registry and its relationship with Identifiers.org .

General questions and definitions

What is the Registry?

The Registry provides the necessary information for the generation and resolution of unique and perennial identifiers for life science data. Those identifiers are of the URI form and make use of Identifiers.org to provide direct access to the identified data records on the Web. Examples of these identifiers:

In order to fulfil this task, the Registry lists data collections and is composed of several services and resources:

The Registry is available at: http://www.ebi.ac.uk/miriam/.

Additionally, the Registry hosts the core information required to the resolving infrastructure Identifiers.org.

What is MIRIAM, and what does it stand for?

MIRIAM is an acronym for the Minimal Information Required In the Annotation of Models. It is important to distinguish between the MIRIAM Guidelines, and the Registry. Both are part of the wider BioModels.net initiative.

Whilst originally designed to meet the needs of the computational modelling community, the Registry has previously been referred to as the 'MIRIAM Resources' and 'MIRIAM Registry'. Since then, to reflect its increased scope and diverse utility, it is currently known as the 'Registry' or 'Identifiers.org Registry'.

What is a data collection?

A data collection is defined as a set of data of the same kind, which usually originates from a specific data provider or is associated with a particular database. This primary data set may be available from a number of different physical locations on the Web or 'resources', but the core or essential information content across all those resources should be identical. By referring to a 'collection', we are able to reference this data set independently of where it is located. For example, protein entities could be referred to using UniProt, while a book could use ISBN. These are both 'data collections'.

The Registry stores for each data collection the list of resource(s) (or physical locations) that provide access to the data. Additional information is recorded; for more details, see below.

The scope or domain of each collection must be strictly defined: where a data provider references many types of data (for example genes and proteins), the collection scope must be clearly demarcated. On way to achieve this is by 'sub-classing' the different collections which may be provided by a particular data provider.

Note: data collections were historically known as 'data types', but this nomenclature was changed as its meaning was ambiguous.

Why can't I find my favourite data collection in the Registry?

There are two main methods, besides browsing, that can be used to determine whether a particular collection exists in the Registry database; a keyword-based search function, and a tag & collection cloud. Both methods are described below. Failure to locate a collection could be due to one of the following reasons:

Are there plans to include more data collections?

The content of the Registry is based on users needs. Therefore new data collections are primarily submitted by individual users or projects. However we do perform an internal curation: submitted data collections are retained in the curation pipeline until any issues preventing their publication are addressed, or else can be made public through the association with various restrictions.

Additionally we also collaborate with various projects, which leads to the submission of whole batches of collections in order to cover specific requirements. For example, in collaboration with the Bio2RDF effort, we have incorporated a large proportion of data collections from their listings into the Registry.

What is a resource?

We refer to resources as services which provide information about entities or elements from a collection. To be recorded in the Registry, those should be accessible on-line. For each collection there are potentially many resources which may exist to provide information regarding their constituent entities. Some directly mirror the main resource, while others may provide slightly different views on the same core information. For example, PubMed is a collection, has a main resource, as well as others, through which the same information can be accessed. Where possible, the Registry identifies the 'primary' resource.

Which collections are suitable for the Registry?

There are a number of requirements for a collection to be suitable for addition to the Registry. Collections which do not comply with these requirements can still potentially be incorporated, but will likely be associated with restriction(s). Each rule and rationale for its necessity is listed below:

What are 'restrictions'?

Restrictions indicate the potential limitation(s) of (re)using the URIs identifying entities from some data collections. Collections with restrictions can be used (for example in cross-references), but the user must be aware that these will be limited in some way, and that it may be advisable to find a more suitable alternative collection. Collections with such restrictions are clearly indicated in the Registry. The current list of restrictions is as follows:

What information is stored in the Registry?

The Registry records for each data collection the following information:

In addition, most of the elements recorded by the Registry (such as collections and resources) have unique identifiers associated to them.

Identifiers.org

What is Identifiers.org?

Identifiers.org provides a complete identification infrastructure. This system relies on an identification scheme using resolvable (which can be used online) URIs.

The aim is to provide a community-agreed means to create stable, perennial and globally unique identifiers, which can be used to reference data with a primary focus on life sciences. Additionally, the system decouples the identification of records from the physical locations on the web where they can be retrieved.

What are Identifiers.org URIs?

They are stable, perennial and globally unique URIs which can be used to annotate a wide range of entities or concepts from a plethora of fields, including proteins, diseases, publications and ontological terms. They can be used unchanged for a wide range of tasks, as they are both unique and directly usable online.

Identifiers.org URIs are persistent since they shield the user from changes in the underlying, resource specific, URLs.

Example: http://identifiers.org/pubmed/23584831, which is the URI that identifies the latest publication about this system in PubMed.

Identifiers.org URIs are composed of three main parts:

Additional examples, showing the different types of URIs:

This URI scheme relies on a list of community agreed 'namespaces' which are recorded in the Registry. Please refer to the list of detailed examples for more information about how to use the URIs.

Why do we need Identifiers.org URIs?

The use of URIs allows one to:

Resolvable persistent URLs

With the progress of Linked Data and Semantic Web initiatives, as well as feedback and suggestions from the community, it is necessary to provide identifiers in the form of resolvable URIs (URLs). These persistent URL forms are better suited to the Linked Data vision, and provide users the clickable links for which they expressed a desire.

How to generate Identifiers.org URIs?

The first step is to make sure the data you need to identify comes from a collection which is already recorded in the Registry. You can launch a search to find that out. If the collection is not present in the Registry, then you will need to submit it.

Then, there are 2 main ways to create the URIs you need.

Option 1: use the Registry web services to generate the URIs for you. There are 2 main methods which can help this: getURI and convertURN.

Option 2: retrieve the necessary 'namespace' of the data collection record from the Registry and just build up the URI with any tool you want. Please refer to the structure of Identifiers.org URIs for more details.

What types of legacy URIs exist?

Initially, the Registry associated up to two official URIs with each collection: one URN and one URL. These URIs were based on some existing identifiers (for example LSIDs) and some domain names, whenever possible. This can still be observed in the list of deprecated URIs maintained for the collections. However, it became apparent that there were various issues with these inconsistent URIs.

During the Super-hackathon about Standards and ontologies for Systems Biology in Okinawa, Japan (January 2008), there was a consensus to provide only URNs. Those were of the form: urn:miriam:pubmed:16333295. For more information about the URN Namespace that was previously used by the Registry, please refer to the RFC, also available from SourceForge.

Since 2011, these URNs have been superseded by Identifiers.org URIs. The later a built in a very similar way (they share the same 'namespaces') with the benefit that Identifiers.org URIs do not require any encoding of the entity identifier (the one supplied by the data provider). Both forms, URNs and URLs, are inter-convertible and the Registry's web services can perform this task.

Create, edit and update collections

How are new collections added to the Registry?

It is important to note that the content of the Registry is based on users needs. The curators may add new data collections from time to time as they encounter them, but generally the creation of new collections should be considered as being 'on demand' from the community. Therefore new data collections are primarily submitted by individual users or projects as they need them.

The main method by which the community requests the addition of a new collection is through the SourceForge request page. This submission process is open to absolutely everyone. There are some suggestions on how make a submission.

Community requests of course may entail the incorporation of large sets of collections, in this case, we suggest you to contact us, so that we can provide you a more convenient way to handle your submissions.

In all cases, we do perform an internal curation: submitted data collections are retained in the curation pipeline until any issues preventing their publication are addressed, or else can be made public through the association with various restrictions.

What is the best way to make a submission to the Registry?

There are a number of checks that can be done in advance to speed entry of a requested collection into the Registry:

Once the form is completed, and the request submitted, an online confirmation of your request is displayed in the browser window. Please do not expect to receive an email confirmation of your request, particularly if you did not provide an email address with the submission form.

What is the process by which submitted collections are made publicly available for use?

Once submitted, requested collections enter a curation pipeline. This is to ensure suitability for use in annotation, and to confirm that such use would not infringe copyright and license issues of the corresponding resource. In effect, the resource is evaluated against the principles for Registry inclusion. In some cases the curator may wish to contact the submitter to clarify some aspect of the submission, for example to determine which particular collection the user would like to be able to reference if the submission is ambiguous. Also, the curator may contact the provider of the resource to clarify any issues, for example with identifier range, or with copyright issues. Once this process is completed, the new collection is 'published' in the Registry, for use by the community.

Who maintains URIs and their associated data?

As with submissions, it is important to note that collection entries are not updated automatically, should their associated resources change. We endeavour to regularly check the validity and access to the associated resources, and do our best to keep information it up to date. However, if you are are frequent user of a particular collection, and notice an issue with, for example, dereferencing an Identifiers.org URI, please let us know. Similarly, if you are the maintainer of a resource and you are aware of some upcoming change that would affect our systems, please contact us and let us know. For this purpose, you can use our support form. Thank you in advance for your assistance.

How do you modify an existing collection?

Absolutely anyone can request the modification of an existing collection. To do so, one must locate the collection of interest, for example through browsing. Once the details of a collection are displayed, a 'Suggest modifications to this data collection' link is displayed for each page. This brings up a form containing an editable version of the completed information for that collection. Modifications can be freely made here, to specify precisely any suggested changes. Additional information that is not suitable for entry within the fixed fields of this form may be supplied within the 'User notes' field. Suggested modifications need to be confirmed by a curator prior to publication to the main site.

Who can delete an Identifiers.org URI or collection?

Identifiers.org URIs entered in the Registry are not subject to complete removal, but may become deprecated. This preserves legacy references that utilised the deprecated URI, without the need to specifically revisit those and update them. Deprecated URIs therefore can still be used to access the same information, but any new annotations created should use the latest URI roots stored in the Registry. Any existing deprecated URIs are clearly listed in the main entry page for each collection.

Collections entered in the Registry also cannot be deleted completely, but can be deprecated if it is not maintained any more and the underlying data becomes unavailable. In this case, collections are clearly marked as being deprecated, and users are recommended to use the alternatives that are suggested, or find another collection with which to generate Identifiers.org URIs.

Services and availability

How do I find a collection suitable for a particular kind of annotation?

Since the number of collections listed in the Registry is continuously increasing, it can become difficult to find a suitable collection to annotate a particular kind of entity, for instance a 'protein'. In addition, there may be a substantial number of suitable collections with which one could generate an appropriate Identifiers.org URI. To address these issues, a tagging system is used in the Registry: each of the registered collections in the Registry is associated with one or more tags or categories, each of which indicate the kind of data provided by a particular collection. For instance, the Uniprot collection is tagged with 'protein' and 'sequence'. The tags associated with a particular collection are listed on the main collection page, and are represented as icons.

There are two ways in which the listed collections can be searched:

Are there any plans to improve the tagging system?

Currently, the tag system is based upon a set of approximately 40 terms that were created in ad hoc fashion, when the system was implemented. This is sufficient in number to allow the association of each collection registered with two or three tags. The tags are of a coarse granularity, describing resource and collection specific information such as the type of data recorded ('sequence', 'expression', 'phenotype'), the subject of those data ('gene', 'protein', 'drug'), the domain area to which they relate ('disease', 'pharmacogenomics', 'neuroscience'), and taxonomic associations of the data ('mammalian', 'human'). The purposes of the tags at this gross level is to allow users to quickly locate suitable collections, using high level keywords.

Additional tags can be created as needed by curators, or when requested by users. In addition, resource and collection descriptions will be further enhanced by the incorporation of information from ontologies such as the Biomedical Resource Ontology. These modifications will allow users to ascertain the appropriateness of use of particular data collections, and will improve existing search facilities.

We would welcome any suggestions for additional tags which would be useful to the community, as well as for tag information that could be added to existing collections. Such requests can be submitted as described, making use of the 'User notes' section.

Is there any way to determine the reliability of a resource before I use it?

Some resources are more reliable than others, for the same collection. Therefore, users may wish to make a choice as to which resource to use. For this reason, we have implemented a 'Health Check' status and history for the resources listed in the Registry. This allows one to differentiate the stability or reliability of the various resources recorded in the Registry. The features of this utility are listed below:

The Health Check status is meant only to signify broadly the uptime of a particular resource, and is in no way to be perceived as an accurate depiction of true reliability. For example, our system is unable to reliably detect uptime for pages whose content is loaded via JavaScript, and these will be listed as 'probably up'. In addition, some resources may be slow at the particular time of day when our checks are executed, and consequently may be listed as 'down', while they are actually 'up'.

The colour-coded display of uptime in the summary table follows this scheme:

For Registry curators, this 'Health Check' also provides an early warning system for changes made to collections listed in the the Registry. This allows a rapid update of the information held, for example, where servers have moved to a new location, or where access URLs or identifiers have been modified. This minimises connectivity issues and other disruptions that would otherwise be the result for users of our systems.

We welcome feedback from users of this system, and on any further suggestions on how it could be improved. We also encourage data providers to contact us if they have any concerns on the information we display, and on ways this could be improved in the future. We can be contacted through the contact page.

Why can I not find a health check for my resource?

Once a resource is registered against a particular data collection, there is a small delay before the newly incorporated information is used to perform routine health checks. Since the health checks are performed daily at a particular time, it is usual for these newly registered resources to appear in the automated checks the subsequent day. Hence, there is potentially a 24 hour delay between the registering on a new resource, and its incorporation into the health check system.

Which services are available?

The Registry provides various services to handle URIs, particularly for resolving URIs:

Using the web services

The easiest way to use the Web Services provided by Registry is to use the provided Java library. The documentation gives full examples of use.

For information about the usage of the REST Web Services, please refer to the documentation page.

Where can I get the source code?

The source code for the Registry Library is available, under the terms of the GNU General Public License, through the SourceForge project.

The source code of Registry Web Services and Registry Web Application is available, under the same license, in the SVN repository of the SourceForge project.

Please note that SourceForge this is not our main hosting repository (we use the EBI internal Subversion system), therefore the code available from SourceForge might be out of date. Please contact us if you have any enquiries about the software and its source code.

Questions about users and curators

Who are the Registry curators?

Curators are users from the BioModels.net initiative who have curation rights on the Registry.

Which general features are available to Registry users?

All users can perform several actions through the web interface:

Registry citation and contact information

How to cite the Registry and Identifiers.org

The following list of publications can be cited to reference Identifiers.org, the Registry and the associated services:

How to contact us

If you have any queries or concerns that are not addressed on this FAQ, you can contact us for further assistance using the links on the contact page.