Executive/public summary
EMBL-EBI supports open science and believes that open access to data is a key driver for scientific discovery. To strengthen open science and minimise barriers to reuse, EMBL-EBI is setting out a roadmap to rationalise licence information on its data resources in the context of Findable, Accessible, Interoperable and Reusable (FAIR) data principles. EMBL-EBI adds value to the data submitted to its resources and makes data FAIR through curation, annotation, and by linking it to other relevant data resources.
EMBL-EBI hosts 45 data resources, each with its own governance structures, history, technical landscape and partner institutes. These factors need to be taken into consideration when making fundamental decisions on matters such as licensing.
The majority of EMBL-EBI data resources use the institute’s Terms of Use, which is a combination of statements on our commitment to open science and defining expected behaviours around using the data EMBL-EBI makes available. The Terms of Use also clarify that EMBL-EBI does not impose any additional restrictions on the use of data over those provided by the data owner. Therefore, it is possible to employ the EMBL-EBI Terms of Use in combination with more specific data licences.
Our commitment
EMBL-EBI will minimise barriers to reuse of data in EMBL-EBI resources by adopting the Creative Commons (CC) license framework across all its data resources in the next 5 years.
We will:
- Standardise the licences used across EMBL-EBI resources.
- Use licences that present the lowest barriers to data reuse. CC0 is preferred over CC-BY. CC0 is most in line with the spirit of the EMBL-EBI Terms of Use.
- State the licence explicitly on the resource as a whole and at the record level, in both human and machine-readable formats.
In some cases EMBL-EBI’s Terms of Use may have to be retained (for example, for a data resource that aggregates content with existing multiple licence types).
Citing data
EMBL-EBI strongly encourages its users to cite the appropriate data sources as a matter of good scientific practice; the use of CC0 licences only means that we are not making it a legal requirement to do so.
Why CC0?
- It is most in line with the spirit of EMBL-EBI’s Terms of use and places data in the public domain without constraints. We believe that this approach to research data sharing strengthens open science and scientific progress.
- It is the best way to encourage remixing and reuse as it makes clear to any user – academic, commercial or otherwise – that the data are not owned by anyone and therefore can be used freely.
- It saves researchers time when reusing the data, which speeds up the process of science.
Note:
It is possible that at a resource level, content on the website infringes copyright (unknown to the resource). Should such content be identified, the offending content is removed in response to a “take-down” request from the third party or legal representative, which resolves the issue.
Dataset-level licencing
EMBL-EBI will take steps to:
- Include CC licences in development roadmaps and in conversations within the institute’s governance structures.
- Encourage homogeneity of licences for datasets within a data resource. For example, encourage submitters to accept a CC0 licence rather than offering a choice.
- Make the licence machine readable.
- Make the notation standard across EMBL-EBI.
- Use EMBL-EBI Terms of Use to cover back records that may not be individually licensed.
Note:
Dataset level licensing allows for variable licensing if required, but this does create its own challenges for reuse and remixing, so should be a last resort.
Monitoring progress towards standardised licensing
In 2021, we set our ambition to move towards dataset-level CC0 licensing wherever possible. The licensing status at the start of 2021 across our main resources is shown in yellow below, with the status in 2024 shown in green:

This shows that, of the 41 EMBL-EBI-hosted resources analysed:
- 68% of resources now use primarily Terms of Use-based or CC0 licensing
- 22% use CC-BY licensing
- Between 2021 and 2024, three new data resources were added (AlphaFoldDB, CancerModels.org and DECIPHER), while three resources retired and were consolidated into other databases (ArrayExpress, Enzyme Portal, IntEnz). The Hugo Gene Nomenclature Committee will be administered by University Of Cambridge from 2024, but is reported here.
- The remaining 2 resources with non-standard licensing are
- DECIPHER – an interactive web-based database which incorporates a suite of tools designed to aid the interpretation of genomic variants. As DECIPHER contains data derived from clinical data, additional restrictions on use are put in place by data owners in order to protect patient privacy.
- SureChEMBL – a resource containing compounds extracted from the full text, images and attachments of patent documents. Additional restrictions on use are put in place by the data owners who licensed their data for use in SureChEMBL.
The aim of the EMBL-EBI Terms of Use are to ‘impose no additional restriction on the use of the contributed data than those provided by the data owner’. Transitioning to greater adoption of CC0 licensing, many resources now contain a mixture of differently licensed records – for example, BioStudies and BioImage Archive make many datasets available under the Terms of Use, but apply per-dataset CC0 or CC-BY licensing where this is the wish of the data owner.
This allows implementation of CC0 licensing where data owners are ready to do so, while respecting the requirements of data owners or collaborating partners who want to make their data openly accessible, but are not yet able or ready to move to cc0, offering the permissive Terms of Use as a next-best option.
Working with our collaborators and delivery partners, we will continue to advocate for and take forward adoption of CC0 licensing wherever possible, and will continue to provide updates on progress on the journey towards more standardised, more permissive licensing.
Edit