DUO: streamlining access to biomedical datasets

Data Use Ontology (DUO). Credit: Karen Arnott/EMBL-EBI

DUO: streamlining access to biomedical datasets

10 Nov 2021 - 11:05

Summary

  • The GA4GH Data Use Ontology (DUO) provides standard terms and definitions for use of biomedical data obtained through informed consent
  • Standardising this vocabulary allows for automated access to sensitive biomedical data 
  • Biomedical research depends on shared access to human data and streamlining consent to use these data will facilitate discovery

10 November 2021, Cambridge – A large group of international experts within the Global Alliance for Genomics and Health (GA4GH), including users at EMBL’s European Bioinformatics Institute (EMBL-EBI) and the Broad Institute of MIT and Harvard, have developed the Data Use Ontology (DUO) standard. DUO annotations allow terms and definitions in data consent forms and data sharing policies to be standardised and machine readable. This helps in streamlining consent to use controlled-access biomedical datasets.

Human biomedical datasets often contain sensitive information about individuals. Consequently, the owner of the data must take great care when these data are being made available for secondary use by other researchers. Access and sharing of such data need to comply with legal, ethical, and informed consent rules and regulations. Navigating this complex landscape can slow research down as researchers require permission to access sensitive datasets. 

Infographic showing DUO in the centre. Either side are graphics to represent people both depositing and requesting data via DUO.
Infographic showing how the DUO is used for depositing and requesting data. 
 

Using the Data Use Ontology (DUO) standard will streamline these processes by supporting a data authorisation and access framework for granting researchers permission to reuse biomedical datasets based on their credentials and research purposes. As described in a recent publication in a special issue of Cell Genomics, DUO, together with the GA4GH Passport standard, accelerates responsible sharing of biomedical datasets worldwide.

Moving away from manual workflows

To retrieve and reuse biomedical datasets, researchers must submit a data access request to Data Access Committees. This can be a long process as a typical workflow involves manual review of applications against the data use letter that specifies how the dataset can be used, to determine whether access should be granted.

“Research consent forms often do not provide clear information about biomedical dataset sharing. Not having a standard way of representing data use conditions can really slow research down when trying to re-use those datasets,” said Melanie Courtot, Metadata Standards Coordinator at EMBL-EBI and co-lead of the DUO development team. “Using the GA4GH DUO standard is a first step to help researchers around the globe access the biomedical data they need quickly and easily.”

“I’m incredibly thankful to the many contributors to DUO,” added Melanie. “This was a huge collaborative effort that received input from many international experts and users.”

A machine-readable vocabulary

DUO is a machine-readable standard vocabulary of data use terms, aiming at describing data use conditions and automating the matching of specific datasets against the data access requests. Pairing DUO with the GA4GH Passport further automates this step by providing a researcher's authentication and authorisation levels.

Screenshot showing a page from the EGA website displaying the DUO codes associated with the dataset named “Genetic landscape of inherited retinal dystrophies”.
An example of DUO codes associated with a dataset within the EGA. 
 

“Over 200,000 datasets worldwide have already been annotated with DUO terms and it has been successfully implemented in various institutions such as the Broad Institute and the Wellcome Sanger Institute,” says Tommi Nyrönen, Head of Node at ELIXIR Finland, and co-lead of the GA4GH Data Use and Researcher Identities Work Stream that developed the DUO. “There’s still more to be done; we want to include different data types and the Resource Entitlement Management System tool we are developing and deploying with EGA will leverage both DUO and Passport implementation.”

Further reading

This work was published as part of a Cell Genomics special issue focusing on the work of GA4GH. To find out more, see the list of articles below.

Source article

LAWSON, J., et al. (2021) The Data Use Ontology to streamline responsible access to human biomedical datasets. Cell Genomics. Published online 10 11; DOI:10.1016/j.xgen.2021.100028

Related articles in the Cell Genomics GA4GH special issue

REHM, H.L., et al. (2021) GA4GH: international policies and standards for data sharing across genomic research and healthcare. Cell Genomics. Published online 10 11; DOI:10.1016/j.xgen.2021.100029

THOROGOOD, A., et al. (2021) International Federation of Genomic Medicine Databases Using GA4GH Standards. Cell Genomics. Published online 10 11; DOI:10.1016/j.xgen.2021.100032

VOISIN, C., et al. (2021) GA4GH Passport standard for digital identity and access permissions. Cell Genomics. Published online 10 11; DOI:10.1016/j.xgen.2021.100030

CABILI, M.N., et al. (2021) Empirical validation of an automated approach to data use oversight. Cell Genomics. Published online 10 11; DOI:10.1016/j.xgen.2021.100031

WAGNER, A., et al. (2021) The GA4GH Variation Representation Specification (VRS): a Computational Framework for the Precise Representation and Federated Identification of Molecular Variation. Cell Genomics. Published online 10 11; DOI:10.1016/j.xgen.2021.100027

Contact the news team

Vicky Hatch | Communications Officer

vhatch@ebi.ac.uk

Oana Stroe | Senior Communications Officer

stroe@ebi.ac.uk

Subscribe to the email newsletter

Subscribe to our publications.

Sign up Or stay updated with the RSS feed (EMBL-EBI only).