Provenance

It is good practice to not only make RDF data available but also include a dataset description that provides metadata about the dataset itself, so called VoID files. For datasets in Health Care and Life Science a w3c standard proposes the structure of these meta data files. We try to satisfy this standard with our dataset descriptions by enforcing the MUST and MUST NOT criteria while encouraging our data providers to provide even more information about the datasets in their dataset descriptions.

Structure of HCLS VoID files

Every dataset description consists of at least three levels: Summary, version and distribution level. The summary level provides a description of a dataset that is independent of a specific version or format. The version level captures version-specific characteristics of a dataset and the distribution level captures metadata about a specific form and version of a dataset.

Summary level

Mandatory fields for the summary level:

<http://purl.org/dc/terms/Dataset>
<http://purl.org/pav/hasCurrentVersion>
<http://purl.org/dc/terms/title>
<http://purl.org/dc/terms/publisher>
<http://purl.org/dc/terms/description>

The summary level is NOT allowed to these fields:

<http://rdfs.org/ns/void#dataDump>
<http://purl.org/dc/terms/creator>

Version level

Mandatory fields for the version level:

<http://purl.org/dc/terms/Dataset>
<http://purl.org/dc/terms/isVersionOf>
<http://purl.org/dc/terms/hasDistribution>
<http://purl.org/dc/terms/title>
<http://purl.org/dc/terms/description>
<http://purl.org/dc/terms/creator>
<http://purl.org/dc/terms/publisher>
<http://purl.org/pav/version>

The version level is NOT allowed to have:

<http://rdfs.org/ns/void#dataDump>

Distribution level

Distribution Level is defined through

<http://rdfs.org/ns/void#Dataset>
<http://rdfs.org/ns/void#dataDump>
<http://www.w3.org/ns/dcat#Distribution>
<http://purl.org/dc/terms/title>
<http://purl.org/dc/terms/description>
<http://purl.org/dc/terms/creator>
<http://purl.org/dc/terms/format>
<http://purl.org/dc/terms/license>
<http://purl.org/dc/terms/publisher>

Example VoID Files

The VoID files that drive the RDF-platform can be accessed through our github repository or through the SPARQL endpoint (see Datasets page). Each named graph in the database has a corresponding named graph with the _void.ttl ending. For more information on the naming schema for named graphs read our page on using Sparql.

Additional information

This structure represents the MUST and MUST NOT fields described in the standard. We enforce these fields to be present in the descriptions of datasets that are part of the RDF platform. We encourage people to also implement the SHOULD and MAY requirements, even though we do not enforce this at the moment.

For more information on dataset descriptions, once more, we refer to the w3c standard about HCLS dataset descriptions. An online validator of the standard using Shape Expressions (ShEx) can be found at here.