Documentation

efovalidator validates the Experimental Factor Ontology (EFO). It reads EFO in OWL-format, validates the contents and writes a log file of validation errors. Optionally, some simple errors and inconsistencies are fixed.

Description

efovalidator detects various types of error and inconsistency that can creep into an ontology through manual editing and automated updating. This includes syntactic issues such as malformed URIs, and semantic issues such as clashes between labels and synonyms. It can also perform a set of supplied tests for whether specific classes and relations exist. Optionally, it will perform some simple error correction. efovalidator can be incorporated into an ontology production process to assist with quality assurance, highlighting errors for correction that might otherwise go uncorrected.



Availability

Download efovalidator from http://www.ebi.ac.uk/fgpt/sw/downloads/.



Usage

efovalidator [-h] [-i file] [-u url] [-l log] [-t file] [-o file] [-c file] [-n file] [-v] [-x]

efovalidator is invoked using the script validate.sh. For example, to read EFO from the file efo.owl, validate EFO and log validation message to efo.log:

$ validate.sh -i efo.owl -l efo.log

When you first use efovalidator, you must generate validate.sh for your specific environment using the bundled script build.sh, which requires the path to java and efovalidator installation directory, e.g.:

$ build.sh /usr/bin/java/ /usr/local/efovalidator/

build.sh creates validate.sh which you can then use.

Command-line options

OptionAlternativeValueDescription
-h--helpPrint (to stdout) usage information and the bug-reporting address, then exit.
-i--infilefileRead EFO from OWL-format file
-l--logfileValidate EFO and log validation messages to file
-o--outfilefileFix validation errors and write a fixed EFO to OWL-format file
-t--testfileUnit test EFO. Unit testing messages will be logged to the file specified by -log
-u--urlurlRead EFO in OWL-format from url
-c--capitalsRead valid terms with capitals from file.
-n--namespacesRead valid namespaces from file.
-v--verboseTurn on output (to stdout) with summary of validation report.
-x--xrefProcess cross-references to other ontologies.

Modes of Operation

ModeDescriptionOptionDescription
ValidateValidate an ontology-lValidation message log file
Validate and fixValidate and fix ontology-oFixed (output) ontology file
Unit testUnit test an ontology-tUnit test definition file

Exit status

The application returns the following values to the operating system:

  • 0 (No critical errors but possible warnings)
  • 1 (Command invoked cannot execute validation)
  • 3 (Critical errors)

A warning is raised when a class synonym is non-unique or duplicates a label. All other failed validation checks are "critical".

Depending on the status, efovalidator will print to <stdout> on of the following messages:

  • "No critical validation errors"
  • "Command invoked cannot execute validation"
  • "Possible validation error(s). Please see the log file: logFileName"
  • "Critical validation error(s). Please see the log file: logFileName"



Usage examples

To see a list of all command-line options, run efovalidator with the -h option:

$ ./validate.sh -h
usage: efovalidator
-c,--capitals <file> Read valid terms with capitals from <file>.
-h,--help Print (to stdout) usage information and the bug-reporting address, then exit.
-i,--infile <file> Read EFO from OWL-format <file>.
-l,--log <file> Validate EFO and log validation messages to <file>.
-n,--namespaces <file> Read valid namespaces from <file>.
-o,--outfile <file> Fix validation errors and write a fixed EFO to OWL-format <file>.
-t,--testlog <file> Unit test EFO and log unit testing messages to <file>.
-u,--url <url> Read EFO in OWL-format from <url>.
-v,--verbose Turn on output (to stdout) with summary of validation report.
-x,--xref Process cross-references to other ontologies.

To validate EFO loaded from an input file:

$ ./validate.sh -i efo.owl
___ADD_SESSION_HERE___

To validate and fix EFO loaded from an input file:

$ ./validate.sh -i efo.owl -o efo.fix
___ADD_SESSION_HERE___

To unit test EFO loaded from a URL:

$ ./validate.sh -u
http://bar.ebi.ac.uk:8080/trac/export/head/branches/curator/ExperimentalFactorOntology/ExFactorInOWL/releasecandidate/efo_release_candidate.owl -t efo.tests
___ADD_SESSION_HERE___

To unit test and validate EFO loaded from URL (URL and log file specified in properties file):

$ ./validate.sh -u -t efo.tests -l
___ADD_SESSION_HERE___



Notes

Validation checks

Validation checks will include:

  • The EFO input file is in valid OWL format.

  • A single ontology version number is specified, in an <owl:versionInfo> element, e.g.:

    <owl:versionInfo>2.22</owl:versionInfo>

  • Date is specified in an <rdfs:comment> element with a value prefixed by Date:, e.g.:

    <rdfs:comment>Date: 15th March 2012</rdfs:comment>

  • Every class ID is a URI formed from a valid namespace name. Valid namespaces include the default (EFO) namespace and the namespaces of acceptable imports:

    • http://www.ebi.ac.uk/efo/
    • http://purl.org/obo/owl/CL#
    • http://purl.org/obo/owl/PATO#
    • http://www.w3.org/2002/07/owl#
    • http://purl.obolibrary.org/obo/
    • http://ourl.obolibrary.org/obo/
    • http://purl.obolibrary.org/obo#
    • http://purl.org/obo/owl/OBO_REL#
    • http://www.w3.org/2001/XMLSchema#
    • http://purl.obolibrary.org/obo/GO_
    • http://purl.org/obo/owl/NCBITaxon#
    • http://www.ifomis.org/bfo/1.1/snap#
    • http://www.ifomis.org/bfo/1.1/span#
    • http://www.obofoundry.org/ro/ro.owl#
    • http://orange.ebi.ac.uk:14086/urigen#
    • http://www.w3.org/2000/01/rdf-schema#
    • http://www.w3.org/1999/02/22-rdf-syntax-ns#
    • http://www.geneontology.org/formats/oboInOwl#
    • http://www.orphanet.org/rdfns#

    The above are default valid namespaces, but these can be configured.

  • EFO IDs used in class IRIs are valid. IRI fragment of IRIs in EFO namepace must have the form:

    EFO_XXXXXXX
    where XXXXXXX is a 7-digit number.
  • Annotation of EFO URIs are valid. Where a class was imported from an ontology and replaced an existing EFO class, the new class ID will use the imported namespace and the old id (an EFO URI) may be retained as an annotation, using the <EFO_URI> element, e.g.:

    <EFO_URI>http://www.ebi.ac.uk/efo/EFO_0003190</EFO_URI>

    In such cases, the <EFO_URI> annotation must include the EFO namespace and a valid EFO ID which does not exist in EFO.

  • Every class has a single label, specified in the <rdfs:label> element, e.g.:

    <rdfs:label>norfloxacin</rdfs:label>

  • Class labels are unique.

  • Class synonyms are unique. Synonyms are defined in the <alternative_term> element, e.g.:

    <alternative_term>SAM</alternative_term>

  • Class labels and synonyms collectively are unique. A synonym for a class should never be the label of another class.

  • There are not an excessive number (currently 20) of class synonyms.

  • Obsolete classes are marked as organizational using the <organizational_class> element as follows:

    <organizational_class>true<organizational_class>

    Note that True and TRUE are also acceptable.

  • Annotation (on the ontology or a class) has no value, e.g.

    <alternative_term></alternative_term>

  • Annotation (on the ontology or a class) has no leading or trailing whitespace.

  • Term capitalisation. Labels or synonyms with capitals are now reported (as non-critical errors).

Unit tests

Unit tests specified in the unit test definition file are performed. These may include checks of whether a set of OWL classes persist, and checks whether a sub-class relationships are defined.

Term capitalisation

efovalidator can be configured with a list of valid capitalisations (which are allowed to slip through unreported):

validate.sh -i efo_release_candidate.owl -c efo.caps

where efo.caps looks like this:

C28H45O2R MCF7 cell CV system

This allows reporting on trivial variations in capitals e.g. "Drug Usage" or "Cancers, Stomach" but not on valid, significant cases, e.g. "C28H45O2R", "MCF7 cell", "CV system" etc.

Case sensitivity when checking terms

The checks for unique terms (labels and synonyms) are case-sensitive within a single class, but case-insensitive between classes. This is to prevent erroneous reporting of label:synyonym duplication within a single class where only the capitalisation differed. Case-sensitivity between different classes, although useful in some cases, isn't implemented as it would have meant not missing many bad duplication.



Configuration

efovalidator may be configured using Java properties defined in a file (.efovalidator.properties) in the invocation directory, e.g.:

efovalidator.infile = efo.owl
efovalidator.url = http://bar.ebi.ac.uk:8080/trac/export/head/branches/curator/ExperimentalFactorOntology/ExFactorInOWL/releasecandidate/efo_release_candidate.owl
efovalidator.outfile = efofix.owl
efovalidator.log = efo.log
efovalidator.testlog = efotest.log
efovalidator.verbose = false
efovalidator.xref = false

This provides default values for the command-line options. For example, given the properties file above, the following command-line would read EFO from efo.owl and write a log file called efo.log:

$ ./validate.sh -i -l

The properties correspond to and are overridden by the command-line options of the same name. For example, here EFO will be read from efo2.owl; the value set using -infile takes precedence over the value of efovalidator.infile:

$ ./validate.sh -i efo2.owl -l

The properties file is written (to the current working directory) whenever efovalidator is run. The values written are those set on the command-line in the last run, or default values (if not set).

Developers downloading the code may also configure using Spring XML. The Spring configuration has a lower precedence than both command-line options and java properties. Default property values are defined in a bundled file (efovalidator.properties).

Valid namespaces

You can configure the valid namespaces like this:

validate.sh -i efo_release_candidate.owl -n efo.ns

where efo.ns looks like this:

http://www.ebi.ac.uk/efo http://purl.org/obo/owl/UO http://purl.org/obo/owl/CL http://www.orphanet.org/rdfns



Data files

Input file format (OWL-format ontology)

efovalidator requires an input file in any of the standard serialization formats of the Web Ontology Language (OWL).

Input file format (unit test file)

The unit test definition file format is as follows:

___THIS_WILL_CHANGE_VERY_SOON___

Output file format

The log file format is as follows:

___THIS_WILL_CHANGE_VERY_SOON___



See also

There are a couple of validators that perform complementary (syntactic) checks:



Issues and support

Issues and feature requests

To request a new feature or if you think you've found a bug, please use the JIRA Tracker.

Support

If you need help using this tool, please email fgpt@ebi.ac.uk.

Contact

For more information or to get involved please email Jon Ison.



Acknowledgements

This tool was developed by Jon Ison and the EBI Functional Genomics Production Team.

We gratefully acknowledge the support of our funders.



Software

spacer