BOOTStrep - Bootstrapping Of Ontologies and Terminologies STrategic REsearch Project
BOOTStrep is a paneuropean and interdisciplinary project in the IST program of the EC's Sixth Framework.
The project started in April 2006 and will terminate in March 2009. The overall project budget is 4.2M €.
To exploit existing terminological resources in the biomedical domain to generate a new resource (the BioLexicon) based on a common representation framework. The BioLexicon includes terminology from domain-specific literature to fill gaps in existing resources and is interlinked to ontological concepts (the BioOntology).
To create a repository of biological facts from the literature (the FactStore) via automatic text processing. The FactStore is interoperable with the BioLexicon and the BioOntology.
To develop open access NLP tools for text-based knowledge harvesting in order to support information extraction and text mining in the biomedical domain.
Website for the Gene Regulation Ontology (GRO) is available now!
GRO has been developed as part of the BOOTStrep project and is hosted at the EBI. More information can be found
Beisswanger,E., Lee,V., Kim,J.J., Rebholz-Schuhmann,D., Splendiani,A., Dameron,O., Schulz,S., Hahn,U. Gene Regulation Ontology (GRO): Design Principles and Use Cases. Stud Health Technol Inform. 2008;136:9-14.
The Gene Regulation Ontology (GRO) has been submitted to the Open
Biomedical Ontologies (OBO) library!
The BOOTStrep Gene Regulation Ontology has been submitted to the Open
Biomedical Ontologies (OBO) library and is currently under review. By
now it can be found at
MedEvi is a novel search engine that retrieves and aligns sentences from Medline abstracts.
MedEvi has been developed as part of the BootStrep project. The search engine identifies sentences in Medline abstracts that contain the query terms. All sentences are sorted, prioritized and aligned according to the query terms.
Kim,J.J., Pezik,P., and Rebholz-Schuhmann,D. (2008) MedEvi: Retrieving textual evidence of relations between biomedical concepts from Medline. Bioinformatics 2008 (open online access).
Evaluation of the Term Repository against standard corpora
The BootStrep consortium is developing a lexical resource, called the BioLexicon. In the current state, the core content called the Term Repository has been generated and exchanged with the partners to augment the content with terms from the literature (NaCTeM/UoM) and to feed the results into a database schema that fulfils standard requirements of a lexical resource (CNR, Pisa). The content of the Term Repository has been assessed against the corpus of the BioCreAtIve II / Task 1b challenge (gene name normalisation).
Pezik, P. Jimeno, A. Lee, V., Rebholz-Schuhmann, D. (2008) Static Dictionary Features for Term Polysemy Identification. In Proceedings of the Language Resources and Evaluation Conference (LREC-2008), workshop on "Building and evaluating resources for biomedical text mining". Marrakech (Morocco), 28-30 May 2008.
Access to the BioLexicon
The content of the BioLexicon is available in different formats:
XML interchange format (XIF): The collected terms are contained in special XML-formatted files and the whole set of files are called the term repository
. The different XIF files of the term repository can be accessed
The BioLexicon will also be available as dump of a relational database (MySQL). While the database dump has already been generated, it requires still some maintenance to produce the most efficient lean version of the database.
Several resources are available from the project partners (List and links will be updated till the end of November).