spacer

NET projects' details

BioInvestigation Index

An infrastructure to commonly represent and store the experimental metadata of biological, biomedical and environmental studies. (This project is closely related with the ISA-TAB format, ontology and MIBBI activities described below)

A PROTOTYPE INSTANCE IS UP AND RUNNING AT: http://www.ebi.ac.uk/bioinvindex

BioInvestigation Index

Rationale

Nowadays, researchers are able to perform multi-assay studies where the same sample is run through the full range of ‘omics (e.g., genomics, transcriptomics, proteomics, and metabolomics) and conventional technologies, in combination. For example, consider the reporting of a complex multi-assay study looking at the effect, on a number of subjects, of a compound by characterizing the urine metabolic profile (by mass spectroscopy), measuring liver protein and gene expression (by mass spectrometry and DNA microarrays, respectively), and conducting conventional histological analysis. It is pivotal that such complex metadata (i.e., sample characteristics, study design, assay execution, sample-data relationships) are reported in a standard manner to correctly interpret the final results (data) that they contextualize. Relevant EBI systems, such as ArrayExpress, Pride and ENA-Reads are built to store microarray-based, proteomics and next-generation sequencing -based experiments, respectively. These systems have different submission and download formats, diverse representations of the metadata and terminologies used. Nothing yet exists to archive metabolomics based assays and other conventional biomedical/environmental assays. The BioInv Index infrastructure aims to be a single entry point those researchers willing to deposit their multi-assay studies datasets, and/or easily download similar datasets.

Infrastructure components

  • ISAcreator, a Java standalone system designed to enable users to structure or edit the experimental metadata in ISA-TAB format and package it with the corresponding data files for submission to the BioInv Index database. Its core functionalities includes:
    • Excel-like functionalities and look, coupled with dynamic graphical view;
    • support for searching and using OBO Foundry ontologies accessed via OLS;
    • configuration module to define fields to comply to relevant minimal requirements and/or customization of ISA-TAB field's allowed values;
    • wizard to provide a knowledge driven assisted creation mode.
  • Import layer, a core component performing the following tasks:
    • reads and loads the ISA-TAB formatted experimental metadata into the BioInv Index database;
    • dispatches data files to other EBI databases, such as ArrayExpress, Pride and ENA-Reads, into the format their required;
    • stores in file archives data file for which where no repository yet exists (e.g. metabolomics and conventional assays).
  • BioInv Index database for storing the experimental metadata and associated data files:
    • its model is based on the ISA-TAB structure, and can be easily mapped to FuGE model;
    • a web interface will enable browsing, retrieval and search by experimental metadata.

Funding source - EU IP Carcinogenomics

This activity is funded by the EU Integrated Project Carcinogenomics (PL 037712), aiming to develop toxicogenomics-based predictive models in vitro for carcinogenic chemicals. Alternative tests refining, reducing or replacing (3 R's) animal use are foreseen by the new EU Chemical Policy REACH and the 7th Amendment to the Cosmetics Directive. The Carcinogenomics datasets will be submitted to the BioInvIndex, curated and stored in the relevant repositories. The datasets will make public at the end of the Carcinogenomic project and accessible via the BioInvIndex query interface.

Nutri-BASE

A set of generic and nutrition specific plugins for BASE, a widely used LIMS for the annotation and submission of array-based data to ArrayExpress

Rationale

Array-based nutrigenomics investigations are information intensive and no Laboratory Information Management System (LIMS) exists to annotate both the microarray assay and the conventional nutritional endpoints (e.g. biometrics, clinical chemistry measurements) and export MIAME-compliant information to ArrayExpress.

Plugins for BASE

We have developed Nutri-BASE, a web-service enabled LIMS, based on BASE v2 and customized to meet the annotation requirements of users working in the nutrition domain. In work closely with our userbase and the BASE core developers to create:

  • Controlled vocabulary (CV) manager
  • Tab2Mage import and export to ArrayExpress
  • Support to Affymetrix datafiles
  • Standalone normalization plugin based on RMAExpress
  • Web-service to access other NutriBASE instances remotely

The latest version can be downloaded from the Nutri-BASE project page.

Funding source - EU NoE NuGO

This activity is funded by the European Nutrigenomics Organisation (NuGO), an EU Network of Excellence making future nutrigenomics research easier. The partners have designed two NuGO custom-made Affymetrix Chip for mouse and human, and we have assisted by selecting Ensembl gene predictions. The experiments employing with these arrays have been annotated and stored in NuGO instances of Nutri-BASE. Ultimately these experiments and others published datasets will be exported to ArrayExpress.

Minimum Information for Biological and Biomedical Investigations (MIBBI)

A Portal for extant minimum information checklists (e.g. MIAME for microarray, MIAPE for proteomics) and a Foundry to promote gradual integration. (This activity is closely related with the BioInvIndex project described above)

Rationale

Throughout the biological and biomedical the need for prescriptive checklists specifying the key information to include when reporting experimental results is growing. However, such 'minimum information' (MI) checklists are usually developed independently, from within particular biologically- or technologically-delineated domains. Consequently, the full range of checklists can be difficult to establish without intensive searching, and tracking their evolution is non-trivial; they are also inevitably partially-redundant one against another.

MIBBI components

We work towards a web-based common resource for MI checklists to serve biomedical and environmental communities working in the area of functional genomics/systems biology, which routinely combines information from multiple biological domains and technology platforms. In detail we work closely with the MI checklists communities and NEBC to develop:

  • Portal, a 'one-stop shop' for researchers, journal editors, reviewers, and funders, providing a quick and simple way to discover (whether there are) guidelines for a particular domain, and what related resources might exist.
  • Foundry, supporting investigation of the boundaries, overlaps and gaps between the checklists and ultimately promoting creation of orthogonal modules and gradual integration
  • MIcheckout tool, compiling the required MI checklist starting from a set of orthogonal module

For a full list of international participants, and up to date developments, please refer to the MIBBI webpage and workshop series below.

Funding source - UK NERC Bioinformatics Center UK BBSRC

This activity is funded by the NERC Bioinformatics Center (NEBC). It started as a collaborative project between the Proteomics Standards Initiative (PSI), the MGED Reporting Structure for Biological Investigation (MGED RSBI), and Genomic Standards Consortium (GSC).

Ontology

Contribution to the collaborative development of OBI, an ontology for the description of experimental metadata, and naming conventions for ontology engineering under the OBO Foundry. (This activity is closely related with the BioInvIndex project described above)

Rationale

The increased cost of interpreting the experimental procedures and exploring data has encouraged several scientific communities to develop and adopt ontology-based knowledge representations to extend power of their computational approaches. A wide variety of ontologies relevant to the biological and medical domains are available through the OBO Foundry portal, and the number of such artifacts is growing rapidly.

Ontology for Biomedical Investigations (OBI)

We work as part of the OBI consortium to develop an integrated ontology for the description of biological and clinical investigations. OBI will includes

  • a set of 'universal' terms, that are applicable across various biological and technological domains, and
  • domain-specific terms relevant only to a given domain.

OBI will support the consistent annotation of biomedical investigations, regardless of the particular field of study. The ontology will model the design of an investigation, the protocols and instrumentation used, the material used, the data generated and the type analysis performed on it.

For a full list of international participants, and up to date developments, please refer to the OBI webpage and workshop series below.

Open Biomedical Ontologies (OBO) Foundry

Integration of ontologies is extremely desirable. However, heterogeneity in format and style pose serious obstacles to such integration. In particular, inconsistencies in naming conventions can impair the readability and navigability of ontology class hierarchies, and hinder their alignment and integration. While other sources of diversity are tremendously complex and challenging, agreeing a set of common naming conventions is an achievable goal.

We work with the OBO Foundry coordinators and communities to develop common naming conventions to help the formal annotation of ontologies. A survey has been carried out to establish which naming conventions are currently employed by OBO Foundry ontologies and to determine what their special requirements regarding the naming of entities might be.

For more information on the naming conventions and the survey, please refer to the OBO Foundry webpage and workshop series below.

Text mining methods to enrich ontology

(coming soon!)

Funding source - UK BBSRC and EU NoE Semantic Mining

This activity is funded by the UK BBSRC e-Science Programme (BB/D524283/1). Visits to collaborators are funded by the Semantic Interoperability and Data Mining in Biomedicine (NoE 507505).

ISA-TAB format

Collaborative development with MGED RSBI working group of ISA-TAB, a tab-delimited format for representing experimental metadata, that can be rendered in FuGE XML format. (This activity is closely related with the BioInvIndex project described above, and the workshop series)

Rationale

Several groups participate in the Functional Genomics (FuGE) project to develop a single generic data model that will underpin a variety of XML-based formats by providing a single common framework. However, a complete set of FuGE-based modules (or other interoperable XML) to communicate studies employing omics-based technologies is still some way off. Nevertheless there is a pressing need to report (submit and/or exchange) such complex studies in a standard format.

The Investigation / Study / Assay tab-delimited (ISA-TAB) format

This is a simple format to submit or exchange studies employing omics technologies along with more conventional methodologies which assists in structuring metadata and describing the relationship of samples to data. The ISA-TAB proposal builds on the successful uptake of the MAGE TAB format, which supports the management, exchange and submission of microarray-based experiment data and metadata. Like MAGE-TAB before it, ISA-TAB is simply a format with which to communicate information. Neither minimum requirements, nor the use of controlled terminologies are within the scope of this proposal.

Once FuGE-based modules or other interoperable XML formats will become available, ISA-TAB can continue to serve those with little or no bioinformatics support, as well as finding utility as a user-friendly presentation layer for XML-based formats (via an XSL transformation).

For a full list of international participants, and up to date developments, please refer to the ISA-TAB webpage and workshop series below.

Funding source - EU IP Carcinogenomics and UK BBSRC

Workshop series on standards and ontologies

A set of workshop on omics standards designed to advance the coordinated development of MIBBI, OBI, FuGE and ISA-TAB. (This activity is closely related with the BioInvIndex project described above)

Rationale

Standards and ontologies are being specifically developed to target a particular omics technology or a particular biologically-delineated community. However, remaining bounded by a particular discipline, standardisation efforts in general remain fragmented and cannot be easily integrated. The reuse of standards and ontologies will benefit the entire scientific community by simplifying the job of data integration. It will also ease the task of software developers, vendors, and equipment manufacturers by reducing time and costs for implementing standards-compliant products. Fortunately, some synergistic activities have already begun to foster harmonisation and consolidation of reporting standards.
Our workshops are designed to:

  • advance the coordinated development of synergistic reporting standards, including minimal descriptors (MIBBI), format (FuGE) and terminology (OBI and OBO Foundry)
  • identify stable subsets of those projects’ outputs that can be implemented and tested
  • develop interim solutions, e.g. ISA-TAB, to tackle today's need for reporting (submitting and/or exchanging) both data and metadata while these synergistic projects remain works in progress

The integrative elements binding these workshops together are largely informed by the requirements of the BioInvIndex project and the opinions and of the several standard working groups.

Represented communities and deliverables

Individuals to be brought together are the key developers of the standards working groups operating in the functional genomics domain, namely:

We plan to document the progress and outcomes of the workshops and to publish reports in open access journals. Standards are effective only if they are widely used, and thus it is crucial that the process of creating and refining reporting standards and of testing them in working system is publicly documented in an effective manner. The long-term deliverables are the various standards themselves, which must be formulated in ways that are clear and understandable to the relevant target communities (both end users and other developers).

Workshops schedule and reports

We plan to hold several workshops until June 2010:

Funding source - UK BBSRC and EU Elixir

This activity is funded by the UK BBSRC e-Science Programme (BB/E025080/1) and partly by the Elixir project.

spacer
spacer