spacer

Cheminformatics and Metabolism Team - Research

The understanding and simulation of metabolic networks is currently hindered by a significant lack of information on the structural identity and physical properties of biochemical metabolites in organisms under investigation. Methods developed by our group provide means to quickly determine the structure of metabolites by stochastic screening of large candidate spaces based on spectroscopic methods [12,16,7,4]1. Our so-called SENECA system is based on a stochastic structure generator which is guided by a spectroscopy-based scoring function.

In order to perform this scoring we need precise and fast methods for the prediction of mass and NMR spectra. Here, we employ machine learning methods such as support vector machines to correlate graph-based molecular descriptors with database knowledge [26,25]. The resulting prediction engines are then used as judges in our SENECA scoring function or elsewhere.

In this context, a concerted effort to create a database of biological metabolites and their spectral and physicochemical properties for System Biology data is required. This will, for example, serve as sources for dereplication 2 data as well as for the training of our spectrum prediction engines. In this database warehouses, data from diverse sources (various analytical and spectroscopic instruments, NMR, MS, LC, GC) will need to be integrated and combined with already existing knowledge from systems biology databases. Two past and current projects will aid us in instantiating or contributing to the development of such a repository: Our open access, open submission, open source database NMRShiftDB for organic structures and their NMR data [17,15], as well as our current efforts to create a standard markup language CMLSpect for the representation of spectroscopic data in the framework of the Chemical Markup Language CML, developed together with Cambridge University, UK [22].


1 Numbers in square brackets refer to the publication list as part of my publication list.

2 The fast identification of known metabolites based on their spectroscopic fingerprint
spacer
spacer