
The
understanding and simulation of metabolic networks is currently
hindered by a significant lack of information on the structural
identity and physical properties of biochemical metabolites in
organisms under investigation. Methods developed by our group provide
means to quickly determine the structure of metabolites by
stochastic screening of large candidate spaces based on spectroscopic
methods [12,16,7,4].
Our so-called SENECA system is based on a stochastic
structure generator which is guided by a spectroscopy-based scoring
function.
In
order to perform this scoring we need precise and fast methods for
the prediction of mass and NMR spectra.
Here, we employ machine learning methods such as support vector
machines to correlate graph-based molecular descriptors with database
knowledge [26,25]. The resulting prediction engines are then used as
judges in our SENECA scoring function or elsewhere.
In
this context, a concerted effort to create a database of
biological metabolites and their spectral and physicochemical
properties for System Biology data is required. This will, for example, serve as sources for
dereplication
data as well as for the training of our spectrum prediction engines.
In this database warehouses, data from diverse sources (various
analytical and spectroscopic instruments, NMR, MS, LC, GC) will need
to be integrated and combined with already existing knowledge from
systems biology databases. Two past and current projects will aid us
in instantiating or contributing to the development of such a
repository: Our open access, open submission, open source
database NMRShiftDB for organic
structures and their NMR data [17,15], as well as our current efforts
to create a standard markup language CMLSpect for the
representation of spectroscopic data in the framework of the Chemical Markup Language CML, developed
together with Cambridge University, UK [22].
Numbers in square brackets refer to the publication list as part of my
publication list.
The fast identification of known metabolites based on their spectroscopic fingerprint
