EDITtoTrEMBL: A distributed approach to high-quality automated protein sequence annotation.
Many databases in molecular biology face the problem that
the ever increasing rate of data production can not be handled any
more with traditional methods, especially human curation. Therefore
a number of projects are currently investigating methods for
automated sequence annotation. This paper describes the EBI's
approach to tackle this problem for protein sequences by integration
of arbitrary analysis programs into a distributed and highly
flexible environment. Our software framework allows an individual
treatment of sequences depending on their particular properties,
which is achieved through a high-level description of the
preconditions and capabilities of analysing modules. This does not
only improve the overall performance of the annotation process, as
unnecessary steps are avoided, but also enhances its quality since
dependencies between different modules are taken into account. We
have implemented a prototype and use it in the
production of TrEMBL releases.