spacer
  spacer

EDITtoTrEMBL: A distributed approach to high-quality automated protein sequence annotation.

 

  Many databases in molecular biology face the problem that the ever increasing rate of data production can not be handled any more with traditional methods, especially human curation. Therefore a number of projects are currently investigating methods for automated sequence annotation. This paper describes the EBI's approach to tackle this problem for protein sequences by integration of arbitrary analysis programs into a distributed and highly flexible environment. Our software framework allows an individual treatment of sequences depending on their particular properties, which is achieved through a high-level description of the preconditions and capabilities of analysing modules. This does not only improve the overall performance of the annotation process, as unnecessary steps are avoided, but also enhances its quality since dependencies between different modules are taken into account. We have implemented a prototype and use it in the production of TrEMBL releases.



spacer
spacer