![]() |
GOA - Project OutlineContinual advancement in proteome research has led to an increase insequences from a wide range of species requiring addition to the UniProtKB/Swiss-Prot Protein Knowledgebase and its supplement, UniProtKB/TrEMBL (Bairoch and Apweiler, 2000), the majority of these lacking functional characterisation. To fully exploit the potential of this vast quantity of data, the UniProtKB/Swiss-Prot group has intensified its efforts to capture all available biological information related to these sequences and, in particular, to the human proteome. Crucial to our efforts is the integration of in-house resources with those of external database groups. Integration and data exchange involve solving the complexities that exist between databases. For example, the use of different vocabularies to describe gene function can hinder searching across multiple proteins and species for common characteristics. The use of a common vocabulary facilitates the identification of relationships and common properties between gene products from different species. This problem has been addressed by the creation of the Gene Ontology resource (The Gene Ontology Consortium, 2001), a dynamic controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing. The Gene Ontology Consortium has developed three separate ontologies, molecular function, biological process and cellular component, to describe gene products and these allow for the annotation of molecular characteristics across species. Each vocabulary is structured as directed acyclic graphs (DAGs), wherein any term may have more than one parent as well as zero, one, or more children. This makes attempts to describe biology much richer than would be possible with a hierarchical graph. Currently the GO vocabulary consists of more than 17,000 terms, which will, in time, all have strict definitions for their usage. UniProtKB/Swiss-Prot has joined the Gene Ontology (GO) Consortium and has adopted its standard vocabulary to characterise the activities of proteins in the UniProtKB/Swiss-Prot, UniProtKB/TrEMBL and InterPro (Apweiler et al., 2001) databases. It has initiated the Gene Ontology Annotation (GOA) project to provide assignments of GO terms to gene products for all organisms with completely sequenced genomes by a combination of electronic assignment and manual annotation. By annotating all characterised proteins with GO terms and facilitating the transfer of this knowledge to similar uncharacterised proteins, the UniProtKB/Swiss-Prot group anticipates a valued contribution to biotechnological research through a better understanding of all proteomes. The GOA project is supported by Grants QRLT-2001-00015 and QLRI-2000-00981 of the European Commission and a supplementary NIH grant, 1R01HGO2273-01. ContactFor information, comments and/or suggestions, please email us at goa@ebi.ac.uk![]() |