What is UniProt-GOA?

Why do we need Gene Ontology annotation?

As proteomics research gains momentum, biologists need new ways to access and analyse information on proteins. To exploit the potential of these data fully, we need to capture all the available biological information related to each protein, including consistent descriptions of protein function. The process of adding extra biological information to the data is called annotation.

What is the UniProt-Gene Ontology Annotation project?

The Gene Ontology (GO) is a dynamic controlled vocabulary that details the biological processes, molecular functions and cellular components of a generic cell. The UniProt-Gene Ontology Annotation (UniProt-GOA) project uses GO to describe proteins in UniProtKB. The project has assigned GO terms to all complete and incomplete proteomes that exist in UniProtKB, using a combination of manual and automatic annotation. For more information about the Gene Onotology, see the GO quick tour.

Manual annotation is carried out by curators who directly assign GO terms to proteins based on evidence from the scientific literature. GO associations that are determined manually are allocated an evidence code that describes the evidence from the literature that supports the annotation. More information about our manual annotation procedure can be found on the UniProt-GOA website.

Automatic annotation is a rapid way of assigning GO terms to gene products on a large scale, and the UniProt-GOA project is the main producer of automatic annotations in the GO Consortium. Details of the different automatic annotation pipelines can be found on the UniProt-GOA website.

UniProt-GOA is updated on a monthly basis, in accordance with the latest data released by UniProtKB, Ensembl, Ensembl Genomes and InterPro (Figure 1). GO annotations are also imported from other members of the GO Consortium and its collaborators.

By annotating all characterised proteins with GO terms and helping to transfer this knowledge to similar uncharacterised proteins, we hope to contribute to a better understanding of all proteomes. The success of GO can be measured by the number of databases that use it to annotate and exchange biological knowledge. The UniProt-GOA project has made an important contribution to this global effort.

 

Sources and flow of data for the UniProt-GOA resource

Figure 1. Sources and flow of data for the UniProt-GOA resource. UniProt-GOA contains protein sequences from UniProtKB, which are annotated (both manually and automatically) using GO terms. Data can be downloaded as complete files or filtered annotation sets. Data can be searched using QuickGO. All data are exchanged with the GO Consortium.