Manual Annotation Efforts

Manual annotation is the direct assignment of GO terms to proteins by curators from evidence extracted during the review of published scientific literature. GO associations that are made manually are supplied with one of 13 different evidence code that describes the broad category of evidence located in the cited paper to support the annotation. More information on evidence codes can be found in the Evidence Code Guide .

The UniProt-GOA group is involved in several manual annotation projects, each of which has it’s own priority list of proteins.

Please select one of our projects below for more information:

In addition, if you have any gene products which have not been manually annotated recently, please e-mail goa@ebi.ac.uk and we will endeavour to add them to our priority lists.

UniProt-GOA’s Manual Annotation Workflow

 

image

 
(1) UniProt-GOA curators are involved in several projects which each have their own protein target list, so we tend to annotate on a protein-centric basis rather than paper-centric. As the UniProt-GOA project provides GO annotation to the UniProt Knowledgebase, UniProtKB accessions are the primary sequence identifier used. We also annotate to protein isoforms (e.g. Q4VCS5-2 )

(2) Literature searches are usually performed using resources such as PubMed , CitExplore and iHOP [1] ; quite frequently a search for a well-studied human protein results in a search engine returning hundreds of papers, most of which will not be appropriate for GO annotation, e.g. clinical papers, or where a gene name matches a commonly used word (e.g. ‘OAT’). In such cases curators may decide to focus curation efforts on a range of the most recent articles to get an overview of its functional attributes.

(3) The triage process usually involves the curator looking at the titles of the papers from the literature search results, if they look promising then the curator will go on to read the abstract of the paper. Usually the curator will be able to identify from the abstract whether the paper will be useful for GO annotation and, if so, the paper will be saved.

(4) If a curator has exhaustively searched for experimental evidence appropriate for GO annotation in one or all of the ontologies and none can be found, then an annotation can be added to the root term of the ontology, e.g. molecular function, using the evidence code ‘ND’ (No biological Data available). This indicates that as of the date of the annotation, there was no evidence available for a molecular function of this protein.

(5) Papers are read in full. When reading a paper, curators must identify the taxonomic origin of the protein begin described, this may involve following up references to methods from a different paper or, in some cases, contacting the author.

(6) When evidence for a GO annotation has been identified, the curator must find an appropriate GO term. (GO terms can be identified by using a GO browser, such as QuickGO ). The curator will look for the most specific term that suits the evidence presented in the paper.

(7) Depending on the type of experiment or statement presented in the paper, an appropriate evidence code must be chosen to support the association between GO term and protein accession. Details on the different evidence codes available can be found in the Evidence Code Guide . Preference is given to annotations with an experimental evidence code.

(8) The core GO annotation, which minimally includes a GO term, evidence code and reference identifier, is entered into UniProt-GOA’s annotation tool, Protein2GO. All annotations will be automatically supplied with a creation date.

(9) Once a protein has been comprehensively annotated , curators perform a BLAST search to find highly-similar proteins. Any appropriate annotations may be copied to similar proteins in less well-studied species. This increases the coverage of proteins with GO annotation which may otherwise have no manual annotation. The evidence code given to the copied annotation is ‘ISS’ (Inferred from Sequence or Structural Similarity).

For additional information please see:

[1] Hoffmann, R., Valencia, A.
A gene network for navigating the literature .
Nature Genetics 36, 664 (2004)