- Which file should I use to get the fullest set of GO annotations for a species?
- Why can't I see an annotation in a UniProtKB record when it appears in the gene association file?
- Why are the species-specific files and the UniProt multi-species gene association file different?
- How do I change between UniProt accessions and other identifiers, e.g. Ensembl, EMBL, RefSeq Gene ID, UniGene?
- What GO tools can I use to display/compare annotations to my selected proteins/genes?
- Is it correct to assume that gene products annotated to a child of a GO term will automatically be considered part of the parent term?
- Who do I contact if there is an annotation error in one of your files?
- How do I download a bulk set of GO annotations? What formats are there?
- GO slims - What are they? How can I make one?
- What do the evidence codes mean?
- How do I cite GOA?
1. Which file should I use to get the fullest set of GO annotations for a species?
In the GO Consortium, there are a number of model organism databases that are the authoritative source of GO annotations for their respective species. These groups also integrate annotations from other sources including GOA (a multi-species resource) on a regular basis.
GOA also provides a number of species-specific files for human, mouse, rat, zebrafish, Arabidopsis, chicken, dog, cow, pig, Dictyostelium, worm, yeast and Drosophila. The annotations in these files are based on entries in the UniProtKB gene-centric reference proteome (GCRP), protein complexes (Complex Portal), and non-coding RNAs (RNACentral). GOA integrates manual annotations from all other GO Consortium groups, as well as a number of external annotating groups where the annotated gene product identifier can be mapped to one of the three identifiers we support (UniProtKB, Complex Portal, and RNACentral IDs).
Both model organism group and GOA species-specific files are available on our FTP site.
2. Why can't I see an annotation in a UniProtKB record when it appears in the gene association file?
There could be a number of reasons for this:
A. If it appears that a manual annotation is missing:
If the GO annotation has been recently created, then UniProtKB may not yet have cross-referenced the annotation; there can be a time lag of up to 3 months.
B. If it appears that an electronic annotation is missing:
If you are looking at a curated UniProtKB entry (i.e. one in the Swiss-Prot section of UniProtKB), then not all electronic annotations are displayed here. Only annotations from certain methods, such as the HAMAP2GO and EC2GO mappings, are included.
In addition, sets of GO annotations displayed in UniProtKB are filtered to try to provide a comprehensive yet concise set of cross-references. To get from the UniProtKB record to the QuickGO browser (which will show the most up-to-date and full set of manual and electronic annotations for a protein) click on the '[View the complete GO annotation on QuickGO]' link at the bottom of the GO cross-references section of the UniProtKB entry.
However if none of these reasons appear to apply to your missing annotation please let us know and we will investigate!
3. Why are the species-specific files and the UniProt multi-species gene association file different?
The GOA UniProt gene association file contains all manual and electronic annotations that GOA has assigned to UniProtKB entries. This dataset contains annotations to more than 800,000 different species (https://www.ebi.ac.uk/GOA/uniprot_release) and is redundant for electronic annotations where two different electronic methods have assigned the same or a less granular GO term.
The species-specific files are created using the reference complete proteome sets to determine the protein composition of the files. The species-specific files can contain annotations to both reviewed (Swiss-Prot) and unreviewed (TrEMBL) UniProtKB accessions. Any user wishing to only identify the reviewed (Swiss-Prot) UniProt protein annotation subset will be able continue to do so using the information supplied in the gp_information.goa_uniprot file, which can be found here; ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/gp_ .
We aim to remove electronic annotations from the species-specific files that have been created by the same technique and that have predicted the same or less granular GO terms.
4. How do I change between UniProt accessions and other identifiers, e.g. Ensembl, EMBL, RefSeq Gene ID, UniGene?
UniProt provides a mapping service that can convert UniProt accessions to accessions from multiple databases including the EMBL/Genbank/DDBJ nucleotide sequence databases, Ensembl, GeneID and RefSeq. and vice versa. This service can be found here. If you require any more information or help using this service, you can mail firstname.lastname@example.org .
Besides QuickGO, the GO consortium has available on their site other tools that are useful for GO analysis. They can be found here: http://amigo.geneontology.org/amigo/software_list
6. Is it correct to assume that gene products annotated to a child of a GO term will automatically be considered part of the parent term?
Yes, it is safe to assume this, since every GO term must follow the true path rule: if the child term describes the gene product, then all its parent terms must also apply. So if a gene product is annotated to ‘protein tyrosine kinase activity’, the parent terms such as ‘protein kinase activity’ and ‘peptidyl-tyrosine phosphorylation’ also apply to the gene product.
If you find an annotation error, please e-mail GOA (email@example.com) with as much detail as possible regarding the annotation in question. We will then either be able to correct it or pass it on to the database responsible for the annotation so that they may correct it.
All GOA GO annotations to UniProtKB accessions are available from:
There is a ReadMe.txt file that explains the different annotation files available. We generate two file formats that are used across the GO consortium. The Gene Annotation File (GAF) is a 17 column tab-delimited file. The file format conforms to the specifications demanded by the GO Consortium and therefore GO IDs and not GO term names are shown. In addition to the GAF format, we also offer a Gene Product Annotation Data file (GPAD), which is a 12 column tab-delimited file and is more normalized than GAF. If you are after a more customised version of an annotation file, our QuickGO tool can allow you to filter for the annotations you are interested in and export them as a TSV file. The TSV format will also have both the GO term and GO term ID, allowing you to quickly see what GO term a gene product has been annotated to.
GO slims are cut-down versions of the GO ontologies containing terms that cover the main aspects of each of the three GO ontologies. They give a broad overview of the ontology content without the detail of the specific fine-grained terms.
As each community has different needs, a variety of GO slim files have been archived on the GO home page by Consortium members. Further documentation and links to these slims can be found at: http://www.geneontology.org/GO.slims.shtml
Every annotation submitted to GO must be attributed to a source - such as a literature reference, another database or a computational analysis. In addition, these annotations must indicate what kind of evidence is found in the cited source to support the association between the gene product and the GO term. If you would like to find more detailed information on the meaning and usage of evidence codes, documentation can be found at the GO web site at: http://www.geneontology.org/page/guide-go-evidence-codes
If you use any data obtained from GOA or QuickGO in a publication, please cite the following paper:
Huntley RP, Sawford T, Mutowo-Meullenet P, Shypitsyna A, Bonilla C, Martin MJ, O’Donovan C
The GOA database: Gene Ontology annotation updates for 2015.
Nucleic Acids Res. 2015 Jan; 43:D1057-63