News Archive

UniProt-GOA news

All the news relating to UniProt-GOA.

Date News Releases
04 July, 2017 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
06 June, 2017 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
09 May, 2017 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
11 April, 2017 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
17 March, 2017 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
17 March, 2017 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
17 March, 2017 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
15 February, 2017 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
18 January, 2017 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
30 November, 2016 This UniProt-GOA release sees the introduction of a new version of the automatic prediction pipeline based on sets of orthologs from Ensembl and EnsemblGenomes.

In the previous version of the pipeline, the determination of the ortholog sets and the projection of GO terms between orthologs were both done by Ensembl. In the new version, Ensembl continues to provide the orthologs (which now include those from EnsemblProtists and EnsemblMetazoa, in addition to the original Ensembl Compara, EnsemblPlants, and EnsemblFungi), but the GO projection is now done by UniProt-GOA, which ensures that the most up-to-date annotation data is used as the source of the projections.

The projected annotations all have evidence code ECO:0000265 (sequence orthology evidence used in automatic assertion), which maps up to the IEA GO evidence, and have GO_REF:0000107 (https://github.com/geneontology/go-site/blob/master/metadata/gorefs/gore...) as their reference; in addition, the with/from column contains both the UniProtKB accession and MOD identifier of the protein from which the term was projected.

If you have any comments or questions about this, or any other aspect of the GOA release, please contact us at goa@ebi.ac.uk
02 November, 2016 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
05 October, 2016 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
19 September, 2016 Changes to GOA annotation files
===============================

Following discussions with the GO Consortium, we made some changes at the time of the last GOA release (08 June 2016) to the set of annotation files that we publish.

Note that in the following text, {species} is one of: human, chicken, cow, pig, dog, mouse, rat, arabidopsis, zebrafish, worm, yeast, fly, or dicty


i) Changes to species-specific annotation sets

We now publish the following four annotation sets per species, which we provide in both GAF and GPAD format, the format being indicated by the file suffix:

- goa_{species}.[gaf|gpa] - annotations to canonical accessions from the UniProt reference proteome
- goa_{species}_isoform.[gaf|gpa] - annotations to isoforms from the UniProt reference proteome
- goa_{species}_complex.[gaf|gpa] - annotations to complexes
- goa_{species}_rna.[gaf|gpa] - annotations to RNAs

Note that annotations to proteins that are not part of the UniProt reference proteome for a species are still available in the full goa_uniprot annotation set (goa_uniprot_all.[gaf|gpa]).

These files replace the old species-specific annotation sets:

- gene_association.goa_{species}
- gp_association.goa_{species}
- gene_association.goa_ref_{species}
- gp_association.goa_ref_{species}


ii) Changes to species-specific metadata files

Metadata files contain information (name, symbol, synonyms, etc) about both annotated and unannotated gene products.

We now publish the following four metadata files per species:

- goa_{species}.gpi - metadata for canonical accessions from the UniProt reference proteome (GCRP) for the species
- goa_{species}_isoform.gpi - metadata for isoforms from the UniProt GCRP
- goa_{species}_complex.gpi - metadata for complexes
- goa_{species}_rna.gpi - metadata for RNAs

These files contain entries for all gene products in that particular category, whether they are annotated or not, and they replace the old species-specific metadata files, gp_information.goa_{species} and gp_information.goa_ref_{species}.


iii) Changes to other annotation sets

For consistency with the new naming convention used for the species-specific files, we have changed the names of the following files; their content, however, remains unchanged:

- gene_association.goa_uniprot is now goa_uniprot_all.gaf
- gp_association.goa_uniprot is now goa_uniprot_all.gpa
- gene_association.goa_ref_uniprot is now goa_uniprot_gcrp.gaf
- gp_association.goa_ref_uniprot is now goa_uniprot_gcrp.gpa
- gp_information.goa_uniprot is now goa_uniprot_all.gpi
- gp_information.goa_ref_uniprot is now goa_uniprot_gcrp.gpi


If you have any comments or questions about these changes, please email us at goa@ebi.ac.uk.

07 July, 2016 Changes to GOA annotation files
===============================

Following discussions with the GO Consortium, we made some changes at the time of the last GOA release (08 June 2016) to the set of annotation files that we publish.

Note that in the following text, {species} is one of: human, chicken, cow, pig, dog, mouse, rat, arabidopsis, zebrafish, worm, yeast, fly, or dicty


i) Changes to species-specific annotation sets

We now publish the following four annotation sets per species, which we provide in both GAF and GPAD format, the format being indicated by the file suffix:

- goa_{species}.[gaf|gpa] - annotations to canonical accessions from the UniProt reference proteome
- goa_{species}_isoform.[gaf|gpa] - annotations to isoforms from the UniProt reference proteome
- goa_{species}_complex.[gaf|gpa] - annotations to complexes
- goa_{species}_rna.[gaf|gpa] - annotations to RNAs

Note that annotations to proteins that are not part of the UniProt reference proteome for a species are still available in the full goa_uniprot annotation set (goa_uniprot_all.[gaf|gpa]).

These files replace the old species-specific annotation sets:

- gene_association.goa_{species}
- gp_association.goa_{species}
- gene_association.goa_ref_{species}
- gp_association.goa_ref_{species}


ii) Changes to species-specific metadata files

Metadata files contain information (name, symbol, synonyms, etc) about both annotated and unannotated gene products.

We now publish the following four metadata files per species:

- goa_{species}.gpi - metadata for canonical accessions from the UniProt reference proteome (GCRP) for the species
- goa_{species}_isoform.gpi - metadata for isoforms from the UniProt GCRP
- goa_{species}_complex.gpi - metadata for complexes
- goa_{species}_rna.gpi - metadata for RNAs

These files contain entries for all gene products in that particular category, whether they are annotated or not, and they replace the old species-specific metadata files, gp_information.goa_{species} and gp_information.goa_ref_{species}.


iii) Changes to other annotation sets

For consistency with the new naming convention used for the species-specific files, we have changed the names of the following files; their content, however, remains unchanged:

- gene_association.goa_uniprot is now goa_uniprot_all.gaf
- gp_association.goa_uniprot is now goa_uniprot_all.gpa
- gene_association.goa_ref_uniprot is now goa_uniprot_gcrp.gaf
- gp_association.goa_ref_uniprot is now goa_uniprot_gcrp.gpa
- gp_information.goa_uniprot is now goa_uniprot_all.gpi
- gp_information.goa_ref_uniprot is now goa_uniprot_gcrp.gpi


If you have any comments or questions about these changes, please email us at goa@ebi.ac.uk.

08 June, 2016 Changes to GOA annotation files
===============================

Following discussions with the GO Consortium, we have made some changes to the set of annotation files that we publish.

Note that in the following text, is one of: human, chicken, cow, pig, dog, mouse, rat, arabidopsis, zebrafish, worm, yeast, fly, or dicty


i) Changes to species-specific annotation sets

With effect from this GOA release, we now publish the following four annotation sets per species, which we provide in both GAF and GPAD format, the format being indicated by the file suffix:

- goa_.[gaf|gpa] - annotations to canonical accessions from the UniProt reference proteome
- goa__isoform.[gaf|gpa] - annotations to isoforms from the UniProt reference proteome
- goa__complex.[gaf|gpa] - annotations to complexes
- goa__rna.[gaf|gpa] - annotations to RNAs

Note that annotations to proteins that are not part of the UniProt reference proteome for a species are still available in the full goa_uniprot annotation set (goa_uniprot_all.[gaf|gpa]).

These files replace the old species-specific annotation sets:

- gene_association.goa_
- gp_association.goa_
- gene_association.goa_ref_
- gp_association.goa_ref_


ii) Changes to species-specific metadata files

Metadata files contain information (name, symbol, synonyms, etc) about both annotated and unannotated gene products.

With effect from this GOA release, we now publish the following four metadata files per species:

- goa_.gpi - metadata for canonical accessions from the UniProt reference proteome (GCRP) for the species
- goa__isoform.gpi - metadata for isoforms from the UniProt GCRP
- goa__complex.gpi - metadata for complexes
- goa__rna.gpi - metadata for RNAs

These files contain entries for all gene products in that particular category, whether they are annotated or not, and they replace the old species-specific metadata files, gp_information.goa_ and gp_information.goa_ref_.


iii) Changes to other annotation sets

For consistency with the new naming convention used for the species-specific files, we have changed the names of the following files; their content, however, remains unchanged:

- gene_association.goa_uniprot is now goa_uniprot_all.gaf
- gp_association.goa_uniprot is now goa_uniprot_all.gpa
- gene_association.goa_ref_uniprot is now goa_uniprot_gcrp.gaf
- gp_association.goa_ref_uniprot is now goa_uniprot_gcrp.gpa
- gp_information.goa_uniprot is now goa_uniprot_all.gpi
- gp_information.goa_ref_uniprot is now goa_uniprot_gcrp.gpi


If you have any comments or questions about these changes, please email us at goa@ebi.ac.uk.

11 May, 2016 Changes to GOA annotation files
===============================

Following discussions with the GO Consortium, we shall be making some changes to the set of annotation files that we publish. These changes will be implemented at the next GOA release, which is scheduled for the week of 6th June.

i) Changes to species-specific annotation sets

Note that in the following text, is one of: human, chicken, cow, pig, dog, mouse, rat, arabidopsis, zebrafish, worm, yeast, fly, or dicty

For each species, we currently publish two sets of annotations, in both GAF 2.1 and GPAD 1.1 format:

- gene_association.goa_ and gp_association.goa_ - these files contain annotations to proteins that are part of the UniProt complete proteome for the species *plus* isoforms; they also contain annotations to complexes and RNAs

- gene_association.goa_ref_ and gp_association.goa_ref_ - these files contain annotations to proteins that are part of the UniProt reference proteome for the species (the “Gene Centric Reference Proteome”, or GCRP)

With effect from the next GOA release, we shall cease production of the above files, and replace them with the following four annotation sets per species, which we will provide in both GAF and GPAD format, the format being indicated by the file suffix:

- goa_.[gaf|gpa] - annotations to canonical accessions from the UniProt reference proteome
- goa__isoform.[gaf|gpa] - annotations to isoforms from the UniProt reference proteome
- goa__complex.[gaf|gpa] - annotations to complexes
- goa__rna.[gaf|gpa] - annotations to RNAs

Note that annotations to proteins that are not part of the UniProt reference proteome for a species will still be available in the full goa_uniprot annotation set (goa_uniprot_all.[gaf|gpa]).

ii) Changes to species-specific metadata files

For each species, we also currently publish two metadata files, in GPI 1.1 format, which contain information (name, symbol, synonyms, etc) about both annotated and unannotated gene products:

- gp_information.goa_ - contains metadata for all proteins that are part of the UniProt complete proteome for the species, plus metadata for complexes and RNAs from that species

- gp_information.goa_ref_ - contains metadata only for those proteins that are part of the UniProt reference proteome for the species

With effect from the next GOA release, we shall cease production of the above two files for each species, and replace them with the following:

- goa_.gpi - metadata for canonical accessions from the UniProt reference proteome (GCRP) for the species
- goa__isoform.gpi - metadata for isoforms from the UniProt GCRP
- goa__complex.gpi - metadata for complexes
- goa__rna.gpi - metadata for RNAs

Note that the files will contain entries for all gene products for that particular category, whether they are annotated or not.

iii) Changes to other annotation sets

For consistency with the new naming convention used for the species-specific files, we shall be changing the names of the following files; their content, however, will remain unchanged:

- gene_association.goa_uniprot will be renamed to goa_uniprot_all.gaf
- gp_association.goa_uniprot will be renamed to goa_uniprot_all.gpa
- gene_association.goa_ref_uniprot will be renamed to goa_uniprot_gcrp.gaf
- gp_association.goa_ref_uniprot will be renamed to goa_uniprot_gcrp.gpa
- gp_information.goa_uniprot will be renamed to goa_uniprot_all.gpi
- gp_information.goa_ref_uniprot will be renamed to goa_uniprot_gcrp.gpi


Examples of the new species-specific file sets, for human and mouse, are available at ftp://ftp.ebi.ac.uk/pub/contrib/goa/new-files/

If you have any comments or questions about these changes, please email us at goa@ebi.ac.uk.

Changes to annotations created by logical inference
===================================================

With effect from this release, we have made a change to the format of annotations that are created by logical inference based on inter-ontology links between molecular function and biological process terms, or between biological process and cellular component terms. These annotations can be identified easily, as they have "GOC" in the assigned_by column of the GAF and GPAD files.

In previous releases, these logically inferred annotations retained the reference, with/from, and evidence code of the annotation to the asserted GO term from which they were derived. Now, however, inferred annotations will have GO_REF:0000108 (http://www.geneontology.org/cgi-bin/references.cgi#GO_REF:0000108) as their reference, and one of the two evidence codes ECO:0000364 (evidence based on logical inference from manual annotation used in automatic assertion) or ECO:0000366 (evidence based on logical inference from automatic annotation used in automatic assertion), depending on whether the annotation to the asserted term is manual or electronic, respectively; the with/from column is also populated with the identifier of the GO term in the original asserted annotation.

Again, if you have any comments or questions about these changes, please email us at goa@ebi.ac.uk.
13 April, 2016 - Proposed changes to GOA annotation files

Following discussions with the GO Consortium, we are planning to make some changes to the set of species-specific annotation files that we publish.

For each of the following species, we currently publish two sets of annotations, one based on the UniProt complete proteome, and one based on the UniProt reference proteome:

human chicken cow pig dog mouse rat arabidopsis zebrafish worm yeast fly dicty

We intend to cease production of the current goa_ and goa_ref_ files, and replace them with the following four annotation sets per species, which we will provide in both GAF and GPAD format (note that the names of the files are as yet undecided):

1. annotations to canonical accessions from the UniProt reference proteome
2. annotations to isoforms from the UniProt reference proteome
3. annotations to complexes
4. annotations to RNAs

Note that annotations to proteins that are not part of the UniProt reference proteome for a species will still be available in the full goa_uniprot annotation set.

We plan to implement these changes in the GOA release that is scheduled for the week of May 9th 2016.

Comments on this proposal should be sent to goa@ebi.ac.uk

- New beta version of QuickGO

We are pleased to announce the public availability of the beta-test version of a major new update to QuickGO.

The new version is available at http://www.ebi.ac.uk/QuickGO-Beta

Please take a moment to try it out and let us know what you think using the "Send feedback" button.
16 March, 2016 - Proposed changes to GOA annotation files

Following discussions with the GO Consortium, we are planning to make some changes to the set of species-specific annotation files that we publish.

For each of the following species, we currently publish two sets of annotations, one based on the UniProt complete proteome, and one based on the UniProt reference proteome:

human chicken cow pig dog mouse rat arabidopsis zebrafish worm yeast fly dicty

We intend to cease production of the current goa_ and goa_ref_ files, and replace them with the following four annotation sets per species, which we will provide in both GAF and GPAD format (note that the names of the files are as yet undecided):

1. annotations to canonical accessions from the UniProt reference proteome
2. annotations to isoforms from the UniProt reference proteome
3. annotations to complexes
4. annotations to RNAs

Note that annotations to proteins that are not part of the UniProt reference proteome for a species will still be available in the full goa_uniprot annotation set.

We plan to implement these changes in the GOA release that is scheduled for the week of May 9th 2016.

Comments on this proposal should be sent to goa@ebi.ac.uk
16 February, 2016 Following discussions with the GO Consortium, we are planning to make some changes to the set of species-specific annotation files that we publish.

For each of the following species, we currently publish two sets of annotations, one based on the UniProt complete proteome, and one based on the UniProt reference proteome:

human chicken cow pig dog mouse rat arabidopsis zebrafish worm yeast fly dicty

We intend to cease production of the current goa_ and goa_ref_ files, and replace them with the following four annotation sets per species, which we will provide in both GAF and GPAD format (note that the names of the files are as yet undecided):

1. annotations to canonical accessions from the UniProt reference proteome
2. annotations to isoforms from the UniProt reference proteome
3. annotations to complexes
4. annotations to RNAs

Note that annotations to proteins that are not part of the UniProt reference proteome for a species will still be available in the full goa_uniprot annotation set.

Note also that this change will not be implemented before the GOA release that is scheduled for the week of March 14th 2016.

Comments on this proposal should be sent to goa@ebi.ac.uk
21 January, 2016 Following discussions with the GO Consortium, we are planning to make some changes to the set of species-specific annotation files that we publish.

For each of the following species, we currently publish two sets of annotations, one based on the UniProt complete proteome, and one based on the UniProt reference proteome:

human chicken cow pig dog mouse rat arabidopsis zebrafish worm yeast fly dicty

We intend to cease production of the current goa_ and goa_ref_ files, and replace them with the following four annotation sets, which we will provide in both GAF and GPAD format (note that the names of the files are as yet undecided):

1. annotations to canonical accessions from the UniProt reference proteome
2. annotations to isoforms from the UniProt reference proteome
3. annotations to complexes
4. annotations to RNAs

Note that annotations to proteins that are not part of the UniProt reference proteome for a species will still be available in the full goa_uniprot annotation set.

Note also that this change will not be implemented before the GOA release that is scheduled for the week of March 14th 2016.

Comments on this proposal should be sent to goa@ebi.ac.uk
06 January, 2016 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
09 December, 2015 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
11 November, 2015 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
14 October, 2015 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
16 September, 2015 We are pleased to announce that in this release we now incorporate 1.5 million annotations from UniProt Unified Rule (UniRule) system. Rules are devised and tested by experienced curators using experimental data from manually annotated UniProt entries as templates.
These annotations use the evidence code ECO:0000256 ('match to sequence model evidence used in automatic assertion') which maps up to 'IEA' and reference GO_REF:0000104.
22 July, 2015 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
23 June, 2015 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
27 May, 2015 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
28 April, 2015 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
31 March, 2015 As announced at the time of the previous (04 March 2015) UniProt-GOA release, this release sees a significant reduction in the number of electronic GO annotations, caused by the deletion of redundant bacterial proteomes in UniProt; this change is described at http://www.uniprot.org/changes/proteome_redundancy
04 March, 2015 1) RNA annotations

We are pleased to announce that this UniProt-GOA release introduces annotations to non-coding RNA.

The scope of the project has recently been expanded, and our annotation files now incorporate annotations to proteins, identified by UniProtKB (www.uniprot.org) accessions, macromolecular complexes, identified by IntAct Complex Portal (www.ebi.ac.uk/intact/complex) identifiers , and ncRNAs, identified by RNAcentral (rnacentral.org) identifiers.


2) Reducing redundancy in proteomes

The UniProt Knowledgebase (UniProtKB) has witnessed an exponential growth in the last few years with a two-fold increase in the number of entries in 2014. This follows the vastly increased submission of multiple genomes for the same or closely related organisms. This increase has been accompanied by a high level of redundancy in UniProtKB/TrEMBL and many sequences are over-represented in the database. This is especially true for bacterial species where different strains of the same species have been sequenced and submitted (e.g. 1,692 strains of Mycobacterium tuberculosis, corresponding to 5.97 million entries). To reduce this redundancy, we have developed a procedure to identify highly redundant proteomes within species groups using a combination of manual and automatic methods. We are going to apply this procedure to bacterial proteomes (which constitute 82% of UniProtKB/TrEMBL as of release 2015_02) and sequences corresponding to redundant proteomes (46.9 million entries) will be removed from UniProtKB. These sequence will still be available in the UniParc sequence archive dataset within UniProt. In the future, we will no longer create new UniProtKB/TrEMBL records for proteomes identified as redundant.

Protein sequences belonging to proteomes that are not identified as redundant will remain in UniProtKB. All proteomes will remain searchable through the UniProt website's Proteomes pages. Sequences corresponding to redundant proteomes will be available for download from UniParc and you will also be directed to alternate non-redundant proteome(s) available for the same species. The history (i.e. previous versions) of redundant UniProtKB records will remain available.

This removal of redundant proteomes from UniProtKB is expected to lead to a significant reduction in the number of (mainly electronic) GO annotations, and the effects will be seen in the next GOA release, which is scheduled for the week commencing Monday, 30th March, 2015.
04 February, 2015 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
07 January, 2015 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
27 November, 2014 All annotation files that we provide, in both GAF and GPAD format, now contain a GO-version tag in their header which gives the IRI of the version of the GO that was current when the files were published, for example:

!GO-version: http://purl.obolibrary.org/obo/go/releases/2014-11-13/go.owl

This will allow consumers of our annotation files to link a specific set of annotations to a specific version of the ontology.

29 October, 2014

In the October 1st release there was a database-related problem which meant the number of annotations based on the automatic mapping of UniProtKB Subcellular Location terms in UniProtKB entries to GO terms had slightly reduced. This has now been resolved and these annotations are back up to expected levels.


01 October, 2014 1) We are pleased to announce that in this release we now incorporate GO annotations to IntAct Complex Portal identifiers.
The IntAct Complex Portal (http://www.ebi.ac.uk/intact/complex/) is a manually curated resource of macromolecular complexes. An example of a curated complex is EBI-1224506, the human mitochondrial electron transport complex III (http://www.ebi.ac.uk/intact/complex/details/EBI-1224506).

These annotations are currently visible in our annotation files, except those that are based on UniProt reference proteomes as these contain annotations only to UniProtKB entries. The annotations are not visible in the current version of QuickGO (www.ebi.ac.uk/QuickGO), but will be available from the new version, which is due for release in the near future.

2) The number of annotations based on the automatic mapping of UniProtKB Subcellular Location terms in UniProtKB entries to GO terms has reduced slightly in this release. This is as a result of a database-related problem, which will be resolved by the time of the next release.
03 September, 2014 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
10 July, 2014 In this release we have improved the accuracy of automatic annotations by removing those annotations that violate taxon constraints. Some GO terms are applicable only to certain taxa and this is encoded in the GO taxon constraints. For example, if a GO term that is valid for use only with eukaryotes, e.g. GO:0000165 'MAPK cascade', is applied to a bacterial protein, the annotation would be incorrect and it would be deleted.
This process has resulted in the deletion of approximately 106,000 incorrect electronic annotations.
10 June, 2014 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
13 May, 2014 As a result of database-related issues, annotations provided by Human Protein Atlas (HPA) were omitted from our last set of release files.
We apologise for this omission, which was not noticed until after the files had been published, but we are pleased to say that the issue has been rectified, and the HPA annotations have been fully restored in this release.
15 April, 2014 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
18 March, 2014 We are now incorporating manual annotations from the Alzheimer's Project at the University of Toronto. Sejal Patel, an MSc student from the University of Toronto, is focusing on curating genes associated with Alzheimer’s disease that have been significant in previous genome wide association studies. For further information on this project, a brief project plan can be found at http://wiki.geneontology.org/index.php/Alzheimer%27s_Disease_Annotation_Project, or contact Sejal Patel, email sejalr.patel@mail.utoronto.ca
19 February, 2014 1. In this release we made further improvements to the pipeline that creates the GO Consortium 'inferred' annotations to reduce redundancy. This has caused a large decrease in the number of annotations that are assigned by 'GOC'.

2. Since January we have included annotations from a new project "Parkinson's UK-UCL", which is a project led by Dr. Ruth Lovering at University College London to annotate proteins involved in Parkinson's disease. Further information on this project can be found at http://www.ucl.ac.uk/cardiovasculargeneontology/cardiovascular/newsletters, or contact the project co-ordinator, Paul Denny, email p.denny@ucl.ac.uk.
22 January, 2014 1. As of this release we have suspended submission to the GO Consortium (GOC) of species-specific Gene Association Files if another
group is responsible for the provision of GO annotations to that species. This affects the following files:

gene_association.goa_arabidopsis
gene_association.goa_mouse
gene_association.goa_rat
gene_association.goa_zebrafish

These files will no longer be available from the GOC annotation download webpage (http://www.geneontology.org/GO.downloads.annotations.shtml) nor the
GOC ftp site (ftp://ftp.geneontology.org/pub/go/gene-associations/submission/).
Users will still be able to get annotations for all of these species from the UniProt multispecies file on the GOC website (http://www.geneontology.org/GO.downloads.annotations.shtml#unfilter).

The above species-specific files will continue to be made available from the UniProt-GOA ftp site (ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/).

2. As of the UniProt-GOA release in February 2014, we will remove all of the archived species-specific files mentioned above from the GOC CVS repository. These archived files will still be available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/old/

3. In this release there is a substantial increase in GO Consortium 'inferred' annotations. These annotations are automatically created based on inter-ontology links between Molecular Function and Biological Process terms and between Biological Process and Cellular Component terms. The increase is due to enhancements to the pipeline to take account of the GO hierarchy.

4. We are pleased to announce that this release of UniProt-GOA gene association files includes manual annotations for Trypanosoma brucei and Leishmania major that have been created by the GeneDB project.

Details of GeneDB can be found at: http://www.genedb.org/Homepage
11 December, 2013 Changes to the provision of UniProt GO annotation files to the GO Consortium.

1. As of this release we are additionally supplying the GO Consortium (GOC) with a set of species-specific annotation files for human, dog, pig, cow and chicken that are based on UniProt reference proteomes and provide one protein per gene. The protein accessions included in these files are the protein sequences annotated in Swiss-Prot or the longest TrEMBL transcript if there is no Swiss-Prot record.
The files will be available in both GAF2.0 and GPAD1.1 format and can be identified by the inclusion of "ref" in the file name, e.g. gp_association.goa_ref_human. The GAF2.0 files will be available from the GOC annotation downloads page (http://www.geneontology.org/GO.downloads.annotations.shtml) and the GOC ftp site (ftp://ftp.geneontology.org/pub/go/gene-associations/); the locations of the GPAD files will be announced at a later date.

These files are already available from the UniProt-GOA ftp site: ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/

2. As of our January 2014 release, we will suspend submission to the GO Consortium (GOC) of species-specific Gene Association Files if another
group is responsible for the provision of GO annotations to that species. This affects the following files:

gene_association.goa_arabidopsis
gene_association.goa_mouse
gene_association.goa_rat
gene_association.goa_zebrafish

These files will no longer be available from the GOC annotation download webpage (http://www.geneontology.org/GO.downloads.annotations.shtml) nor the
GOC ftp site (ftp://ftp.geneontology.org/pub/go/gene-associations/submission/).
Users will still be able to get annotations for all of these species from the UniProt multispecies file on the GOC website (http://www.geneontology.org/GO.downloads.annotations.shtml#unfilter).

The above species-specific files will continue to be made available from the UniProt-GOA ftp site (ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/).
14 November, 2013 UniProt-GOA are delighted to announce that we now supply more than 200 million annotations to over 30 million proteins.


All annotations are available in the 125th release of the UniProt-GOA 'UniProt' file. The vast majority of the annotations (99%) are created by automatic annotation methods, including InterPro to GO and Swiss-Prot Keyword to GO mappings.
For more information on the automatic annotation methods that we incorporate, please see http://www.ebi.ac.uk/GOA/ElectronicAnnotationMethods.
For details on our manual annotation practices, please see http://www.ebi.ac.uk/GOA/ManualAnnotationEfforts.


UniProt-GOA would like to thank all of the contributing groups that have helped us reach this milestone. Links to our collaborating groups are displayed on our website, http://www.ebi.ac.uk/GOA.
16 October, 2013 We are pleased to announce that this release of UniProt-GOA gene association files includes manual annotations for several bacterial and viral species as provided by the Community Assessment of Community Annotation with Ontologies (CACAO), a project to provide large-scale manual community annotation of gene function using the Gene Ontology.

Details of the CACAO project can be found at: http://gowiki.tamu.edu/wiki/index.php/Category:CACAO
The CACAO annotations can be viewed in QuickGO: http://www.ebi.ac.uk/QuickGO/GAnnotation?source=CACAO
18 September, 2013 As of our previous release, dated 24th July 2013, we have rationalized the attribution display for UniProt- and Ensembl-created annotations.
This attribution is shown in the 'assigned_by' column, which is column 15 of the Gene Association Files (GAF) and column 10 of the Gene Product Assocation Data files (GPAD). UniProt-created annotations all now have the attribution 'UniProt', this includes UniProt manual annotation and automatic annotation based on EC, HAMAP, UniProt keywords, UniProt subcellular location and UniPathway. Ensembl-created annotations all now have the attribution 'Ensembl', this includes the automatic annotation from Ensembl vertebrates, EnsemblFungi and EnsemblPlants/Gramene.
24 July, 2013 We are pleased to announce that the latest set of UniProt-GOA release files are available from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
26 June, 2013

We are pleased to announce that this release of UniProt-GOA gene association files includes manual annotations for the filamentous fungi, Aspergillus, that have been created by the AspGD Project, as well as manual annotations for Pseudomonas aeruginosa that have been created by the Pseudomonas aeruginosa Community Annotation Project (PseudoCAP).

30 May, 2013 UniProt-GOA are pleased to announce the release of a set of species-specific annotation files that are based on UniProt reference proteomes that provide one protein per gene. The protein accessions included in these files are the protein sequences annotated in Swiss-Prot or the longest TrEMBL transcript if there is no Swiss-Prot record.

The files are released in both GAF2.0 and GPAD format and there is an accompanying Gene Product Information file, which contains additional information on all the protein accessions in the species' reference proteome.

The files can be identified by the inclusion of "ref" in the file name, e.g. gp_association.goa_ref_human or gene_association.goa_ref_mouse and can be downloaded from the UniProt-GOA ftp site (ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/).

Further details on the file formats can be obtained from the readme in each species directory.
30 April, 2013 As of this release we are now supplying the species-specific annotation files in GPAD1.1 format. The files can be accessed from each species directory here: ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
Details of the file format can be found in the readme file at the same location.
03 April, 2013 We are pleased to announce that this release of UniProt-GOA gene association files includes manual annotations for archaeal and bacterial species that have been created by the Microbial ENergy processes Gene Ontology Project (MENGO).

Details of the MENGO project can be found at: http://mengo.vbi.vt.edu/
05 March, 2013 As of this release we are now supplying the UniProt annotation file in GPAD1.1 format. You can find the details of this format here: ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/gp_association_readme
06 February, 2013 In this release we have changed the referencing of GO Cellular Component annotations from the Human Protein Atlas (HPA) and LIFEdb. Previously all HPA annotations were referenced by PMID:18029348 and all LIFEdb annotations by PMID:11256614. These papers describe the pilot studies and methodology used to obtain the annotations rather than experiments for the individual protein localizations.

It was decided that these annotations would be more correctly described using a GO reference (GO_REF), which is an abstract describing the methodology behind a set of annotations.

The HPA annotations are now referenced by GO_REF:0000052 and the LIFEdb annotations by GO_REF:0000054. Both references are described here: http://www.geneontology.org/cgi-bin/references.cgi
28 November, 2012

Improvements to the UniPathway annotation pipeline.

In this release we are now obtaining cross-references for the UniPathway resource directly from UniProt. This has resulted in over a ten-fold increase in the number of annotations produced by this method. The UniPathway2GO annotation method is currently providing over 1.4 million annotations to more than 1.3 million proteins.
04 September, 2012 Reduced redundancy of InterPro GO annotations.

In this release we have altered our display of InterPro2GO annotations in order to reduce the redundancy. This has resulted in a large reduction in the number of GO annotations from InterPro from 83 million to around 51 million, a decrease of approximately 32 million annotations.

Previously, when different InterPro domains predict the same GO ID to the same protein, we displayed these as separate annotations. We have changed this so that all the InterPro domains that predict the same GO ID for the same protein will be piped together in the 'with' field of a single annotation line, ensuring no loss of data.

An example;
Previously there were 4 annotations linking GO:0030170 to A0A000 from InterPro2GO:

A0A000 GO:0030170 pyridoxal phosphate binding F IEA InterPro:IPR004839 20120512 InterPro
A0A000 GO:0030170 pyridoxal phosphate binding F IEA InterPro:IPR010961 20120512 InterPro
A0A000 GO:0030170 pyridoxal phosphate binding F IEA InterPro:IPR015421 20120512 InterPro
A0A000 GO:0030170 pyridoxal phosphate binding F IEA InterPro:IPR015422 20120512 InterPro

This data will instead be represented as:

A0A000 GO:0030170 pyridoxal phosphate binding F IEA InterPro:IPR004839|InterPro:IPR010961| InterPro:IPR015421|InterPro:IPR015422 20120512 InterPro

11 July, 2012

New EnsemblFungi annotations.

UniProt-GOA are pleased to announce the inclusion in their database of electronic GO annotations created by EnsemblFungi. The annotations are created by projection of manual GO annotations from Saccharomyces cerevisiae or Saccharomyces pombe proteins onto proteins from one or more target species based on gene orthology obtained from Ensembl Compara. This release contains over 41,000 annotations to over 9,000 proteins covering 36 taxonomies including; Ashbya gossypii, Emericella nidulans and Aspergillus species. The annotations can be viewed and downloaded from the QuickGO browser here.

16 May, 2012 1. New UniPathway2GO mapping provides an additional 113,285 GO annotations.

UniProt-GOA are pleased to announce that in collaboration with the Swiss Institute of Bioinformatics, INRIA (Rhone-Alpes) and Laboratoire d'Ecologie Alpine (Grenoble), we are able to offer an additional 113,285 GO annotations that describe the pathway(s) that 105,041 UniProtKB entries are involved in. 48% of these annotations apply a GO term that either uniquely describes a protein's involvement in a certain process, or supplies a more granular term than is supplied by other UniProt electronic annotation methods.

UniPathway is a manually curated resource of enzyme-catalyzed and spontaneous chemical reactions that provides a hierarchical representation of metabolic pathways. Descriptions of the pathway(s) that a particular protein is involved in are included in UniProtKB records. Further information on the UniPathway resource is available at
http://www.unipathway.org/obiwarehouse/unipathway .

So far, 425 UniPathway pathway terms have been manually mapped to GO terms. The mapping is available from: http://bit.ly/XIDc4f


The reference cited for these annotations is GO_REF:0000041 (for further details see: http://www.geneontology.org/cgi-bin/references.cgi#GO_REF:0000041 )

2. Inclusion of multiple identifiers in the 'with/from' column of UniProt-GOA annotations

UniProt-GOA is able to include in this release those GO annotations that have applied more than one identifier in the 'with/from' annotation field (column 8 of the GAF2.0 format). This means that 12,317 annotations from external annotation groups are now more fully represented in the UniProt-GOA files, ensuring we are able to provide a more comprehensive set of GO annotations to users.
03 April, 2012

1. Annotation post-processing

As of this release, UniProt-GOA will be displaying some IEA annotations that have been subject to minor post-processing by UniProt to correct the assigned GO term. The focus of the changes is to ensure taxonomic correctness of annotated GO terms, using data supplied by the GO taxon rules:

http://bit.ly/XcZw9s


Why is UniProt editing IEA annotations?

Edited IEA annotations are made by UniProt-GOA when the annotation originally supplied by the automatic annotation pipeline is incorrect for a UniProtKB protein and cannot be easily fixed by the annotation-contributing group without an unnecessarily high loss of correct annotations. This editing results in a conservatively-changed annotation, displaying an equivalent, correct GO term.

An example of this is the vaccinia virus DNA toposiomerase IB protein (UniProtKB:P68697), which has a prediction to 'GO:0005694; chromosome' from an electronic annotation source (InterPro). A slight change to this annotation prediction would lead to the correct term, 'GO:0044383; host chromosome', being supplied, which would be in accordance with the GO taxon rules, which require the term 'chromosome' only to be applied to cellular organisms ( http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0005694#info=3 )

Which annotation sets will be changed in this way?

UniProt-GOA will only carry out such changes to automatic annotation sets when it is the group primarily responsible for supplying the data for the proteome to the GO Consortium, and where the change to the annotation set at source is considered inappropriate. Such changes are made with the express agreement of the automatic annotation groups.

How to identify globally edited annotations?

All automatic annotations that are transformed by the UniProt-GOA processing will use a GO_REF reference that indicates to the user that such changes have occurred and which points users to this page: http://www.geneontology.org/cgi-bin/references.cgi . For example, UniProtKB:P68697, as described in the example above would be displayed with the accompanying GO_REF:0000042. Details of the annotation post-processing rules can be found at http://www.ebi.ac.uk/QuickGO/AnnotationPostProcessing.html .

2. Removal of automatic annotations that conflict with NOT-qualified manual GO annotations

In this release all IEA-evidenced annotations have been filtered to remove those which conflict with a NOT-qualified manual GO annotation for the same protein that has applied a GO identifier that is the same, or parent of the go identifier applied by the IEA method.

This filtering step has resulted in the removal of 8,000 incorrect IEA annotations from the UniProt file.

06 March, 2012 1. Change to the references applied by UniProtKB Keyword and UniProtKB Subcellular Location 2 GO annotations

As of this release, references supplied in annotations from two UniProtKB automatic annotation pipelines have changed.

Annotations created from mappings between GO and the UniProtKB keywords and UniProtKB Subcellular Location controlled vocabularies previously cited the GO reference GO_REF:0000004 and GO_REF:0000023, respectively.

However, terms from these UniProtKB controlled vocabularies are applied differently to UniProt Swiss-Prot and TrEMBL entries; UniProtKB terms are manually annotated to UniProtKB/Swiss-Prot entries, whereas UniProtKB/TrEMBL entries are annotated from data supplied by the underlying nucleic acid databases and/or by the UniProt automatic annotation program. As advised in our December 2011 release, we have now changed the cited references in the supplied GO annotations to highlight these differences.

From the current release onwards, the UniProt-GOA annotation set will use:
GO_REF:0000037 or GO_REF:0000038 instead of GO_REF:0000004 for UniProtKB keyword annotations
GO_REF:0000039 or GO_REF:0000040 instead of GO_REF:0000023 for UniProtKB Subcellular Location annotations

Further descriptions all of these references are available at: http://www.geneontology.org/cgi-bin/references.cgi

2. UniProt-GOA are delighted to announce that we now supply more than 120 million annotations to over 14 million proteins.

All annotations are available in the 106th release of the UniProt-GOA 'UniProt' file. The vast majority of the annotations (99%) are created by automatic annotation methods, including InterPro to GO and Swiss-Prot Keyword to GO mappings.
For more information on the automatic annotation methods that we incorporate, please see http://www.ebi.ac.uk/GOA/ElectronicAnnotationMethods.html .
For details on our manual annotation practices, please see http://www.ebi.ac.uk/GOA/ManualAnnotationEfforts.html .

UniProt-GOA would like to thank all of the contributing groups that have helped us reach this milestone. Links to our collaborating groups are displayed on our website, http://www.ebi.ac.uk/GOA .
10 January, 2012 Improvements to the HAMAP2GO annotation set

We are very pleased to announce that this UniProt-GOA release contains an improved set of HAMAP2GO annotations. The previous HAMAP2GO pipeline was unable to use the full complexity of the manually-curated UniProtKB HAMAP rules, however this has changed due to recent developments. The HAMAP source now predicts 3,458,519 GO annotations.
13 December, 2011 Scheduled change to the references applied by UniProtKB Keyword and UniProtKB Subcellular Location 2 GO annotations

We would like to alert users to an intended change in the references supplied in annotations from two UniProtKB automatic annotation pipelines.

Annotations created from mappings between GO and the UniProtKB keywords and UniProtKB Subcellular Location controlled vocabularies currently cite the GO reference GO_REF:0000004 and GO_REF:0000023, respectively.

However, terms from these UniProtKB controlled vocabularies are applied differently to UniProt Swiss-Prot and TrEMBL entries; UniProtKB terms are manually annotated to UniProtKB/Swiss-Prot entries, whereas UniProtKB/TrEMBL entries are annotated from data supplied by the underlying nucleic acid databases and/or by the UniProt automatic annotation program. We intend to change the cited references in the supplied GO annotations to highlight these differences.

Therefore, from the February 2012 release onwards, the UniProt-GOA annotation set will use:
GO_REF:0000037 or GO_REF:0000038 instead of GO_REF:0000004 for UniProtKB keyword annotations
GO_REF:0000039 or GO_REF:0000040 instead of GO_REF:0000023 for UniProtKB Subcellular Location annotations

Further descriptions all of these references are available at: http://www.geneontology.org/cgi-bin/references.cgi
20 September, 2011 UniProt-GOA are delighted to announce that we now supply more than 100 million annotations to over 11 million proteins.

All annotations are available in the 100th release of the UniProt-GOA 'UniProt' file. The vast majority of the annotations (99%) are created by automatic annotation methods, including InterPro to GO and Swiss-Prot Keyword to GO mappings.
For more information on the automatic annotation methods that we incorporate, please see http://www.ebi.ac.uk/GOA/ElectronicAnnotationMethods.html .
For details on our manual annotation practices, please see http://www.ebi.ac.uk/GOA/ManualAnnotationEfforts.html .

UniProt-GOA would like to thank all of the contributing groups that have helped us reach this milestone. Links to our collaborating groups are displayed on our website, http://www.ebi.ac.uk/GOA .
24 August, 2011

In this release, the UniProt GO annotation files contain a greatly reduced number of protein binding GO annotations from the IntAct database. A subset of presumed reliable interactions is now extracted from the IntAct dataset with export determined using a simple scoring system developed by IntAct, coupled to a score threshold that has been deliberately chosen to exclude interactions supported by only one experimental observation. Further details of how interactions are scored can be found at the IntAct website ( http://www.ebi.ac.uk/intact/pages/faq/faq.xhtml#4 ). This simple score-based filter is used in combination with a set of defined rules that excludes certain types of data, such as interactions that have been inferred but not experimentally proven.

UniProt-GOA are pleased to announce the inclusion in their database of electronic GO annotations created by EnsemblPlants/Gramene . The annotations are created by projection of GO annotations from Arabidopsis thaliana or Oryza sativa proteins onto proteins from one or more target species based on gene orthology obtained from Ensembl Compara. This first release contains almost 230,000 annotations to over 50,000 proteins covering 16 taxonomies including; poplar, maize, sorghum, grape and Physcomitrella. We hope this will be a valuable resource for the non-model plant species community. The annotations can be viewed and downloaded from the QuickGO browser here .

28 July, 2011 1. With the imminent closure of the International Protein Index (IPI), we would like to alert users of the Human, Mouse, Rat, Zebrafish, Chicken and Cow UniProt GO annotation files (files named: gene_association.goa_[species], e.g., gene_association.goa_human) that our July GO annotation release now uses UniProt Complete Proteome sets to determine the protein composition of these files. Further information on UniProt Complete Proteome sets is available here .

This change has had a dramatic affect on the gene_association.goa_human file, which has increase in annotation count by 43.7% , as the file now includes GO annotations both to reviewed (Swiss-Prot) and unreviewed (TrEMBL) UniProtKB accessions. Any user wishing to only identify the reviewed (Swiss-Prot) UniProt protein annotation subset will be able continue to do so using the information supplied in the gp_information.goa_uniprot file, which can be found here .
Alternatively, users can download the reviewed UniProtKB human GO annotation set from the UniProt QuickGO browser using this link .

Users of the Zebrafish species-specific gene association file will, however, notice a decrease in annotations this release, caused by major changes to the Ensembl Zebrafish ENSP identifier set that has resulted in a decreased set of electronic GO annotations obtained from the application of Ensembl Compara orthology data, as well as a decrease in ZFIN manual GO annotations caused by differences in the set of UniProtKB accessions selected for the Zebrafish UniProtKB proteome set and those currently mapped to ZFIN identifiers.

2. An additional restriction on the HAMAP2GO electronic GO annotation pipeline, filtering out all HAMAP2GO predictions for proteins in the Metazoan kingdom, has enabled us to remove incorrect GO annotation predictions for a number of species.

3. Species-specific UniProt GO annotation gene association files that include a filtering step to remove redundant electronic GO annotation predictions are now available from the GOA ftp site for the UniProt Complete Proteome sets of Dictyostellium discoideum (gene_association.goa_dicty.gz), Canis familiaris (gene_association.goa_dog.gz), Drosophila melanogaster (gene_association.goa_fly.gz), Caenorhabditis elegans (gene_association.goa_worm.gz), Saccharomyces cerevisiae (gene_association.goa_yeast) and Sus scrofa (gene_association.goa_pig.gz).
29 June, 2011 With the imminent closure of the International Protein Index (IPI), we would like to alert users of the Human, Mouse, Rat, Zebrafish, Chicken and Cow UniProt GO annotation files (files named: gene_association.goa_[species], e.g., gene_association.goa_human) that our July GO annotation release will use the UniProt Complete Proteome sets to determine the protein composition of these files. Further information on UniProt Complete Proteome sets is available here .

This change will mean that from July the gene_association.goa_human will increase in size, as it will include GO annotations both to reviewed (Swiss-Prot) and unreviewed (TrEMBL) UniProtKB accessions. Any user wishing to identify the reviewed UniProt protein annotation subset will be able continue to do so using the information supplied in the gp_information.goa_uniprot file, which can be found here . Alternatively, users can download the reviewed UniProtKB human GO annotation set from the UniProt QuickGO browser using this link .
09 June, 2011 1. Inferred Cellular Component GO annotations now included in the UniProt-GOA annotation set.

We are pleased to announce an additional set of Cellular Component GO annotations available in this release that have been automatically generated from the 'occurs_in' relationship, made available as intersection tags in Biological Process terms in the GO OBO v1.2 format.

For example:

[Term]
id: GO:0033579 ! protein amino acid galactosylation in endoplasmic reticulum
intersection_of: GO:0042125 ! protein amino acid galactosylation
intersection_of: occurs_in GO:0005783 ! endoplasmic reticulum

This information is used to describe Biological Processes that occur in a specific Cellular Component. As many GO users do not currently reason over the GO inter-ontology relationships, the set of inferred annotations has been generated to improve the consistency of the annotation set. Such GO annotations are produced when an annotation has been made (either manually or electronically) to a process term that, either directly or via one of its parent terms, has an 'occurs_in' relationship to a component term and where the component term (or one of its children) has not already been used in the annotation set for the same gene product identifier. The generated inferred annotations apply the same gene product identifier, reference and evidence code as the asserted Processs annotation and are generated from all sources of GO annotations, with only 'NOT'-qualified annotations being excluded. All such inferred GO annotations can be identified by the 'GOC' value in the 'assigned_by' field (column 15).

These relationships are also visible in the QuickGO Ancestor Chart display, for example: http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0033579#term=ancchart

2. Annotations that apply the new manual GO evidence codes IBA, IBD, IKR and IRD are now available in the UniProt-GOA annotation set.
For further information on these recently created manual evidence codes, please consult the GO website for code definitions: http://www.geneontology.org/GO.evidence.shtml . These types of evidenced GO annotations are currently being created by the GO Consortium's Reference Genome project, identified by the 'RefGenome' value in the assigned_by field (column 15).
04 May, 2011 UniProt-GOA now incorporates annotations from external groups that use a GO reference (GO_REF) or a MOD-specific reference that can be converted to an equivalent GO_REF using the mappings defined in http://www.geneontology.org/doc/GO.references in their reference field. Previously, UniProt-GOA only accepted annotations that used a PubMed identifier in this field.
An example of a GO_REF that we are now accepting is GO_REF:0000015 'Use of the 'No biological data' (ND) evidence code for Gene Ontology terms'.
A description and complete list of the GO_REFs available can be found at http://www.geneontology.org/cgi-bin/references.cgi .
08 April, 2011 A greater diversity of identifiers in the 'with' field (column 8) of manual annotations and a rise in the number of Reference Genome annotations integrated.

Over the last month we have been working to provide a more complete display of the manual annotations that we integrate into the UniProt-GOA dataset from external annotation groups. Whereas previously the 'with' field (column 8) in our annotation file was left empty if a manual annotation did not include either UniProtKB or GO identifier, our files now displays 43 different gene, protein and chemical identifier types (such as WormBase, CHEBI and EcoCyc identifiers) in this field. This development ensures that integrated manual GO annotations display with the full set of information that curation groups have used when translating experimental data into a GO annotation.

The UniProt-GOA files also now contain a larger set of the manual annotations supplied by the GO Consortium's Reference Genome project (source: RefGenome). This project has generated inferred annotations for 47 species using GO Consortium manual annotations and phylogenetic trees from gene families. The Reference Genome project is fully described here: http://www.geneontology.org/GO.refgenome.shtml .
09 February, 2011 Inferred Biological Process GO annotations now included in the UniProt-GOA annotation set.

We are pleased to announce an additional set of GO annotations available in this release that have been automatically generated from the Molecular Function (MF) -> Biological Process (BP) inter-ontology relationships present in the GO OBO v1.2 format.

As many GO users do not currently reason over the GO inter-ontology relationships, a set of inferred annotations has been generated to improve the consistency of the Biological Process annotation set. These GO annotations are produced when an annotation has been made (either manually or electronically) to a Molecular Function term that, either directly or via one of its parent terms, has a relationship to a Biological Process term and where the Process term (or one of its children) has not already been used in the annotation set for the same gene product identifier. This inferred annotation set applies the same gene product identifier, reference and evidence code as the asserted function annotation and are generated from all sources of GO annotations, with only 'NOT'-qualified annotations being excluded. All such inferred GO annotations can be identified by the 'GOC' value in the 'assigned_by' field (column 15).
14 December, 2010 GOA proteomes files are now created using UniProtKB proteome data

As of this latest release, UniProt-GOA are creating the proteomes gene association files using the UniProtKB proteomes data. The UniProt-GOA proteomes files are available here: http://www.ebi.ac.uk/GOA/proteomes.html

The proteome sets provided by UniProt-GOA were previously based on those defined by the Integr8 project. Integr8 is planning to close after the launch of Ensembl Genomes as the next-generation interface for genome-scale data from non-vertebrate species. UniProtKB is taking over responsibility for the maintenance of the complete proteome sets, which can be found here: http://www.uniprot.org/taxonomy/complete-proteomes

There are two main consequences that should be noted:
1) The proteomes are available only for those species that have a complete genome, as defined by INSDC. For the complete genomes, please see http://www.ebi.ac.uk/genomes and the NCBI Project database, http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj
2) The entries within the proteome are the direct translation of the sequenced reference genome; this logic is applied to all species. Any species-specific rules that Integr8 used have not been propagated to the new sets. The proteome sets are subsets of the GOA-UniProt gene association file, and consequently if a protein accession is deemed not to be present in a particular proteome then any manual annotations made to this protein will not be visible in the proteome set; they will, however, still be available from the GOA-UniProt gene association file.
17 November, 2010 Important news about intended change to the GOA proteomes files

The current proteome sets provided by GOA are based on those defined by the Integr8 project. Integr8 is planning to close after the launch of Ensembl Genomes as the next-generation interface for genome-scale data from non-vertebrate species. UniProtKB is taking over responsibility for the maintenance of the complete proteome sets, which can be found here:
http://www.uniprot.org/taxonomy/complete-proteomes

There are two main consequences that should be noted:
1) The proteomes are available only for those species that have a complete genome, as defined by INSDC. For the complete genomes, please see http://www.ebi.ac.uk/genomes and the NCBI Project database, http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj
2) The entries within the proteome are the direct translation of the sequenced reference genome; this logic is applied to all species. Any species-specific rules that Integr8 used have not been propagated to the new sets. The proteome sets will be subsets of the GOA-UniProt gene association file, and consequently if a protein accession is deemed not to be present in a particular proteome then any manual annotations made to this protein will not be visible in the proteome set; they will, however, still be available from the GOA-UniProt gene association file.

We plan to implement this change in the next UniProt-GOA release, which is due on or around 14 December 2010.
19 October, 2010 Important news about intended change to the GOA proteomes files

The current proteome sets provided by GOA are based on those defined by the Integr8 project. Integr8 is planning to close after the launch of Ensembl Genomes as the next-generation interface for genome-scale data from non-vertebrate species. UniProtKB is taking over responsibility for the maintenance of the complete proteome sets, which can be found here:
http://www.uniprot.org/taxonomy/complete-proteomes

There are two main consequences that should be noted:
1) The proteomes are available only for those species that have a complete genome, as defined by INSDC. For the complete genomes, please see http://www.ebi.ac.uk/genomes and the NCBI Project database, http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj h

2) The entries within the proteome are the direct translation of the sequenced reference genome; this logic is applied to all species. Any species-specific rules that Integr8 used have not been propagated to the new sets. The proteome sets will be subsets of the GOA-UniProt gene association file, and consequently if a protein accession is deemed not to be present in a particular proteome then any manual annotations made to this protein will not be visible in the proteome set; they will, however, still be available from the GOA-UniProt gene association file.

Addition of annotations from the GO Consortium Reference Genome project

We are pleased to announce that this release of UniProt-GOA gene association files includes manual annotations from the GO Consortium Reference Genome project. GO terms based on experimental data from the scientific literature are used to annotate ancestral genes in phylogenetic trees from the PANTHER database ( http://www.pantherdb.org ) by sequence similarity (evidence code ISS), and unannotated descendants of these ancestral genes are inferred to have inherited these same GO annotations by descent. The annotations are created using a tool called PAINT (Phylogenetic Annotation and INference Tool).
Further information on this method can be found at: http://gocwiki.geneontology.org/index.php/PAINT

New user interface of QuickGO is now released

The beta testing phase of the new QuickGO user interface is now complete. The new interface is available at the usual QuickGO URL, http://www.ebi.ac.uk/QuickGO . We would very much welcome any feedback about the new interface. If you notice a problem please contact us with the URL of the affected page, what you were trying to accomplish and the name and version of the browser you are using. If you are unsure about how to use a particular feature of QuickGO, please do not hesitate to email us at goa@ebi.ac.uk with your questions.
21 September, 2010 We are pleased to announce that this release of UniProt-GOA gene association files includes manual annotations for Mycobacterium tuberculosis that have been created by MTBbase
28 July, 2010 Please note that GOA has now stopped producing the EC2GO mapping file and it will no longer be available from our ftp site. The production of EC2GO has been passed to the GO Consortium and the file can be accessed from the GO Consortium ftp site here; ftp://ftp.geneontology.org/pub/go/external2go/ec2go or from GO CVS here; http://cvsweb.geneontology.org/cgi-bin/cvsweb.cgi/go/external2go/ec2go .

We apologise for a delay in bringing you the GOA proteome sets. These will be available the first week in August.
03 June, 2010 1. Addition of annotations from CGD, Plasmodium falciparum GeneDB and PAMGO
2. New files provided describing specifically associations and annotation object
data for the UniProtKB gene association file.
3. Change in contents of column 1 in all GOA-UniProtKB gene association files.

1. We are pleased to announce that this release of UniProt-GOA gene
association files includes manual annotations for Candida albicans,
Plasmodium falciparum and Agrobacterium tumefaciens UniProtKB accessions
that have been created by the Candida Genome Database
(http://www.candidagenome.org/), Plasmodium falciparum GeneDB
(http://www.genedb.org/genedb/malaria) and Agrobacterium Genome
Consortium, PAMGO project (http://www.agrobacterium.org/) respectively,
from files these groups have submitted to the GO Consortium.

2. The UniProt-GOA group is making available two new files that have
been generated from gene_association.goa_uniprot. These files have
split between them the information specfically required to describe a GO
annotation (gp_association.goa_uniprot) and to describe the proteins
for which annotations are provided (gp_information.goa_uniprot). Use of these
two files instead of the gene association file has the advantage of reduced
redundancy in the information supplied, resulting in a combined size of
126MB less than the gene_association.goa_uniprot file. However the
format of these two files is subject to ongoing discussions by the GO
Consortium, so their exact format may change over time. Readmes
to describe the format of these files are available from the ftp site,
alongside these two files. Files can be downloaded from:

ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/gp_association.goa_unip...
ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/gp_information.goa_unip...

3. The contents of column 1 (DB) of all UniProt-GOA gene association
files has changed in this release, so that UniProtKB accession numbers
are only identified by the namespace 'UniProtKB' instead of the previous
values of either: 'UniProtKB/TrEMBL' or 'UniProtKB/Swiss-Prot'. This
change has occurred in response to requests from GO tool providers.
However the new gp_information.goa_uniprot.gz file described above does
contain the UniProtKB subset (Swiss-Prot/TrEMBL) information in column 2
for all UniProtKB proteins that have been GO annotated.
07 May, 2010 Please note that in this release all UniProt-GOA gene association files now correctly attribute the InterPro group as the source of annotations generated by the InterPro2GO electronic annotation pipeline. This means that the value in column 15 (Assigned_By) has changed from 'UniProtKB' to 'InterPro' where column 6 (DB:Reference) displays the reference 'GO_REF:0000002'.
22 April, 2010 UniProt-GOA news:
1. UniProt-GOA files supplied in GAF2.0 format
2. New PDB gene association file

1. UniProt-GOA files supplied in GAF2.0 format
In line with GO Consortium intentions, the format of GO annotation files available from the UniProt-GOA project are now supplied in a 17-column GAF format (GAF2.0) instead of the previous 15 columns. This new association format has been prepared with the GO Consortium's 1st June 2010 switchover date in mind, as after this date the GO Consortium's gene-associations ftp directory (ftp://ftp.geneontology.org/pub/go/gene-associations/) will only supply users with GAF2.0 formatted files.

However, if you require the older, 15-column GAF1.0 format for any of the UniProt-GOA files, a Perl conversion script is now made available at: ftp://ftp.ebi.ac.uk/pub/databases/GO/GOA/tools/gaf_convert.pl

This script will assist users to convert any UniProt-GOA file from the newer GAF2.0 format to GAF1.0; it is used thus:

perl gaf_convert.pl < input_file > output_file

where input_file is a GAF2.0 annotation file, and output_file is the generated GAF1.0 file.

The format of the new GAF2.0 annotation file is described in the GO web page below:
http://www.geneontology.org/GO.format.gaf-2_0.shtml

In essence, this format change means that two columns have been added to the end of the previous tab-delimited file format:

Column 16 (Annotation Extension). It is intended that this column will provide additional cross references to other ontologies that can be used to qualify or enhance a GO annotation. As discussions are ongoing within the GO Consortium as to the exact format of this field, the column is currently empty in all UniProt-GOA files.

Column 17 (Gene Product Form ID). This column will specifically identify which gene product is being annotated. Where possible, annotations from the UniProt-GOA group will supply isoform identifiers in column 17 to identify isoform-specific functionality.

It will not be mandatory for users to use the information supplied in columns 16 and 17; it is intended that this data will enhance the annotation descriptiveness, but will not radically modify the interpretation of the annotation information supplied in columns 1 to 15.

2. New PDB gene association file
We are pleased to include in this release a new format for the PDB gene association file. This file has been generated from a collaboration between the InterPro, PDB and UniProt-GOA teams, and once again is able to offer annotations to PDB chain identifiers. In addition, further sources of GO annotations are now associated with PDB chains, to provide a more comprehensive PDB GO annotation resource. Manual and electronic GO annotations are now provided in this file from two sources:

1. where an InterPro entry matches a PDB chain, annotations supplied by the InterPro2GO electronic method are assigned to the chain identifier (for further details on this method see: http://www.geneontology.org/cgi-bin/references.cgi#GO_REF:0000002).

2. PDB chains are additionally supplied with manual and electronic GO annotations (excluding InterPro2GO) when a PDB chain maps with at least 90% identity to a UniProtKB accession (more specifically with the UniProtKB's CHAIN feature), whereupon manual and electronic annotations are supplied to the PDB chain identifier from the matching UniProtKB accession.
04 March, 2010 It is intended that the contents of column 1 in UniProt-GOA gene association files will change at our next monthly release. Currently column 1 displays the values 'UniProtKB/TrEMBL' or 'UniProtKB/Swiss-Prot' to indicate which section of UniProtKB an accession is a member (please see our readme for further information on our annotation format: http://www.ebi.ac.uk/GOA/goaHelp.html ). However, these two different UniProtKB descriptions have caused processing issues for GO Consortium tools. Therefore we are intending that information on whether a protein is a member of the UniProtKB Swiss-Prot or TrEMBL sets will be provided to users in a tab-separated, supplementary gene product information file that will be released alongside UniProt-GOA gene association files, and column 1 will be changed to consistently display 'UniProtKB' for all UniProtKB accessions. Please contact us if you have any concerns regarding this format change.
18 December, 2009 We are pleased to announce that this release of UniProt-GOA gene association files includes manual annotations created by the EcoCyc and EcoWiki groups.
30 November, 2009 We are pleased to announce that this release of UniProt-GOA gene association files excludes annotations that apply a secondary or obsolete UniProtKB accession number
in either column 2 (DB_Object_ID) or column 8 (With).
This has meant that 3,929 (4.5%) manual annotations integrated from external annotation groups have been filtered from UniProt-GOA files. We are working closely with
all collaborating databases to help ensure that their set of UniProtKB accessions are regularly updated to valid primary UniProtKB accessions.
08 October, 2009 Please note that the contents of the gene_association.goa_pdb file has changed as of this release, as UniProt-GOA is no longer able to provide annotations to specific PDB chains. In future annotations in this file will only associate PDB entries with GO terms. As an example, an annotation line in the PDB file has changed from:

PDB 102M_A q_A GO:0005506 GOA:interpro IEA InterPro:IPR000971 F protein_structure taxon:9755 20090629 UniProtKB
to:
PDB 102M 102M GO:0005506 GOA:interpro IEA InterPro:IPR000971 F protein_structure taxon:9755 20091001 UniProtKB

Please contact us if this change is of concern.
July 30th 2009 Please note that UniProt-GOA is intending to stop production of the gene_association.goa_uniprot_slimmed.gz file after this release.

Annotations to GO slims can now be generated using UniProt-GOA's QuickGO tool. QuickGO provides a range of features to help users to create GO slims and map annotations; the tool also integrates pre-defined slims, including the UniProt-GOA slim. Information on QuickGO's annotation slimming facilities is available at: http://www.ebi.ac.uk/QuickGO/GMultiTerm and http://www.ebi.ac.uk/QuickGO/tutorial.html

Other files provided in the UniProt-GOA/goslim/ ftp directory will continue to be maintained. Please contact us if you have any concerns regarding this change in the UniProt-GOA release procedure.
July 2nd 2009 Important News about intended file format change:

Previously, annotation lines produced by electronic methods (as indicated by the presence of the 'IEA' evidence code in column 7) contained two identifiers piped together in the reference column; an internal UniProt-GOA keyword (e.g. GOA:interpro) and a GO reference identifier (e.g. GO_REF:0000002). As of this fille release, the UniProt-GOA internal reference (e.g. GOA:interpro) and pipe has been removed from this field, so that only a GO_REF identifier is provided.

Full descriptions of the methods referenced by a GO reference identifier can be found at:  http://www.geneontology.org/cgi-bin/references.cgi

This format change effects all annotation lines that  had the following reference field contents:

                    GOA:interpro|GO_REF:0000002 GOA:hamap|GO_REF:0000020 GOA:spkw|GO_REF:0000004 GOA:spec|GO_REF:0000003 GOA:compara|GO_REF:0000019
                    GOA:spsl|GO_REF:0000023
                

We are also pleased to announce that  column 1 of the gene association files 'DB' now differentiates (where possible) between UniProt/TrEMBL and UniProt/Swiss-Prot.
May 27th 2009 Important News about intended file format change:

Please note that UniProt-GOA is intending to implement two changes to the format of the gene association files that will affect annotations supplied in future UniProt-GOA releases. These changes will affect columns 1 and 6:

1. DB field (column 1). Column 1 of the gene association file is used to identify the database which has supplied the sequence identifier displayed in column 2. Its value is very often 'UniProtKB'. However recent changes to other fields in the UniProt-GOA gene association files have made it difficult for users to identify whether an UniProtKB accession originates from the UniProtKB/Swiss-Prot or UniProtKB/TrEMBL databases. Therefore it is intended that when a UniProtKB accession is provided in column 2, column 1 will in future display either 'UniProtKB/Swiss-Prot' or 'UniProtKB/TrEMBL'.

2. DB:Reference field (column 6). Changes to this field were advertised last month, however it has not been possible to carry out the required work in time for this release. Therefore the changes outlined again below will be implemented in the next UniProt-GOA file release: Currently annotation lines produced by electronic methods (as indicated by the presence of the 'IEA' evidence code in column 7) contain two identifiers piped together in the reference column; an internal UniProt-GOA keyword (e.g. GOA:interpro) and a GO reference identifier (e.g. GO_REF:0000002). As of the next file release, the UniProt-GOA internal reference (e.g. GOA:interpro) and pipe will be removed from this field, so that only a GO_REF identifier will be provided.

Full descriptions of the methods referenced by a GO reference identifier can be found at: http://www.geneontology.org/cgi-bin/references.cgi

This format change will effect all annotation lines with the following reference field contents:
                    GOA:interpro|GO_REF:0000002 GOA:hamap|GO_REF:0000020 GOA:spkw|GO_REF:0000004 GOA:spec|GO_REF:0000003 GOA:compara|GO_REF:0000019
                    GOA:spsl|GO_REF:0000023
                
April 24th 2009 Important News about intended file format change:

Please note that UniProt-GOA intends to change the content of the DB:Reference field (Column 6) for all electronic annotations supplied in our gene association files at the next file release.

Currently annotation lines produced by electronic methods (as indicated by the presence of the 'IEA' evidence code in column 7) contain two identfiers piped together in the reference column; an internal UniProt-GOA keyword (e.g. GOA:interpro) and a GO reference identifier (e.g. GO_REF:0000002). As of the next file release, the UniProt-GOA internal reference (e.g. GOA:interpro) and pipe will be removed from this field, so that only a GO_REF identifier will be provided.

Full descriptions of the methods referenced by a GO reference identifier can be found at: http://www.geneontology.org/cgi-bin/references.cgi

This format change will effect all annotation lines with the following reference field contents:
                    GOA:interpro|GO_REF:0000002 GOA:hamap|GO_REF:0000020 GOA:spkw|GO_REF:0000004 GOA:spec|GO_REF:0000003 GOA:compara|GO_REF:0000019
                    GOA:spsl|GO_REF:0000023
                
February 6th 2009 Please note
Please note that the gene_association.,GOA_human file provided in the next UniProt-GOA release will no longer be made using the IPI non-redundant human protein set. ( http://www.ebi.ac.uk/IPI/ ). Instead, the next version of this file will now use the complete human proteome now available in UniProtKB/Swiss-Prot ( http://www.uniprot.org/news/2008/09/02/release ). This change will enable us to provide a non-redundant set of annotations for the human proteome, therefore please expect a sharp drop in both the number of distinct sequence identifiers and in the total number of electronic annotations in the new file.

The name and format of this human file will remain the same, however annotations will be assigned to proteins only from the 'UniProtKB' (column 1) database source. Human IPI identifiers will continue to be included in column 11 of annotations.

In addition, the cross-references file for human IPI set (human.xrefs.gz), will no longer be provided. Instead, identifier mapping will be possible using the UniProt ID mapping file, available from: http://bit.ly/Y2ieMK

idmapping.dat.gz is a tab-delimited table, which includes mappings for 20 different sequence identifier types (and will be expanded in time for the next file release to include IPI identifiers).

A readme for this file is available from:  http://bit.ly/WjFiwk
September 18th 2008 UniProt-GOA gene association file format changes in this release
Changes to the contents of columns 3, 10 and 11 in all of the UniProt-GOA gene association files except gene_association.goa_pdb.gz.

Column 3 (DB_Object_Symbol)
Previous content: UniProt identifier (e.g. PRG4_HUMAN)
New content: Primary gene symbol when available (e.g. PRG4), otherwise the value present in column 2 (either a UniProtKB accession, IPI, Ensembl, VEGA, HINV, TAIR or RefSeq peptide identifier).

Column 10 (DB_Object_Name)
Previous content: A list of gene symbols and protein name. (e.g. PRG4, MSF, SZP: Proteoglycan-4 precursor)
New content: protein name only when available (e.g. Proteoglycan-4 precursor). Otherwise blank.

Column 11 (Synonym)
Previous content: IPI identifiers when available (e.g. IPI00024825)
New content: A pipe-delimited list of alternative gene symbol synonyms, IPI and UniProtKB identifiers (e.g. MSF|SZP|IPI00024825|PRG4_HUMAN).
June 18th 2008 Proposed UniProt-GOA gene association file format change
The UniProt-GOA group is intending to change the contents of columns 3, 10 and 11 in all UniProt-GOA gene association files except gene_association.goa_pdb.gz. The changes being proposed are outlined below and will ensure that the format of the affected columns is in line with GO Consortium requirements. While the ordering of identifiers in these columns will change, no identifiers will be removed. These changes will come into effect at the end of July (Release 65 of UniProt-GOA UniProt).

Column 3 (DB_Object_Symbol)
Current content: UniProt identifier (e.g. PRG4_HUMAN) Future content: Primary gene symbol when available (e.g. PRG4), otherwise will contain locus name or will repeat the value present in column 2 (either a UniProtKB accession, IPI, Ensembl, VEGA, HINV, TAIR or RefSeq peptide identifier).

Column 10 (DB_Object_Name)
Current content: A list of gene symbols and protein name. (e.g. PRG4, MSF, SZP: Proteoglycan-4 precursor)
Future content: protein name only when available (e.g. Proteoglycan-4 precursor). Otherwise will be left blank.

Column 11 (Synonym)
Current content: IPI identifiers when available (e.g. IPI00024825) Future content: A pipe-delimited list of alternative gene symbol synonyms, IPI and UniProtKB identifiers (e.g. MSF|SZP|IPI00024825|PRG4_HUMAN).

Please contact us if you have any concerns about these proposed changes.
May 27th 2008 In this release 1,255 GO annotations which applied the now-illegal evidence code 'NR' have been removed.
These annotations had been applied to human proteins at the beginning of the human annotation effort. However no reference had been provided to indicate the source of evidence supporting the GO term- protein association, and therefore are now not considered to be sufficiently reliable to include in the UniProt-GOA dataset.
May 6th 2008 In this release we have included Reactome manual annotations to the 'EXP' evidence code.

Please note that this release does not contain electronic annotation using Ensembl Compara projections. This is due to a database problem in Ensembl Compara, we hope to correct this in the next release.… more
April 2nd 2008 The UniProt-GOA group has started to produce, in collaboration with the British Heart Foundation-funded Cardiovascular Gene Ontology Annotation Initiative, a BHF-UCL gene association file… more
December 18th 2007 Please note that in this release we use the abbreviation 'UniProtKB' instead of 'UniProt'. This follows the naming conventions set by the UniProt Consortium.

We have added 2 new gp2protein files in this release containing all UniProtKB and RefSeq accessions for human and chicken in the followings files:

gp2protein.human.gz
gp2protein.chicken.gz

more
June 18th 2008 Proposed UniProt-GOA gene association file format change
The UniProt-GOA group is intending to change the contents of columns 3, 10 and 11 in all UniProt-GOA gene association files except gene_association.goa_pdb.gz. The changes being proposed are outlined below and will ensure that the format of the affected columns is in line with GO Consortium requirements. While the ordering of identifiers in these columns will change, no identifiers will be removed. These changes will come into effect at the end of July (Release 65 of UniProt-GOA UniProt).

Column 3 (DB_Object_Symbol)
Current content: UniProt identifier (e.g. PRG4_HUMAN) Future content: Primary gene symbol when available (e.g. PRG4), otherwise will contain locus name or will repeat the value present in column 2 (either a UniProtKB accession, IPI, Ensembl, VEGA, HINV, TAIR or RefSeq peptide identifier).

Column 10 (DB_Object_Name)
Current content: A list of gene symbols and protein name. (e.g. PRG4, MSF, SZP: Proteoglycan-4 precursor)
Future content: protein name only when available (e.g. Proteoglycan-4 precursor). Otherwise will be left blank.

Column 11 (Synonym)
Current content: IPI identifiers when available (e.g. IPI00024825) Future content: A pipe-delimited list of alternative gene symbol synonyms, IPI and UniProtKB identifiers (e.g. MSF|SZP|IPI00024825|PRG4_HUMAN).

Please contact us if you have any concerns about these proposed changes.
May 27th 2008 In this release 1,255 GO annotations which applied the now-illegal evidence code 'NR' have been removed.
These annotations had been applied to human proteins at the beginning of the human annotation effort. However no reference had been provided to indicate the source of evidence supporting the GO term- protein association, and therefore are now not considered to be sufficiently reliable to include in the UniProt-GOA dataset.
May 6th 2008 In this release we have included Reactome manual annotations to the 'EXP' evidence code. Please note that this release does not contain electronic annotation using Ensembl Compara projections. This is due to a database problem in Ensembl Compara, we hope to correct this in the next release.
April 2nd 2008 The UniProt-GOA group has started to produce, in collaboration with the British Heart Foundation-funded Cardiovascular Gene Ontology Annotation Initiative, a BHF-UCL gene association file, which contains all GO annotations for 4,027 human proteins implicated in cardiovascular development or disease. The BHF-UCL file is a sub-set of the UniProt gene association file, and will provide the cardiovascular community with a discrete set of relevant GO annotations. This file will be updated by UniProt-GOA monthly and can be downloaded from:
ftp://ftp.ebi.ac.uk/pub/databases/GO/GOA/bhf-ucl/gene_association.goa_bhf-ucl.gz

The University College London based Gene Ontology Annotation Initiative aims to prioritise the manual annotation of genes associated with the cardiovascular system, for further information on this project, please see:
http://www.cardiovasculargeneontology.com
http://wiki.geneontology.org/index.php/Cardiovascular_Gene_Page
December 18th 2007 Please note that in this release we use the abbreviation 'UniProtKB' instead of 'UniProt'. This follows the naming conventions set by the UniProt Consortium.

We have added 2 new gp2protein files in this release containing all UniProtKB and RefSeq accessions for human and chicken in the followings files:

gp2protein.human.gz
gp2protein.chicken.gz
November 14th 2007 New Subcellular Location2GO mapping (spsl2go).
The UniProt-GOA group are pleased to announce that in collaboration with Serenella Ferro Rojas of the Swiss Institute of Bioinformatics, we are able to offer a new external database mapping file. So far, 269 Subcellular Location terms from the Comment (CC) lines of UniProtKB entries have been manually mapped to GO terms. The new mapping has been applied electronically to enhance the electronic (IEA coded) GO annotation in our latest UniProt-GOA release. This mapping has provided 418,185 new associations with 396,734 proteins.
The spsl2go mapping file is available from the UniProt-GOA ftp site ( ftp://ftp.ebi.ac.uk/pub/databases/GO/GOA/external2go/spsl2go ) and the Gene Ontology Consortium website ( http://www.geneontology.org/external2go/spsl2go ). The GO reference for this mapping is GO_REF:0000023.

PRSF2GO mapping file now available.
This month, the UniProt-GOA group has created a PRSF2GO mapping file. This file is a subset of the InterPro2GO mapping, with PRSF identifiers used instead of InterPro identifiers.
The prsf2go mapping file is available from the UniProt-GOA ftp site ( ftp://ftp.ebi.ac.uk/pub/databases/GO/GOA/external2go/prsf2go ) and the Gene Ontology Consortium website ( http://www.geneontology.org/external2go/prsf2go ).

Ensembl Compara pipeline.
This month, the Ensembl Compara electronic annotation pipeline will provide annotations to an increased set of species, including Macaque, Chimpanzee, guinea pig, Xenopus, Tetradon, Fugu, Zebrafish and Aedes aegypti.
For full details of this method, please see: http://www.ebi.ac.uk/GOA/compara_go_annotations.html

External Dates.
Annotation dates supplied in column 14 of the UniProt-GOA gene association files have now been corrected for annotations integrated from external model organism databases.

UniProt-GOA Survey.
The UniProt-GOA group now has added a short user survey to our website, which we would invite you look at, especially if:
- you have set of proteins that you would like to suggest have prioritised for manual annotation?
- you would like to join our expert panel to review a specified set of fully-manually annotated proteins?
- there are additional files that UniProt-GOA could supply that you would find useful?
The survey is available at: http://www.ebi.ac.uk/GOA/contactus.html
September 20th 2007 New for this release are mapping files of UniProt accessions to GeneID and RefSeq identifiers:

ftp://ftp.ebi.ac.uk/pub/databases/GO/GOA/gp2protein/gp2protein.geneid.gz
ftp://ftp.ebi.ac.uk/pub/databases/GO/GOA/gp2protein/gp2protein.refseq.gz
August 24th 2007 IEA assignments based on Swiss-Prot keyword assignments have increased due to more manual mappings, while mappings to Enzyme Commission numbers has dropped.
The latter decrease is under investigation and may be related to a forthcoming EC number format change at UniProtKB. Partial EC numbers where dashes replace part of the complete
EC number (e.g. EC 3.4.24.-) are currently used for 2 different cases:
  • when the catalytic activity of the protein is not known exactly
  • when the protein catalyzes a reaction that is known but not yet included in the IUBMB EC list. To distinguish between these two meanings, UniProtKB will soon start to use the letter 'n' instead of a dash for the latter case.
July 25th 2007 This month UniProt-GOA released over 20 million annotations, to more than 3 million proteins. All annotations are available in the UniProt-GOA UniProt file. This increase is largely due to contributions from the Swiss-Prot Keyword to GO mapping as well as the InterPro to GO mapping automatic (IEA) annotation techniques.
June 7th 2007 This month there was a decrease in the number of manual annotations in a couple of the UniProt-GOA gene association files. This is due to a retrofit that UniProt-GOA has carried out to improve the quality of the 'ISS'-evidenced manual annotations. In the current UniProt-GOA release, 'ISS'-evidenced annotations are only present where they reference a primary annotation that has been annotated with an experimental evidence code (either: 'IDA,','IPI','IMP','IGI' or ''IEP'). This change resulted in the deletion of approximately 13,500 UniProt-GOA 'ISS' annotations which did not conform to this standard.
March 6th 2007 InterPro2GO annotations in species-specific files.
Users of the species-specific UniProt-GOA gene association files (for the human, mouse, rat, arabidopsis, zebrafish, chicken and cow IPI proteomes), may notice the disappearance of a few InterPro2GO annotations this month, as InterPro2GO annotations originating from Prosite pattern matches without a 'known' match status are no longer incorporated into these files. This change will not affect the UniProt-GOA UniProt gene association file and proteomes files as such a restriction has always been in place.

Ensembl Compara pipeline. We would like to clarify that the electronic annotations provided by the Ensembl Compara pipeline (see UniProt-GOA News Release from December 2006) use manual GO annotations from UniProt-GOA's January release in combination with ortholog data from the current (43) Ensembl release.
December 2006 New electronic GO annotation method.
We are pleased to announce that in collaboration with the Ensembl group, UniProt-GOA is able to offer a new electronic GO annotation method, providing an additional 59,424 annotations to the human, mouse, rat, Drosophila melanogaster and Anopheles gambiae proteomes.

Using the gene orthology obtained from the Ensembl Compara system, GO terms from a source species have been projected onto one or more target species.  Only one to one and apparent one to one orthologies are used, and only manually annotated GO terms with an evidence type of IDA, IEP, IGI, IMP or IPI are projected. GO annotations using this technique will receive the evidence code 'IEA' (Inferred from Electronic Anotation). The projections and the resulting annotations will be updated monthly. In the gene association files, the Ensembl protein identifier of the annotation source will be indicated in column 8 ('With') and column 15 will contain the value 'Ensembl'.

DB:Reference column change. The format of column 6 (DB:Reference) has changed for all electronic (IEA) annotations. Alongside the keyword stating the UniProt-GOA method applied, a GO_REF identifier referencing a full description of this method is now also present. These two values are separated by a pipe symbol. Values in this field for 'IEA'-evidenced annotations will be one of the following:

GOA:interpro|GO_REF:0000002
GOA:hamap|GO_REF:0000020
GOA:spkw|GO_REF:0000004
GOA:spec|GO_REF:0000003
GOA:compara|GO_REF:0000019


The full descriptions referenced by the GO_REF identifier can be found at:
http://www.geneontology.org/doc/GO.references
September 2006 Gene association file filtering.
Please note that both the filtered and unfiltered versions of the UniProt-GOA UniProt gene association file are available from the GO Consortium ftp site ( ftp.geneontology.org ). The filtered version does not contain annotations for those species where a different Consortium group is primarily responsible for annotating the species to GO.
If you would like to download an unfiltered UniProt-GOA UniProt gene association
file, please use either the UniProt-GOA ftp site:
ftp://ftp.ebi.ac.uk/pub/databases/GO/GOA/UNIPROT/gene_association.goa_uniprot.gz

Or the submissions folder in the GO Consortium ftp site:
http://bit.ly/YBX5t1

Species which are not present in the filtered version of the gene_association.goa_uniprot.gz file on the GO Consortium site include:

Danio rerio, Drosophila melanogaster, Mus musculus, Rattus norvegicus,
Arabidopsis thaliana, all rice species, Bacillus anthracis str. Ames, Campylobacter jejuni RM1221, Candida albicans, Caenorhabditis elegans, Coxiella burnetii RSA 493, Dehalococcoides ethenogenes 195, Dictyostelium sp., Dictyostelium discoideum, Geobacter sulfurreducens PCA, Glossina morsitans morsitans, Leishmania major, Listeria monocytogenes str. 4b F2365, Methylococcus capsulatus str. Bath, Pseudomonas syringae pv. tomato str. DC3000, Plasmodium falciparum, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Shewanella oneidensis MR-1, Silicibacter pomeroyi DSS-3, Trypanosoma brucei and Vibrio cholerae O1 biovar eltor.

Further information on this filtering script can be found at:
http://www.geneontology.org/GO.annotation.shtml#taxon
August 2006 New manual annotation integrations.
UniProt-GOA now integrates human cellular component manual annotation from LIFEdb and manual annotation from DictyBase .
June 2nd 2006 UniProt-GOA hits the 10 million annotation mark!
There are now over 10 million manual and electronic annotations in the UniProt-GOA-UniProt file, covering over 100,000 species.

 

New manual annotation integrations.
Curators at the Roslin Institute have been trained in GO annotation and we are now incorporating their annotation into our releases.

January 27th 2006 UniProt-GOA Cow 1.0 Released
The first version of cow UniProt-GOA has been released.
November 21st 2005 New manual annotation integrations.
UniProt-GOA now includes manual annotation from Gramene .
September 7th 2005 New electronic annotation integrations.
This release of UniProt-GOA contains annotations to Vega data in Human, Mouse and Zebrafish files.

 

New manual annotation integrations.
UniProt-GOA now includes manual annotation from Reactome .

July 21st 2005 New manual annotation integrations.
UniProt-GOA now includes external annotation from GDB and AgBase . UniProt-GOA would like to thank GDB, especially Resham Kulkarni and AgBase, especially Fiona McCarthy, for their help in integrating these annotations.
June 9th 2005 New species-specific file released.
This release contains the first release of UniProt-GOA CHICKEN , in future this will include manual annotations from Fiona McCarthy and Shane Burgess, AgBase (Mississippi State University).

 

UniProt-GOA Chicken 1.0 Released
The first version of chicken UniProt-GOA has been released.

April 22nd 2005 New manual annotation integrations.

 


This release contains new annotation from the ZFIN database.
March 15th 2005 UniProt-GOA PDB 18.0 Released
The 'gene_association.goa_pdb' gene association file provided by the UniProt-GOA group at the EBI contains GO assignments to PDB entries. As of this month this file has been altered so that PDB entries are only assigned GO terms based on matched InterPro domain ids. This file no longer contains annotations from sources where GO terms have been assigned to entire UniProt protein accessions (i.e. from GOA:manual, GOA:SPKW, GOA:SPEC or GOA:HAMAP sources). This change has been made to avoid assigning GO terms to PDB chains where some terms might only be correct for the corresponding whole protein.
February 10th 2005 New electronic annotation integrations.
This release of UniProt-GOA Human includes annotation to H-Invitational Database (H-InvDB).

 

UniProt-GOA Arabidopsis 1.0 Released
We are pleased to introduce the 1st release of  arabidopsis UniProt-GOA

UniProt-GOA Zebrafish 1.0 Released
We are pleased to introduce the 1st release of  zebrafish UniProt-GOA

January 11th 2005 UniProt-GOA Proteome Sets 1.0 Released
We are pleased to introduce GO annotation to 218 proteome sets in this release.  For information please follow this link.
August 11th 2004 Annotation increase
This month UniProt-GOA increases by over 1 million GO annotations

 

New manual annotation integrations.
This release includes GO annotation created by Rat Genome Database (RGD), GeneDB (Schizosaccharomyces pombe) and HGNC .

April 14th 2004 New Download : UniProt-GOA-slim
GO-slim is a list of high-level GO terms that cover the main aspects of each of the three GO ontologies. As each community has different needs, a variety of GO-slim files have been archived on the GO home page by Consortium members.

UniProt-GOA has created its own GO-slim to summarize the GO annotation of each completed proteome on the Proteome Analysis pages. As an additional service, we have now made a mapping of all our GO annotations to these GO-slim terms available for download on the EBI FTP site .

Users wishing to use a different set of GO-slim terms or association file are advised to use the map2slim.pl script archived on the GO Database website . This script uses the GO MySQL database and requires prior knowledge of Perl API.
February 25th 2004 New electronic annotation integrations.
This release includes annotation to RefSeq NP/XP entries, please see the README for more details.
December 17th 2003 New HAMAP2GO mapping.
The HAMAP database (High-quality Automated and Manual Annotation of microbial Proteomes) at SIB aims to automatically annotate proteins originating from microbial genome sequencing projects. In collaboration with Karine Michoud (UniProt,Geneva), HAMAP families have been manually mapped to GO terms. Similar to our use of InterPro2go, this new mapping has been applied electronically to enhance the electronic (IEA coded) GO annotation in this UniProt-GOA release. This mapping has updated the GO annotation of 36380 microbial proteins.

The mapping file is available here for local use.

BioCreAtIvE data release.
The data suppressed during BioCreAtIvE Competition has now been released. UniProt-GOA-Human has approx. 340 more proteins with manual GO annotations and UniProt-GOA-UniProt has approx. 1000 more proteins with manual GO annotations.

 

UniProt release and changes to UniProt-GOA .
With the recent release of UniProt we have made some changes to UniProt-GOA. Please see the readme for more information on the content of the gene association files. Also, please note the change in the location of gene_association.goa_sptr.gz .

The original SPTR files will continue to be updated until February. All archived SPTR UniProt-GOA releases will continue to exist.

November 15th 2003 UniProt-GOA Mouse 1.0 Released
Version 1.0 of mouse UniProt-GOA has been released.

UniProt-GOA Rat 1.0 Released
Version 1.0 of rat UniProt-GOA has been released.
October 17th 2003 UniProt-GOA creates test and training set for BioCreAtIvE Competition.
Increasingly UniProt-GOA data is being used to validate electronic GO annotation predictions which have been extracted from the scientific literature. In collaboration with the BioLINK group , and as part of the BioCreAtIvE Competition (Critical Assessment of Information Extraction systems in biology), UniProt-GOA is providing training and test sets as well as curators for the manual evaluation of the competing automatic information extraction techniques. To create the test set the UniProt-GOA group are currently manually associating GO terms to hundreds of human proteins using 200 articles from J. Biol. Chem. This data will not be released into UniProt-GOA-Human association file until December (after the competition evaluation). For these reasons users should be aware that the October release of UniProt-GOA-Human is mostly an electronic update.

 

EC2GO mapping and obsolete annotations.
Obsolete GO annotations have been removed due to an update of the ec2go mapping.

July 15th 2003 UniProt-GOA SPTr 10.0 Released
Version 10.0 Swiss-Prot & TrEMBL GO annotation has been released. Now approx. 76% of Swiss-Prot and TrEMBL has GO annotation.
March 28th 2003 GO annotation cross referenced in Swiss-Prot and TrEMBL.
Electronic and manual GO annotation has been cross referenced in the TrEMBL database. Manual GO annotation only has been cross referenced in the Swiss-Prot database. GO annotation with evidence codes ND (No Data) or NR (Not Recorded) have been suppressed. This annotation can be accessed via SRS or Expasy NiceProt view .

EBI’s Human GO annotation cross referenced in LocusLink.
LocusLink begins to replace the human GO annotation supplied by the former Proteome group with that being provided by UniProt-GOA-Human. When GO annotation is available for any protein associated with a LocusID, the previous annotation is being suppressed.
March 12th 2003 New Version of QuickGO Browser is Released.
QuickGO is a fast web-based browser developed at the EBI to allow users to search and browse GO and GO annotation (UniProt-GOA) project data. The search and display interfaces have recently been updated.

Improvements include:

 

  • QuickGO browser will be more stable and allow more users than the previous version.
  • You can now view all the evidence codes associated with a GO annotation or select just to view manual annotation associated with experimental evidence codes.
  • You can view GO terms in a de-normalised tree or graphical output to trace more complex paths.
  • Common concurrent assignments based on UniProt-GOA have been updated.

New UniProt-GOA paper in Genome Research.
A detailed description of the implementation of GO in Swiss-Prot, TrEMBL and InterPro has been released online in this months Genome Research. Information on the new version of QuickGO is also documented here. Download pdf of UniProt-GOA paper .

February 28th 2003 New manual annotation integrations.
UniProt-GOA SPtr now includes an extra 32657 manual annotations covering 8648 proteins resulting from the integration of data from MGI (Mouse), SGD (Yeast) and FlyBase (Fly). See the readme for more details.
February 2003 File Format Change.
UniProt-GOA is now in a position to integrate the manual annotations of other GO Consortium members e.g. FlyBase (Fly), SGD (Yeast) and MGI (Mouse). We will be integrating all manual annotations with PUBMED references except those that are inferred from sequence similarity (ISS, GO Evidence Code). To acknowledge the database source of this annotation, UniProt-GOA are adding an extra column which will be implemented in the next UniProt-GOA-SPTR release. This may affect any scripts run on the UniProt-GOA-SPTR association file.
December 2002 New UniProt-GOA cross reference in EMBL-Bank (rel 73).
UniProt-GOA has been cross-referenced directly in the EMBL Nucleotide Sequence Database. 734286 coding sequences (CDS features) in EMBL now have a cross-reference to UniProt-GOA (e.g. /db_xref="GOA:P01100"), and these are hyperlinked in the EBI SRS server to the GO annotation displayed in our QuickGO browser .
UniProt-GOA collaborates with

Cardiovascular Gene Ontology Annotation Initiative DictyBase Ensembl Compara Enzyme Nomenclature FlyBase  GO Consortium logo GUDMAP Gramene HAMAP HGNC HPA IntAct InterPro LIFEdb MGI MTBbase logo Reactome RI RGD SGD SwissProt TAIR TIGR WormBase ZFIN The Evidence & Conclusion Ontology