spacer
spacer
 

Query examples - apoptosis, GO:0006915, tropomyosin, P06727.

GOA - README

1. Contents

  1. Contents
  2. Introduction
  3. Differences in the UniProt gene association file from GO and GOA ftp sites
  4. List of files and file formats
  5. The non-redundant human proteome set
  6. The non-redundant IPI higher eukaryotic proteome sets
  7. Ancillary mappings
  8. Assignment of GO terms to UniProtKB/Ensembl data
  9. Additional information on Manual Annotation in GOA
  10. Addition of GO assignments from other data sources
  11. Further information on the PDB association file
  12. Contacts
  13. Copyright Notice

2. Introduction


GOA (GO Annotation@UniProt) is a project run by the European Bioinformatics Institute that aims to provide assignments of gene products to the Gene Ontology (GO) resource. The goal of the Gene Ontology Consortium is to produce a dynamic controlled vocabulary that can be applied to all eukaryotes, even while the knowledge of gene and protein roles in cells is still accumulating and changing.

In the GOA project, this vocabulary is applied to all proteins described in the UniProt (Swiss-Prot and TrEMBL) knowledgebase.

GOA also provides non-redundant, species-specific annotation sets using either the complete proteome set available from UniProtKB or the International Protein Index (IPI), where sequence identifiers from the GOA, Ensembl, H-Invitational Database, TAIR, RefSeq and Vega groups are combined.

GOA manual annotations are created by EBI curators from the GOA, UniProt and IntAct groups. The dataset is supplemented with manual GO annotation from external model organism databases: AgBase, BHF-UCL, DictyBase, Ensembl, FlyBase, GDB, GeneDB(S.pombe),Gramene, HGNC, MGI, Reactome, RGD, Roslin, SGD, TAIR, TIGR, WormBase, ZFIN, the IntAct protein-protein interaction database, LIFEdb, the Human Protein Atlas and the Proteome Inc dataset (see section 9). The source of an annotation is always indicated in column 15 ('assigned by') of an association file.

The following describes the philosophy behind the EBI curated annotation dataset:

GOA curators prioritise human proteins for GO annotation, especially those proteins which:
  1. have no GO annotation,
  2. have disease relevance and (c) are important for high-throughput method analyses.
In GOA our aim is to capture the most recent papers that provide experimental evidence for the unique features of a given protein. Our approach is protein-centric rather than paper-centric, as we don't read all papers that might be used to assign the same GO term. However when experimental evidence is read which further experimentally verifies a function, redundant annotations to a term using different references are created as this can provide greater confidence to a GO annotation.

For further information please refer to our web site at: http://www.ebi.ac.uk/GOA

External Contributors to the GOA Gene Association Files:

AgBasehttp://www.agbase.msstate.edu
BHF-UCLhttp://www.cardiovasculargeneontology.com/
DictyBasehttp://dictybase.org
Ensemblhttp://www.ensembl.org
FlyBasehttp://www.flybase.org
GDB (Human Genome Database)http://www.gdb.org
GeneDB (S.pombe)http://www.genedb.org/genedb/pombe
Gramenehttp://www.gramene.org
HGNC (HUGO Gene Nomenclature Committee)http://www.gene.ucl.ac.uk/nomenclature
Human Protein Atlashttp://www.proteinatlas.org/
IntAct http://www.ebi.ac.uk/intact (see section 9)
LifeDBhttp://www.lifedb.de
MGI (Mouse Genome Informatics)http://www.informatics.jax.org
Proteome Inc. (see section 9)
Reactomehttp://www.reactome.org
RGD (Rat Genome Database)http://rgd.mcw.edu
Roslin Institutehttp://www.ri.bbsrc.ac.uk
SGD (Saccaromyces Genome Database)http:// www.yeastgenome.org
TAIR (The Arabidopsis Information Resource)http://www.arabidopsis.org
TIGR (The Insitute for Genomic Research)http://www.tigr.org
WormBasehttp://www.wormbase.org
ZFIN (Zebrafish Information Network)http://zfin.org

3. Differences in the UniProt gene association file from GO and GOA ftp sites.


Please note that in addition to the human, chicken and cow gene association file, a filtered and unfiltered version of the GOA UniProt gene association file is available from the GO Consortium ftp site (ftp.geneontology.org). The filtered UniProt file version does not contain annotations for those species where a different Consortium group is primarily responsible for annotating the species to GO.

If you would like to download an unfiltered GOA UniProt gene association file, please use either the GOA ftp site: ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/gene_association.goa_uniprot.gz

Or the submissions folder in the GO Consortium ftp site:

ftp://ftp.geneontology.org/pub/go/gene-associations/submission/gene_association.goa_uniprot.gz

Species which are not present in the filtered version of the gene_association.goa_uniprot.gz file on the GO Consortium site include:

Danio rerio, Drosophila melanogaster, Mus musculus, Rattus norvegicus, Arabidopsis thaliana, all rice species, Bacillus anthracis str. Ames, Campylobacter jejuni RM1221, Candida albicans, Caenorhabditis elegans, Coxiella burnetii RSA 493, Dehalococcoides ethenogenes 195, Dictyostelium sp., Dictyostelium discoideum, Geobacter sulfurreducens PCA, Glossina morsitans morsitans, Leishmania major, Listeria monocytogenes str. 4b F2365, Methylococcus capsulatus str. Bath, Pseudomonas syringae pv. tomato str. DC3000, Plasmodium falciparum, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Shewanella oneidensis MR-1, Silicibacter pomeroyi DSS-3, Trypanosoma brucei and Vibrio cholerae O1 biovar eltor.

Further information on this filtering script can be found at:

http://www.geneontology.org/GO.annotation.shtml#taxon

4. List of files and file formats


The GOA project produces the following gene association files:
  1. gene_association.goa_uniprot

    Locations:
    ftp://ftp.geneontology.org/pub/go/gene-associations/submission/gene_association.goa_uniprot.gz
    ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/gene_association.goa_uniprot.gz

    This file contains all GO assignments for the UniProt KnowledgeBase (UniProtKB).

  2. gene_association.goa_human

    Locations:
    ftp://ftp.geneontology.org/pub/go/gene-associations/gene_association.goa_human.gz
    ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/gene_association.goa_human.gz

    This file contains the GO assignments for the non-redundant human proteome set. Please note that as of February 2009 this file is constructed using only proteins from UniProtKB/Swiss-Prot.

  3. gene_association.goa_mouse

    Locations:
    ftp://ftp.geneontology.org/pub/go/gene-associations/submission/gene_association.goa_mouse.gz
    ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/MOUSE/gene_association.goa_mouse.gz

    This file contains the GO assignments for the proteins of the non-redundant mouse proteome set.

  4. gene_association.goa_rat

    Locations:
    ftp://ftp.geneontology.org/pub/go/gene-associations/submission/gene_association.goa_rat.gz
    ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/RAT/gene_association.goa_rat.gz

    This file contains the GO assignments for the proteins of the non-redundant rat proteome set.

  5. gene_association.goa_arabidopsis

    Locations:
    ftp://ftp.geneontology.org/pub/go/gene-associations/submission/gene_association.goa_arabidopsis.gz
    ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/ARABIDOPSIS/gene_association.goa_arabidopsis.gz

    This file contains the GO assignments for the proteins of the non-redundant Arabidopsis proteome set.

  6. gene_association.goa_chicken

    Locations:
    ftp://ftp.geneontology.org/pub/go/gene-associations/gene_association.goa_chicken.gz
    ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/CHICKEN/gene_association.goa_chicken.gz

    This file contains the GO assignments for the proteins of the non-redundant chicken proteome set.

  7. gene_association.goa_cow

    Locations:
    ftp://ftp.geneontology.org/pub/go/gene-associations/gene_association.goa_cow.gz
    ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/COW/gene_association.goa_cow.gz

    This file contains the GO assignments for the proteins of the non-redundant cow proteome set.

  8. gene_association.goa_zebrafish

    Locations:
    ftp://ftp.geneontology.org/pub/go/gene-associations/submission/gene_association.goa_zebrafish.gz
    ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/ZEBRAFISH/gene_association.goa_zebrafish.gz

    This file contains the GO assignments for the proteins of the non-redundant zebrafish proteome set.

  9. gene_association.goa_pdb

    Locations:
    ftp://ftp.geneontology.org/pub/go/gene-associations/submission/gene_association.goa_pdb.gz
    ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/PDB/gene_association.goa_pdb.gz


    This file contains the GO assignments for the proteins present in the pdb database.

  10. gene_association.goa_bhf-ucl

    Location:
    ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/bhf-ucl/gene_association.goa_bhf-ucl.gz

    This file contains all GO annotations available for proteins implicated in cardiovascular development and disease. The set of identifiers included in this proteome set was compiled by the Cardiovascular GO Annotation Initiative, funded by the British Heart Foundation, http://www.cardiovasculargeneontology.com/ We comply with the file format described by the Gene Ontology Consortium for annotation files
    (http://www.geneontology.org/GO.annotation.html#file).

    Since we deal with proteins rather than genes, the semantics of some fields in our files may be slightly different to other gene association files.

    1. DB
      Database from which annotated entry has been taken. For the UniProtKB, Human and Proteomes gene association files: UniProtKB (UniProt:Swiss-Prot/TrEMBL) For the species-specific association files created using IPI (Arabidopsis, chicken, cow, mouse, rat or zebrafish): One of either: UniProtKB, UniProtKB/Swiss-Prot, UniProtKB/TrEMBL,ENSEMBL (Ensembl), HINV (H-Invitational Database), TAIR, RefSeq or VEGA. For the PDB association file: PDB

    2. DB_Object_ID
      A unique identifier in the DB for the item being annotated. Here: an accession number or identifier of the annotated protein (or protein chain for the gene_association.goa_pdb file) For the UniProtKB, Human and Proteomes gene association files: - either a UniProtKB accession number or IPI identifier. For IPI species-specific association files (Arabidopsis, chicken, cow, mouse, rat or zebrafish): - one of either UniProtKB, Ensembl, VEGA, HINV, TAIR or RefSeq peptide identifiers For the PDB association file: - a PDB entry identifier (could be any non-control ASCII character). Examples: O00165, O43526-1, PENSP00000241656, OTTDARP00000014036, HIT000018908, AT1G12760.2, NP_671756, 117E

    3. DB_Object_Symbol
      A (unique and valid) symbol (gene name) to which DB_Object_ID is matched. An officially approved gene symbol will be added to this field when available. Alternatively, other gene symbols, or locus names will be applied. If no symbols are aviailable, the identifier applied in column 2 will be used. N.B. the contents of this field changed in August 2008.
      Examples: G6PC, CYB561, MGCQ309F3, C10H14ORF1, ENSBTAP00000000027, NP_671756, 117E_A

    4. Qualifier
      This column is used for flags that modify the interpretation of an annotation. This field may be equal to: NOT, colocalizes_with, contributes_to, NOT | contributes_to, NOT | colocalizes_with Example: NOT

    5. GO ID
      The GO identifier for the term attributed to the DB_Object_ID.
      Example: GO:0005634

    6. DB:Reference
      Reference cited to support the annotation. For annotations methods which cannot reference a paper as being the direct source of an annotation, this field will contain a GO_REF identifier. See section 8 and http://www.geneontology.org/doc/GO.references for an explanation of the reference types used.
      Examples: PUBMED:9058808, GOA:interpro|GO_REF:0000002, GOA:hamap|GO_REF:0000020, GOA:spkw|GO_REF:0000004, GOA:spec|GO_REF:0000003, GOA:compara|GO_REF:0000019, GOA:spsl|GO_REF:0000023, GO_REF:0000024

    7. Evidence
      One of either EXP, IMP, IC, IGC, IGI, IPI, ISS, IDA, IEP, IEA, TAS, NAS, NR, ND or RCA.
      Example: TAS

    8. With
      An additional identifier to support annotations using certain evidence codes (including IEA, IPI, IGI, IC and ISS evidences).
      Examples: UniProtKB:O00341, InterPro:IPROO1878, Ensembl:ENSG00000136141, GO:0000001, EC:3.1.22.1

    9. Aspect
      One of the three ontologies: P (biological process), F (molecular function) or C (cellular component).
      Example: P

    10. DB_Object_Name
      Name of protein The full UniProt protein name will be present here, if available from UniProtKB. If a name cannot be added, this field will be left empty. Examples: Glucose-6-phosphatase Cellular tumor antigen p53 Coatomer subunit beta

    11. Synonym
      Gene_symbol [or other text] Alternative gene symbol(s), IPI identifier(s) and UniProtKB/Swiss-Prot identifiers are provided pipe-separated, if available from UniProtKB. If none of these identifiers have been supplied, the field will be left empty. Example: RNF20|BRE1A|IPI00690596|BRE1A_BOVIN IPI00706050 MMP-16|IPI00689864

    12. DB_Object_Type
      What kind of entity is being annotated. Here: protein (or protein_structure for the gene_association.goa_pdb file).
      Example: protein

    13. Taxon_ID
      Identifier for the species being annotated.
      Example: taxon:9606

    14. Date
      The date of last annotation update in the format 'YYYYMMDD' Example: 20050101

    15. Assigned_By
      Attribute describing the source of the annotation. One of either UniProtKB, AgBase, BHF-UCL, DictyBase, Ensembl, FB, GDB, GeneDB, GR (Gramene), HGNC, LIFEdb, MGI, Reactome, RGD, Roslin Institute, SGD, TAIR, TIGR, ZFIN, IntAct, PINC (Proteome Inc.) or WormBase.
    16. Example: UniProtKB

  11. xrefs.goa

    Locations:
    ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/MOUSE/mouse.xrefs.gz
    ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/RAT/rat.xrefs.gz
    ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/ARABIDOPSIS/arabidopsis.xrefs.gz
    ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/ZEBRAFISH/zebrafish.xrefs.gz
    ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/CHICKEN/chicken.xrefs.gz
    ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/COW/cow.xrefs.gz

    N.B. As the human gene association file from GOA is no longer constructed using the IPI resource, users are now invited make use of the UniProtKB identifier mapping file, available from:
    ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/idmapping/idmapping.dat.gz
    The ReadMe for this file's format is availble from:
    ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/idmapping/README
In addition to the principal IPI files with mappings of UniProtKB/Ensembl/Vega to GO, files have been prepared describing the relationship between the entries in this set and other databases, such as the EMBL/Genbank/DDBJ nucleotide sequence databases, HUGO, and Entrez Gene and RefSeq at the NCBI. This file is tab delineated (multiple entries in individual fields are separated by commas) with each row in the file representing one protein in the IPI set. The fields are as follows:
  1. Database from which master entry of this IPI entry has been taken. One of either SP (UniProtKB/Swiss-Prot), TR (UniProtKB/TrEMBL), ENSEMBL (Ensembl), REFSEQ_STATUS (where STATUS corresponds to the RefSeq entry revision status), VEGA (Vega), TAIR (TAIR Protein data set) or HINV (H-Invitational Database).
  2. UniProtKB accession number or Vega ID or Ensembl ID or RefSeq ID or TAIR Protein ID or H-InvDB ID.
  3. International Protein Index identifier.
  4. Supplementary UniProtKB/Swiss-Prot entries associated with this IPI entry.
  5. Supplementary UniProtKB/TrEMBL entries associated with this IPI entry.
  6. Supplementary Ensembl entries associated with this IPI entry. Havana curated transcripts preceeded by the key HAVANA: (e.g. HAVANA:ENSP00000237305;ENSP00000356824;).
  7. Supplementary list of RefSeq STATUS:ID couples (separated by a semi-colon ';') associated with this IPI entry (RefSeq entry revision status details).
  8. Supplementary TAIR Protein entries associated with this IPI entry.
  9. Supplementary H-Inv Protein entries associated with this IPI entry.
  10. Protein identifiers (cross reference to EMBL/Genbank/DDBJ nucleotide databases).
  11. List of HGNC number, HGNC official gene symbol couples (separated by by a semi-colon ';') associated with this IPI entry.
  12. List of NCBI Entrez Gene gene number, Entrez Gene Default Gene Symbol couples (separated by a semi-colon ';') associated with this IPI entry.
  13. UNIPARC identifier associated with the sequence of this IPI entry.
  14. UniGene identifiers associated with this IPI entry.
  15. CCDS identifiers associated with this IPI entry.
  16. RefSeq GI protein identifiers associated with this IPI entry.
  17. Supplementary Vega entries associated with this IPI entry.


The mouse, rat, zebrafish and arabidopsis xref files have the following differences:

  • Column 11 in the mouse file contains the MGI (Mouse Genome Informatics) identifier and symbol for the genes
  • Column 11 in the rat file contains the RGD (Rat Genome Database) identifier and symbol for the genes.
  • Column 11 in the zebrafish file contains the ZFIN (Zebrafish information network) identifier and symbol for the genes.
  • Column 11 in the arabidopsis file contains the TAIR Gene (The Arabidopsis Information Resource) symbol and locus identifier for the genes.
  • Column 11 does not contain any data for chicken and cow.


N.B. Entrez Gene is the successor database to LocusLink. For species covered by LocusLink, it will still be possible to access the data using the Entrez Gene identifiers.

5. The non-redundant human proteome set


In February 2009, the production of the gene_association.goa_human file changed from using the International Protein Index (IPI) to using the complete human proteome set available from UniProtKB/Swiss-Prot (http://www.uniprot.org/news/2008/09/02/release).

The name and format of this human file has remained the same, however annotations are now assigned to proteins from just the 'UniProtKB' (column 1) database source. Human IPI identifiers continue to be included in column 11 of annotations.

In addition, new releases of the cross-references file for human IPI set (human.xrefs.gz), will no longer be provided. Instead, identifier mapping is possible using the UniProt ID mapping file, available from: ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/idmapping/idmapping.dat.gz

idmapping.dat.gz is a tab-delimited table, which includes mappings for 20 different sequence identifier types, including IPI identifiers.

A readme for this file is available from: ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/idmapping/README

6. The non-redundant IPI higher eukaryotic proteome sets


The non-redundant mouse, rat, arabidopsis, zebrafish, chicken and cow files are produced using the monthly IPI (International Protein Index) releases which provides a top- level overview of the main databases that describe proteomes: UniProtKB, Ensembl, TAIR, Vega, H-Invitational and NCBI's RefSeq databases. IPI assigns stable identifiers to clusters of matching proteins from its contributing databases.

Information on how the IPI sets are obtained can be found at:
http://www.ebi.ac.uk/IPI/Algorithm.html

IPI sets can be downloaded from:
ftp://ftp.ebi.ac.uk/pub/databases/IPI/current

7. Ancillary mappings


Mappings between UniProtKB and EMBL/Genbank/DDBJ are derived from the cross references to these databases found in UniProt entries. Mappings between UniProt and HUGO, Entrez Gene and RefSeq are derived from various publicly available sources of information that allow the electronic tracking of identifiers between databases. Contentious or contradictory data is referred to a curator for judgement.

8. Assignment of GO terms to UniProtKB/Ensembl data


In this release, we have used eight data sources to assign GO terms to proteins.
  1. PUBMED:nnnnnnnn
    All such annotations are manually curated and can contain any of the evidence codes available, except 'IEA' (see section 4). Curators have read the abstract or full paper with the PubMed identifier nnnnnnnn and assigned the GO terms manually. Where a journal is not indexed by PubMed then an internal identifier is provided eg: PBTnnnnnnnn. The GOA manual annotation set is created by the curators from the GOA, UniProt and IntAct groups, and is also supplemented with manual annotation (excluding annotation containing the ISS and IEA codes) from external model organism databases, see section 2. Please contact goa@ebi.ac.uk for details.

  2. GOA:interpro|GO_REF:0000002
    Transitive assignment of GO terms based on InterPro classification. For any protein that has been annotated with one or more InterPro domains, the corresponding GO terms are obtained from a translation table of InterPro entries to GO terms (interpro2go) generated manually by the InterPro team at EBI. The mapping file is available at: http://www.geneontology.org/external2go/interpro2go.

  3. GOA:hamap|GO_REF:0000020
    GO terms are manually assigned to each HAMAP family rule. HAMAP family rules are a collection of orthologous microbial protein families, from bacteria, archaea and plastids, generated manually by expert curators. The assigned GO terms are then transferred to all the proteins that belong to each HAMAP family. Only GO terms from the molecular function and biological process ontologies are assigned. GO annotations using this technique will receive the evidence code Inferred from Electronic Annotation (IEA). These annotations are updated monthly by HAMAP and are available for download on both GO and GOA EBI ftp sites. HAMAP (High-quality Automated and Manual Annotation of Microbial proteins) is a project based at the Swiss Institute of Bioinformatics (Gattiker et al. 2003, Comp. Biol and Chem. 27: 49-58).
    For further information, please see: http://www.expasy.org/sprot/hamap

  4. GOA:spkw|GO_REF:0000004
    Transitive assignment using Swiss-Prot keywords. This method is used for any database record that has one or more Swiss-Prot keywords assigned. Each keyword is mapped to the corresponding GO term in the spkw2go file, which was originally constructed manually by MGI curators and is now maintained by the GOA team at EBI. The mapping file is available at:
    http://www.geneontology.org/external2go/spkw2go.

  5. GOA:spec|GO_REF:0000003
    Transitive assignment using Enzyme Commission identifiers. This method is used for any database entry, such as a protein record in Swiss-Prot or TrEMBL, that has had an Enzyme Commission number assigned. The corresponding GO term is determined using the EC cross-references in the GO molecular function ontology. Also see Hill et al., Genomics (2001) 74:121-128. The mapping file is available at: http://www.geneontology.org/external2go/ec2go.

  6. GOA:compara|GO_REF:0000019
    GO terms from a source species are projected onto one or more target species based on gene orthology obtained from the Ensembl Compara system. Only one to one and apparent one to one orthologies are used, and only GO annotations with an evidence type of IDA, IEP, IGI, IMP or IPI are projected. Projected GO annotations using this technique will receive the evidence code, inferred from electronic anotation, 'IEA'. The UniProtKB protein accession of the annotation source will be indicated in the 'With' column of the GOA association file.

  7. GOA:spsl|GO_REF:0000023
    Transitive assignment of GO terms based on Swiss-Prot Subcellular Location vocabulary annotation. The UniProt Consortium has developed a Subcellular Location vocabulary (SPSL) to annotate UniProt Knowledgebase entries (in CC_SUBC LOCATION lines). The GOA curators at EBI have manually mapped this vocabulary to the GO cellular component ontology. This mapping file, spsl2go, is used to obtain corresponding GO terms for any UniPRotKB entry that has SPSL annotation; the mapping file is available is available from:
    ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/external2go/spsl2go

  8. GO_REF:0000024
    Method for transferring manual annotations to an entry based on a curator's judgment of its similarity to a putative ortholog which has annotations with experimental evidence. Annotations are created when a curator judges that the sequence of a protein shows high similarity to another protein that has annotation(s) supported by experimental evidence (IDA, IGI, IMP, IPI or IEP). Annotations resulting from the transfer of GO terms display the 'ISS' evidence code and include an accession for the protein from which the annotation was projected in the 'with' field (column 8). This field can contain either a UniProtKB Accession or an IPI (International Protein Index) identifier. Further information on this method can be found at:
    http://www.ebi.ac.uk/GOA/ISS_method.html

  9. GO_REF:0000015
    The Gene Ontology (GO) Consortium created the evidence code "ND" to indicate "no biological data available". This code is used for annotations to any of the three terms 'molecular function: GO:0005554', 'biological process unknown: GO:0000004' or 'cellular component unknown ; GO:0008372'. The use of any of these three GO terms, attributed to this reference and supported by the ND evidence code, signifies that a curator has examined the available literature and sequence for this gene and that as of the date of the annotation to the unknown term, there is no information supporting an annotation to any GO term in that ontology. (Note that ND can be used with any one (or two) of the 'unknown' terms, even if there is data available to support annotation to a term from one or both of the other ontologies; e.g., ND can be used with GO:0008372 if the function and process are known but component is not).

  10. GO_REF:0000029
    Method for GO terms which were manually assigned by to UniProt KnowledgeBase accession using either a NAS or TAS evidence code by applying information extracted from a publicly-available, manually curated UniProtKB entry. Such GO annotations were submitted by the GOA-UniProt group from 2001, however this annotation practise was discontinued in 2007.

9. Additional information on Manual Annotation in GOA


For information on manual annotation guidelines and the usage of manual evidence codes please see:

http://www.geneontology.org/GO.annotation.html
http://www.geneontology.org/GO.evidence.html

Usage of the ISS code within GOA

There are three ways in which a curator can use the ISS evidence code:
  1. If a curator reads a paper that provides functional information for a protein and also states an orthology between it and another protein, then manual annotation can be transferred to the ortholog. The ortholog's annotation will contain the evidence code 'ISS' and the original literature identifier is displayed in the DB:reference field (column 6). Any information previously in the 'with' column of the original protein's annotation is replaced in that of the sequence identifier (UniProt accession) of the original protein's accession number. This allows the source of the 'ISS' annotation to be traced.

  2. If a curator is confident that a protein shows high similarity to another protein (e.g. from using BLAST) and it seemed reasonable to infer that the two proteins have a common function, then manual annotation can be transferred to an ortholog. The ortholog's annotation will contain the evidence code 'ISS', an accession for the protein from which the annotation was projected will be present in the 'with' field (column 8) and the reference field (column 6) will contain the GO_REF:0000024. Further information on this method can be found at:
    http://www.ebi.ac.uk/GOA/ISS_method.html

  3. If sequence similarity and functional information is reported in two different papers, then the primary annotation can be transferred to an ortholog. The ortholog's annotation will contain the evidence code 'ISS', the identifier of the paper which describes the sequence similarity is displayed in the DB:reference field (column 6) and any information that was previously contained in the 'with' column of the original entry is changed in that of the ortholog to contain the original entry's accession number. This allows the source of the annotation to be traced.
N.B. For all of the methods described above, only annotations that have an experimental evidence code (either: IDA, IEP, IGI, IMP or IPI) can be further transferred to other proteins. In addition, annotations having the 'NOT' qualifier cannot be transferred by ISS.

10. Addition of GO assignments from other data sources


The GOA dataset has also been supplemented with the last (2001) public release of manual annotation from Proteome Incorporated. A number of annotations from Proteome Inc. contain the NR evidence code, which is not explicitly related to a journal reference; the replacement of this subset with more up-to-date and detailed GO annotation is one of GOA's priorities.

GOA has integrated annotations from the EBI's IntAct protein-protein interaction database. Only those binary interactions which are of high enough quality to be integrated into the UniProt database have been included (this is decided on experimental method type). All GO terms in these annotations are children of the protein binding term (GO:0005515), use the 'IPI' evidence code along with the sequence identifier of the protein's binding partner in column 8 ('with').

11. Further information on the PDB association file


The 'gene_association.goa_pdb' gene association file provided by the GOA group contains GO assignments to PDB entries. In this file PDB entries are only assigned GO terms based on matching InterPro domains.

12. Contacts


Please direct any questions to goa@ebi.ac.uk We welcome any feedback.

13. Copyright Notice


GOA - GO Annotation@EBI
Copyright 2009 (C) The European Bioinformatics Institute. This README and the accompanying databases may be copied and redistributed freely, without advance permission, provided that this copyright statement is reproduced with each copy.

































spacer
spacer