This user manual documents the use of the EBI Protein2GO tool
- The EBI GO/GOA browser QuickGO
If you have any problems with this manual, please contact: email@example.com
Protein2GO is continuously evolving, not all sections are fully documented.
The manual does not intend to cover annotation guidelines, these can be viewed on the GO Consortium website.
What is Protein2GO?
Protein2GO tool is a web-based GO annotation tool developed and maintained by the UniProt Gene Ontology Annotation group. It allows the assignment of Gene Ontology terms to UniProtKB accessions. During 2014 we have developed Protein2GO to accept entity identifiers other than UniProt protein accessions. Protein2GO accepts macromolecular complex identifiers from the IntAct Complex Portal, e.g. EBI-6475852, and RNA identifiers from RNAcentral, e.g. URS000024463E_9606 (currently RNAcentral only has sequence-specific identifiers, e.g. URS000024463E, the taxon that the sequence belongs to is indicated by adding after the sequence-specific ID an underscore followed by the taxon ID, e.g. URS000024463E_9606).
How do I gain access to Protein2GO?
Use of Protein2GO is by invitation only. Groups that curate using Protein2GO agree to follow the curation requirements put in place by the UniProt-GOA group and those agreed by the GO Consortium. Any groups wishing to use Protein2GO should contact goa (AT) ebi.ac.uk and we will consider your request.
If use of Protein2GO is agreed, a login will be set up for the new users who will then receive a 'Welcome' email. Click on the link within the email to take you to Protein2GO where you will be already logged on. You will be asked if you would like to save your login as a cookie on your computer, it is recommended that you do this and that you save the link as a bookmark to ensure easy access to the tool.
If you require help when using Protein2GO, you can access this user manual by clicking the 'Help' link in the top left-hand corner of Protein2GO.
Viewing an entry by entity type (i.e. protein, complex or RNA)
To get started, type either a UniProtKB protein accession, a protein name, an Complex Portal ID or complex name or a RNAcentral ID into the search box. N.B. If you have used Protein2GO previously when you click in the search box, a list of identifiers will appear that you have recently edited.
You can also search by PubMed ID - just type a PMID into the search box and press return. All the annotations made to that paper will be displayed and these can be edited as normal.
When you have finished typing the accession/identifier, press enter (if using an accession) or click on the appropriate entity that appears in the pop-up box (see Fig. 1).
Any annotations the entity has will be loaded, if it has many annotations this may take a couple of seconds.
The page of annotations is divided into six sections as follows;
- Entity information and curator comments
- Add annotation
- Manual annotation (pre-existing)
- Electronic annotation
- Deleted annotation
Entity information and comments
At the very top left of Protein2GO you will see eight links: 'Help', which will take you to this help manual; 'Options', which will display a pop-up box where you can alter the columns displayed in Protein2GO; 'Reports', which will open a new window containing some common SQL queries that you can run on the GOA database, such as number of annotations you have created within a certain time period, or a list of the proteins you have annotated etc.; 'GO Tracker', which will take you to the GO Consortium's GitHub page where you can request new terms or ask questions regarding term usage; 'Protein2GO Tracker', which will take you to the GitHub page for Protein2GO; 'Users', which you can search and edit users of Protein2GO; 'Talisman', which will open a new window displaying the Talisman tool to allow editing to annotation blacklists, InterPro2GO mappings, UniProt keyword mappings and Subcellular2GO mapping; and 'GOA Compare' which will open a new window to access tools to compare GOA releases.
The entity information displayed at the top of the page includes accession/identifier, species and name - check this information to ensure you are annotating to the correct thing. If you have been editing other accessions in your current session, these will appear as buttons below the current entity information so that you can navigate easily between them (see Fig. 2). These can be removed by clicking the 'Clear' button.
The curator comments box (Fig. 2) is for recording useful information which cannot be captured in a GO annotation.
There are several topics that can be chosen (these are accessed by clicking on the empty field under 'Topic') as well as an optional free text field to add extra information; for example:
Ticket Pending would be used when you have requested a new term from the GO editors in the Github Tracker - the free text field can be used to add the GitHub request number and the PMID and evidence code of the annotation you would like to make with the new term;
GO Annotation Complete would be used to indicate that you have comprehensively annotated the entity from all the available literature to date.
Examples of some comments are shown in figure 3. It is not mandatory to have text in the free text box and a date stamp is applied to all comments.
Sometimes it may be useful to include in your comment the annotation line identifier, this can be found by opening the 'Options' pop-up in the top left corner and checking the 'Show ID' box. Click on 'OK' and the P2GO display will show the annotation line ID on the left of each annotation. Comments can be deleted by clicking on the red cross or edited by changing/adding text to the text box and then clicking on the green tick.
There can be only one instance of a comment topic, if there is more than comment for a particular topic these must be typed after any existing comments, for example, if a curator wanted to add another Ticket Pending comment to the one shown in Fig. 3, they would have to type the new information after the text about the oncosis term and then click on the green tick to update the entry.
Add new annotation
Annotations are added a single line at a time in the 'Add' section of Protein2GO (Fig. 4). Pop-up boxes appear when you click in any field instructing you what type of information needs to go in.
When annotating to a protein isoform, just edit the protein accession field to the appropriate isoform accession, e.g. add '-1' to the end of the displayed protein accession in the 'Add' annotation section.
When annotating to a feature identifier, the format for the identifier is UniProt accession:feature identifier, e.g. P62987:PRO_0000396434.
When annotating to a variant identifier, the format is UniProt accession:variant identifier, e.g. Q5S007:VAR_024931. Please note that you should ONLY annotate normal variants, not disease variants.
When a GO ID is entered into the 'GO' field a pop-up appears showing the GO term name, ID, ontology and definition (Fig. 5) - this is useful to check that you have entered the correct GO ID. The GO term name will be automatically filled in. GO terms can be found using QuickGO or AmiGO. You can also type a GO term into the 'GO' field and choose an appropriate one from the pop-up list. When you start typing information into the add annotation section, certain boxes are coloured pink which indicates that information needs to be entered into these fields before you can add the annotation to the database.
Figure 6 shows the pop-up box which appears when you click in the 'evidence' field. Clicking on the evidence code you want for the annotation will auto-fill the box. More about evidence codes and how to use them can be found in the GO Consortium evidence code guide . Furthermore, Protein2GO now supports the use of ECO codes. ECO codes can be looked at and searched for in QuickGO or on the Ontology Lookup Service (OLS).
Next, fill in the 'Reference' field with the relevant information; when you click in this box a lightbox will appear with two text boxes. When inserting a PubMed ID, just type or paste the identifier into the right-hand box and the reference database field on the left will automatically be filled with 'PMID' (Fig.7). If using a DOI, you will have to select this from the drop-down list in the left-hand box. Click on 'OK' in the lightbox to enter the reference and return to the main screen.
If appropriate - e.g. when using the IC, IGI or IPI evidence codes, fill in the 'With' field with the appropriate information, note that the 'With' field will also appear as a lightbox similar to the reference field. After all the relevant information has been added for the GO annotation, an add buttonwill appear to the left of the annotation as shown in Fig. 8. to show that the annotation has passed certain checks and is ready to be added to the database, to do this click on the add button . If the annotation line is not complete or has incorrect information in it, then a red exclamation point will appear and you will not be able to add the annotation until you have fixed the error. Clicking on the exclamation point will tell you what is wrong with the annotation.
N.B. To enable curators to find their recently added annotations, annotations added within the previous 24 hours will be highlighted in a bright yellow colour and those added within the previous two weeks will be highlighted in a pale yellow colour.
A note on previously curated papers
If you have entered a PubMedID that has previously been curated, an alert will be displayed informing you of this (see Fig. 9). This is to prevent curators re-annotating papers. You are not prevented from making new annotations to this paper, but if you would like to improve on any of the existing annotations we strongly recommend that you contact the group that made the original annotation with your concerns. It is acceptable to add new information from a previously curated paper. To view the existing annotations for that paper, click on the link shown. To close the alert click on the cross in the top right-hand corner.
Warning or prevention of specific GO term-entity associations based on the Annotation Blacklist
UniProt-GOA maintain a blacklist of GO term-entity accessions that may be incorrect to associate with one another. The entries in the blacklist are derived from three places;
1) associations that were observed to be incorrect from the review of annotations obtained from an IEA pipeline
2) associations obtained from NOT-qualified annotations
3) associations that are deemed to be incorrect after review of literature or sequence information and captured in the UniProt comment_caution lines.
When an attempt is made to make an annotation to an entity based on 1) in the above list, an error will be shown preventing the curator from making the annotation. An example is shown in Fig. 10 where the association between protein al4 and 'dihydroorotate dehydrogenas activity' is prevented due to information supplied by InterPro.
When an attempt is made to make an annotation to one of the GO term-entity blacklisted associations form 2) or 3) in the above list, a warning will appear in Protein2GO, but the curator will not be prevented from making the annotation. An example of this is shown in Fig. 11, where a NOT annotation exists to 'nucleolus', an attempt to annotate to 'nucleolus' results in a warning.
Creation of annotations using inter-ontology links
Within the GO structure, links are being made between certain Molecular Functions and Biological Processes and between certain Biological Processes and Cellular Components. For example, it is known that all proteins with protein kinase activity (Function) are involved in protein phosphorylation (Process), so a part_of relationship now links these two terms in the ontology. Similarly, it is known that intracellular transport (Process) always occurs in the intracellular compartment of a cell (Cellular Component), so an occurs_in relationship links these two terms in the ontology. A pipeline now runs over the UniProt-GOA database to create these missing Process or Component annotations and these 'inferred' annotations are assigned by 'GOC' (GO Consortium). A sanity check has been built into Protein2GO which alerts curators who are annotating to a Molecular Function or Biological Process term that has an inter-ontology link. The alert (an example is shown in Fig. 8e) states that creation of the annotation will result in the creation of a second inferred annotation. The curator can do one of three things when one of these alerts is displayed;
- Ignore the alert (or close the window); the inferred annotation will be created and given the 'GOC' database source.
- Add the annotation; to do this the curator should click on 'Add annotation' in the alert window and the annotation row will be added to the 'Add annotation' section of Protein2GO where the curator can click on the green plus button to add the annotation to the database. The annotation will be assigned to your database source.
- Improve on the granularity of the annotation; to do this the curator should click on 'Add annotation' in the alert window and the annotation row will be added to the 'Add annotation' section of Protein2GO where the curator can provide a more specific GO term which is relevant for the experimental assay they are curating. The annotation will be assigned to your database source.
If you disagree with an inferred annotation, either there is a problem with the original annotation that the inferred annotation is derived from, in which case you can dispute the annotations (see section on "Annotation Disputes"), or a change in the ontology may be required in which case you should open a ticket on the GitHub Ontology Tracker.
N.B. An inferred annotation is not made if the original annotation has a NOT qualifier. Any other qualifier information (i.e. contributes_to, colocalizes_with) is not carried over to the inferred annotation. Similarly, information in the annotation extension field is not carried over to the inferred annotation.
Suggestions for additional annotations
As well as suggesting annotations based on inter-ontology links, Protein2GO also presents the curator with additional suggestions based on the co-occurrence statistics from QuickGO, which displays when two GO terms are often co-annotated. An example of these suggestions is depicted in the bottom section of Fig. 12. Curators can either;
- Ignore the alert (or close the window), no annotations will be added
- Add one (or more) of the suggested annotations; the curator must find the appropriate evidence to support the annotation, this may not necessarily be in the same paper as your original annotation
- Reject the suggestion; if the curator has knowledge that this GO term should not be associated with this entity, the curator can reject the suggestion, a text box will appear in which the curator must supply a reason why the suggestion is not appropriate. The rejected associations and their reasons will be audited to help refine the suggestions.
Adding multiple annotations
Sometimes it is useful to retain information from an annotation for use in subsequent annotations, e.g. evidence code, PubMed ID. In Protein2GO you are able to duplicate a new annotation line before you have added it, this allows you to change the relevant information, e.g. GO ID, before adding both annotations to the database. In order to duplicate a new line of annotation just click on the duplicate buttonto the left of the new annotation line as shown in Figure 13. Remember to change the relevant information before adding the annotation to the database - the tool will not let you add two identical annotations.
Adding reciprocal protein binding annotations
Reciprocal protein-protein interactions can be entered into Protein2GO by clicking on the reciprocal annotation button. Two annotation rows will appear (Fig. 14) which are pre-filled with the current protein accession and the GO term 'protein binding' GO:0005515. If you would like to use a more specific protein binding GO term, e.g. enzyme binding, just type either the GO ID or the GO term name (and choose the appropriate one from the pop-up list) into the GO field of either or both annotation lines. When you enter the protein accession which interacts with the protein you are currently annotating in the 'With' field, Protein2GO will automatically enter the same accession into the empty 'Protein' field in the second annotation line. All you now need to enter is a PubMed ID. If the annotation lines have been filled in correctly, an add button will appear to the left of the annotations to indicate that they have passed the inbuilt sanity checks. Click on the add button to add both annotations.
N.B. The annotations will both be visible on the current annotation page until you refresh the page when the annotation to the protein binding partner will disappear from this view.
Suggestions for more granular protein binding terms
Protein2GO will suggest a more granular protein binding term when a reciprocal annotation is made using the term GO:0005515 'protein binding' and an interactor is entered into the 'with' field that has a known function. Suggestions are made based on whether the protein entered in the 'with' field is a) annotated to an enzyme activity-type term with experimental evidence or b) part of an InterPro family that has been mapped to a specific protein binding term, e.g. IPR001421 ATPase family is mapped to GO:0051117 'ATPase binding', therefore when a protein that belongs to the IPR001421 family is entered into the 'with' field of a protein binding annotation, Protein2GO will suggest the more granular term 'ATPase binding'. An example of the alert that is given in such a case is shown in Fig. 15.
Entering multiple values in the 'with' field
Under certain circumstances, it may be beneficial to capture more than one value in the 'with' field. For instance, when curating a triple mutant experiment using the IGI evidence code it would be preferable to capture all of the genes involved in the interaction.
An example of this is from PMID:10227296 in which spermatogenesis was determined to be aborted in a triple mouse mutant (tyro/axl/mer). An annotation has been made to Tyro3 using the evidence code IGI with the GO term 'spermatogenesis'. In the 'with' field the UniProtKB accessions for Axl (Q00993) and Mer (Q60805) have been added (Fig. 16 and 17).
Another example would be when making an annotation which has been inferred by the curator (i.e. using the IC code). You may wish to put the GO identifiers for more than one supporting GO annotation in the 'with' field of the IC annotation.
The format for addition of the identifiers is tightly controlled by Protein2GO, so it should not be possible to input the data in an incorrect format.
The restrictions on the use of multiple 'with' field entries are;
- multiple entries can only be made in the 'with' field when using the IGI or IC evidence codes
- a maximum of three identifiers can be entered into the 'with' field
- only UniProtKB accessions can be entered in the 'with' field of an IGI annotation
- only GO identifiers can be entered in the 'with' field of an IC annotation
N.B. The guidelines for with/from field usage are currently under review, therefore these restrictions are subject to change.
Dual taxon annotations
Curators can create annotations where a second species can be specified when an entity is found to be involved in a process that involves another species (for further information see: GO annotation conventions ).
The 'Interacting Taxon' field should only be used for annotations containing a GO term which is a child of multi-organism process (GO:0051704) or other organism (GO:0044215) - you will receive an error if you try to fill in this column with a term from any other part of the GO.
Curators can indicate the second species by adding a taxon id into the 'Interacting Taxon' field. You can search for taxon ids by using the UniProt Taxonomy Browser
The Qualifier column is used to modify the interpretation of an annotation.
Allowable values are: 'NOT', 'contributes_to', 'colocalizes_with', 'NOT|contributes_to', 'NOT|colocalizes_with' and Null (no value, default).
The 'NOT' qualifier is used to make an explicit note that the entity is not associated with the GO term. This is particularly important in cases where associating a GO term with an entity should be avoided (but might otherwise be made, especially by an automated method). It can also be used to document conflicting claims in the literature.
The 'contributes_to' qualifier is used when annotating an individual entity that is part of a complex. The entity can be annotated to terms that describe the action (function) of the complex. This qualifier allows us to distinguish the individual subunit from complex functions e.g. contributes_to ribosome binding when part of a complex but does not perform this function on its own. All entities annotated using 'contributes_to' must also be annotated to a cellular component term representing the complex that possesses the activity. This qualifier should only be used with the GO Molecular Function Ontology.
The 'colocalizes_with' qualifier is used when annotating entities that are transiently or peripherally associated with an organelle or complex. The entity may be annotated to the relevant cellular component term, using the 'colocalizes_with' qualifier. This qualifier may also be used in cases where the resolution of an assay is not accurate enough to say that the entity is a bona fide component member. This qualifier should only used with the GO Cellular Component Ontology.
The 'NOT|contributes_to' qualifier may be used when it has been explicitly shown that the entity does NOT contribute to the function of a complex it is part of. All entities annotated using 'NOT|contributes_to' must also be annotated to a cellular component term representing the complex that the entity is part of. This qualifier should only use this qualifier with the GO Molecular Function Ontology.
The 'NOT|colocalizes_with' qualifier may be used when it has been explicitly shown that the entity does NOT colocalize with the complex/organelle GO term that has been annotated to. This qualifier should only be used with the GO Cellular Component Ontology.
Manual annotation section
This section contains all existing annotations for the entity or reference you are viewing. It contains both internal annotations (i.e. those made using the Protein2GO tool) and external annotations (i.e. those imported from other groups). Annotations in the manual section are divided into Process, Function and Component ontologies for ease of viewing.
The only annotations that can be edited in the Protein2GO tool are those which were created in the tool by, for example, UniProt, BHF-UCL or AgBase curators (Fig. 18), but you can only edit annotations that you have permission to edit, i.e. those originating from your annotation group. Annotations made by external groups are not available for editing. If you have a problem with any of the external annotations, you must contact the group who made the annotation, we have implemented a dispute mechanism to make this simpler (see "Annotation Disputes" section).
There are several actions that can be performed on the annotations in the manual annotation section. These are displayed in the 'Available actions' row along the top of the manual annotation section (Fig. 19). Actions will only be available if they are applicable to the annotation that was selected by checking the box to the left of the annotation row.
The actions that are available are;
- Delete: deletes the annotations (the deleted annotations are displayed in the 'Deleted' section of Protein2GO and can be undeleted if necessary)
- Transfer Annotations: transfers the annotations to other specified UniProt accessions (transferring annotations is explained in the next section)
- Compare GO Terms: opens up the ancestor chart in QuickGO to display the selected GO terms in context within the GO hierarchy
- Compare ECO terms: opens up the ancestor chart in QuickGO to dsiplay the selected ECO codes in context with the ECO heirarchy
- Update: commits to the database any changes to the annotation line
- Dispute Annotation(s): opens the annotation dispute dialog box (see "Annotation Disputes" section)
- Email Curator(s): opens a text box for you to send a message to the curator/group who made the annotation, e.g. if you want to clarify something about the annotation, which is not strong enough to dispute the annotation
- Search IntAct: if you select an annotation to protein binding or one of its child terms, you can search in the IntAct database for that interaction
To perform the action required, click on the relevant button.
Any of the fields surrounded by a box can be edited, such as accession/identifier, qualifier, GO ID, evidence code, PubMed ID. After you have made a change, the box to the left of the annotation will become selected indicating the annotation has passed verification and can be added to the database. To add the change to the database, click on the 'Update' button. If a red exclamation point appears then there is something wrong with the annotation which must be corrected before you can update the annotation.
Figure 20 shows an annotation row in which the PubMed ID has been updated, the history displays this change together with the date and time the update occurred and the curator who made the changes. If you would like to revert back to a previous annotation, click on the single curved green arrow to the left of the appropriate annotation in the history pop-up (the double green arrow refreshes the contents of the history pop-up).
Annotations may be transferred to other entities for a number of reasons. An annotation may be transferred to an orthologous protein in a species which is unlikely to have any manual annotation, e.g. from a human protein to a macaque protein. Alternatively, a paper may have evidence for an annotation to more than one entity, so you may want to copy the annotation to one or more entities. Only annotations with an experimental evidence code (IDA, IGI, IMP, IPI or IEP) and which do not have the 'NOT' qualifier can be copied.
To transfer an annotation to one or more identifiers, first select the annotations you want to transfer by checking the box to the left of each annotation line, now click on the 'Transfer Annotations' button that has appeared in the 'Available Actions' row.
A lightbox will appear and in the first box you must choose a transfer mode; there are four methods of transferring annotations as shown in Figure 21. You have the option to transfer the annotation with the original GO term, or one of its parents, if you feel the original term is too specific.
N.B. You cannot transfer annotations from one entity type to another, e.g. you cannot transfer a protein annotation to a complex identifier and vice versa.
N.B. You cannot transfer any high-throughput annotations to another (any annotations with the ECO code ECO:0006056 (HTP) or its descendents
1. The 'Copy' transfer keeps the evidence code and reference the same but this transfer is 'not linked', i.e. any changes made to the original annotation will not be replicated to the copied annotations. This method is useful when a single paper refers to more than one species/entity or for bulk annotations.
2. The 'ISS_original' transfer is used when a paper states, for example, information for a human protein and also the orthology between the human and e.g. mouse protein(s). The annotation can be transferred from the human to the mouse protein using this method and the resulting mouse annotation will have the evidence code 'ISS' (Inferred from Structural or sequence Similarity) with the original PubMed ID.
3. The 'ISS_curator' transfer is used when, for example, a paper states only human protein information but you have done a sequence similarity check by MPSrch/BLAST yourself (e.g. You have manually checked the human protein and found it is 90% similar to the mouse protein). With this method, the identifier of the entry you are curating will appear in the 'With ID' field of the entity you have transferred the annotation to. When using this method, you can also optionally add justification for the list of entities you are transferring to, e.g. details of BLAST or sequence alignments etc.
4. The 'ISS_other_reference' transfer is used when a second paper states the orthology between e.g. the human and mouse protein(s). The annotation can be transferred from the human to the mouse protein using this method and the resulting mouse annotation will have the evidence code 'ISS' (Inferred from Structural or sequence Similarity) with the second PubMed ID. Now type the identifier(s) that you wish to transfer the annotations to into the 'Target proteins' box (see fig. 13) and click on 'OK' to transfer the annotations.
There is the option of transferring the annotations using a less specific GO term, if you feel there is not enough evidence to use the more granular term for the orthologs. To choose a less specific term, click on the "GO ID" box and a list of parent terms of the primary GO term will appear.
N.B. ISS-evidenced annotations that have been transferred from a Protein2GO-created annotation are linked to the original annotation, so any changes made to the original will cause a change in the transferred annotation. However, ISS annotations created by transferring from an externally sourced annotation will not be linked to the original annotation, regardless of the copy method used.
Choosing which proteins to transfer annotations to.
Orthologs/homologs are chosen following a protein MPsrch or BLAST, where the aligned sequences show a high degree of similarity over their entire lengths, making it reasonable to infer that the two proteins have a common function. It must be emphasized that curators must check each alignment and use their experience to assess whether similarity is considered to be strong enough to project annotations. While there is no fixed cut-off point in percentage sequence similarity, generally proteins which have greater than 60% identity that covers greater than 80% of the length of both proteins are examined further. For mammalian proteins this cut-off tends to be higher, with an average of 80% identity over 90% of the length of both proteins. Additional tools, such as the HCOP orthology tool , are used when possible. Strict orthologs are desirable but not essential. When there is evidence of paralogs, annotations are transferred only to the most similar protein in each species. Further detailed information on this procedure, including how ISS annotations are made to protein isoforms, can be found on the GOA website .
Existing annotations, which have been found to be incorrect, can be deleted by selecting the appropriate annotation using the check box to the left of the annotation line and then clicking on the 'Delete' button which will appear in the 'Available actions' row (see Fig. 11a). If you delete an annotation by mistake the annotation can be re-instated from the Deleted annotation section of Protein2GO. See also #Deleted annotation section below.
Comments can be associated with individual annotation lines simply by clicking on the speech bubble at the end of an annotation line (see Fig.18). A lightbox will appear (Fig. 22) into which you can add your topic and comment. Once a comment has been added to an annotation line, this will be indicated by a filled blue speech bubble (See Fig.11a). Note that if comments are meant to be temporary, they should be deleted once resolved. A sanity check will be run by the UniProt-GOA group to remind curators of their annotation comments that may need to be reviewed.
Existing information within database entries, including Swiss-Prot keywords (SPKW2GO), Swiss-Prot subcellular locations (SUBC2GO), UniPathway vocabulary (UniPathway2GO), Enzyme Commission numbers (EC2GO) and cross-references to InterPro (InterPro2GO) and HAMAP (HAMAP2GO) are manually mapped to GO terms. Electronically combining these mappings with a table of UniProt entries that contain one or more of these concepts generates a table of GO associations. Associations that are made electronically are labelled as 'inferred from electronic annotation' (IEA).
Electronic annotations are also created by projecting manual experimental annotations between orthologs (Ensembl Compara, EnsemblFungi and EnsemblPlants/Gramene).
Figure 23 shows an example of an electronic annotation section of Protein2GO. The electronic annotation method used to create the annotation is indicated in the 'with' column and the concept the GO term is mapped to is indicated in the 'with ID' column, for example, SPKW KW-0796 refers to the Swiss-Prot keyword 'tight junction'.
Electronic annotation is not editable in Protein2GO, so if you spot an error in an annotation please dispute the annotation (see "Annotation Disputes" section).
The deleted annotation section shows all annotations which have been associated with the protein entry but are no longer. Figure 24 shows an example of a deleted annotation section. Most of the annotations have the TAS evidence code and so were probably replaced with annotations with a better evidence code. One of the annotations is from the PINC source, these are legacy annotations from when GOA first started and they used only one of three evidence codes; E, P and NR - E and P have been converted to evidence codes TAS and NAS and the NR code is now obsolete. PINC annotations are slowly being replaced by those with better experimental evidence codes, so if you come across any PINC annotations, please try to find experimental evidence with which to update the annotation.
Annotation in the deleted section can be returned to the 'edit' annotation section by clicking on the green curved arrow to the left of the annotation. Any annotation which has incorrect information, such as the PINC annotation in fig. 16 which uses an incorrect evidence code, cannot be returned to the edit section as indicated by the red exclamation point to the left of the annotation. Clicking on this warning icon will tell you why the annotation is incorrect, in this case the message reads "P2GE:2 Evidence code isn't valid"
If you would like to add a comment to a deleted annotation, e.g. to explain why it was necessary to delete it, you must enter the comment on the annotation BEFORE you delete it. Comments cannot be added to deleted annotations
If you disagree with any annotation you see in Protein2GO and you do not have the permission to update/delete the annotation, you can open an annotation dispute.
This is a simple mechanism for alerting the curator or group who created or last updated the annotation to a problem with the annotation. To dispute an annotation, select the tick box to the left of the annotation line and then click on the "Dispute Annotation(s)" button in the "Available Actions" section. This will open a dialog box containing the details of the annotation, a drop-down list to select the nature of the dispute and a free text box where you can give further details of the problem (see Figure 25). Once you have filled in all of the details, click on "OK" and an email will be sent to the appropriate curator or group and the dispute will be logged in Protein2GO.
N.B. You cannot dispute an annotation from an external source.
The standard operating procedure for disputing annotations is as follows;
1. Anyone with a Protein2GO account can dispute annotations (not GOC inferred ones) that are displayed in Protein2GO
2. The dispute will be emailed to the curator who made/last updated the annotation or to the group that curator belongs to if they are not Protein2GO users, e.g. Reactome, or the group that provides the predictions in the case of electronic annotations. In the case of annotations made by ex-curators or ex-curation groups, UniProt-GOA is emailed.
3. A reminder email is sent to the disputer after 30 days if the dispute is not resolved. It is the responsibility of the disputer to further chase up the disputee if they do not respond.
4. Once the dispute has been resolved to the satisfaction of the disputer, it is their responsibility to close the dispute.
5. You can view and resolve any disputes you have initiated that are still open by clicking on the "Unresolved disputes" button at the top of Protein2GO
N.B. If you just have a query about an annotation, you may use the "Email curator(s)" option, which will just send an email to the curator without storing any details of the request.
As well as viewing annotations to an individual entity, you can also view annotations made to a single paper. To load this view, simply enter the PubMed identifier into the search box in the top-left corner of Protein2GO.
All annotations made to this PMID will be displayed. You can still add annotations to individual entities from this view, but you will have to enter the accession/identifier for the entity you want to annotate.
Additionally, it is possible to add a comment to a paper using the comment text boxes at the top of the page. For instance, if you want to indicate that no species information was given in this paper and you had to find this from another source, or that the paper had been retracted so should not be used for annotation.
Fig. 26 shows the paper view of PMID:10793131. Note there is a comment associated with this paper in the top-left comment section. The comment is also indicated by the filled blue speech bubble next to the reference field of each annotation.
Viewing annotation statistics and sanity checks
The 'Reports' section of Protein2GO allows a curator to view the current statistics for their set of annotations as well as query for any annotations that have an error according to the in-built sanity checks.
To access the Reports, click on the 'Reports' button in the top left-hand corner of Protein2GO (Fig. 27)
Two sections should be visible; 'Common' which appear the same for each user (certain groups may request their own custom reports and these will be visible only to curators within those groups) and 'Sanity_Checks' which are common queries that can be run instantly over the database, alerting individual curators to any errors they may have in their annotations.
To view the reports that are available to you, expand the sections by clicking on the grey book icon on the left.
To create a customised report from any of those listed, just click on the 'Run' arrow next to the report (Fig. 28). Optional parameter boxes are provided to query for specific date ranges, groups, sources or evidence codes by using the format shown in the box. If the box is blank ALL possible parameters will be queried, e.g. all groups or all curators. Results will be shown at the bottom of the window.
To run a specified sanity check, click on the 'Run' arrow next to the required check. The results will be shown at the bottom of the window (Fig. 29). There is an optional parameter to include a username, this is populated with your own username by default. If the box is blank, it will query all curators.
Reports can be downloaded by clicking on the downward arrow icon and saving the file to your computer.
If you require additional reports or sanity checks to be included in Protein2GO Reports, please contact goa (AT) ebi.ac.uk
Annotation Extension field
This field was introduced in the GO annotation format in June 2010. It allows curators to add cross-references to other ontologies or sequence identifiers to qualify or enhance the annotation. The cross-reference is prefaced by an appropriate GO relationship and references to multiple ontologies or IDs can be entered into one field.
This field is entirely optional for GO curators and the full details of the format are being finalized by the GO Consortium.
This field will only be displayed for curators who have stated an interest in using it
If you would like to use this field, please contact GOA@ebi.ac.uk
The annotation extension field is at the bottom of the 'Add annotation' section (Fig. 30).
To add information to this field, click in the box and a lightbox will appear. Choose the relationship and database prefix from the first two drop-down menus, then type or paste in the appropriate database identifier into the third box.
One statement group should contain all the information for one specific observation, this may include entering one or more relationship/database identifier lines. For example, in Fig. 31 the first statement group contains two lines meaning the two statements are linked; the meaning of this extension is that the GO term used in the annotation is happening during mitochondrial DNA inheritance and within the primary oocyte (described by the relationship occurs in).
The second statement group shown in Fig. 31 is making a separate statement that is unrelated to the first; that the GO annotation has regulation target UniProtKB:P12345.
Linked statements will be displayed in Protein2GO separated by a comma and separate statement groups will be separated by a pipe.
Protein2GO error messages
Error: Term not known
Reason: The GO term used is not a current valid GO term.
Error: Invalid evidence code: <text>
e.g. Invalid evidence code: IOB
Reason: Evidence code does not exist, choose one from the drop-down list.
Error: Invalid qualifier: <text>
Reason: Qualifier does not exist, choose one from the drop-down list.
Error: Invalid entity (gene product) identifier: <text>
e.g. Invalid entity (gene product) identifier: ABC123
Reason: Protein accession is not in an acceptable format, you must enter a valid UniProtKB accession (e.g. Q4VCS5), UniProtKB isoform accession (e.g. Q4VCS5-1), UniProtKB feature identifier (e.g. P62987:PRO_0000396434) or IPI identifier (e.g. IPI00408378)
Error: Unknown or invalid <reference or with> database: <text>
e.g. Unknown With database MGI
e.g. Unknown Reference database NCBI
Reason: The Reference or With database entered is not recognised, choose one from the drop-down list or choose one that is appropriate for the evidence code, e.g. IPI for a ChEBI identifier.
Error: Incomplete <reference or with>
e.g. Incomplete with
Reason: Incomplete 'reference' or 'with' field information. The 'reference' and 'with' fields are checked to ensure that both a recognised database (from the drop-down list) and an identifier have been selected.
Error: Invalid <Reference or With> ID: <id>
e.g. Invalid With ID: Q4VCS55
Reason: The identifier entered in the 'reference' or 'with' field is not in the correct format for the chosen database.
Error: GO ID <GO identifier> requires With to be specified
e.g. GO ID GO:0005515 (protein binding) requires information to be included in the With DB and ID fields to be specified
Reason: When using the specified GO term, there must always be a valid protein accession in the 'with' field.
For further information on the restrictions imposed on GO:0005515 usage, please see this GO wiki page
Error: Unable to copy annotation with evidence code: <evidence code>
e.g. Unable to copy annotation with evidence code: NAS
Reason: Annotations made using NAS, TAS, IC, ND evidence codes cannot be transferred, using the ISS evidence code, to another protein entry.
Annotations made using the ND evidence code cannot be transferred using the 'Copy' method. To prevent an ND annotation co-existing with an experimentally coded annotation within the same ontology as the ND annotation, it is recommended that ND annotations are made directly to the protein entry.
Error: Copy Mode not specified
Reason: A copy mode must be chosen from the drop-down list.
Error: Target entity (gene product) identifier not specified (N.B. This error code is now obsolete)
Reason: When transferring an annotation to another entity, a valid accession(s)/identifier(s) must be placed in the 'Copy Proteins' box.
Error: Target of copy is an invalid protein accession: <text> (N.B. This error code is now obsolete)
e.g. Target of copy is an invalid entity (gene product): GO:0019899
When transferring an annotation to another entity, a valid accession(s)/identifier(s) must be placed in the 'Copy Proteins' box. Ensure that a valid accession/identifier is entered.
Error: Evidence code <evidence code> not valid for GO ID <GO identifier>
e.g. Evidence code ND not valid for GO ID GO:0016020
Reason: Some evidence codes are restricted to certain GO IDs as follows;
i) The 'ND' (No Data) evidence code can only be used with the three root GO terms; GO:0008150, GO:0003674, GO:0005575 ii) When using these three root GO terms it is only possible to use the ND evidence code.
iii) The 'IEP' (Inferred from Expression Pattern) evidence code may not be used with GO terms from the Cellular Component ontology.
Error: Qualifier not valid for <ontology> term
e.g. Qualifier not valid for Function term
Reason: The 'colocalizes_with' qualifier may only be used with GO terms from the Cellular Component ontology and the 'contributes_to' qualifier may only be used with GO terms from the Molecular Function ontology.
For further information, please see:
Error: Invalid use of NOT qualifier
Reason: The NOT qualifier must not be used with GO term 'protein binding' GO:0005515
Error: Target of copy is the same as the source: <entity identifier> (N.B. This error code is now obsolete)
e.g. Target of copy is the same as the source: Q4VCS5
Reason: It is not possible to make duplicate, redundant annotations to the same entity.
Error: Missing <reference or with>
Reason: There must always be a value in the reference field, choose a database from the 'Ref' drop-down menu and insert an appropriate ID for the reference database chosen.
The 'with' field must be completed for certain evidence codes, e.g. IPI and IC. Choose a database from the 'With' drop-down menu and insert an appropriate ID for the database chosen.
Error: Term is obsolete
Reason: The GO term chosen has been made obsolete from the ontology. Choose a different GO ID. Some of the obsolete terms have suggested alternative GO terms, these suggestions are displayed in QuickGO in the term page of the obsolete term.
Error: ChEBI IDs can only be used with terms that are descendants of GO:0005488 (binding)
Reason: ChEBI identifiers in the 'with' column can only be used with the 'binding' term GO:0005488 or any of it's descendants, they cannot be used with any other GO ID or with GO IDs under 'protein binding' GO:0005515. Choose an appropriate GO ID.
Error: Extra Tax Id can only be used with terms that are descendants of GO:0051704 (multi-organism process) or GO:0044215 (other organism)
Reason: The Extra Tax ID column is used for annotating proteins involved in interactions between organisms. These interactions are described by special GO terms in the Biological Process ontology under GO:0051704 (multi-organism process) or in the Cellular Component ontology under GO:0044215 (other organism). One of these terms must be used when the Extra Tax ID field is completed.
For further information, please see:
Error: Reciprocal annotations must have IPI evidence code (N.B. This error code is now obsolete)
Reason: Reciprocal annotations must only be made when using the 'protein binding' term GO:0005515 or any of it's descendants. The correct evidence code to use for protein binding-type annotations is IPI (Inferred from Physical Interaction). Choose 'IPI' from the drop-down menu and ensure you have entered a GO term that is either GO:0005515 or one of it's descendants.
Error: IPI annotations can not have GO IDs that are descendants of GO:0003824 (catalytic activity)
Reason: It has been agreed that evidence from a binding assay is not enough to suggest an annotation for a catalytic activity.
Error: Invalid source: <assigned_by>
Reason: The value in the 'Source' (assigned_by) column is not valid. Choose the appropriate source from the drop-down menu.
Error: Multi-component With strings are not valid for this evidence code:
Reason: Certain evidence codes can only be used with one entry in the 'with/from' field.
Error: With strings are not permitted for this evidence code:
Reason: Certain evidence codes, e.g. IDA (Inferred from Direct Assay), are not allowed to have a 'with/from' field entry.
Error: Error in extension: <annotation_extension>
Reason: The annotation extension is not formatted correctly. Either the relation used is not valid for the GO term in the primary annotation, or the entity type used in the extension is not valid for use with the specified relation.
Error: GO_REF:<GO_REF identifier> is not valid for this evidence code
Reason: Certain evidence codes may only be used with certain GO_REFs, e.g. ND annotations must use GO_REF:0000015.
Error: With string must be supplied for this evidence code.
Reason: Certain evidence codes must have an entry in the 'with/from' field, e.g. IC must have a GO identifier in the with/from field.
Error: NOT qualifier required for this evidence code.
Reason: Certain evidence codes must use a NOT qualifier, e.g. IKR (Inferred from Key Residues) requires a NOT qualifier, since the annotation indicates that a sequence variation means the entity cannot perform the specified role or be present in the specified location.
Error: No annotations found to <GO identifier> for <entity identifier>.
Reason: This error is shown when the contributes_to qualifier is used for a molecular function annotation and an experimental annotation to a protein complex GO term that performs that function is not found for the same entity. This is required when using the contributes_to qualifier (see the GO Consortium documentation), e.g. "No annotations to GO:0032991 (macromolecular complex) or any of its child terms exist for P12345."
Error: No experimentally-evidenced annotations found to <GO identifier> for <entity identifier> .
Reason: This error is shown when an annotation using the.
Error: Must use Reciprocal Annotation function to create IPI annotations to protein binding terms.
Reason: In order to prevent non-reciprocal annotations when using IPI evidence for a protein binding term, you must use the reciprocal annotation function, i.e. the crossed green arrows icon in the "Add annotation" section.
Error: GO_REF:<GO_REF identifier> is obsolete.
Reason: The GO_REF identifier is no longer in use, find an appropriate one from the GO Reference collection.
Error: Term is not to be used for manual annotations.
Reason: The term should not be used for direct manual annotation, a more specific term should be found.
Error: Cannot use NOT qualifier if Extension is supplied.
Reason: A NOT qualifier is supposed to refer to the GO term used in the primary annotation, it cannot be used to negate an extension. When a NOT qualifier is used on an annotation that has been extended, it is not clear whether the curator intended the NOT to apply to the primary GO annotation or to the extension.
Error: New GO term must be an ancestor of (i.e., less specific than) <GO identifier>.
Reason: When updating the GO ID of an ISS annotation, the only terms that can be used are ancestors of the term used in the original annotation that the ISS annotation was transferred from.
Error: GO_REF:0000057 can only be used with terms that are descendants of GO:0006915 (apoptotic process).
Reason: GO_REF:0000057 was made specifically for use with apoptotic process, it can only be used with the IC evidence code. The definition can be found in the GO Reference collection.
Error: Reciprocal annotations must have GO IDs that are descendants of GO:0005515 (protein binding).
Reason: Annotations made using the reciprocal annotation button (crossed green arrows) are intended to indicate interactions between two entities, e.g. protein binding another protein, protein binding a complex, complex binding a protein, therefore the only suitable GO terms to be used in these cases are child terms of protein binding.
Error: Annotation extension not allowed with this GO ID.
Reason: Annotation extensions may not be added to annotations made to the three root terms; Molecular Function (GO:0003674), Biological Process (GO:0008150) or Cellular Component (GO:0005575).
Error: With string must be the same as entity ID (protein accession) for this GO ID.
Reason: For certain GO terms, the 'with/from' field value should be the same as in the primary annotation, e.g. protein homodimerization (GO:0042803), protein self-association (GO:0043621).
Sources of GO annotation
Current groups we incorporate annotations from can be viewed in our latest UniProt GO annotation file release