|Ancestor||An ancestor is an organism or sequence from which another is descended |
|Annotation||The process of attaching additional information to biological entities. Annotation can be structural (i.e. identification of the elements from a sequence, such as protein coding regions or the location of regulatory motifs) or functional (i.e. adding biological information to the identified elements, such as the biological function of a protein domain or an entire protein, or the molecular interactions or regulatory role of a nucleotide sequence). Annotation can either be applied automatically or can be manually added (in a process called 'curation') from various sources, such as the scientific literature. At EMBL-EBI, we use a combination of automatic and manual annotation to enrich our databases. Annotation can either be applied automatically or it can be curated (manually) from the scientific literature. At EMBL-EBI, we use a combination of automatic and manual annotation to enrich our databases. |
|Automatic annotation||Annotation is the addition of extra information (‘metadata’) to a database entry to provide context and crosslinks to related information. Automatic annotation involves using computational tools to automatically extract and add the relevant information, as opposed to manual annotation which is undertaken by humans. |
|Controlled vocabulary||A controlled vocabulary makes a database easier to search by drawing together all of the different words and phrases used to describe a concept under a single word or phrase. Synonyms are also listed and searchable so that you do not need to know the selceted word or phrase in advance. |
|Ensembl||Ensembl is a joint project between the EMBL-EBI and the Wellcome Trust Sanger Institute that aims to develop a system that maintains automatic annotation of large eukaryotic genomes. All the software and data are free to access without any constraints. The project is primarily funded by the Wellcome Trust. It is a comprehensive source of stable annotation with confirmed gene predictions that have been integrated from external data sources. Ensembl annotates known genes and predicts new ones, with functional annotation from InterPro, OMIM, SAGE and gene families. |
|Ensembl Genomes||The Ensembl Genomes resource is a collection of five portals for genome-scale data: Ensembl Bacteria, Protists, Fungi, Plants and Metazoa. The resources uses the Ensembl software suite for genome analysis and browsing. |
|FASTA||This tool provides sequence similarity searching against protein databases using the FASTA suite of programs.
You can find out more on FASTA on the WikiPedia page: http://en.wikipedia.org/wiki/FASTA |
|GO evidence code||A three-letter code used by curators to describe to the user how they found evidence that a gene product has a particular function. A list of evidence codes can be found in the GO guide to evidence codes: www.geneontology.org/GO.evidence.shtml
|GOA||The UniProt Gene Ontology Annotation (GOA) program aims to provide high-quality Gene Ontology (GO) annotations to proteins in the UniProt Knowledgebase (UniProtKB).
|Gene association file||Tab-delimited files of the associations between gene products, GO terms and associated annotation information. The file format is described on the GO Consortium web site; www.geneontology.org/GO.format.annotation.shtml |
|Gene ontology||Gene Ontology (GO) is a controlled vocabulary used to describe the biology of a gene product in any organism. There are 3 independent sets of vocabularies, or ontologies, that describe: the molecular function of a gene product, the biological process in which the gene product participates and the cellular component where the gene product can be found (http://www.geneontology.org). |
|InterPro||The EBI’s integrated resource for protein motifs, families and domains. It provides a single, consistent interface of protein signatures contributed by ten different databases, each of which uses a slightly different method for deriving protein signatures. |
|Manual annotation||Annotation involves taking data and presenting it in a way that will allow for the extraction of information by adding further knowledge to the data (meta-data). Manual annotation is undertaken by humans, and although this is more labour intensive than automatic annotation, it is generally considered to be more precise.
In Ensembl: Manual annotation of the human, mouse and zebrafish genomes from the VEGA/Havana group refers to genes that have been individually analysed by a team of experts in order to determine the transcript set at each locus.
In UniProt: Manual annotation (or curation) consists of a critical review of experimental and predicted data for each protein as well as manual verification of each protein sequence. Curation methods applied to UniProtKB/Swiss-Prot include manual extraction and structuring of information from the literature, manual verification of results from computational analyses, mining and integration of large-scale data sets, and continuous updating as new information becomes available.
Examples of manually annotated databses include UniProtKB/Swiss-Prot and IntAct. |
|Ontology||Is a formal representation of knowledge as a set of concepts within a domain, and the relationships between those concepts |
|Proteome||A proteome is a set of proteins produced in an organism, system, or biological context. We may refer to, for instance, the proteome of a species (for example, Homo sapiens) or an organ (for example, the liver). The proteome is not constant; it differs from cell to cell and changes over time. To some degree, the proteome reflects the underlying transcriptome, however protein activity is also modulated by many factors in addition to rates of production. |
|Proteomics||Proteomics is the large-scale study of proteomes. A proteome is a set of proteins produced in an organism, system, or biological context. We may refer to, for instance, the proteome of a species (for example, Homo sapiens) or an organ (for example, the liver). |
|REST||Representational state transfer (REST) is a style of software architecture for distributed hypermedia systems such as the World Wide Web.
Definition source: Wikipedia. Full reference is here: http://en.wikipedia.org/wiki/REST) |
|UniProt||UniProt – Universal Protein Resource: The world's most comprehensive catalogue of information on proteins and a central repository of protein sequence and function, created by joining the information contained in UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, and PIR http://www.ebi.ac.uk/uniprot/ |
|UniProt-GOA||The UniProt-Gene Ontology Annotation database provides high-quality manual and electronic GO annotations to proteins within UniProtKB. UniProt-GOA is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. |
|UniProtKB||UniProtKB (UniProt Knowledgebase) is the central access point for extensive curated protein information, including function, classification, and cross-reference. |
|Web Services||Web Services ( www.w3.org/ws ) are software systems designed to support interoperable machine-to-machine interaction over a network. To ensure that software systems from different sources work well together, they are built using open standards such as SOAP. |
|curator||A professional scientist who collects, annotates, and validates information that is disseminated by biological and model organism databases. The role of a biocurator encompasses quality control of primary biological research data intended for publication, extracting and organizing data from original scientific literature, and describing the data with standard annotation protocols and vocabularies that enable powerful queries and biological database inter-operability. Curators communicate with researchers to ensure the accuracy of curated information and to foster data exchanges with research laboratories. |
|taxa||A taxon (pural: taxa) is a taxonomic group of any rank, such as a species, family, class. |