|Annotation||The process of attaching additional information to biological entities. Annotation can be structural (i.e. identification of the elements from a sequence, such as protein coding regions or the location of regulatory motifs) or functional (i.e. adding biological information to the identified elements, such as the biological function of a protein domain or an entire protein, or the molecular interactions or regulatory role of a nucleotide sequence). Annotation can either be applied automatically or can be manually added (in a process called 'curation') from various sources, such as the scientific literature. At EMBL-EBI, we use a combination of automatic and manual annotation to enrich our databases. Annotation can either be applied automatically or it can be curated (manually) from the scientific literature. At EMBL-EBI, we use a combination of automatic and manual annotation to enrich our databases. |
|Aspera||Aspera (http://asperasoft.com/) is a company owned by IBM that has produced software for the transmission of data through their patented Fast And Secure Protocol (FASP; http://asperasoft.com/technology/transport/fasp/). It is freely available through the Aspera Connect web browser plug-in (http://downloads.asperasoft.com/connect2/) which can be used for manually uploading big datasets to PRIDE, or downloading public dataset files from the PRIDE Archive. For more information, see: http://www.ebi.ac.uk/pride/help/archive/aspera |
|EMBL-Bank||The EBI’s database of nucleotide sequences. A member of the International Sequence Database Collaboration (www.insdc.org), EMBL-Bank exchanges data every 24 hours with the other INSDC databases to ensure that they are all comprehensive and up to date. EMBL-Bank can be accessed from www.ebi.ac.uk/embl/ |
|European Nucleotide Archive||The European Nucleotide Archive (ENA) is a comprehensive databank of primary nucleotide sequence information. ENA provides access to both assembled sequence and unassembled (raw) sequence reads, but places them in separate databases in order to optimise accessibility and analysis. http://www.ebi.ac.uk/ena/
|Metadata||A term used to describe data that provides additional information about a particular data set. This information can include: how, when and where the data set was generated and what standards were used. In the proteomics context the addition of metadata such as peptide and protein identifications and quantification of their expression values gives meaning to a simple collection of mass spectra output files. |
|Next generation sequencing||Next generation sequencing or high-throughput sequencing technologies parallelise the sequencing process, producing thousands or millions of sequences at once.
You can find out more about NGS /HTS on the Wikipedia page: http://en.wikipedia.org/wiki/Next-generation_sequencing#High-throughput_sequencing |
|Sequence Read Archive||Reads of raw data consisting of short, unassembled fragments of sequence generated using Next Generation sequencing technology.
|Web Services||Web Services ( www.w3.org/ws ) are software systems designed to support interoperable machine-to-machine interaction over a network. To ensure that software systems from different sources work well together, they are built using open standards such as SOAP. |
|accession number||A unique, relatively stable, identifier given to database record which allows you to track different versions of that record over time in a single data repository.
For example, in in the ArrayExpress Archive, experiments and array designs are given unique accession numbers in the format of E-XXXX-n for experiments and A-XXXX-n for array designs. XXXX is a four letter code indicating the course of submission and n is a number e.g. E-MEXP-568. Some experiments also have secondary accession numbers.
In the UniProt database, proteins have unique UniProt Accession Numbers (e.g. P04637) and UniProt Protein ID's (e.g. P53_HUMAN). Uniprot accessions are unique to specific protein isoforms in specific species, and are used as the standard method for uniquely referencing a protein in EBI resources. Uniprot accessions cross-link the entries in various UniProt databases. Most often, researchers will find it useful to follow the Uniprot accession back to an entry in UniProtKB/Swiss-Prot to view a curated summary of known information about that protein.
There is a 'ID Mapping' Tool on the UniProt homepage which can be useful for converting Accession Numbers to corresponding idenfiers in other databases.
|curation||In the context of biological databases, curation is the process of interpreting and representing biological data using standardised annotation, controlled vocabularies and standardised formats, so the data can be stored and made available to the scientific community. |
|gene||A molecular unit of heredity of a living organism. Genes hold the information to build and maintain an organism's cells and pass genetic traits to offspring. All organisms have many genes corresponding to various biological traits, some of which are immediately visible, such as eye color or number of limbs, and some of which are not, such as blood type or increased risk for specific diseases, or the thousands of basic biochemical processes that comprise life. |
|gene expression||The process by which information from a gene is used in the synthesis of a functional product. Gene expression is the most fundamental level at which the genotype gives rise to the phenotype. The genetic code stored in DNA is "interpreted" by gene expression, and the properties of the expression give rise to the organism's phenotype.