|Annotation||The process of attaching additional information to biological entities. Annotation can be structural (i.e. identification of the elements from a sequence, such as protein coding regions or the location of regulatory motifs) or functional (i.e. adding biological information to the identified elements, such as the biological function of a protein domain or an entire protein, or the molecular interactions or regulatory role of a nucleotide sequence). Annotation can either be applied automatically or can be manually added (in a process called 'curation') from various sources, such as the scientific literature. At EMBL-EBI, we use a combination of automatic and manual annotation to enrich our databases. Annotation can either be applied automatically or it can be curated (manually) from the scientific literature. At EMBL-EBI, we use a combination of automatic and manual annotation to enrich our databases. |
|CDS||Coding DNA sequence - the region of a gene that codes for protein. |
|European Nucleotide Archive||The European Nucleotide Archive (ENA) is a comprehensive databank of primary nucleotide sequence information. ENA provides access to both assembled sequence and unassembled (raw) sequence reads, but places them in separate databases in order to optimise accessibility and analysis. http://www.ebi.ac.uk/ena/
|FASTA||This tool provides sequence similarity searching against protein databases using the FASTA suite of programs.
You can find out more on FASTA on the WikiPedia page: http://en.wikipedia.org/wiki/FASTA |
|FASTQ||A text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. Both the sequence letter and quality score are encoded with a single ASCII character. It was originally developed at the Wellcome Trust Sanger Institute to bundle a FASTA sequence and its quality data, but has become the de facto standard for storing the output of high throughput sequencing instruments. |
|Gene ontology||Gene Ontology (GO) is a controlled vocabulary used to describe the biology of a gene product in any organism. There are 3 independent sets of vocabularies, or ontologies, that describe: the molecular function of a gene product, the biological process in which the gene product participates and the cellular component where the gene product can be found (http://www.geneontology.org). |
|InterPro||The EBI’s integrated resource for protein motifs, families and domains. It provides a single, consistent interface of protein signatures contributed by ten different databases, each of which uses a slightly different method for deriving protein signatures. |
|Metagenome||A metagenome consists of all the genetic material extracted from an environmental sample. This presents numerous challenges for bioinformatics, given the diversity of species that may be present in a given sample. Metagenomics is the study of metagenomes. |
|Metatranscriptomics||Analysis of microbial gene expression in environmental samples. |
|Raw data||Data that have not been subjected to processing or any other manipulation |
|Sequence Read Archive||Reads of raw data consisting of short, unassembled fragments of sequence generated using Next Generation sequencing technology.
|Standard flowgram format||A binary file format used to encode the results of pyrosequencing from the 454 Life Sciences platform for high-throughput sequencing. SFF files can be viewed, edited and converted to FASTQ format with sffinfo (Newbler tools). |
|rRNA||Ribosomal ribonucleic acid (rRNA) is the RNA component of the ribosome, the enzyme that is the site of protein synthesis in all living cells.
For more information about rRNA, please visit the Wikiedia page: http://en.wikipedia.org/wiki/RRNA |