Finding genes using a sequence
Finding a gene
If you do not have a gene name or an accession number for your sequence of interest, you can use the sequence search facility of the ENA Browser to identify the closest matches in ENA. Sequence searching is also useful for finding potential homologues that are related to your gene of interest.
Imagine that you have isolated a gene associated with an autoimmune disease in humans. You know part of the DNA sequence and that mutations in the sequence are associated with a T cell-mediated autoimmune disease, but you do not know the gene it belongs to. You can use the ENA Browser's sequence search to see the closest matches in the ENA database.
Figure 58. To search ENA, copy/paste your sequence into the 'Sequence Search' box (red arrow) and click search.
Results - closest sequence matches
Our search will provide matches in both ENA and Ensembl, so you can look at the sequences and their annotation in ENA, then look at the Ensembl results to see the alignment of your sequence to the genome(s).
Note: ENA Browser's sequence search does not search raw sequences in either SRA or the Trace Archive, because the very short, redundant nature of raw sequences make sequence searching very difficult.
Figure 59. Close-up of the sequence search results page displaying the closest matching sequences in EMBL-Bank. The top hit is AF333072 (red arrow), which shows 100% identity (i.e. all the nucleotides in the query sequences align with the target sequence).
Results - analysing the results
Entry AF333072 is described as Homo sapiens HERV-K18, but HERV-K18 is a virus so what's going on?
Figure 60. Close-up of results page of EMBL-Bank entry ABF333072 showing a graphical overview of the annotation available for this sequence and the source of the sequence.
Although it is a human sequence, the gag, pol and env genes are usually associated with viruses, so why is this sequence not classified as being viral?
Genomes often contain 'foreign' DNA, such as endogenous viruses (ERVs) and transposable elements. ERVs are thought to have arisen from ancient viral infections, but through the course of evolution have remained permanently integrated within their host genome to be passed down to subsequent generations. When you sequence a genome and find these 'foreign elements', how do you classify them? Are they part of the host genome or are they separate entities?
In entry AF333072, the HERV-K18 endogenous virus was inserted into intron 1 of the human CD48 gene (see Note encircled in red in Figure 59), and the sequence was obtained from sequencing the human genome, therefore it is classified as human (HUM taxonomic division). On average the human genome contains 25-50 copies of endogenous HERV type K retroviruses.