Finding a protein-coding sequence

Search for a gene

Background

Human BRCA1 (breast cancer 1, early onset) is a tumour suppressor gene coding for a protein involved in DNA repair. Mutations in the BRCA1 gene in the germ line can result in the individual developing Hereditary Breast and Ovarian Cancer Syndrome.

Scenario

Imagine you are working on the BRCA1 gene, studying the effects of different genetic mutations on the function of the encoded protein. You need to know the coding sequence of human BRCA1, and of any known variants, so you can make targeted mutations to the gene sequence that will code for alternate amino acids in the protein. You can use ENA to search for the BRCA1 coding sequence.

To search ENA, type 'BRCA1 AND Homo sapiens' into the text search box

Figure 52. To search ENA, type 'BRCA1 AND Homo sapiens' into the text search box.

Steps

1. Open the ENA Browser in a new window.

2.Type the search term BRCA1 AND Homo sapiens into the text search box (this is a more stringent search than typing 'BRCA1 AND human').

3. Click ‘Search’ to obtain search results.

Results - ENA Browser summary

The ENA Bowser provides a summary of the nucleotide data available for the human BRCA1 gene in both the EMBL-Bank (assembled/annotated data) and SRA (raw data) databases (Figure 53). You can now explore the data further by expanding and viewing the EMBL-Bank protein-coding sequences available for this gene.

Results page showing the ENA Browser summary of the assembled (EMBL-Bank database) and raw (SRA database) sequence data available for human BRCA1

Figure 53. Results page showing the ENA Browser summary of the assembled (EMBL-Bank database) and raw (SRA database) sequence data available for human BRCA1; the EMBL-Bank database 'Protein-coding Sequence' section has been expanded.

Steps

1. Expand the Protein-coding Sequences [A] results by clicking on '+' [B]

Note: because the 'Protein-coding Sequences' section has already been expanded in Figure 52, it displays a '-'.

2. You should now have an expanded view [C] of all the coding sequence available for BRCA1.

3. Expand the entry for AAC37594 (circled) by clicking on '+' [D].

 

Warning

Note that many reads in the SRA database are likely to contain sequence from the BRCA1 locus, but will not have been annotated as such and will, therefore, not appear in the results of this search.

 

Results - obtaining BRCA1 coding sequence

Within one of the EMBL-Bank database entries for the BRCA1 gene, you can download the sequence or browse the biological annotation available (Figure 54). Note that there are several protein-coding sequences available for human BRCA1 in the EMBL-Bank database, which a researcher would want to view.

EMBL-Bank entry for AAC37594, which contains the protein-coding sequence for the human BRCA1 gene

Figure 54. EMBL-Bank entry for AAC37594, which contains the protein-coding sequence for the human BRCA1 gene.

Steps

1. To download the BRCA1 coding sequence, click on 'FASTA' in the top right-hand corner of the entry [A].

2. You should now have a pop-up text file containing the sequence in FASTA format.

3. Delete the text file. Although this file could be saved for later use by a researcher, we were only viewing it as a demonstration.

3. To explore the BRCA1 protein-coding sequence, take a look at the biological annotation available in the graphical display [B].