These exercises demonstrate many possible uses of BioMart as a data-retrieval tool.To get the identical answers, you can follow along using the version 70 archive site, or if you want to see the answers you'd get with the most recent version of Ensembl, use our main BioMart site.
We will now guide you through using BioMart to find mouse proteins with transmembrane domains located on chromosome 9.
As with all BioMart queries you must select the dataset, set your filters (input) and define your attributes (desired output). For this exercise:
Dataset: Ensembl genes in mouse
Filters: Transmembrane proteins on chromosome 9
Attributes:Ensembl gene and transcript IDs and Associated gene names
- Go to the Ensembl homepage (http://www.ensembl.org) and click on BioMart at the top of the page.
- Select Ensembl genesas your database and Mus musculus genes as the dataset.
- Click on Filters on the left of the screen and expand REGION. Change the chromosome to 9.
- Now expand PROTEIN DOMAINS, also under filters, and select Transmembrane domainsand then Only. Clicking on Count should reveal that you have filtered the dataset down to 422 genes.
- Click on Attributes and expand GENE. Select Associated gene name.
Now click on Results. The first 10 results are displayed by default; display all results by selecting ALL from the drop down menu.
You should see a similar output to the screenshot below, which displays the Ensembl gene ID, Ensembl Transcript ID and Associated gene names of all proteins with a transmembrane domain on chromosome 9. If you prefer, you can also export to an Excel sheet by using the Export all results to XLS option.
BioMart is a very handy tool when you want to convert IDs from different databases. The following is a list of 29 IDs of human proteins from the RefSeq database of NCBI (http://www.ncbi.nlm.nih.gov/projects/RefSeq/):
NP_001218, NP_203125, NP_203124, NP_203126, NP_001007233, NP_150636, NP_150635, NP_001214, NP_150637, NP_150634, NP_150649, NP_001216, NP_116787, NP_001217, NP_127463, NP_001220, NP_004338, NP_004337, NP_116786, NP_036246, NP_116756, NP_116759, NP_001221, NP_203519, NP_001073594, NP_001219, NP_001073593, NP_203520, NP_203522
Generate a list that shows to which Ensembl Gene IDs and to which HGNC symbols these RefSeq IDs correspond.
Hint … These 29 proteins do not correspond to 29 genes. Why is that?
For a list of Ciona savignyi Ensembl genes, export the human orthologues .
ENSCSAVG00000000002, ENSCSAVG00000000003, ENSCSAVG00000000006, ENSCSAVG00000000007, ENSCSAVG00000000009, ENSCSAVG00000000011
You can use BioMart to query variations, not just genes.
a) Export the study accession, source name, chromosome, sequence region start and end (in bp) of human structural variations (SV) on chromosome 1, starting at 130408 and ending at 210597.
b)In a new BioMart query, find the alleles, phenotype descriptions, and associated genes for rs1801500 and rs1801368. Can you view this same information in the Ensembl browser?
Forrest et al performed a microarray analysis of peripheral blood mononuclear cell gene expression in benzene-exposed workers (Environ Health Perspect. 2005 June; 113(6): 801–807). The microarray used was the human Affymetrix U133A/B (also called U133 plus 2) GeneChip. The top 25 up-regulated probe-sets were:
207630_s_at, 221840_at, 219228_at, 204924_at, 227613_at, 223454_at, 228962_at, 214696_at, 210732_s_at, 212371_at, 225390_s_at, 227645_at, 226652_at, 221641_s_at, 202055_at, 226743_at, 228393_s_at, 225120_at, 218515_at, 202224_at, 200614_at, 212014_x_at, 223461_at, 209835_x_at, 213315_x_at
(a) Retrieve for the genes corresponding to these probe-sets the Ensembl Gene and Transcript IDs as well as their HGNC symbols and descriptions.
(b) In order to analyse these genes for possible promoter/enhancer elements, retrieve the 2000 bp upstream of the transcripts of these genes.
(c) In order to be able to study these human genes in mouse, identify their mouse orthologues. Also retrieve the genomic coordinates of these orthologues.
You can use BioMart for the non-vertebrate species hosted at www.ensemblgenomes.org. Export a list of Gene IDs (from the PomBase project only) for S. pombe that are protein coding and located on Chromosome III.
(Start at http://fungi.ensembl.org)
Design your own query