The International Sheep Genomics Consortium recently published the sheep genome in the journal Science. The team produced an unprecedented volume of DNA and gene expression data, which was annotated on the Wellcome Trust Genome Campus and is freely available through the Ensembl genome explorer.
“These data are incredibly rich, and more precise than anything we’ve had before from ruminant species,” says Paul Flicek, head of vertebrate genomics at EMBL-EBI.
The consortium compared the sheep genome with human, cattle, goat and pig genomes and identified several genes that are associated with wool production. Their work also reveals genes that are important in the evolution of the rumen, a chamber in the sheep stomach that breaks down cellulose-rich plants like grass into protein.
Much like the original reference human genome, the sheep data – particularly the gene expression data – are based on more than one individual: we worked with 82 samples that were taken from several closely related Texel sheep, including a ram, a ewe and their offspring. The scientists studied gene expression and metabolism features in over 40 different tissue types. These data can be used to pinpoint which genes can express specific features, for example lanolin production.
“We worked with about a terabyte of RNAseq data produced by the ISGSC experiments,” says Bronwen Aken, coordinator of Ensembl annotation at EMBL-EBI. “That’s more RNAseq data than any species we’ve worked with so far. The data were great because the reads were long and paired-end. And because so many different tissues were represented, we can start to put together an clear, unbiased picture of where things are happening in the sheep genome.”
The sheep genome project has taken around eight years altogether; it took about six months for the bioinformaticians to align the expression data and protein information from UniProt with specific locations on the genome. The assembled genome takes up around 2.61 Gb.
“The reads were about 150 bases long, and there were so many of them that we could put together a very precise alignment,” says Thibaut Hourlier of EMBL-EBI, who was deeply involved in the genome annotation. “There are a lot of agricultural researchers out there who are very happy that they now have such good data to work with.”
The world’s sheep stocks come to around one billion, so research based on these data could have a massive impact for rural economies.
“We investigated the completed genome to determine which genes are present in a process called gene annotation, which resulted in an advanced understanding of the genes involved in making sheep the unique animals that they are,” says Dr Brian Dalrymple, who led the project work at the Commonwealth Scientific and Industrial Research Organisation (CSIRO) in Australia. “Given the importance of wool production, we focused on which genes were likely to be involved in producing wool. We identified a new pathway for the metabolism of lipid in sheep skin, which may play a role in both the development of wool and in the efficient production of wool grease – lanolin.”
Professor Alan Archibald, Head of Genetics and Genomics at The Roslin Institute, says, “Sheep were one of the first animals to be domesticated for farming and are still an important part of the global agricultural economy. Understanding more about their genetic make-up will help us to breed healthier and more productive flocks.”
This collaborative study, involving 26 research institutions in eight different countries, was led by researchers from the Commonwealth Scientific and Industrial Research Organisation (CSIRO) in Australia; BGI and the Kunming Institute of Zoology in China; Utah State University and Baylor College of Medicine in the US; and The Roslin Institute in Edinburgh.
Jiang Y, Xie M, Chen W, Talbot R, et al. (2014) The sheep genome illuminates biology of the rumen and lipid metabolism. Science 344, 1168-1173.
About the International Sheep Genomics Consortium
The ISGC is a partnership of scientists and funding agencies in Australia, Austria, Brazil, China, Finland, France, Germany, Greece, India, Iran, Israel, Italy, Kenya, New Zealand, Norway, Saudi Arabia, Spain, Switzerland, Turkey, the UK and US who are developing public genomic resources to help researchers find genes associated with production, quality and disease traits in sheep. The project is entirely in the public domain, with prompt data release. Typically, genomic data will be released in accordance with current NHGRI policy. The intention is material from this work will be released into the public domain so that it is available for free unencumbered use by all individuals. www.sheephapmap.org/