EVA issues long-term IDs for non-human variants

EVA issues long-term IDs for non-human variants

9 May 2017 - 15:18

Summary

  • New agreement between the NCBI and EMBL-EBI shares responsibility for managing data from genetic variation experiments worldwide.
  • From September 2017, EMBL-EBI’s European Variation Archive (EVA) will issue locus accession numbers (Reference SNP, rs#) for all non-human species, and will continue to show data from all species.
  • NCBI’s dbSNP database will continue to issue accession numbers for human genetic variation data.
  • More rapid turnaround for accessioning will support scientists studying biodiversity and agriculturally relevant species.

Bethesda, US and Hinxton, UK, 8 May 2017 – In a new agreement, EMBL-EBI and the National Center for Biotechnology Information (NCBI) will support the long-term discoverability of genetic variation data by sharing responsibility for issuing variant locus accessions (called Reference SNP identifiers, rs#) for different species.

As of September 2017, EMBL-EBI will maintain reliable rs# accession numbers for variation data on non-human species through the European Variation Archive (EVA). NCBI’s dbSNP database will continue to maintain rs# identifiers for human genetic variation data. The change will enable more rapid turnaround for data sharing in this burgeoning field.

What is the challenge?

Publicly funded organisations put enormous efforts into detecting, cataloguing and analysing sequence variation, from single base changes (SNPs) to large, structural variants. The clinical function of only a fraction of these variants is known, but access to new data mapped to hard-won existing knowledge leads directly to a deeper understanding of how variants affect function.

One of the biggest challenges is the sheer volume of data being produced. New technologies have made it easier and cheaper than ever to generate new data about genetic variation in humans – knowledge that is crucial for advances in personalised medicine.

Biomedical research is just one part of the story. With concerted international efforts to improve the health of crops and farm animals, the rate of data generation has accelerated even more. This is very good news for plant, animal and pathogen research, because this new flood of information could transform our understanding how health and disease work in all species.

But this new knowledge is only truly useful if it is maintained and made accessible to scientists everywhere over the long term, through resources such as EVA and dbSNP.

What’s an accession number?

Accession numbers are a bit like social security numbers or national/personal IDs – they are highly specific to one entity, permanently. But instead of representing a person, they refer to pieces of discovered knowledge, for example about genetic variation. If this growing body of knowledge is to be useful to scientists, now and in the future, it must be managed very carefully.

Just as your personal ID stays linked to your bank account when you move house, the accession number for each variant (SNP) helps to keep track of where it is on the reference genome, even when an update to that genome causes the SNP’s position to shift.

Researchers use stable identifiers to reference specific variants and alleles in their publications. That same ID is used to cross-link information that may be held in different databases, and to make sure it always refers to the right part of a genome in successive reference genome builds. The genetic variation research community uses a scheme for stable identifiers called ‘Reference SNP’, or ‘rs IDs’.

Dr James Kijas of Australia’s Commonwealth Scientific and Industrial Research Organisation plays a key role in the economically important International Sheep Genomes consortium, which works in collaboration with EVA. That project uses the EVA to manage variant datasets to they can be put to use for investigating biological processes. “As the number of ruminant species and samples sequenced continues to increase, the demand for the EVA's services will continue unabated,” he stated in a letter of support. “The EVA enables us to centralise the growing collection of sequence variants for comparison that leads directly to increased discovery.”

How users benefit

“This change is a perfect example of how users – including those who use the data for community portals – benefit from our partnership,” explains Thomas Keane, Team Leader for the EVA at EMBL-EBI. “Researchers will now be able to share their experimental data about any non-human species, including model organisms, more rapidly and without worrying about whether it will be accommodated. We’ll use the well-known naming convention from dbSNP - that continuity is important for other services that automatically integrate variant data to serve focused research communities, like wheat or sheep.”

Free access to accessioned variants in a standardised manner is necessary for expanding our understanding of the genomics of key crops, pathogens and model organisms. One of the goals of the renewed agreement between dbSNP and the EVA is to ensure that variants from all species are accessioned properly and quickly. This will greatly enhance the utility of public data resources, and promote the continued analysis of the data well into the future.

Discover more

Read the announcement by the NCBI

About the European Variation Archive (EVA)

The EVA is EMBL-EBI's open-access database of all types of genetic variation data from all species. ​All users can download ​data from any​ ​study, or submit their own data to the archive. You can also query all variants in the EVA by study, gene, chromosomal location or dbSNP identifier using our Variant Browser.

About EMBL-EBI

The European Bioinformatics Institute (EMBL-EBI) is a global leader in the storage, analysis and dissemination of large biological datasets. We help scientists realise the potential of ‘big data’ by enhancing their ability to exploit complex information to make discoveries that benefit humankind. We are at the forefront of computational biology research, with work spanning sequence analysis methods, multi-dimensional statistical analysis and data-driven biological discovery, from plant biology to mammalian development and disease. We are part of EMBL, an international, innovative and interdisciplinary research organisation funded by over 20 member states and two associate member states. We are located on the Wellcome Genome Campus, one of the world’s largest concentrations of scientific and technical expertise in genomics.

About EMBL

EMBL is Europe’s flagship laboratory for the life sciences. We are an intergovernmental organisation established in 1974 and are supported by over 20 member states. We perform fundamental research in molecular biology, studying the story of life. We offer services to the scientific community; train the next generation of scientists and strive to integrate the life sciences across Europe. We are international, innovative and interdisciplinary, with more than 1600 people from over 80 countries operating across six sites in Barcelona (Spain), Grenoble (France), Hamburg (Germany), Heidelberg (Germany), Hinxton (UK) and Monterotondo (Italy). Our scientists work in independent groups to conduct research and offer services in all areas of molecular biology. Our research drives the development of new technology and methods in the life sciences. We work to transfer this knowledge for the benefit of society. www.embl.org

About the NCBI

The National Center for Biotechnology Information (NCBI) mission is to develop new information technologies to aid in the understanding of fundamental molecular and genetic processes that control health and disease. More specifically, the NCBI has been charged with creating automated systems for storing and analysing knowledge about molecular biology, biochemistry, and genetics; facilitating the use of such databases and software by the research and medical community; coordinating efforts to gather biotechnology information both nationally and internationally; and performing research into advanced methods of computer-based information processing for analysing the structure and function of biologically important molecules. www.ncbi.nlm.nih.gov/

Contact the news team

Mary Todd Bergman
Senior Communications Officer
mary@ebi.ac.uk
+44 (0)1223 494 665

Oana Stroe
Communications Officer
stroe@ebi.ac.uk
+44 (0)1223 494 369

Subscribe to the e-mail newsletter
Get a monthly round-up of the hottest news and features from EMBL, straight to your inbox.
Or stay updated with the RSS feed (EMBL-EBI only).