Support for non-human variant data archival and accessioning is transitioning from dbSNP to EVA from September 2017. [Read more]

Overview

The European Variation Archive is an open-access database of all types of genetic variation data from all species.

All users can download data from any study, or submit their own data to the archive. You can also query all variants in the EVA by study, gene, chromosomal location or dbSNP identifier using our Variant Browser.

We will be adding new features to the EVA on a regular basis, and welcome your comments and feedback.

News

Statistics

Short genetic variants studies (<50bp)

Structural variants studies (>50bp)

This web application makes an intensive use of new web technologies and standards like HTML5. Please see FAQs for further browser compatibility notes.

Submit

The EVA accepts genetic variants (SNPs and INDELs) from any species, and provides stable long term accessions and archival of the data. The EVA works in collaboration with the Database of Genomics Variants Archive (DGVa) to accession and archive structural variants. DGVa relies on a template based submission process that is explained in detail here

Data submitted to the EVA is brokered to our collaborating databases at NCBI, dbSNP and dbVar. It is therefore unnecessary to submit data to multiple resources.

All data valid for EVA submission shall be made available via the Study Browser and will be browsable using both the Variant Browser and the EVA API. Variant Effect Predictor annotations shall be available for variants mapped to genome assemblies that are known to Ensembl.

Please contact eva-helpdesk@ebi.ac.uk if you would like any further information on this brokering process or collaboration.

Data requirements

EVA accepts submission of genetic variation data based on three criteria:

  1. The genome assembly used is International Nucleotide Sequence Database Collaboration (INSDC) registered, or will be at point of submission
  2. The variation data is described in valid VCF file(s) this can be tested prior to submission using the EVA VCF Validation Suite found here
  3. For all data submitted to the EVA, we require that it be possible to compute allele frequencies for all submitted variants. Therefore, the EVA supports two types of submissions: 1) variation data with sample genotypes 2) summary data with population allele frequencies

Converting genetic variation data to VCF

Submitters that need to convert their data to VCF should first contact eva-helpdesk@ebi.ac.uk to discuss their situation, as we may be able to offer advice and/or tools to aid the conversion process.

We advise submitters to consult the VCF specification guidelines (specifically sections 1.1 and 5) when converting data to VCF in order to ensure that file(s) generated are valid, as this is a requirement of EVA submission.

Should manual conversion to VCF be necessary, we have provided a minimal VCF file template here; this may be useful for non-technical submitters, or submitters with only a very low number of variants to report.

Key stages of EVA submissions

Contact

Contact eva-helpdesk@ebi.ac.uk in order to provide details of your submission. You shall receive a custom private FTP location for you to deposit files.

Prepare

Submissions to the EVA consist of VCF file(s), any associated data file(s), and metadata that describe sample(s), experiment(s), and analysis that produced the variant and/or genotype call(s). This metadata is described in an Excel template that can be found here. Please also see here for a mocked up version of this template that has been completed for a fictional study.

VCF file(s) submitted to the EVA must be truly valid a 4.X version of the file format specification. Files can be validated prior to submission using our validation suite that is available via our EVA VCF validation suite.

Submit

Upload your VCF file(s), associated data file(s) and EVA metadata template to your private EVA FTP location.

Receive

The EVA aims to process submissions within two business days. Accession numbers shall be sent via email to the submitter upon successful archival of the deposited data.

Clinically relevant genetic variation data

The EVA strongly recommends submission of clinically relevant genetic variant data, i.e. data that relates genetic variation(s) with clinical significance values (e.g. pathogenic, benign, etc.), to the ClinVar resource at NCBI.

Submitters unsure of the most relevant resource to archive their genetic variation data are encouraged to first contact eva-helpdesk@ebi.ac.uk to discuss their situation.

Feedback

If you have any questions related to the European Variation Archive resource, please contact us.

Follow us on Twitter using @EBIvariation

API

The general structure of a EVA REST web service URL is:

http://www.ebi.ac.uk/eva/webservices/rest/{version}/{category}/IDs/{resource}?{filters}

Where:

* version: indicates the version of the API, this defines the available filters and JSON schema to be returned. Currently there is only version 'v1'.
* category: this defines what objects we want to query. Currently there are five different categories: variants, segments, genes, files and studies.
* resource: specifies the resource to be returned, therefore the JSON data model.
* filters: each specific endpoint allows different filters.

REST web services have been implemented using GET protocol since only queries are allowed so far. Several IDs can be concatenated using comma as separator.
For more detailed information about the API and filters you can visit the project wiki and Swagger documentation.

Help

  • What is the European Variation Archive (EVA)?

    The European Variation Archive (EVA) is EMBL-EBI's open-access genetic variation archive. The EVA accepts submission of all types of genetic variants, ranging from single nucleotide polymorphisms to large structural variants, observed in germline or somatic sources, from any eukaryotic organism. The EVA permits access to these data at two distinct levels:

    i) The raw variant data as was submitted to the EVA, via the EVA Study Browser

    ii) The normalised and processed variant data, via the EVA Variant Browser and EVA API

  • What are the EVA normalisation and variant processing steps?
    EVA Variant Level Processing: Submitted data from the EVA Study Browser -> Variants are merged, normalized and annotated for functional consequences and statistical values -> Processed variants are brokered to dbSNP at NCBI and resulting 'ss' and 'rs' accessions are ingested by EVA -> Data are exposed as JSON objects either via the EVA website GUI or API

    Normalisation

    Variants submitted to the EVA have been determined by a number of different algorithms and software packages. As a result, the VCF files generated by these differing methodologies describe variants in a number of different ways. The primary processing step of the EVA is to normalise variant representation following two basic rules:

    1. Each variant is shifted to be left-aligned
    2. The Start and End positions represent exactly the range where the variation occurs (which could, in the case of insertions, result in the reference allele being recorded as 'empty')

    Examples of our variant normalisation process can be seen here

    Annotation

    Once variants have been normalised, the EVA uses the Variant Effect Predictor (VEP) of Ensembl to annotate variant consequences. The variant consequences are described using Sequence Ontology terms and both the VEP version and Ensembl gene build used are described via the "i" help bubbles on the EVA Variant Browser.

    N.B. Variants that have been mapped to a reference genome sequence that is not supported by Ensembl are not annotated.

    Statistical calculations

    The EVA adopts the classical definition of allele frequency (AF): 'a measure of the relative frequency of an allele at a genetic locus in a given population'. The AF value(s) stored by the EVA for each variant is (are) study specific - i.e. the same variant reported in two distinct studies shall be given two allele frequencies, one for each study. There are two methodologies by which the EVA is able to determine allele frequency values, dependent on the datatype of the study in question:

    Variants associated with genotypes:

    For variants associated with genotypes, the EVA determines the AF values via the calculation:

    AF = (number of alternate allele observations (AC)) / (number of observations (AN))

    The result of this calculation allows the EVA to also store the minor allele frequency (MAF) for each variant (defined as the minimum of the reference or alternative allele frequency) and the MAF allele (the allele associated with the MAF).

    Variants not associated with genotypes:

    For variants that are not associated with genotypes, the EVA is dependent on the AF value(s) estimated from the primary data and provided in the submitted VCF files(s). AF values that are specifically provided in the submitted aggregated VCF file(s) are directly stored. In cases where no AF is provided then the EVA uses the AC and AN values in the submitted aggregated VCF file(s) to calculate AF value(s) via the calculation:

    AF = AC / AN

    Population / sample cohort allele frequency values:

    The EVA accepts submission of pedigree files, or structured samples (using "derived_from" and/or "subject" layers), to define populations and cohorts within studies. In cases where such information is associated with variants that have genotypes then the EVA calculates intra-study population/cohort specific AF values via the method described above, with the caveat that the (total number of populations/cohorts):(total number of samples) ratio must be less than 1:10. For studies that do not contain genotypes but instead provide intra-study population/cohort AF values in the submitted aggregated VCF file(s), or AC and AN values, then these are directly stored, or calculated by the EVA using the method described above, again with the caveat that a ratio of 1:10 (total number of populations/cohorts):(total number of samples) must not be exceeded.

    *NB: there are a low number of variants for which the EVA is unable to determine any allele frequency value(s) as the submitted VCF file(s) contain neither genotypes nor AF or AC and AN values. The EVA discourages submission of variants that cannot be associated with an AF.

  • With whom does the EVA collaborate?
    Collaborators:GEUVADIS European Exome Variant Server and Príncipe Felipe Research Cent​er​. Past Collaborators: The Genomics and Bioinformatics Platform of Andalusia​


    The EVA & GEUVADIS European Exome Variant Server

    The ​​EVA & GE​UVADIS European Exome Variant Server ​(GEEVS;) work in collaboration to ​coordinate​​ common ​data formats ​for ​data​ exchange​​. As part of this collaboration, we fully endorse the variant calling protocol detailed on the GEEVS website as adherence to this protocol for variant calling permits direct comparison and/or aggregation of results from different datasets.

    The EVA & Príncipe Felipe Research Cent​er​

    Some of the technical and analytical features of the EVA were developed in collaboration with the department of Computational Genomics led by Joaquin Dopazo at the Principe Felipe Research Centre Computational Genomics Department (CIPF).

    Past Collaborations

    EVA & The Genomics and Bioinformatics Platform of Andalusia​

    Early development of the EVA was carried out in collaboration with the Bioinformatics Department at the Genomics and Bioinformatics Platform of Andalusia.

  • How can I follow the development of the EVA?
    The following are our GitHub repositories:
    • The EVA VCF validator checks that a file is compliant with the VCF specification. It includes and expands the validations supported by the vcftools suite. It supports versions 4.1, 4.2 and 4.3 of the specification.

    • The EVA pipeline processes VCF files, stores the variation data in a database and post-processes it, in a way that can be later consumed via web services.

    • The EVA web services serve the data generated and stored by the EVA pipeline. They follow the REST paradigm and can be consumed by any external application.

    • The EVA website displays the data served by the EVA REST web services API in a user-friendly way.

    Acknowledgement

    We would like to acknowledge the following software support.

  • How can I consume variant data from the EVA?

    Links are shown on the EVA Study Browser to both the raw submitted files and the 'EVA browsable files'. Not all submitted VCF files for a study are browsable due to overlapping information. For example, we have studies where the submitted VCF files contain the sample and variant data split by population, but also merged together. It would be redundant for the EVA to load all of these data to our variant warehouse. Please note that due to processing time required not all studies have 'EVA browsable files' just yet.

    Additionally users can download query results from the EVA variant browser directly and all of our loaded data is available programmatically via our API.

  • Which browsers does the EVA website support?

    The EVA website employs HTML5 technologies and standards. Chrome (version 18 or higher), Firefox (version 12 or higher), Safari (version 6 or higher) , Opera (version 12 or higher) and Internet Explorer (version 10 or higher) are fully supported, however older versions of these browsers may give rise to errors.
    The EVA website also supports mobile versions of web browsers, with limited functionality. Please report all errors to the EVA helpdesk.

  • How can I search EVA with existing dbSNP accessions?

    The EVA offers users the ability to search both the EVA Variant Browser and API using existing dbSNP 'ss' and 'rs' accessions, as well as those that are newly generated by the EVA.

  • Can I download accessioned variants from the EVA?

    Yes. After an rs build the EVA will provide a VCF dump for each species/assembly combination. These files shall be in the same style as the current VCFs that dbSNP generates (i.e. contain the rs ID positional information, but not the genotypic nor annotation data). Furthermore we are developing a 'VCF Dumper' tool which shall allow users to generate a custom VCF file based on filtering options available at the EVA Variant Browser.

  • How can I track variants that have been generated, merged and/or deprecated over time?

    In addition to outputting a VCF file for each rs build the EVA shall also generate a file that will allow users to track changes to variant numbers over time. These files shall include details of all newly generated accessions for each rs build, a list of those variants that have been deprecated as well as information on any variants that have been merged together. The EVA shall follow the same rules as dbSNP for generating, deprecating and merging variant accessions.

  • What data is shown in the EVA Clinical Browser?

    The EVA Clinical Browser displays variant data imported from the NCBI resource ClinVar, where each variant is associated with both a phenotype and a clinical significance assigned using the guidlines from the American College of Medical Genetics and Genomics. Importantly, the variants shown via the EVA Clinical Browser have been annotated against the human GRCh37 genome using both the VEP and GENCODE Basic gene build of Ensembl version 78.

  • Why submit data to the EVA?

    The EVA provides to the community a completely free, secure and permanent solution to data sharing. Each project, VCF file and sample that is submitted to the EVA is assigned a unique identifier that is accessible in perpetuity and is therefore able to be referenced in publication, for example. The EVA helpdesk provides support to submitters, and users, to ensure accurate represention and proper integration of the submitted data with other EMBL-EBI resources such as the EGA and Ensembl. A final advantage of submitting to the EVA is that variants are brokered to the National Centre for Biotechnology Information (NCBI) on the submitter's behalf, negating the requirement for independent submissions.

  • Is my data suitable for submission to the EVA?

    The most important consideration is that all data archived at the EVA is open access. As such, there are no restrictions as to who can access the data or how such data is reused. It is the submitter's responsibility to ensure that the data archived at the EVA complies to this open access policy.

    Genetic variants submitted to the EVA must be described in the Variant Call Format (VCF). The EVA has developed a custom VCF file validator and accepts submission of only VCF files that pass these validation steps. Furthermore, VCF files submitted to the EVA should provide either genotypes from the individual samples analysed, or aggregated sample summary information, such as allele frequencies.

    Finally, each submission to the EVA is accompanied by a completed metadata template. This metadata template captures the study description, details of the sample(s) analysed and the experimental methodology. Submission of as much metadata as possible is strongly encouraged as this information is extremely useful for downstream analysis and is directly related to the frequency at which datasets archived at EVA are reused.

  • What happens to my data once submitted?
    EVA File Level Processing: Submitter brings VCF File(s) and metadata template to EVA -> Submitted data are validated against VCF specification and reference genome sequence -> Accessions are sent to submitter,for referencing in publication for example -> EVA provide links to download the submitted data via the Study Browser

    Submission validation processes

    VCF Specification:

    All VCF files submitted to EVA are validated for adherence to the format specification using the EVA VCF validation suite which includes all the checks from the vcftools suite, and some more that involve lexical, syntactic and semantic analysis of the VCF input. The EVA VCF validation suite also includes a debugging tool to automatically correct many of the common errors found in files. In order to improve processing time, submitters are encouraged to prevalidate VCF files prior to submission.

    Genome Assembly:

    To improve interoperability of variant data submitted to EVA with other resources at EMBL-EBI, and the wider open-access community, all VCF files submitted are subject to validation against the INSDC accessioned genome assembly that is referenced in the associated EVA metadata template. EVA is able to only accept files that match a known assembly at 100%. VCF files that fail this validation step shall be archived at the European Nucleotide Archive only.

    Submission summary

    All VCF files and novel samples that are submitted to EVA are permanently and securely archived at the European Nucleotide Archive and BioSamples, respectively. The EVA provides access to all submitted data via the EVA Study Browser.

    Variants within VCF files submitted to EVA are normalised, annotated and used for statistical calculations (via methodologies described below) and these EVA processed data are available via the EVA Variant Browser and EVA API.

    Finally, the EVA brokers all submitted data to the NCBI: dbSNP, for short variants (<50bp); dbVar for large variants (>50bp) or ClinVar, for variants associated with both a phenotype and clinical significance.

  • Do EVA and dbSNP accept submission of the same genetic variation file format?

    Yes. EVA, like dbSNP, accepts submission of genetic variation data that is described in Variant Call Format (VCF) files. However, unlike dbSNP, EVA does not have a custom VCF specification. Instead EVA only accepts submission of VCF files that conform to the specification guidelines. Conformity to these guidelines is checked at time of submission, but can also be checked beforehand using the EVA Validation Suite. Further guidelines for submission to EVA can be found here.

  • Do EVA and dbSNP collect metadata in the same way?

    No. Although there are many similarities, the EVA and dbSNP metadata templates, which are submitted along with VCF files, are different. The EVA metadata template asks the submitter to provide information about the project and samples so that these can be archived simultaneously along with the VCF file(s), whereas this is a stepwise process during dbSNP submission. The EVA metadata template can be found here and a mocked up version for a fictional study is here. Detailed information on how to submit to the EVA can be found here.

  • Can my submitted data be held privately at the EVA?

    Yes. Data submitted to the EVA can be held privately for up to one year. The date of publication is set by the submitter using the "Hold Date" field of the EVA metadata template.

    EVA & dbSNP Transition

    As of September 2017, EMBL-EBI will maintain reliable accessions for non-human genetic variation data through the European Variation Archive (EVA). NCBI's dbSNP database will continue to maintain stable identifiers for human genetic variation data only. This change will enable a more rapid turnaround for data sharing in this burgeoning field.

    For more information please see the following press releases, presentation and FAQs. We shall continue to rollout more information about this transition in the run up to September. Please subscribe to eva-announce@ebi.ac.uk and follow us at @evarchive, where we shall continue to post updates. If anything remains unclear please contact us at eva-helpdesk@ebi.ac.uk.

    Press releases

    FAQs

    • What are the key steps for the dbSNP - EVA transitional process?

      We have outlined the transitional process, including the timeline and more detailed technical information, in this online presentation:

      EVA-presentation
    • What accessions are administered by the EVA?

      The EVA follows the SRA model for accessioning:

        • A submitted project is administered a 'project' accession. These begin "PRJEB" followed by a numerical sequence.
        • Each analysis object within a project is administered an 'analysis' accession. These begin "ERZ" followed by a numerical sequence.

      Both the project and analysis accessions are sent to the submitter once the data has been validated and fully archived at EMBL-EBI. The project and analysis accessions are stable identifiers that are suitable for publication.

    • What variant accessions will be administered by the EVA?

      The EVA shall follow the same accessioning principles employed by dbSNP: non-human variants that are submitted to the EVA are issued 'Submitted SNP' (ss) accessions, and these are periodically clustered to form 'Reference SNP' (rs) accessions.

    • Will it be possible to retrieve existing dbSNP accessions from the EVA?

      Yes. The EVA is committed to the continuation of existing dbSNP 'submitted SNP' (ss) and 'reference SNP' (rs) accessions.

      The accessions of those dbSNP variants that satisfy the EVA submission requirements will be retrievable via the EVA variant browser and web services API. If you want to learn more about EVA submission requirements, please click here.

    • Why are some variants in a different strand than the one dbSNP reports?

      The EVA accepts variant submission in the forward strand only, whereas dbSNP is not so restrictive. In addition to this, dbSNP registers multiple orientations:

        • Contig to chromosome
        • SNP to contig
        • SubSNP (ss) to RefSNP (rs)

      As a result, variants with a 'reverse' orientation will be displayed differently despite the underlying data being equivalent.

    • Why can't I find some of the dbSNP accessions on the EVA website?

      We will gradually import the dbSNP data, and your species of interest may not have been yet processed. Please check our status report on the transition process.

      If a species has been already imported into the EVA, the most probable reason is that the dbSNP variant did not satisfy the EVA submission requirements. Please click here to learn more about the EVA submission requirements.

      If you know a variant satisfied all of them and it is not displayed in our browser, please communicate the issue via eva-helpdesk@ebi.ac.uk

      dbSNP variants that don't satisfy these requirements will still be searchable via an accession tracking system that will contain all the existing dbSNP accessions (even those of lower quality) plus the new ones to be issued from 2018 onwards. This system will track the full history of an accession (creation, merge, deprecation) as well as the IDs of the studies that reported it. The details of a study submission, reported genotypes and frequencies, etc, can then be queried using the existing EVA website or web services API.

    • Will there be EVA builds or releases like the dbSNP ones?

      Yes. The EVA will create RefSNP dumps in VCF format every 6 months, starting in Q2 2018. These dumps shall contain the most basic information about each RefSNP:

        • Genomic coordinates
        • Reference and alternate alleles in the forward strand
        • Identifiers of the studies that reported it

      Please note that coordinates are mandatory in VCF format, so RefSNP's without them won't be included in the dump.

      The details of a study submission, reported genotypes and frequencies, etc., can then be queried using the existing EVA website and web services API.