DGVa data


The data stored in the DGVa are organised according to the DGVa's data model and are centred around three types of object (also shown in Figure 2):

  • the study
  • the genomic region in which the variation occurs
  • the particular variation observed in an individual sample (call)

As part of the curation process, the DGVa provides stable accession numbers for each of these types of objects. These accession numbers provide a permanent point of reference that can be used in publications and for searching other resources (for example, searching for a structural variant region in Ensembl.) The DGVa also stores other important information relating to these data objects.


'Study' is the placeholder for all data objects and related information for a genomic structural variation study. The accession number has the prefix estd (or, if the study has been curated by dbVar, nstd.) Study-related information includes details about the study authors and their affiliation, the type of study and the publication that describes the study. 


'Region' denotes the genomic location where structural variation is asserted to exist.  The accesssion number has the prefix esv (or, if the study has been curated by dbVar, the prefix is nsv.) The authors of a Study assert the presence of a structural variant region on the basis of individual variation observed in samples. Region-related information includes the assertion method, which describes how the variation in samples has been merged to define the region (for example, sample calls overlap by at least 80%.)


'Call' describes the individual variation observed in a sample. The accession number has the prefix essv (or, if the study has been curated by dbVar, the prefix is nssv.) Call-related information includes the name of the sample, the experimental procedure that generated the call (e.g. sequencing or array), the type of variation (e.g. deletion, insertion, etc.) and placement (location) in the sample's genome.

Figure 2.  The data model that links accessioned objects  The three types of accessioned objects are prefixed with e if processed by DGVa, n if processed by dbVar.  Variation in individual sample genomes is aggregated to a variant region with respect to a reference genome, by procedures described in the Assertion method.  Genomic positions of variant calls (shown in green) do not necessarily overlap completely.  Discovery and validation procedures are described in the Experiment attribute for each call.  The study is the container for all information relating to the body of work and points to any external resources that provide access to raw data (such as the European Nucleotide Archive or Array Express) or to publications describing the study and data (such as PubMed.)