CNV data requirements
This page describes the columns that should be present in your Copy Number Variant (CNV) summary statistics file for submission to the GWAS Catalog.
If you have any feedback about the data requirements listed below, please contact gwas-info@ebi.ac.uk
Example table
chromosome | base_pair_start | base_pair_end | neg_log10_p_value | beta | standard_error | statistical_model_type |
|---|---|---|---|---|---|---|
| 1 | 16600001 | 16605000 | 9.45 | 0.048 | 0.008 | additive |
| X | 86415001 | 86425000 | 13.661 | -0.035 | 0.003 | additive |
Field (column) structure
Column names must appear exactly as shown below. Any differences or typos in your column names will cause validation errors.
-
GWAS Catalog submissions are expected to be full genome-wide datasets, not just top hits.
-
CNV analyses should contain at least 10,000 pre-QC variants.
-
It's OK if quality control steps have reduced the number of rows in your final dataset below the minimum row count, but please ensure you are submitting the full set of variants that were analysed in your study, including data which didn't meet GWAS significance.
Position
Required — all fields required
How is the position of the variant represented?
| Column | Description |
|---|---|
chromosome | Chromosome where the variant is located |
base_pair_start | Start position of the variant (0/1 based) |
base_pair_end | End position of the variant (0/1 based) |
Statistical significance
Required — select one
How are p values stored in your file?
| Column | Description |
|---|---|
p_value | p value. Smaller p values are more significant. |
neg_log10_p_value | Negative log₁₀ of the p value. Larger values indicate greater significance. |
Effect size
Required — at least one required
At least one effect size must be selected.
| Column | Description |
|---|---|
beta | Regression coefficient. |
odds_ratio | Odds ratio estimate. |
hazard_ratio | Hazard ratio estimate. |
z_score | Z-score statistic. |
Uncertainty estimate
Conditional
How have you measured the uncertainty of your effect size estimates?
Required when an effect size is provided.
| Column | Description |
|---|---|
standard_error | Standard error. |
confidence_interval_lower | Lower bound of the primary effect size confidence interval (typically odds ratio). |
confidence_interval_upper | Upper bound of the primary effect size confidence interval (typically odds ratio). |
Statistical model information
Required — all fields required
Additional information about the statistical model used in the study.
| Column | Description |
|---|---|
statistical_model_type | Genetic association model type (e.g. additive, dominant, recessive) |
Other fields
Optional
Please consider including this data to improve the quality of your submission.
| Column | Description |
|---|---|
n | Sample size per variant. |
You can also include a reasonable number of extra fields relevant to your study in your submission.
Validation rules
These rules are enforced during validation. The same rules apply in both the web tool and the command line interface.
- Beta requires standard error: If beta is selected as an effect size, standard_error must also be included as an uncertainty estimate. Standard error may only be used with beta.
- Confidence intervals require odds ratio or hazard ratio: Confidence interval bounds are only valid when odds_ratio or hazard_ratio is selected as an effect size. If one bound is provided, both must be provided.
- Z-score does not accept uncertainty estimates: Z-score is a standardised statistic. When z_score is the sole or primary effect size, no uncertainty estimate (standard error or confidence interval) should be provided.
- Odds ratio and hazard ratio are mutually exclusive: Odds ratio and hazard ratio cannot both be provided in the same file.
- Primary effect size must be designated: If more than one effect size column is present, one must be designated as the primary measure of effect. The primary measure of effect is placed in a standardised column in the output file.