This documentation provides detailed guidance on effectively integrating GWAS Catalog data services into your applications (scientific pipelines, scripts, web applications, etc). It's designed with developers in mind and includes plenty of helpful examples. If you have any questions, feel free to reach out to our helpdesk gwas-info@ebi.ac.uk.
The GWAS Catalog API platform provides all the tools you need to search for and identify variants linked to your disease of interest. Whether you're building a simple platform to ascertain disease-specific associations, searching across phenotypes for a specific variant, summarising associations within a population group, or identifying studies with full-genomewide summary statistics available for download, we provide you with all the resources needed to achieve your goal.
This API allows access to the literature-curated top associations & metadata (the same data that is available
via the GWAS Catalog website, eligibility criteria described elsewhere).
A second API enabling access to data from the
full genome-wide summary statistics collection is under development.
You can find a detailed reference manual containing endpoints, schemas, parameters, syntax, definitions, and
response codes here:
API Reference
The page includes a visual, interactive interface for exploring and testing the endpoints.
For more documentation about the extensive controlled vocabularies in use in the Catalog, including the API, we refer you to lists available for:
More documentation about population descriptors, including methods of assigning ancestry labels, is available here: https://www.ebi.ac.uk/gwas/population-descriptors
Here are some features of the API that it’s useful to know about before you start:
API responses are paginated to manage performance, scalability, and usability. The default
page size is 20 records. If there are many results returned, you'll need to handle next links in the response.
Users of the API are limited to 15 queries per second to prevent server overload and ensure fair usage. If the limit is exceeded, the API will slow down subsequent calls.
When querying by efo_trait (ontology trait mapping), you have two options:
show_child_traits=false.
show_child_traits=true. This is the default setting on the
GWAS Catalog trait pages.
If you prefer to first identify a wider range of relevant traits and query for a list, the best place to start
is the efo_trait endpoint. Searching here for a free text term will return any traits including the term.
For example, searching for "COVID-19" could return "COVID-19", "COVID-19 symptoms measurement", "long COVID-19", "response to
COVID-19 vaccine", and "time to remission of COVID-19 symptoms". You can then query your desired endpoint using the
list of traits.
Note, trait examples are shown here with their efo_trait name for readability, but it’s often better to use
the efo_id (e.g. MONDO_0004979) for precision. There’s more information about how we annotate traits
in the ontology documentation.
When querying associations or single-nucleotide-polymorphisms by gene, you have two options:
extended_geneset=false.
This is the same annotation that’s shown in the GWAS Catalog web interface.
extended_geneset=true. This is the annotation
that was used in the V1 API.
By default, the query uses option 1.
For hands-on, executable examples, please explore our collection of Jupyter Notebooks.
For example, answering scientific questions like "what variants are associated with a particular disease?", "Which studies of a trait have full summary statistics available?", "Which SNP has the strongest effect size for a particular disease?" and many more. This is a great place to start if you are new to the REST API.