Using Beacon

Beacon is an API established by the Global Alliance for Genomics and Health initiative (GA4GH) that defines a standard for federated discovery of sensitive genomic and phenotypic data. You can learn more through the introductory webinar “The Beacon: a data discovery solution in genomics and health”, and you can visit the CINECA project blog post to find an overview of the GA4GH standards being developed and implemented by CINECA.

Beacon (version 2) aims to facilitate data findability and to support researchers with the secure and efficient reuse of translational genomic data. It is a tool that enables ‘discovery queries’ across cohorts, thus helping researchers to discover relevant samples, patient data, and cohorts of interest for their particular research questions. Beacon allows different querying scenarios where researchers can:

  • Perform genomic variation & region queries
  • List samples related to a phenotype (requires authentication or authorisation)
  • Filter by variables or interest (e.g. gender or age)
  • Consult clinical annotation about the variants found
  • Find relevant cohorts (e.g., a group of patients with a specific disease). More information about searching cohorts, a new feature developed in the context of CINECA, is available in the Beacon cohorts: A model for cohort discovery in CINECA and beyond blog post. 

How to use Beacon

A Beacon user (or end-user) interested in querying Beacon instances and networks, can do it either through the web interface or programmatically by using the Beacon API. While querying Beacon programmatically might be more relevant for bioinformaticians, the user interface (UI) can be the preferred option for other researchers or clinical profiles.

Querying Beacon programmatically:

The beacon API can be queried by structured JSON serialisation (‘POST’ request), or more commonly by standard query URLs with parameters (‘GET’ queries). You can visit the REST API documentation to find more information and examples about the Beacon structured queries. 

One of the most outstanding values of Beacon is its power to perform genomic queries, which includes queries for:

  • Sequence, query for the the existence of a specific sequence at a given genomic position
  • Ranges, query for any variant with at least partial overlap of the sequence range specified
  • Geneld, a variation of range query in which the sequence range is replaced by the gene symbol
  • Bracket, variation of range query that requires specification of the sequence ranges for both start and end positions of a genomic variation. 
  • Others, such as: allele query, amino acid change query, parameter interpretation query and parameter change log query.

You can learn more about the Beacon Genomic queries in the Beacon documentation.

Additionally, different filters can be applied to Beacon queries. These filters can be considered rules for selecting specific information (or field values) from a given dataset. The filtering feature allows querying phenotypes, disease codes or technical parameters associated with genomic variants. Querying Beacon programmatically supports four different types of filters:

  • Ontology terms, standards for biomedical data or metadata
  • Custom terms, for biomedical data or metadata locally defined by a beacon
  • Numerical values
  • Alphanumeric values

In the Beacon Documentation section you will find more information on how to use filters in queries.

Querying beacon using the User Interface:

Beacon can also be queried by an user interface. To learn how to use this interface, the user can access the Training UI and watch this demo video.

Using the querying examples at the main page, we can test the different kinds of possible queries. First, we just need to select the type of query we are interested in: variant, region, phenoclinic or cohort. Next, we can use one of the proposed examples or use the search bar to send our own query. 

NOTE: The final interface is under development. 

Exercise: Let’s try to query Beacon using the Training UI:

  1. Access the Training UI, go to the Phenoclinic section and select the Individuals collection from the drop-down box. 
    1. Note the ‘Log in’ option on the top of the website. In a real case scenario, you would need to log in to have authorised access to the data. As this is a Training UI, it is a dummy log in.
  2. Check all the provided query examples. Click on the one that says “ethnicity=NCIT:C16352,  geographicOrigin=England, Weight>50, Height-standing>150”. In this case we are searching for all the individuals whose ethnicity matches with NCIT:C16352 (in this case we used the ontology term, but we could also use the label as we will see hereunder).
  3. Submit the search and select the type of granularity of the response that you prefer (boolean, count or record). To see the full one (record one), please click on the login button at the top right part of the page.

Now we want to do the same query but instead of using the ontology term of ethnicity we want to use the label.

  1. Click on “Filtering terms” and look for the filtering term “NCIT:C16352”. You should find the matching label: “Black or Black British”.
  2. Now we are going to repeat the query with that parameter. Please, type in the search bar:
    1. ethnicity=Black or Black British
    2. geographicOrigin=England
    3. Weight>50
    4. Height-standing>150
  3. You should get the same results as before.

Interpreting beacon responses

Beacon provides three types of response:

  • Boolean, returns ‘true/false’ responses
  • Counts, specifies the total number of positive results found, only returns aggregated information
  • Detailed responses, returns details for every document

Some of the responses are only available depending on access permission, the user might need to log in to have access to complete results.

For more information about responses check our Beacon flavours