Chart Builder documentation

Summary

  1. Introduction
  2. Fields
    1. Archive
    2. Chart type
    3. Global filter
    4. Data X
      1. Year
      2. Experimental metadata
      3. Custom queries
    5. Data Y
    6. Data series
  3. Attributes

Introduction

The EMDB Chart Builder (https://emdb-empiar.org/statistics/builder) is a web-based application that provides users with the ability to create customizable, dynamic, and informative charts that help analyse the holdings and trends of the EMDB and EMPIAR archives, powered by the EMDB search engine. The underlying data is based on the metadata of all EMDB entries, enriched by the EMICSS resource (https://emdb-empiar.org/emicss). This tool allows users to define specific search terms, enabling the extraction of data subsets that are directly relevant to their analysis objectives. By organising the extracted data into meaningful categories, users can generate insightful charts to visualise and analyse the results. The Chart Builder can be accessed at https://ebi.ac.uk/emdb/statistics/builder.This page presents a user-friendly interface specifically designed to facilitate the creation of customised charts using both EMDB and EMPIAR data.

  • Space
    Multiple data series definition
  • Space
    Plot charts from categorical variables
  • Space
    Multiple stack modes for the charts.
  • Space
    Display the data in linear or logarithmic scales.

Use of the Chart Builder requires some knowledge of the EMDB search engine. Therefore you might find it useful to read the search system documentation and the list of search fields. These resources provide valuable insights into the various search functionalities and parameters that can be employed to refine your data extraction.

The Chart builder now allows you to open and customise most of the charts available at /emdb/statistics. If you see an edit icon, simply click on it to open the corresponding chart in the Chart builder. This feature provides you with the flexibility to modify and personalise the chart according to your specific requirements, enabling you to derive deeper insights from the data visualisation.

Edit button available in EMstats plots.
Figure 1: Edit button available in EMstats plots.

Fields

Archive

Select if your chart is going to be using data about EMDB or EMPIAR. (Note: the amount of available metadata is currently much greater for EMDB than for EMPIAR.)

Chart type

The Chart Builder currently supports four types of charts:

  • Line: This chart type is best suited for visualising trends and changes over time. It can be used to illustrate how properties of one or more subsets of entries evolve over time.Column: The column chart is ideal for comparing different categories or groups.
  • Bar: The bar chart is ideal for comparing different categories or groups. It provides a visual representation of the distribution and relative magnitudes of specific entries within each category.
    Bar chart.
    Figure 3: Bar chart example.
  • Pie: The pie chart is a useful tool for illustrating proportions and percentages. It provides a concise visual representation of the distribution of a selected subset of entries among various categories. Pie charts can only have one data series, and the data categories are defined by the Data X parameter.
    Pie chart.
    Figure 4: Pie chart example.
  • Area: Similar to a line chart, the area chart emphasises the cumulative size of subsets of entries over time.
    Line chart.

Global filter

The optional global filter in the Chart Builder allows you to specify a search term that will be applied to the selected entries for the entire chart. This feature ensures that the resulting plot focuses on the specific data subset defined by the search term. For instance, suppose you wish to create a plot showcasing the number of entries related to Sars-Cov-2 per EM method. In that case, you can simply change one of the previous examples and set the following query within the global filter: `natural_source_ncbi_code:"2697049"`. This query narrows down the data to entries associated with the Sars-Cov-2 NCBI code, enabling the creation of a customised plot that visualises the desired information accurately.

Data X

The Data X field determines the data that will be displayed on the X-axis or as categories in a pie chart. Within this field, there are three types of data X definitions:

Year:

The X-axis values in the chart correspond to the release year of the entries. By modifying the Data X settings, you can refine the previously created SARS-CoV-2 chart to display data specifically from the year 2019 and beyond.

Experimental metadata:

When selecting the option of "Experimental metadata" for the Data X field, you have the ability to choose one of the enumeration categories. The chosen category's elements will be displayed on the X-axis of the chart. For instance, you can set the experimental metadata category as an EM method to showcase the number of Sars-Cov-2 entries per each type of experiment.

Custom queries:

You can define custom queries to specify a set of criteria that will be displayed on the x-axis of the chart. This allows you to customise the chart further according to your specific requirements. For example, you can modify the previous chart to include only the columns related to single-particle, subtomogram averaging, and tomography by using custom queries.

Custom queries are particularly important when creating pie charts since they can only have a single data series, and the slices of the pie are defined by the elements in the custom queries. We can modify the previous chart to be displayed as a pie chart. This allows for a concise visual representation of the distribution of the selected subset of entries among each experiment method.

Data Y:

The Data Y field is used to set what information is going to be displayed along the Y-axis of your chart. For example, you can choose to display the number of entries, publications, or resolution on the Y-axis, depending on the nature of your analysis and the insights you seek to derive from the chart.

Data series:

This field defines the data series that will be displayed in your chart. A data series consists of three pieces of information: operator, query, and label. The available operators depend on the selection made in the Data Y field. If Data Y is a numeric variable, the operator can be average, minimum, maximum, or sum. On the other hand, if Data Y is a categorical variable, the operator can be the count of unique values or cumulative values. Additionally, you can apply an optional filter query for each data series. This is useful when comparing multiple classes of information, as demonstrated in the example below.

Attributes

There are seven attributes that can be used to print, download, export and change the chart visualisation:

Chart attributes.
  1. Change how area and columns are displayed. There are three options: unstacked, stacked and stacked percent.
  2. Select to display the data series in a logarithmic or linear scale.
  3. Share the current chart. The resulting URL will be copied into the clipboard.
  4. Print the current chart.
  5. Download the chart image or data table.
  6. Full Screen mode.
  7. Extra options.