- Course overview
- Search within this course
- Overview of key IMPC concepts and tools
- Introduction to the Solr API: accessing IMPC data programmatically
- What is Apache Solr?
- Important definitions: query, field, core, document, parameter
- Quiz 2: get yourself familiar with Solr terminology
- What is the difference between an IMPC parameter and a Solr parameter?
- Using simple Solr syntax in your browser
- Output of the simplest request in your browser
- A Python module to access IMPC data: installation and available functions
- Quiz 3: explain Solr request
- Solr query syntax: simplified explanation
- How to use the solr_request function from the impc-api python package
- How to perform a query: q parameter
- Exercise 1: getting familiar with the core
- How to request a limited number of documents: rows parameter
- Exercise 2: requesting three documents
- How to get specific fields: fl parameter
- Exercise 3: selecting specific fields
- Quiz 4: basic Solr parameters
- Downloading data: getting large results efficiently
- How to download large dataset effectively: pagination
- How to download the data: batch_solr_request function
- What formats are available for downloading: wt parameter
- Exercise 8: download the data
- What is the difference: JSON vs CSV
- What you need to keep in mind: query responsibly
- Quiz 6: request only necessary data
- Advanced Solr query techniques: faceting and iterating over entities
- Understanding IMPC data: resources and assistance
- Your feedback
Exercise 7: explore null values
Run the query below and answer the questions: how many fields will be in the generated dataset? Why?
num_found, df = solr_request(
core='statistical-result',
params={
'q': 'NOT mp_term_name:[* TO *]',
'fl': 'marker_symbol,effect_size,p_value,mp_term_name',
'rows': 3
}
)
Go to exercise 7 in the Google Colab. Once you’ve finished Exercise 7 in the Google Colab, return here to continue the tutorial.
Show the correct answer
NOT mp_term_name:[* TO *] filters for items where mp_term_name is null. Therefore, in the generated dataset, there will be only three fields: marker_symbol, effect_size, and p_value.
Once you have finished exercise 7, continue to the next section where we will learn about efficiently downloading large datasets.