- Course overview
- Search within this course
- Introduction to the Solr API: accessing IMPC data programmatically
- What is Apache Solr?
- Important definitions: query, field, core, document, parameter
- Quiz 2: get yourself familiar with Solr terminology
- What is the difference between an IMPC parameter and a Solr parameter?
- Using simple Solr syntax in your browser
- Output of the simplest request in your browser
- A Python module to access IMPC data: installation and available functions
- Quiz 3: explain Solr request
- Solr query syntax: simplified explanation
- How to use the solr_request function from the impc-api python package
- How to perform a query: q parameter
- Exercise 1: getting familiar with the core
- How to request a limited number of documents: rows parameter
- Exercise 2: requesting three documents
- How to get specific fields: fl parameter
- Exercise 3: selecting specific fields
- Quiz 4: basic Solr parameters
- Filtering data in Solr: narrowing down your results
- How to query a specific field: filter by value
- Exercise 4: filtering by a single field
- How to filter numbers: range search
- Exercise 5: changing the p-value threshold
- How to combine multiple filters: Boolean operators
- Exercise 6: applying multiple filters
- How to exclude data: NOT operator
- Why parentheses are important: combine multiple Boolean operators
- Quiz 5: Boolean operators
- How to handle with null values: exclude empty fields
- Exercise 7: explore null values
- Downloading data: getting large results efficiently
- How to download large dataset effectively: pagination
- How to download the data: batch_solr_request function
- What formats are available for downloading: wt parameter
- Exercise 8: download the data
- What is the difference: JSON vs CSV
- What you need to keep in mind: query responsibly
- Quiz 6: request only necessary data
- Advanced Solr query techniques: faceting and iterating over entities
- Understanding IMPC data: resources and assistance
- Your feedback
How to complete the exercises: Google Colab or local installation
To complete the exercises during this tutorial, you have two options: use Google Colab or install Jupyter Notebook locally.
1. Google Colab
Google Colab, or “Colaboratory,” allows you to write and execute Python code directly in your browser without any local setup. We recommend using this option. You will find the link for each exercise on the corresponding page. If you’ve never worked with Google Colab before, we recommend reading this page.
2. Local Installation
If you prefer, you can also install the necessary libraries on your local machine. Follow these steps:
- Follow the installation instructions provided in the repository to install the
impc_apiPython package and Jupyter Notebook - Download the Jupyter Notebook with the exercises to your local machine. You can use the
wgetorcurlcommands with the provided link, or clone the repository using:git clone https://github.com/mpi2/impc-data-api-workshop - Run the following command to start a Jupyter Notebook session:
jupyter notebook - Open the
.ipynbfile containing the exercises and apply your knowledge from the tutorial to complete them
Continue on to the next section where we will introduce how to programmatically access IMPC data.