- Course overview
- Search within this course
- Overview of key IMPC concepts and tools
- Introduction to the Solr API: accessing IMPC data programmatically
- What is Apache Solr?
- Important definitions: query, field, core, document, parameter
- Quiz 2: get yourself familiar with Solr terminology
- What is the difference between an IMPC parameter and a Solr parameter?
- Using simple Solr syntax in your browser
- Output of the simplest request in your browser
- A Python module to access IMPC data: installation and available functions
- Quiz 3: explain Solr request
- Solr query syntax: simplified explanation
- How to use the solr_request function from the impc-api python package
- How to perform a query: q parameter
- Exercise 1: getting familiar with the core
- How to request a limited number of documents: rows parameter
- Exercise 2: requesting three documents
- How to get specific fields: fl parameter
- Exercise 3: selecting specific fields
- Quiz 4: basic Solr parameters
- Filtering data in Solr: narrowing down your results
- How to query a specific field: filter by value
- Exercise 4: filtering by a single field
- How to filter numbers: range search
- Exercise 5: changing the p-value threshold
- How to combine multiple filters: Boolean operators
- Exercise 6: applying multiple filters
- How to exclude data: NOT operator
- Why parentheses are important: combine multiple Boolean operators
- Quiz 5: Boolean operators
- How to handle with null values: exclude empty fields
- Exercise 7: explore null values
- Advanced Solr query techniques: faceting and iterating over entities
- Understanding IMPC data: resources and assistance
- Your feedback
How to download large dataset effectively: pagination
What if you would like to request all the data from one core? Can the solr_request function be used for this? No, do NOT execute solr_request, because it will return only a set number of documents (10 by default), instead of returning everything.
Setting the rows parameter equal to the size of the core is not an option, as it could overwhelm the system. If the internet connection is unstable, it could cause an error, and you would need to start downloading again from the beginning.
Instead, use the batch_solr_request function to download the large amount of data. It retrieves results in several chunks (pagination), which helps avoid the problems mentioned above. Alternatively, you can use non-programmatic methods to access IMPC data. Here are the instructions.
Let’s move to the next section to learn more about the batch_solr_request function.