- Course overview
- Search within this course
- Overview of key IMPC concepts and tools
- Introduction to the Solr API: accessing IMPC data programmatically
- What is Apache Solr?
- Important definitions: query, field, core, document, parameter
- Quiz 2: get yourself familiar with Solr terminology
- What is the difference between an IMPC parameter and a Solr parameter?
- Using simple Solr syntax in your browser
- Output of the simplest request in your browser
- A Python module to access IMPC data: installation and available functions
- Quiz 3: explain Solr request
- Filtering data in Solr: narrowing down your results
- How to query a specific field: filter by value
- Exercise 4: filtering by a single field
- How to filter numbers: range search
- Exercise 5: changing the p-value threshold
- How to combine multiple filters: Boolean operators
- Exercise 6: applying multiple filters
- How to exclude data: NOT operator
- Why parentheses are important: combine multiple Boolean operators
- Quiz 5: Boolean operators
- How to handle with null values: exclude empty fields
- Exercise 7: explore null values
- Downloading data: getting large results efficiently
- How to download large dataset effectively: pagination
- How to download the data: batch_solr_request function
- What formats are available for downloading: wt parameter
- Exercise 8: download the data
- What is the difference: JSON vs CSV
- What you need to keep in mind: query responsibly
- Quiz 6: request only necessary data
- Advanced Solr query techniques: faceting and iterating over entities
- Understanding IMPC data: resources and assistance
- Your feedback
How to request a limited number of documents: rows parameter
The rows parameter specifies the maximum number of documents from the complete result set that Solr should return at one time.
The default value is 10, which is why we can’t request all the data from one core using *:* in the q parameter.
I know what you’re thinking! If the whole core contains 100,000 data points, you can use this number in the rows parameter. Yes, it is possible, but the request will be ineffective, and results may be lost in case of an unstable connection. Instead, please use batch_solr_request to download the data.
num_found, df = solr_request(
core='statistical-result',
params={
'q': '*:*',
'rows': 3
}
)
In the example above, three documents from the statistical-result core were requested.