- Course overview
- Search within this course
- Overview of key IMPC concepts and tools
- Introduction to the Solr API: accessing IMPC data programmatically
- What is Apache Solr?
- Important definitions: query, field, core, document, parameter
- Quiz 2: get yourself familiar with Solr terminology
- What is the difference between an IMPC parameter and a Solr parameter?
- Using simple Solr syntax in your browser
- Output of the simplest request in your browser
- A Python module to access IMPC data: installation and available functions
- Quiz 3: explain Solr request
- Solr query syntax: simplified explanation
- How to use the solr_request function from the impc-api python package
- How to perform a query: q parameter
- Exercise 1: getting familiar with the core
- How to request a limited number of documents: rows parameter
- Exercise 2: requesting three documents
- How to get specific fields: fl parameter
- Exercise 3: selecting specific fields
- Quiz 4: basic Solr parameters
- Filtering data in Solr: narrowing down your results
- How to query a specific field: filter by value
- Exercise 4: filtering by a single field
- How to filter numbers: range search
- Exercise 5: changing the p-value threshold
- How to combine multiple filters: Boolean operators
- Exercise 6: applying multiple filters
- How to exclude data: NOT operator
- Why parentheses are important: combine multiple Boolean operators
- Quiz 5: Boolean operators
- How to handle with null values: exclude empty fields
- Exercise 7: explore null values
- Downloading data: getting large results efficiently
- How to download large dataset effectively: pagination
- How to download the data: batch_solr_request function
- What formats are available for downloading: wt parameter
- Exercise 8: download the data
- What is the difference: JSON vs CSV
- What you need to keep in mind: query responsibly
- Quiz 6: request only necessary data
- Understanding IMPC data: resources and assistance
- Your feedback
Group and count: faceting query
In Solr, faceting is a feature that allows you to categorise search results into different groups based on specified criteria. It can be useful, for example, if you need to know all field values and their counts to filter out unnecessary ones and make the query more efficient.
To perform a facet request, use the solr_request function with the required Solr parameters: facet and facet.field.
facet— enables facet counts in the query response.facet.field— specifies the field to run faceting on.facet.limit— controls how many constraints should be returned for each facet. By default, it shows 100; set it to 15 if you need a smaller set. Make sure to set the correct value if you need the full set of different values.facet.mincount— specifies the minimum counts required for a facet field to be included in the response. We recommend setting it to 1 to exclude zero values.
More faceting query parameters are in the official Solr documentation.
num_found, df = solr_request(
core="statistical-result",
params={
"q": "*:*",
"rows": 0,
"facet": "on",
"facet.field": "zygosity",
"facet.limit": 15,
"facet.mincount": 1,
}
)
In this example, we execute a faceted search on a statistical-result core, specifically targeting the zygosity field to get a count of how many documents fall into different categories of zygosity.