- Course overview
- Search within this course
- Overview of key IMPC concepts and tools
- Introduction to the Solr API: accessing IMPC data programmatically
- What is Apache Solr?
- Important definitions: query, field, core, document, parameter
- Quiz 2: get yourself familiar with Solr terminology
- What is the difference between an IMPC parameter and a Solr parameter?
- Using simple Solr syntax in your browser
- Output of the simplest request in your browser
- A Python module to access IMPC data: installation and available functions
- Quiz 3: explain Solr request
- Solr query syntax: simplified explanation
- How to use the solr_request function from the impc-api python package
- How to perform a query: q parameter
- Exercise 1: getting familiar with the core
- How to request a limited number of documents: rows parameter
- Exercise 2: requesting three documents
- How to get specific fields: fl parameter
- Exercise 3: selecting specific fields
- Quiz 4: basic Solr parameters
- Filtering data in Solr: narrowing down your results
- How to query a specific field: filter by value
- Exercise 4: filtering by a single field
- How to filter numbers: range search
- Exercise 5: changing the p-value threshold
- How to combine multiple filters: Boolean operators
- Exercise 6: applying multiple filters
- How to exclude data: NOT operator
- Why parentheses are important: combine multiple Boolean operators
- Quiz 5: Boolean operators
- How to handle with null values: exclude empty fields
- Exercise 7: explore null values
- Downloading data: getting large results efficiently
- How to download large dataset effectively: pagination
- How to download the data: batch_solr_request function
- What formats are available for downloading: wt parameter
- Exercise 8: download the data
- What is the difference: JSON vs CSV
- What you need to keep in mind: query responsibly
- Quiz 6: request only necessary data
- Understanding IMPC data: resources and assistance
- Your feedback
Iterating over entities: genes, models, and other fields
The batch_solr_request function allows you to search for multiple items that belong to the same field. Pass the list of entities to the field_list parameter and specify the field to iterate over in the field_type.
You can iterate over genes, models or any other fields in the core. In the example below, we iterate over genes.
# Write genes to the Python list.
genes = ['Prkdc', 'Xrcc5', 'Xrcc4', 'Wrn']
# Iterate over list of genes.
df = batch_solr_request(
core='genotype-phenotype',
params={
'q':'*:*',
'fl': 'marker_symbol,mp_term_name,p_value',
'field_list': genes,
'field_type': 'marker_symbol'
},
download = False
)
Note: Square brackets should be used because this is a special Python object called a list. It is stored in the genes variable, which will then become the value of the field_list parameter.