- Course overview
- Search within this course
- Overview of key IMPC concepts and tools
- Introduction to the Solr API: accessing IMPC data programmatically
- What is Apache Solr?
- Important definitions: query, field, core, document, parameter
- Quiz 2: get yourself familiar with Solr terminology
- What is the difference between an IMPC parameter and a Solr parameter?
- Using simple Solr syntax in your browser
- Output of the simplest request in your browser
- A Python module to access IMPC data: installation and available functions
- Quiz 3: explain Solr request
- Solr query syntax: simplified explanation
- How to use the solr_request function from the impc-api python package
- How to perform a query: q parameter
- Exercise 1: getting familiar with the core
- How to request a limited number of documents: rows parameter
- Exercise 2: requesting three documents
- How to get specific fields: fl parameter
- Exercise 3: selecting specific fields
- Quiz 4: basic Solr parameters
- Filtering data in Solr: narrowing down your results
- How to query a specific field: filter by value
- Exercise 4: filtering by a single field
- How to filter numbers: range search
- Exercise 5: changing the p-value threshold
- How to combine multiple filters: Boolean operators
- Exercise 6: applying multiple filters
- How to exclude data: NOT operator
- Why parentheses are important: combine multiple Boolean operators
- Quiz 5: Boolean operators
- How to handle with null values: exclude empty fields
- Exercise 7: explore null values
- Advanced Solr query techniques: faceting and iterating over entities
- Understanding IMPC data: resources and assistance
- Your feedback
What is the difference: JSON vs CSV
CSV is a comma-separated values file, where the data is structured as a table. The first row is a header, followed by the values.
JSON is structured differently. Each object (enclosed in curly brackets) contains comma-separated items in a key:value pair, where the key is the “name of the column” and the value is an individual data point that can be a number (e.g., p_value and effect_size), a string (e.g., marker_symbol), or an array (e.g., top_level_mp_term_name). Square brackets indicate an array type.
Example of JSON:
{
"effect_size":0.100352112676056,
"marker_symbol":"Prkdc",
"p_value":0.041044776119403,
"top_level_mp_term_name":["hearing/vestibular/ear phenotype"]},
{
"effect_size":-0.333531393149707,
"marker_symbol":"Prkdc",
"p_value":0.243160963130514,
"top_level_mp_term_name":["hematopoietic system phenotype",
"immune system phenotype"]}
Example of the same CSV:
marker_symbol,top_level_mp_term_name,effect_size,p_value
Prkdc,hearing/vestibular/ear phenotype,0.100352112676056,0.041044776119403
Prkdc,"hematopoietic system phenotype,immune system phenotype",-0.333531393149707,0.243160963130514