Online tutorial
Accessing Mouse Phenotypes and Disease Associations with the IMPC Solr API
A complete Python guide
The International Mouse Phenotyping Consortium (IMPC) aims to identify the function of every protein-coding gene in the mouse genome. This tutorial will walk you through accessing IMPC data programmatically through the IMPC Solr API with Python.
Who is this course for?
This course is designed for anyone interested in learning how to access IMPC data using Python. It will provide you with the necessary skills to navigate and utilise the data effectively. However, basic Python programming skills are required so an introductory Python course is recommended. If you're new to IMPC or want to explore its resources further, we suggest starting with our course The International Mouse Phenotyping Consortium: Finding phenotypes for your gene of interest.
What will I achieve?
By the end of the course you will be able to:
- Explain how IMPC data is organised and what kind of information you can access.
- Use helper functions from the impc_api Python package to request IMPC data through the IMPC Solr API
- Customise query parameters to retrieve the data you are interested in
- Download IMPC data efficiently with Python using pagination
What resources do I need?
This tutorial includes exercises using Python to access the IMPC API. The exercises can either be completed using a Google Colab or Jupyter Notepad. There is further information about these choices on this page in the tutorial.
DOI:
10.6019/TOL.IMPCSolrAPI-t.2024.00001.1
Course contents
- Overview of key IMPC concepts and tools
- Introduction to the Solr API: accessing IMPC data programmatically
- What is Apache Solr?
- Important definitions: query, field, core, document, parameter
- Quiz 2: get yourself familiar with Solr terminology
- What is the difference between an IMPC parameter and a Solr parameter?
- Using simple Solr syntax in your browser
- Output of the simplest request in your browser
- A Python module to access IMPC data: installation and available functions
- Quiz 3: explain Solr request
- Solr query syntax: simplified explanation
- How to use the solr_request function from the impc-api python package
- How to perform a query: q parameter
- Exercise 1: getting familiar with the core
- How to request a limited number of documents: rows parameter
- Exercise 2: requesting three documents
- How to get specific fields: fl parameter
- Exercise 3: selecting specific fields
- Quiz 4: basic Solr parameters
- Filtering data in Solr: narrowing down your results
- How to query a specific field: filter by value
- Exercise 4: filtering by a single field
- How to filter numbers: range search
- Exercise 5: changing the p-value threshold
- How to combine multiple filters: Boolean operators
- Exercise 6: applying multiple filters
- How to exclude data: NOT operator
- Why parentheses are important: combine multiple Boolean operators
- Quiz 5: Boolean operators
- How to handle with null values: exclude empty fields
- Exercise 7: explore null values
- Downloading data: getting large results efficiently
- How to download large dataset effectively: pagination
- How to download the data: batch_solr_request function
- What formats are available for downloading: wt parameter
- Exercise 8: download the data
- What is the difference: JSON vs CSV
- What you need to keep in mind: query responsibly
- Quiz 6: request only necessary data
- Advanced Solr query techniques: faceting and iterating over entities
- Understanding IMPC data: resources and assistance
- Your feedback
How and when to access the course
All our courses are designed with flexibility in mind. You can access them for free at any time, just click on the “Enter Course” button.
It is up to you how you use the course; you can either study the full course or you can focus on sections that are relevant to you. To jump between sections, use the navigation bar on the left or the arrows at the bottom of the page. You can also choose whether to complete the course in one go, or over several visits.
The average time to read through the main body of the course is 3 hours (including exercises). The time may vary depending on your prior knowledge and how you choose to work through the course.
Making the most of the course
Learning something new takes time and practice. We encourage you to:
- Use the activities and quizzes to help you check your learning, recall and apply key concepts. Look out for these icons:-
|
|
|
- Revisit sections as and when you need them. Bookmark relevant pages in your browser or use the navigation panel to jump the relevant section.
Getting help and providing feedback
If something isn’t working or if you have a question get in touch by contacting us at trainonline@ebi.ac.uk
Tell us what you thought about the course (both good and bad!) using the “Feedback and help” button found at the top of each page.
Your feedback helps us ensure we are providing training that is relevant and useful for you.
For help and support on EMBL-EBI resources you can contact the helpdesk directly.
Learn more
You can explore other training on offer from EMBL-EBI on our website. We offer online courses, webinars, face-to-face courses and offsite training.