Course progress: 0%

How to download the data: batch_solr_request function

Before downloading anything, start with prototyping using the solr_request function. Once you are confident that you have everything you need in a small dataset, use the batch_solr_request function with the parameter download=True. Other parameters include:

filename — when download=True, specify the name of the file without the extension.
batch_size — the size of chunks to fetch the data. The default is 5000.

df = batch_solr_request(
    core='statistical-result',
    params={
        'q':'marker_symbol:Prkdc',
        'fl': 'marker_symbol,top_level_mp_term_name,effect_size,p_value'
    },
    download=True,
    batch_size=100,
    filename='output'
)

When download is set to False, the batch_solr_request function will return the whole dataset as a pandas DataFrame. This can be used for relatively small datasets, fewer than 1,000,000 documents, but for larger datasets, we recommend saving the data and then re-uploading it in Python.

Note: By default, batch_solr_request saves the data in JSON format.

Accessing Mouse Phenotypes and Disease Associations with the IMPC Solr API

How to download the data: batch_solr_request function

Congratulations!