0%

How to download large dataset effectively: pagination

What if you would like to request all the data from one core? Can the solr_request function be used for this? No, do NOT execute solr_request, because it will return only a set number of documents (10 by default), instead of returning everything.

Setting the rows parameter equal to the size of the core is not an option, as it could overwhelm the system. If the internet connection is unstable, it could cause an error, and you would need to start downloading again from the beginning.

Instead, use the batch_solr_request function to download the large amount of data. It retrieves results in several chunks (pagination), which helps avoid the problems mentioned above. Alternatively, you can use non-programmatic methods to access IMPC data. Here are the instructions.

Let’s move to the next section to learn more about the batch_solr_request function.