User requirements

I want to find out if a sample in BioSamples has been updated without the need to fetch the entire sample content

Requirements

  • No specific requirements for this recipe

Scenario

Quick solution - Not scalable

Let’s say that you want to monitor a bunch of samples in BioSamples and whenever a sample is updated you want to retrieve the new content into your system.

One way of doing this is to have a list of accession and at a scheduled interval retrieve the samples from BioSamples and check if the update date on the sample changed since the last time you retrieved the data. This method works just fine but is not very efficient as all the times you need to download the entire sample content.

Obviously this is not a big deal if you need to check one sample, but if the number increases to the order of thousands, this could be quite a slow process.

A smarter solution

In the new version of BioSamples we provide ETAG functionality. This feature provides a unique "fingerprint" of the sample that changes as soon as the sample itself changes or a curation to the sample is applied. Basically the ETAG is like an hash of the sample.

With the ETAG you can submit a conditional request to BioSamples using an If-None-Match header. Check out this link to get more details on the If-None-Match HTTP header.

If the provided ETAG matches the one in BioSamples for the sample, this means the sample has not changed since last update and a 304 - Not Modified status is returned to the user. Otherwise the new content is provided along side with a new ETAG.

Going back to the original scenario, if the ETAGs for all samples you are interested in are stored locally, now you can use that to quickly scan BioSamples and download the content of the samples that have been actually updated.

Steps to reproduce

For this demo I will use cURL here for simplicity, but you can use any HTTP client. Also, I’m going to use a real sample, but be aware that the ETAG value may differ to the value at the time of writing.

1. Fetch the Sample and the corresponding ETAG

Here the request for the sample SAMEA2614688

curl -H "Accept: application/json" -i https://www.ebi.ac.uk/biosamples/samples/SAMEA2614688

and here the response header (we don’t really care about the content here)

HTTP/2 200
cache-control: max-age=60, public
content-type: application/json;charset=UTF-8
strict-transport-security: max-age=0
date: Tue, 16 Oct 2018 16:12:55 GMT
x-application-context: application:8081
x-xss-protection: 1; mode=block
x-content-type-options: nosniff
etag: "06b2bf5fb11041e36ad4c29a77ff3be55"
x-frame-options: DENY
content-length: 1488

2. Submit a new GET request including the ETAG

Let’s submit a new request with the If-None-Match header and the ETAG

curl -H "Accept: application/json" -H 'If-None-Match: "06b2bf5fb11041e36ad4c29a77ff3be55"' -i https://www.ebi.ac.uk/biosamples/samples/SAMEA2614688

here the response

HTTP/2 304
cache-control: max-age=60, public
strict-transport-security: max-age=0
date: Tue, 16 Oct 2018 16:14:57 GMT
x-application-context: application:8081
x-xss-protection: 1; mode=block
x-content-type-options: nosniff
etag: "06b2bf5fb11041e36ad4c29a77ff3be55"
x-frame-options: DENY

3. Submit a GET request with an older ETAG

Let’s pretended that your locally stored ETAG is different, like "07b2dc735675d4f54f0dc3df82c34daa1".

If you use that in the conditional request, you will get a 200 - OK response with the original sample content

curl -H "Accept: application/json" -H 'If-None-Match: "07b2dc735675d4f54f0dc3df82c34daa1"' -i https://www.ebi.ac.uk/biosamples/samples/SAMEA2614688

and here the response headers

HTTP/2 200
cache-control: max-age=60, public
content-type: application/json;charset=UTF-8
strict-transport-security: max-age=0
date: Thu, 18 Oct 2018 15:40:54 GMT
x-application-context: application:8081
x-xss-protection: 1; mode=block
x-content-type-options: nosniff
etag: "06b2bf5fb11041e36ad4c29a77ff3be55"
x-frame-options: DENY
content-length: 1488

Template

Here the template for the curl request for you to try to use the ETAG functionality

curl -H "Accept: application/json" -H 'If-None-Match: <sample-etag-with-quotes>' -i https://www.ebi.ac.uk/biosamples/samples/<sample-accession>