Downloading files and datasets

 

1. About Downloading data from the EGA

2. Using the EGA download client (EgaDemoClient)

3 Using the API  (API)

1. About Downloading from the EGA

 

EGA data access is primarily provided via a REST API. For added convenience there are Java applications available to wrap around this API and to provide programmatic, or command interface access to EGA data.

The workflow for data download remains the same regardless of the method you choose to download your data.  Individual files or whole datasets may be downloaded by first placing a download request and then downloading the file/s associated with the request.  

All files are automatically encrypted prior to streaming and must be decrypted using the streamer after download is complete. 

The download request or "ticket" is the only information required to start downloading from the EGA. Our servers compare the IP address of the request and the IP address of the download, when intiated, as a security measure to ensure that data is only downloaded by the original authorised requestor.

Upon successful download, the download ticket is removed. The download ticket is also removed automatically after 7 days if a download has not been initiated for the given ticket.

The data arrives on the user's system as encrypted .cip suffixed files and should be decrypted using the key specified in the original download request.

Data download can be carried out through direct access to the API or using the API Wrapper Java Class.

For added convenience, we also provide an EgaDemoClient to interact with the service API without the need to write programs.  The JAVA client should fulfil the requirements for most users providing the following functionality:

·       Interactive shell to list/request/download/decrypt data

·       Command line for the same functionality (list/request/download/decrypt data)

·       EGA Globus Online Interaction (BETA)

·       FUSE Layer to access downloaded files without having to decrypt them (BETA)

 

2.0 Using the EGA download client (EgaDemoClient)

Click here for the list of datasets NOT currently available in the download client: contact ega-helpdesk@ebi.ac.uk to request an Aspera download account for these datasets.

As the name implies, the EgaDemoClient provides a “Demo” or "reference implementation" of what can be done in terms of the interaction with the REST API. Refer to the API or API Wrapper Java Class documentation to explore the full functionality of downloading from the EGA.

The EgaDemoClient application can be used as an interactive shell or direct command line client.  Both methods provide similar functionality, but you may only use the FUSE layer option from the command line.

 

The client application, on request, can create a small local SQLite cache database where some information is cached. The contents of this local database can be viewed using any SQLite viewer.

The purpose of this caching database is to keep track of requested data, specifically for download requests. Because some data sets contain links to files which are still pending (which are not yet archived in the EGA archive, and therefore cannot be properly requested or downloaded yet) a request may not contain all files that are part of the data set. By using the demo client to make requests, it will become possible to track when these pending files will become available, and easily request any additional files.


2.1 REQUIREMENTS

This application requires Java 1.7+ and Java must be allowed to access the Internet in your firewall, on standard ports 80 (http) and 443 (https). For UDT there must also be UDP port 80 open.

Client load balancer is on on ‘ega.ebi.ac.uk’ which resolves to IP address 193.62.192.14

2.11 TROUBLESHOOTING

There are several options using the interactive shell to explore how to better use the download system.

EGA > tutorial

EGA > instructions

This command performs a series of short downloads using the TCP and the UDT protocol to provide an indication which protocol is expected to work faster from your location, and what the maximum bandwidth can be expected to be:

EGA > testbestdownload

This command performs a series of medium-sized downloads to determine the combined bandwidth maxiumum to be expected using different numbers of parallel download streams. The default is to use 3 parallel streams, that number can be specified as well. This test works both using TCP and UDT settings (command "udt on"/"udt off").

EGA > testbandwidth

EGA > testbandwidth 7

More parallel streams don't always equal higher total throughput! Increasing parallel streams works best if your expected data transfer rate for one individual stream is low. UDT is also not always faster than TCP. Good connections actually tend to perform better using TCP, regardless of distance.

The command line offers a debug command that will check the most common issues arising with using the client. It uses the '-debug' option with user name and password (assume: user name = demo@test.org, password = 123pass):

java -jar EgaDemoClient.jar -debug demo@test.org 123pass

This command will start by creating a simple socket connection to "http://www.google.com" as well as "https://www.google.com" to ensure that Java has access to the Internet on your system (some firewalls prevent this). It then resolves the EGA host name "ega.ebi.ac.uk" to an IP address and tries to ping our servers, to verify that you have access to our API from your system. If that is successful then a login is attempted, to verify that your user name and password are correct and active. Finally, a set of short data transfers are performed, to verify that you can download data to your system, using TCP and UDT data transfer protocols.

 

2.2 DOWNLOAD THE CLIENT

Version 2.2.2

 

2.3 Using interactive shell

See troubleshooting if you have problems using the client.

The interactive shell is started by running the command "java -jar EgaDemoClient.jar". This opens the EGA shell:

Welcome to the EGA Secure Data Shell Demo.
Type 'help' for help, and 'exit' to quit.
Ega Demo Download Client  Version: 1.0.4
   With Ega Download Agent  Version: 0.5 BETA

EGA >

Typing "help" will display all commands available.

The first step will always be to log in (assume: user name = demo@test.org, password = 123pass):

EGA > login demo@test.org 

Password>123pass

Using Cache DB at: /home/demo/demo_demo@test.org_ega_db.sqlite
Creating timer task to update local Cache DB asynchronously every 30 min.
Updating local Cache DB..... in Progress
Login Success!

The time required to update the local database varies from time to time; it will print "Updating local Cache DB..... Done!" when the update is complete.

Upon receiving the "Login Success!" message you can now use all the commands listed with the "help" command.

 

2.3.1 Making a request to download all files in a dataset

You can list all datasets (e.g. EGA > datasets) to which you have access, as well as all files in that dataset (e.g. EGA > files dataset EGAD00010000498). Once you identified the dataset you wish to download, it is time to request it. Requests require 4 parts:

(1) the type of the request: "dataset"

(2) The ID: the dataset ID

(3) the encryption key for received data. And

(4) a label - this label is later used to download the requested data. You should pick a label by which you can identify your request.

For example:

EGA > request dataset EGAD00010000498 abc request_EGAD00010000498
Requesting.... (This may take longer if there are pending files in the request)
Resulting Request:
  request_EGAD00010000498 (19 file requests).

In this request all files in dataset EGAD00010000498 are requested. The data is going to be encrypted with the key "abc". And the request label is "request_EGAD00010000498". The request resulted in 19 individual files to be requested: there are now 19 resource URLs available to download this dataset.

If the requested dataset contains pending files (see above) then a request may look like this:

EGA > request dataset EGAD00010000650 mypass request_EGAD00010000650
Requesting.... (This may take longer if there are pending files in the request)
This request contains 1216 Pending files!
Resulting Request:
  request_EGAD00010000650 (18 file requests). 

In this request the dataset contains 1234 files, but only 18 are in the EGA archive.

 

2.3.2 Making a request to download individual files in a dataset

First, identify the files in your dataset.

EGA > files dataset EGAD00010000498

Files in EGAD00010000498:

  /PROSTATE_SNP6/PD7445a.CEL.gpg 29898719 EGAF00000278296

  /PROSTATE_SNP6/PD7445b.CEL.gpg 30275814 EGAF00000584909

  /PROSTATE_SNP6/PD7445c.CEL.gpg 29571494 EGAF00000584901

  /PROSTATE_SNP6/PD7445d.CEL.gpg 31040185 EGAF00000584899

  /PROSTATE_SNP6/PD7445e.CEL.gpg 30153169 EGAF00000584902

  /PROSTATE_SNP6/PD7445f.CEL.gpg 29735350 EGAF00000584903

  /PROSTATE_SNP6/PD7446a.CEL.gpg 29336337 EGAF00000584905

  /PROSTATE_SNP6/PD7446b.CEL.gpg 28165811 EGAF00000584904

  /PROSTATE_SNP6/PD7446c.CEL.gpg 30383508 EGAF00000584900

  /PROSTATE_SNP6/PD7446d.CEL.gpg 31141416 EGAF00000584910

  /PROSTATE_SNP6/PD7446e.CEL.gpg 29599271 EGAF00000584897

  /PROSTATE_SNP6/PD7446f.CEL.gpg 30756385 EGAF00000584907

  /PROSTATE_SNP6/PD7446g.CEL.gpg 30608708 EGAF00000584908

  /PROSTATE_SNP6/PD7447a.CEL.gpg 29608192 EGAF00000584896

  /PROSTATE_SNP6/PD7447b.CEL.gpg 28497750 EGAF00000584912

  /PROSTATE_SNP6/PD7447c.CEL.gpg 28999141 EGAF00000584911

  /PROSTATE_SNP6/PD7447d.CEL.gpg 28749723 EGAF00000584898

  /PROSTATE_SNP6/PD7447e.CEL.gpg 30863898 EGAF00000584906

 

Then make a request to download the file using the file accession (EGAF).

 EGA > request file EGAF00000278296 abc file_request

Requesting.... (This may take longer if there are pending files in the request)

Resulting Request:

  file_request (1 file requests).

In this request the file EGAF00000278296 is requested.  The file will be encrypted using the encryption key "abc" and the request is given the label "file_request " 

 

2.3.3 Displaying current Requests

If you want to know the status of your requests, there are several options: "requests", "allrequests", and "overview":

Using command "requests" lists all current requests that match the IP address you logged in from. It lists the request labels, along with the number of available files for download:

EGA > requests
Current Requests:
  request_EGAD00010000498 19
  request_EGAD00010000650 18

Requests with Pending Files:
  request_EGAD00010000650 1216

Using command "overview" combines 'allrequests' with some general comments, and it also updates the local database to check is any of the pending file have become available since the request:

EGA > overview
Your login IP is: 55.66.777.88

Current Requests from all Sources, with IP address at time of request:
  11.22.333.444_tst 1
  11.22.333.444_longtest 5889
  22.33.44.55_test 19
  33.44.55.66_testlong1 5617
  55.66.777.88_request_EGAD00010000498 19
  55.66.777.88_request_EGAD00010000650 18

From your current IP you may download these requests:
 request_EGAD00010000498
 request_EGAD00010000650

You must 'localize' these requests before you can download them (see 'help'):
 tst
 longtest
 test
 testlong1

Updating cache, please wait...

The command "localize" can be used to change the IP address of the request to the current login IP, to enable download of that request on the local system.

 

2.3.4 Downloading a Request

Requests are downloaded by default to the current path. That can be changed by using the command "path" to set a new path. Command "pwd" displays the current path.

The request itself is then downloaded using the "download" command, for example:

 

EGA > download request_EGAD00010000650 

The default is to download three parallel streams. The number of streams can be adjusted (15 max) by specifying a number, for example:

EGA > download request_EGAD00010000650 7

This will download the request in 7 parallel streams.

 

2.3.5 Downloading dataset metadata

Launching this command initiates the download of a dataset tar ball that contains all xmls associated with the dataset, including details of the study, samples,

experiments, runs and analysis.

Mapping files are also provided, enabling you to link sample to files. 

 

EGA > downloadmetadata <dataset>

 

2.3.6 Decrypt downloaded files

Once data has been successfully downloaded it can be decrypted using the demo client:

EGA > decrypt <filename> <key>

This will decrypt the file specified using {key} as the decryption key. Upon decryption the encrypted file is deleted.  In case of the ‘decryptkeep’ command the encrypted file is not deleted:

EGA > decryptkeep <filename> <key>

 

2.4 Using direct command mode 

Click here for the list of datasets NOT currently available in the download client: contact ega-helpdesk@ebi.ac.uk to request an Aspera download account for these datasets.

See troubleshooting if you have problems using the client.

All of the interactive shell functions can be accessed using the command line. The command line is run by specifying the parameter '-p' at startup, followed by user name and password. (the order of the actual commands following the "-p username password" is not important) To list the help section for the command line:

java -jar EgaDemoClient.jar -p demo@test.org 123pass -help

(assume: user name = demo@test.org, password = 123pass)

The command line also allows to specify a file that contains the username and password (1st line username, 2nd line password). To start the client with such a file (e.g. "login.txt"), use parameter '-pf':

java -jar EgaDemoClient.jar -pf  /home/demo/ega/login.txt -help 

Example - Listing files in a dataset:

java -jar EgaDemoClient.jar -p demo@test.org 123pass -lfd EGAD00010000498

 

Example - Requesting a dataset:

java -jar EgaDemoClient.jar -p demo@test.org 123pass -rfd EGAD00010000498 -re abc -label request_EGAD00010000498

 

Example - Requesting a file:

java -jar EgaDemoClient.jar -p demo@test.org 123pass -rf EGAF00000584907 -re abc -label request_ EGAF00000584907

 

Example - Listing Requests:

java -jar EgaDemoClient.jar -p demo@test.org 123pass -lr

 

Example - Downloading Request, using the optional parameter '-nt' to specify using 7 parallel streams:

java -jar EgaDemoClient.jar -p demo@test.org 123pass -dr request_EGAD00010000498 -nt 7 

 

2.4.1 Decrypt downloaded files

java -jar EgaDemoClient.jar -p <username> <password> -dc <path to file1> -dck <decryption_key>


e.g. java -jar EgaDemoClient.jar -p name@ebi.ac.uk password -dc /Users/my_downloads/_ega-box-03_Ca9-22.cel.cip -dck test

Multiple files can be listed after the -dc switch.

 

2.5 Using the Fuse Layer 

This function is only available using the command line. The FUSE layer allows a directory of encrypted *.cip files to be mounted in an empty directory, where they can be accessed as unencrypted files. This allows for encrypted files to be used directly, without having to be decrypted first. This function is accessible with the ‘-fuse’ option. 

 

At the moment this required permission to ‘sudo’ (or to be root) to work. The target directory then is accessible to every user.

 

sudo java -jar EgaDemoClient.jar -fuse <source> <target> <password>

 

This command scans the source directory. Cip files are wrapped in an access layer to perform on-demand random-access decryption, and the ‘.cip’ extension is removed from the virtual file. All other files are mounted directly. All .cip files are assumed to be encrypted with the same password/key.

 

Example (making the content of /tmp/download/ available in /tmp/mnt/):

 

sudo java -jar EgaDemoClient.jar -fuse /tmp/download/ /tmp/mnt/ dipassword776

 

It is important to supply the terminating “/” when specifying directories. The target directory must be an empty directory. At the moment subdirectories are ignored. And the source directory is scanned only once, upon start-up.

  

3.0 Using the API

Click here for the list of datasets NOT currently available in the download client: contact ega-helpdesk@ebi.ac.uk to request an Aspera download account for these datasets.

See troubleshooting if you have problems using the client.

Base URLs for the API V2.0 :

 

                  https://ega.ebi.ac.uk/ega/rest/access/v2

                  http://ega.ebi.ac.uk/ega/rest/download/v2

 

There are two base URL: Access to view data and to make requests is HTTPS secured. Once a request has been made, the encrypted data stream is downloaded via HTTP (unsecured) to avoid multiple and unnecessary encryption overhead.

The EGA Download REST API performs all user interaction on an SSL secured HTTPS connection (port 443). For better performance, all data transfer operations are performed on a plain HTTP connection (port 80). All data is encrypted before transmission, so sending that data stream on a standard connection does not pose a security risk. According to REST conventions, downloads are provided as a two-step process: Step one creates a REST resource (a URL), step 2 uses that resource to download the requested data. Upon successful completion of the download (or after 7 days) the download resource URL will be removed again.

A request to download data includes providing user credentials so that the system can verify access rights. In this step the user specifies the data to be downloaded as well as an encryption key, which is used to initialize the outgoing encrypted stream. This ensures that the data arrives on the user's system encrypted with a key provided by the user. This information is combined with the user IP address to create a REST resource URL. The resource is identified by a "ticket", which is sent to the user in response to a download request.

The download ticket (the corresponding resource URL) is the only information required to start downloading the requested data. The EGA server matches the IP address of the request and the IP address of the download to ensure that data can only be downloaded by the requestor. The response to accessing the download URL is a binary stream, which is stored on the user's system. Upon successful download, that download resource is removed again. The download resource also is removed automatically after 7 days.

The API returns primarily simple JSON Arrays (containing a list of response items), occasionally JSON Objects (containing key-value pairs) as response. Download URLs return a binary stream as response.

 

3.1 Using the API to log in

Successful login produces a session id/token, which must be provided in subsequent REST calls. The session token times out and becomes invalid after 10 minutes of inactivity. There are three ways to log in via the API:

(1) submitting a named form,

(2) Basic Authentication,

(3) via URL + Parameter.

 

Examples using “testuser@ebi.ac.uk” as username and “testpassword” for that user’s password:

1) Login by submitting a form named "loginrequest" with URLEncoded fields for "username" and "password":

curl -k -X POST -F loginrequest='{"username":"testuser%40ebi.ac.uk","password":"testpassword"'} -H "Accept: application/json" https://ega.ebi.ac.uk/ega/rest/access/v2/users/login

 

2) Login with Basic Authentication (using base 64 encoded credentials “testuser%40ebi.ac.uk:testpassword”):

 curl -k -H "Accept: application/json" -H "Authorization: Basic dGVzdHVzZXIlNDBlYmkuYWMudWs6dGVzdHBhc3N3b3Jk" https://ega.ebi.ac.uk/ega/rest/access/v2/users/login

 

3) Login via URL + Parameter; the username is part of the URL, the password is passed as URL parameter:

 curl -k -H "Accept: application/json" https://ega.ebi.ac.uk/ega/rest/access/v2/users/ testuser%40ebi.ac.uk?pass=testpassword

 

Any of these three calls returns a JSONArray with two elements, in case of success, and with one element in case of failure:

{"header":{"apiVersion":"v2","code":"200","docLink":"http://www.ebi.ac.uk/ega","errorCode":"200","errorStack":"","service":"access","technicalMessage":"","userMessage":"OK"},"response":{"numTotalResults":1,"result":["success","b195b0c5-b574-43f2-9910-37d5853826ba"],"resultType":"us.monoid.json.JSONArray"}}

 

The interesting part is in JSON object "response": "response":{"numTotalResults":1,"result":["success","b195b0c5-b574-43f2-9910-37d5853826ba"],"resultType":"us.monoid.json.JSONArray"} and this contains the JSON array "result":

"result":["success","b195b0c5-b574-43f2-9910-37d5853826ba"] -- the first element is "success"/"false". And if it is "success" then there is a second element, which contains the session token. In future REST calls this token is added to the REST URL as parameter '?session=b195b0c5-b574-43f2-9910-37d5853826ba'.

In case of failure the array in element "result" contains a single failure error message.

 

3.1.1 Logging out

This is not required, as a session times out after 10 minutes anyway. But it is cleaner to explicitly end a session when API interaction has completed. This is done with a REST call to ‘logout’:

curl -k -H "Accept: application/json" https://ega.ebi.ac.uk/ega/rest/access/v2/users/logout?session=b195b0c5-b...

 

This should return:

{"header":{"apiVersion":"v2","code":"200","docLink":"http://www.ebi.ac.uk/ega","errorCode":"200","errorStack":"","service":"access","technicalMessage":"","userMessage":"OK"},"response":{"numTotalResults":1,"result":["logged out"],"resultType":"us.monoid.json.JSONArray"}}

The important part is the message "logged out", indicating a successful logout. Sessions expire after 10 minutes of inactivity.

  

3.2 Listing all authorised Datasets

Authorisation information is available for a valid user session. A call to ‘datasets’ lists all authorized dataset for the user:

curl -k -H "Accept: application/json" https://ega.ebi.ac.uk/ega/rest/access/v2/datasets?session=b195b0c5-b574-...

 

The result in the "result" element is a JSONArray of Strings, containing authorized Dataset IDs.

 

3.2.1 Listing files in an authorized Dataset

A call to ‘files’ requires specification of an authorized dataset_stable_id in the URL (otherwise no results are returned):

curl -k -H "Accept: application/json" https://ega.ebi.ac.uk/ega/rest/access/v2/datasets/{dataset}/files?session=b195b0c5-b574-43f2-9910-37d5853826ba

 

Where {dataset} is the dataset_stable_id of a dataset.

 

The result in the "result" element is a JSONArray of EgaTicket objects, containing file information about all available and pending files in the specified authorized Dataset IDs.

 

Example:

 

curl -k -H "Accept: application/json" https://ega.ebi.ac.uk/ega/rest/access/v2/datasets/EGAD00010000805/files?... b195b0c5-b574-43f2-9910-37d5853826ba

This produces an ArrayList of EgaFile JSON objects, in this case containing 12 elements (only one is shown in this example):

{"header":{"apiVersion":"v2","code":"200","docLink":"http://www.ebi.ac.uk/ega","errorCode":"200","errorStack":"","service":"access","technicalMessage":"","userMessage":"OK"},

"response":{"numTotalResults":12,"result":[{"fileDataset":"EGAD00010000805","fileID":"EGAF00000867414","fileIndex":"keke.txt","fileMD5":"TODO: MD5","fileName":"/arrays/331-01-3TD.CEL.cip","fileSize":"69084299","fileStatus":"available"}, […]  ],"resultType":"us.monoid.json.JSONArray"}}

 

The EgaFile object contains information about one file:

{

"fileDataset":"EGAD00010000805",

"fileID":"EGAF00000867414",

"fileIndex":"keke.txt",

"fileMD5":"TODO: MD5",

"fileName":"/arrays/331-01-3TD.CEL.cip",

"fileSize":"69084299",

"fileStatus":"available"

}

 

(The fileIndex and fileMD5 fields refer to future functionality).

 

 

3.2.2 Listing Requests (i.e. listing all current request tickets)

A call to ‘requests’ lists all requests containing files that have not been downloaded yet.

curl -k -H "Accept: application/json" https://ega.ebi.ac.uk/ega/rest/access/v2/requests?session=b195b0c5-b574-...

 

This produces an ArrayList of EgaTicket JSON objects, listing all information about currently requested files (this can be a very long list):

{"header":{"apiVersion":"v2","code":"200","docLink":"http://www.ebi.ac.uk/ega","errorCode":"200","errorStack":"","service":"access","technicalMessag

e":"","userMessage":"OK"},"response":{"numTotalResults":2413,"result":[{"encryptionKey":"","fileID":"EGAF00000098885","fileName":"/WTCCC2_PE/raw/a520532-00-791321-072009-4059643-01970.CEL.gz.gpg ","fileSize

":"3693067150","fileType":"EBI","label":"ltest3","ticket":"0ab20948-cba2-48a7-baf7-da871b738665","transferTarget":"","transferType":"","user":"asenf

@ebi.ac.uk"}, [...] ],"resultType":"us.monoid.json.JSONObject"}}

 

The EgaTicket object contains information about one requested ticket:

 

{

"encryptionKey":"",

"fileID":"EGAF00000098885",

"fileName":"/WTCCC2_PE/raw/a520532-00-791321-072009-4059643-01970.CEL.gz.gpg ",

"fileSize":"3693067150",

"fileType":"EBI",

"label":"ltest3",

"ticket":"0ab20948-cba2-48a7-baf7-da871b738665",

"transferTarget":"",

"transferType":"",

"user":"testuser@ebi.ac.uk"

}

 

The encryption Key is always empty, for security reasons. transferTarget and transferType are also empty and refer to future functionality. The ticket is used to form the download URL.

 

3.2.3 Listing tickets in one Request (which contains information about the requested files)

 

Specifying a request in the URL lists all the individual request tickets (which are part of the download URL) in that request:

 

curl -k -H "Accept: application/json" https://ega.ebi.ac.uk/ega/rest/access/v2/requests/{requestlabel}?session=b195b0c5-b574-43f2-9910-37d5853826ba

 

Where {requestlabel} is the label of one user request.

 

The result is a the same format as in the previous section, but only tickets are included where the ‘label’ matches the provided {requestlabel}.

 

 

3.3 Deleting a Request

A call to ‘delete’ removes the specified request from the server:

curl -k -H "Accept: application/json" https://ega.ebi.ac.uk/ega/rest/access/v2/requests/delete/{requestlabel}?session=b195b0c5-b574-43f2-9910-37d5853826ba

 

Where {requestlabel} is the label of one user request.

 

 

3.3.1 Deleting a Request Ticket

If one specific request ticket is specified in the delete call that one request ticket is deleted from the server:

curl -k -H "Accept: application/json" https://ega.ebi.ac.uk/ega/rest/access/v2/requests/delete/{requestlabel}/{ticket}?session=b195b0c5-b574-43f2-9910-37d5853826ba

 

Where {requestlabel} is the label of a user request, and {ticket} is the uuid of a ticket in that request.

 

3.4 Making a Dataset Request

Before any data can be downloaded that data has to be requested. This creates download links for all specified files (for example, all files in a dataset) and deposits the encryption key to be used for this data on the server. At the moment the “downloadType” must always be “STREAM”:

 

curl -k -X POST -F downloadrequest='{"rekey":"{user_re_encryption_key}","downloadType":"STREAM","descriptor":"{requestlabel}"'} -H "Accept: application/json" https://ega.ebi.ac.uk/ega/rest/access/v2/requests/new/datasets/{datasetid}?session=b195b0c5-b574-43f2-9910-37d5853826ba

 

Where {datasetid} id a dataset_stable_id. The “descriptor” is the label by which the request is listed later, and by which it can be downloaded

(later also referred to as ‘request’ or ‘request label’).

 

3.5 Making a File Request

 

Individual files can be requested by specifying the file stable ID in the request URL:

 

curl -k -X POST -F downloadrequest='{"rekey":"{user_re_encryption_key}","downloadType":"STREAM","descriptor":"{requestlabel}"'} -H "Accept: application/json" https://ega.ebi.ac.uk/ega/rest/access/v2/requests/new/files/{fileid}?session=b195b0c5-b574-43f2-9910-37d5853826ba

 

Where {fileid} id a file_stable_id.

 

3.6 Downloading a Ticket

Each requested file has its own download URL, identified by the request ticket for that file.

Downloading data in a request essentially means to stream each download URL associated with that request. Note that this is plain HTTP:

curl -H "Accept: application/octet-stream" http://ega.ebi.ac.uk/ega/rest/ds/v2/downloads/{downloadticket}

This produces a binary data stream, which is the file specified by the ticket, encrypted using the password specified at the time the request was made.

 

3.7 Verifying a Download

Downloads are verified at each stage by MD5 checksums. If a file has been successfully streamed to completion from the download server,the MD5 of the stream that was sent is stored temporarily.

Calling the ‘results’ URL for a download ticket, after the download is complete, retrieves that MD5 (and optionally the local MD5 can be submitted).

This way the MD5 of the received data file can be verified:

curl -H "Accept: application/json" http://ega.ebi.ac.uk/ega/rest/ds/v2/results/{downloadticket}[?md5={local_md5}]

If the local MD5 is provided in the request (optional), then the server will also know if the download was correct.

The result (in the "result" element) is a JSONArray containing the server MD5 and the size of the file that was sent.

 

3.8 Full API Overview

 

3.8.1 Access - https://ega.ebi.ac.uk/ega/rest/access/v2

POST /users/login Preferred way to log in

["loginrequest":{"username":"{username}","password":"{password}"}]

GET  /users/{user}?pass=<pass> 'user' and 'pass' URLEncoded

GET  /users/login Log in via Basic Auth header

GET  /users/logout?session={id} Log out of specified session

GET  /datasets?session={id} List all authorized datasets

GET  /datasets/{dataset}/files?session={id} List all available/pending files in an authorized dataset

GET  /files/{fileid}?session={id} List details about one authorized file

GET  /requests?session={id} List all Requests

GET  /requests/{requestlabel}?session={id} List specified Request

GET  /requests/ticket/{ticket}?session={id} List details on specified request Ticket

GET  /requests/ticket/delete/{ticket}?session={id} Delete specified request Ticket

GET  /requests/delete/{requestlabel}?session={id} Delete specified Request

GET  /requests/delete/{requestlabel}/{ticket}?session={id} Delete specified request Ticket

POST /requests/new/datasets/{datasetid}?session={id} New Request: one entire authorized dataset

["downloadrequest":{"rekey":"{user_re_encryption_key}","downloadType":"STREAM","descriptor":"{requestlabel}"'}]

POST /requests/new/files/{fileid}?session={id} New Request: one individual authorized file

["downloadrequest":{"rekey":"{user_re_encryption_key}","downloadType":"STREAM","descriptor":"{requestlabel}"'}]

 

3.8.2 Download - http://ega.ebi.ac.uk/ega/rest/download/v2

GET  /downloads/{downloadticket}   Start downloading a re-encrypted file

GET  /results/{downloadticket}[?md5={local_md5}] Obtain server statistics (size, md5) of the download after completion

GET  /metadata/{dataset} Obtain metadata .tar.gz packet for selected Dataset <is public in the website anyway>

 

4.0 API Wrapper Class

EGA provides a Java API Wrapper class to make it easy to integrate interaction with this API into Java programs.

The functionality of the API is available as functions of this class. In some cases multiple logical API call sequences are combined into class functions.

It also provides value-added functionality for decrypting data and interaction with the new EGA Globus Transfer API.

The class is EgaAPIWrapper and is part of the EgaAPIWrapper.jar package.

 

4.1 Instantiation

This object is instantiated by providing the REST service URLs to be used. Usually that is just “ega.ebi.ac.uk”.

There is an Information Service URL and a Data Service URL, to allow for the option to use two different URLs. The SSL option should always be set to true.

new EgaDBAPIWrapper(“ega.ebi.ac.uk”, “ega.ebi.ac.uk”, true);

 

Optionally, the EGA Globus API server and base URL can also be specified (they are default).

new EgaDBAPIWrapper(“ega.ebi.ac.uk”, “ega.ebi.ac.uk”, true, “EGA-globus-server.ebi.ac.uk:8112”, “/ega/rest/globus/v2”);

 

And there’s also a constructor that logs in to EGA directly:

new EgaDBAPIWrapper(“ega.ebi.ac.uk”, “ega.ebi.ac.uk”, “testuser@ebi.ac.uk”, “testpassword”.toCharArray());

 

Once the object (EgaDBAPIWrapper apiW = new…) is instantiated, one can log in/out:

apiW.login(“testuser@ebi.ac.uk”, “testpassword”.toCharArray());

apiW.logout();

 

4.2 Listing, Requesting, Download

Listing information is fairly self-explanatory:

String[] ds = apiW.listDatasets();

EgaFile[] fs = apiW.listDatasetFiles(“{dataset stable ID}”);

EgaFile f = apiW.listFileInfo(“{file stable ID}”);

EgaTicket[] ts = apiW.listRequests();

EgaTicket t = apiW.listTicketDetails(“{ticket uuid}”);

 

Data can then be requested using this function:

String[] ts = apiW.requestByID(“{id}”,“{type}”,“{key}”,“{desc}”);

Where:

{id} = file stable id, or dataset stable id

{type} = “file” or “dataset”

{key} = encryption key for this request (chosen by the user)

{desc} = the label for this request (chosen by the user)

 

Requests and individual tickets can be deleted:

String[] dr = apiW.delete_request(“{request label}”);

String[] dr = apiW.delete_ticket(“{label}”, “{ticket uuid}”);

 

Tickets refer to individual requested files, and can be downloaded:

String[] dl = apiW.download_tcp_url“{ticket}”, “{filename}”, “”);

Where

{ticket} = the ticket uuid to be downloaded

{filename} = the name used to save the downloaded content

 

An alternative download function is used for optional UDT transfers (can also be used for TCP):

String[] dl = apiW.download_netty”{ticket}”, “{filename}”, “”);

 

Downloading is an aggregate function that performs several individual steps: 

• it creates a file with the given name plus the extension “.egastream”. 

• Then the binary data stream is saved in that file. 

• During the transfer the MD5 of that stream is calculated. 

• Upon (error-free) completion of the transfer the MD5 is compared with the MD5 calculated on the download server. 

• If the checksums match, the “.egastream” extension is dropped to indicate that the download was successful.  If the server can’t be reached for

   MD5 comparison, the “.egastream” remains (download is probably correct). 

• If the checksums are different, the local file is deleted.

 

There are a few options available for the download:

Boolean v = apiW.setSetPath(“{path}”);

String path = apiW.getPath();

Where {path} is an absolute path to store the downloaded files.

 

Setting UDT as data transfer choice:

apiW.setUdt({value});

Boolean v = apiW.getUdt();

Where {value} is Boolean true or false.

 

4.3 Decryption

Downloaded files can be decrypted. A user does not have to be logged in to EGA to decrypt files:

apiW.decrypt(“{key}”,“{dest}”,{List<files>}, 128, {delete});

Where

{key} = the encryption key chosen by the user upon request

{dest} = the filename of the decrypted file

{List<files>} = a list of absolute file paths to be decrypted

{delete} = true/false – delete encrypted files after decryption

The path option also works for decryption.

 

4.4 Globus Online

The Wrapper may also be used to interact with EGA - Globus Online services:

Boolean g = apiW.globusLogin(“{g_user}”,“{g_pas}”.toCharArray());

Where

{g_user}  = your Globus Online username

{g_pas} = your Globus Online password

 

This function calls the Globus Online REST API to authenticate the user directly – user credentials are never sent to EGA.

On successful authentication the Globus Online API will return an OAuth 2.0 token which is passed to EGA to act on the user’s behalf for a data transfer.

To initiate a Globus Online transfer:

String id = apiW.globusStartTransfer(“{request}”,”{endpoint}”);

Where

{request} = a request label

{endpoint} = a GridFTP endpoint where you can authenticate

 

This function then uses your OAuth token to authenticate you in your endpoint, and set up a subdirectory named with the same name as the request label.

Then files are then transferred from the EGA endpoint (“staging”) to your endpoint, in 500GB steps.