Read domain REST submission tutorial

This is a step by step tutorial for submitting, updating and publishing read domain sequence data into the European Nucleotide Archive (ENA) using the ENA REST submission tool. This tutorial has the following parts:

During training events you will be provided with a piece of paper containing a single word. This contains your tokenName used in this tutorial. Please substitute all occurences of tokenName with this word.

Part 1. Create a new submission using REST API

In this task you will do a new submission, i.e. you will submit all data and metadata files at the same time. One submission.xml is required for submiting the study.xml, sample.xml, experiment.xml and run.xml files at the same time.

Files needed for this task:

All the files needed for this tutorials can be found (SRA.zip) here

Data file:

  • ~/SRA/tokenName/FILES/tokenName.bam

Metadata files:

  • ~/SRA/tokenName/XML/simple/experiment.xml
  • ~/SRA/tokenName/XML/simple/run.xml
  • ~/SRA/tokenName/XML/simple/sample.xml
  • ~/SRA/tokenName/XML/simple/study.xml
  • ~/SRA/tokenName/XML/simple/submission.xml

Step 1: Calculate the MD5 checksum for the BAM file

The integrity of the file transfer is checked by computing a MD5 checksum of the BAM file before uploading. The MD5 checksum must be stored in a file named tokenName.bam.md5. Please remember to substitute the tokenName with the word provided to you in the piece of paper.

Please open a terminal session to calculate the MD5 checksum of the BAM file using the md5sum command and to store the checksum in tokenName.bam.md5 file (see Figure 1). Please do not close the terminal session before proceeding to step 2.

  1. Change the working directory:
    cd  ~/SRA/tokenName/FILES/
  2. Calculate the MD5 checksum:
    md5sum   tokenName.bam > tokenName.bam.md5 

Figure 1. Calculate the MD5 checksum for the BAM file
md5sum

Step 2: Upload the BAM file and MD5 checksum file using ftp

Please use a terminal session to upload the BAM file and the MD5 checksum file to your private data drop box at ENA. Please use the same terminal session as in step 1.

  1. Start ftp session:
    ftp webin.ebi.ac.uk
  2. Please provide  Webin-30 as the username. Please note that the password will be provided to you during the training session.
    Name: Webin-30
    Password:
  3. Upload your BAM file and MD5 checksum file:
    mput tokenName.bam*
  4. End your ftp session:
    bye

Step 3: Submit the metadata and data files using the ENA REST API

  1. In your terminal session, change to the submission directory
    cd ~/SRA/tokenName/XML/simple
  2. List the directory contents to see the 6 XML files available: submission.XML, study.XML, sample.XML, experiment.XML, run.XML and analysis.XML. Note analysis.XML is not used in any steps it is there as a template.
    ls -la
  3. Run the following CURL command in the terminal to validate or submit the metadata and data using REST API, replacing PASSWORD with the password you have been given. 
    curl -k -F "SUBMISSION=@submission.xml" -F "STUDY=@study.xml" -F "SAMPLE=@sample.xml" -F "EXPERIMENT=@experiment.xml" -F "RUN=@run.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/?auth=ENA%20Webin-30%20PASSWORD"
  4. On a successful submission you will get a receipt (XML) which will state if the submission was successful (‘Success: true’) or with errors if the submission failed (‘Success: false’) . If you have any errors please correct it and run the previous step again. Because the submission.xml file contains the VALIDATE action, the metadata files will only be validated but not submitted yet:
    <ACTIONS>
      <ACTION>
        <VALIDATE source="study.xml" schema="study"/>
      </ACTION>
      <ACTION>
        <VALIDATE source="sample.xml" schema="sample"/>
      </ACTION>
      <ACTION>
         <VALIDATE source="experiment.xml" schema="experiment"/>
      </ACTION>
      <ACTION>
         <VALIDATE source="run.xml" schema="run"/>
      </ACTION>
    </ACTIONS>
  5. After you get ‘Success: true’ change the word VALIDATE to ADD in the submission.xml file to submit the data and metadata files:
    <ACTIONS>
      <ACTION>
        <ADD source="study.xml" schema="study"/>
      </ACTION>
      <ACTION>
        <ADD source="sample.xml" schema="sample"/>
      </ACTION>
      <ACTION>
        <ADD source="experiment.xml" schema="experiment"/>
      </ACTION>
      <ACTION>
        <ADD source="run.xml" schema="run"/>
      </ACTION>
    
  6. After making changes in submission.XML  and saving the file. Please run the CURL command from step 3.
  7. You should now get accession numbers assigned to each metadata object, if you receive a receipt with success = true.

Please note down the single assigned sample accession (It starts with ERS prefix). This will be needed for Part 3.

Please note the following part in the submission.xml file:

<ACTION>
    <HOLD HoldUntilDate="2015-03-07Z"/>           
</ACTION>

This means that the submission will not be made public until the 7th of March 2015. For instructions on how to publish a submission please see Part 3.

Part 2. Update a sample using REST API

In this task you will update (MODIFY) a previously submitted sample, in the previous task.

Files needed for this task: (Please remember to change the directory from simple to update)

~/SRA/tokenName/XML/update/sample.xml

~/SRA/tokenName/XML/update/submission.xml

Note: Step 1 and Step 2 has been done for you.

Step 1: Modify the sample.xml file

Add a common name to the sample.xml file in ~/SRA/tokenName/XML/update

<SAMPLE alias="sampleAlias_tokenName" center_name="EBI">
     <TITLE>DNA extracted from a pale-green scale found in the debries of the Roswell UFO crush in 1947.</TITLE>
    <SAMPLE_NAME>
        <TAXON_ID>123456</TAXON_ID>
        <SCIENTIFIC_NAME>Big-eyed green folks</SCIENTIFIC_NAME>
        <COMMON_NAME>Alien</COMMON_NAME>
    </SAMPLE_NAME>
    <DESCRIPTION>DNA extracted from a pale-green scale found in the debries of the Roswell UFO crush in 1947.</DESCRIPTION>
</SAMPLE>

Step 2: Modify the submission.xml file

Edit submission.xml in ~/SRA/tokenName/XML/update/ to have the MODIFY action below:

<ACTION>
 <MODIFY source="sample.xml" schema="sample" />
</ACTION>

The MODIFY action will modify the sample provided in the sample.xml.

Note: You can't mix some ACTION in submission.XML. For eg: ADD is not allowed to use with MODIFY.

Step 3: Submit the sample.xml and submission.xml files using the ENA REST API

  1. Change to the update directory
    cd ~/SRA/tokenName/XML/update
  2. List the directory contents to see the 2 XML files: submission.XML and sample.XML
    ls -la
  3. Run the following CURL command in the terminal to update the sample data using REST API, replacing PASSWORD with the password you have been given
    curl -k -F "SUBMISSION=@submission.xml" -F "SAMPLE=@sample.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/?auth=ENA%20Webin-30%20PASSWORD"
  4. On a successful submission you will get a receipt (XML) which will state if the submission was successful (‘Success: true’) or with errors if the submission failed (‘Success: false’). If you have any errors please correct it and run the CURL command from step 3 again. If you receive a receipt with success = true it means that you updated the data.

Part 3. Publish a sample using SRA REST API

In this task you will publish a sample that was originally submitted to be published until a later date, in the Part 1.

Files needed for this task:(Please remember to change the directory from update to publish)

~/SRA/tokenName/XML/publish/submission.xml

Step 1: Modify the submission.xml file

To publish the sample submitted in Part 1 please enter the sample accession number (Starts with ERS) as the value of the target attribute in the submission.xml RELEASE action:

<ACTION>
 <RELEASE target=”insert sample accession here”/>
</ACTION>

Step 2: Submit the submission.xml using the REST API

  1. Change to the publish directory
    cd ~/SRA/tokenName/XML/publish
  2. List the directory contents to see a XML file available: submission.XML
    ls -la
  3. Run the following CURL command to publish the sample data using REST API, replacing PASSWORD with the password you have been given
    curl -k -F "SUBMISSION=@submission.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/?auth=ENA%20webin-30%20PASSWORD"
  4. On a successful submission you should get a receipt (XML) and if its success(success=true) or with errors if fails (success=false) . If you have any errors please correct it and run the previous CURL command again. If you receive a receipt with success = true it means that you have published the data.

Part 4. Programmatic submission using curl

All REST submission functionality can be accessed programmatically using curl. Again you can acheive the same function using ENA REST from https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/

You can use as a username the "Webin-NNN" style account id, or your email adress. Older accounts also have the "era-drop-NNN" style account id which will work as well.

This URL can be used in CURL to access and autheticate against the Read Domain REST API service.

Replace "Webin-30" with your own username/email address in the curl command given below. Replace "PASSWORD" with your actual password. Older account owners (era-drop-NNN) can use their original ftp password. Again the URL provided in the example is for the Development server.

curl -F "SUBMISSION=@submission.xml" -F "STUDY=@study.xml" -F "SAMPLE=@sample.xml" -F "EXPERIMENT=@experiment.xml" -F "RUN=@run.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/?auth=ENA%20Webin-30%20PASSWORD"

 The -F option instructs the Read Domain REST API service to process one or more XML files. Allowed XML file types are: SUBMISSION (mandatory), SAMPLE, STUDY, EXPERIMENT, RUN, ANALYSIS, DAC (for EGA submissions only), POLICY (for EGA submissions only), DATASET (foe EGA submissions only) and PROJECT.

You will receive a receipt XML informing on the success of the submission with other information including assigned accession numbers.

Note that this URL contains "wwwdev" which is our dev server. And if you use our TEST server "www-test" all changes will get overwritten over a 24 hour cycle. The production server server url starts with "www".

Latest ENA news

01 Jul 2015: ENA release 124
Release 124 of ENA's assembled/annotated sequences now available

20 Jun 2015: Sample Checklist Updates - June 2015
ENA are planning to update several sample metadata reporting checklists. Some of these changes have been carried out for harmonisation of attributes/fields between various checklist. Other changes were made to allow a standardised missing/null value reporting. All changes will come into effect as of 3rd August 2015.

03 Jun 2015: Changes to read data submission services 1st of October 2015
ENA will make a number of changes to submission services for raw sequence read data on first of October 2015. We continue to track an ever evolving landscape of available and preferred formats and introduce these changes with a view to overall simplification of the submission system to allow us to provide a more efficient service with faster turnaround.