Read domain Webin submission tutorial
This is a step by step tutorial for submitting read domain sequence data into the European Nucleotide Archive (ENA)'s interactive Webin submission tool. This tutorial has the following parts:
- Part 1. Upload a BAM file using FTP
- Part 2. Upload a BAM file using Aspera
- Part 3. Login to Webin
- Part 4. Upload a BAM file using WebinDataUploader
- Part 5. Submit data and metadata using Webin
You will be provided with a piece of paper containing a single word. This contains your tokenName used in this tutorial. Please substitute all occurences of tokenName in this tutorial with this word.
Data files must be uploaded to your private data drop box before they can be submitted to ENA using Webin. There are three ways to upload data files to ENA: using FTP, using Aspera or using the WebinDataUploader. We recommend that all Webin submitters would use the WebinDataUploader.
Files needed for this task:
Files created by this task:
Step 1. Calculate the MD5 checksum for the BAM file
The integrity of the file transfer is checked by computing a MD5 checksum of the BAM file before uploading it using FTP. The MD5 checksum must be stored in a file named tokenName.bam.md5. Please remember to substitute the tokenName with the word provided to you in the piece of paper. When using the WebinDataUploader the tokenName.bam.md5 file is created for you as part of the file transfer.
Please open a terminal session to calculate the MD5 checksum of the BAM file using the md5sum command and to store the checksum in tokenName.bam.md5 file (see Figure 1). Please do not close the terminal session before proceeding to step 2.
- Change the working directory:
- Calculate the MD5 checksum:
md5sum tokenName.bam > tokenName.bam.md5
Figure 1. Calculate the MD5 checksum for the BAM file
Step 2: Upload the BAM file and MD5 checksum file using ftp
Please use a terminal session to upload the BAM file and the MD5 checksum file to your private data drop box (see Figure 2). Please use the same terminal session as in step 1.
- Start ftp session:
- Please provide era-drop-30 as the username. Please note that the password will be provided to you during the training session.
Name: era-drop-30 Password:
- Upload your BAM file and MD5 checksum file:
- End your ftp session:
Figure 2. Upload the BAM and MD5 checksum file using ftp
Aspera is a commercial file transfer protocol that provides better transfer speeds than ftp over long distances. For short distance file transfers we recommend the use of ftp.
Please first download the Aspera ascp command line client program from here. Please select the correct operating system. The ascp command line client is distributed as part of the aspera connect high-performance transfer browser plug-in.
Step 1. Calculate the MD5 checksum for the BAM file
Follow the instructions in Part 1 to calculate the MD5 checksum for the BAM file required for integrity checking.
Step 2: Upload the BAM file and the MD5 checksum file using ascp
Please use a terminal session to upload the BAM file and the MD5 checksum file to your private data drop box (see Figure 3). Please use the same terminal session as in step 1.
Your command should look similar to this:
ascp -QT -l300M -L- tokenName.bam* firstname.lastname@example.org:/.
The '-l300M' option sets the upload speed limit to 300MB/s. You may wish to lower this value to increase the reliability of the transfer. The '-L-' option is for printing logs out while transferring.
Figure 3. Upload the BAM file and the MD5 checksum file using ascp
- Use a web browser to go to https://wwwdev.ebi.ac.uk/ena/submit/sra/.
- Login using era-drop-30 as the user and with the password provided to you during the training session.
Once logged into the read domain Webin you will be able to use the WebinDataUploader to upload data files into your private data drop box. Please remember that the data files must be uploaded to the drop box before they can be submitted to ENA using Webin.
- Go to the New Submission tab.
- Click the WebinDataUploader link (near the bottom of the page).
- Accept to download the WebinDataUploader.jnlp file on your computer.
- Launch the WebinDataUploader.jnlp file.
- If a popup window shows up, please confirm that you want to run the application.
- Enter era-drop-30 as your submission account name and use the password provided to you during the training session (see Figure 4).
- Browse into the directory containing your BAM files (~/SRA/tokenName/FILES/). The list of the files contained in the selected directory will be displayed (see Figure 4). Note: After choosing your directory, you will not see the file names from your selected folder in that pop-up window, once you choose that folder click 'open' button then you will see the files listed in the WebinDataUploader panel)
- Select the BAM file to be uploaded: tokenName.bam. You can use the Select All button to select all the files in the directory to be uploaded.
- Click on the Upload button. Your BAM file will be uploaded into your private data drop box (era-drop-30) with a MD5 checksum file generated by the tool.
- Close the WebinDataUploader.
File needed for this task:~/SRA/tokenName/FILES/run_template.tsv
Step 1: Start submission
- Go to New Submission tab.
- Select Submit sequence reads and experiments as the type of submission.
- Click Next >> button.
Step 2: Provide study information
- Create a new study by clicking Create a new study link or button and fill in the following fields:
- Release date: Fill the Release date with the current date. Please note that by default submitted data is released after two months. Submissions will remain confidential until the release date.
- Short name: please use your tokenName. This will be the name of your study.
- Short descriptive title: please provide a short description of the study akin to a publication title.
- Abstract: please provide a detailed description of the study akin to a publication abstract.
- Sample(s) sequenced as part of the study: please select first option.
- Describe genome assembly: please select second option 'no'.
- Provide functional genome annotation: please select second option 'no'.
- PubMed Ids: please leave empty.
- Attributes: please leave empty/blank.
- Click Next >> button. This will validate the study and move you to the next step. Please correct any validation errors.
Step 3: Provide sample information
Create a new sample by filling in the following fields:
- Checklist: please select the checklist (by clicking) that you would like to use for your sample
Optional: For this exercise, please select 'ENA default sample checklist' (at the bottom of the list).
Click Next >> button
Now you are in the checklist information and basic sample information page. On the left side panel you provide the checklist information.
- Select or unselect your attributes of choice (some of the attributes may be already selected for you...)
On the right side panel you provide the basic sample information
- Unique Name Prefix: please use your tokenName. This will be the name of your sample.
- Title : please provide a short description of the sample (e.g. DNA extracted from a pale-green scale found in the debries of the Roswell UFO incident in 1947).
- Description: please provide a detailed description of the sample. This field is optional. Please leave empty/blank.
In Organism Details section, you can search for taxon Id or scientific name or common name
- Search : please provide a NCBI taxonomic identifier (e.g. 9606) or scientific name for the organism (e.g. Homo sapiens) or common name for the organism (e.g. human). The rest of the fields in Organism section will be automatically filled.
Checklist Attribute value forms (please scroll down to see more...)
- Please fill all the checklist values
Click Next >> button
Sample Group Details:
Adding one or many samples (multiple samples can be created here but for this exercise please create one sample)
- On the left hand side, please specify how many sample you would to submit (e.g 3) and click + Add
You can see that your basic sample information have been copied to those number of samples, please edit individual samples and click 'Previous Sample' or 'Next Sample' button on the top section of the right panel.
Click Next >> button
This will validate the sample and move you to the next step. Please correct any validation errors if there exists.
Step 4: Provide experiment/run information in a tsv file
Note: You can skip this step if its already done (data filled) for you.
- Open the run_template.tsv file ( found in ~/SRA/tokenName/FILES/ folder) with a text editor. Please note that the column values must be separated by tabs.
- Enter tokenName in the sample_alias column if this has not already been done for you. This is the name of the sample associated with the BAM file.
After editing the run_template.tsv file it may look like this:
sample_alias instrument_model library_name library_source library_layout insert_size file_name file_md5 tokenName Illumina HiSeq 2000 Roswell GENOMIC SINGLE 76 tokenName.bam
Step 5: Upload the expriment/run information tsv file
- Select your data file format, please make sure you select 'BAM' option(by default CRAM is selected)
- Click Upload / Save Spreadsheet link.
- This will expand and provide you with three options.
- As you got the tsv file we go to the option (2), Upload the run_template.tsv file using Browse...(or Choose...) in the UPLOAD SPREADSHEET section.
This will auto-fill the required fields with the information provided in the tsv file. Otherwise you should see the error message.
Make sure that all the fields are correctly filled. Especially, check that the Sample reference is filled from Sample reference suggestions choice and check that the correct file name is used. Please note that the MD5 checksum field can be left empty if the MD5 checksum file has been uploaded with the BAM file if you used WebinDataUploader in Part 4.
Step 6: Complete submissions
Click Submit button.
This will validate the run and complete the submission. Please correct any validation errors.
You will be presented with a confirmation pop-up window. Please select OK.
On successful submission you will see a receipt with details about your submissions.