Uploading data files

Data files must be uploaded before they can be submitted either using an applet embedded into Webin or by using FTP or Aspera. For FTP/Aspera please calculate the md5 sum for each file that you upload.

Use Webin Java web start application

Data files can be uploaded using a Java web start application downloadable from Webin:

Webin File Upload

  1. Launch Webin
  2. Go to the New Submission Page
  3. Click Launch Uploader to download the Java web start application.
  4. Launch the Java web start application. Mac users should follow the instructions further below.
  5. Enter your password into the Password field.
  6. Browse into the Upload Directory containing the data files you want to upload using the ... button. The list of all the files contained in the selected directory will be displayed.
  7. Choose Override option if you want to replace any existing files.
  8. Choose Upload Tree option if you want to create a directory in your upload area with the same name as the directory containing your data files. Otherwise the files will be uploaded into the root directory of your upload area.
  9. Select the files you want to upload. You can use the Select All button to select all the files for upload.
  10. Click on the Upload button.

Mac users should follow the instructions below to execute the Java web start application.

After selecting the Upload File option, the following dialog box will be displayed: 

image

Select the ‘Save File’ option to save the ‘WebinUploader.jnlp’ file to your default download directory.

If you selected the ‘Open with’ option instead of the ‘Save File’ option then the following dialog box will be displayed:

image

In this case please select ‘OK’. This will save the ‘WebinUploader.jnlp’ file to your default download directory.

In order to run the File Uploader application, open your file explorer and go to the directory where the ‘WebinUploader.jnlp’ file has been saved.

While pressing the ‘ctrl’ button, select the ‘WebinUploader.jnlp’ file then select the ‘open’ option. The following dialog will now be displayed:

image

Now select the ‘Open’ button, the File Uploader application will now be launched.

 

Use Windows 7 Explorer

  1. Right click 'Computer' and select 'Add a network location' from the menu
    image
  2. Click 'Next'
    image
  3. Select 'Choose a custom network location' and click 'Next'
    image
  4. Type ftp://webin.ebi.ac.uk in the 'Internet or netword address' field and click 'Next'
    image
  5. Unselect 'Log on anonymously', type your Webin user name in the 'User name' field and click 'Next'
    image
  6. Type a network location to show in Windows Explorer e.g. 'webin.ebi.ac.uk' click 'Next'
    image
  7. Click 'Finish'
    image
  8. When using the new folder you will prompted for your Webin password. Type your password and click 'Log on'
    image

Use FileZilla FTP client

  1. Download and install FileZilla (if you are not administrator of your computer, download the portable version of FileZilla).
  2. Make sure you are using binary mode: Transfer menu -> Transfer Type -> Binary (change from Auto).
  3. Use webin.ebi.ac.uk as the host.
  4. Use the username and password associated with your Webin submission account. The username can be either the name of your submission account (Webin-N) or your email-address.
  5. Click Quickconnect
  6. Search for the file(s) you want to upload using the tree on the left panel.
  7. Create directories in your drop box (if necessary) using the tree on the right panel.
  8. Drag and drop the files you want to upload from the lower left panel to the lower right panel.
  9. Once your transfer is successful, close the application.

Use FTP command line client on Linux/Unix

  1. Open a terminal and type 'ftp webin.ebi.ac.uk'.
  2. Enter the username and password  associated with your Webin submission account.
  3. Type bin to use binary mode.
  4. Type 'ls' command to check the content of your drop box.
  5. Type 'prompt' to switch off confirmation for each file uploaded.
  6. Use 'mput' command to upload files.
  7. Use 'bye' command to exit the ftp client.

Use FTP command line client on Windows

  1. Start the command line interpreter: press Win-R, type cmd, hit enter
  2. Type 'ftp'
  3. Type 'open webin.ebi.ac.uk'
  4. Enter the username and password associated with your Webin submission account.
  5. Type bin to use binary mode.
  6. Type 'ls' command to check the content of your drop box.
  7. Type 'prompt' to switch off confirmation for each file uploaded.
  8. Use 'mput' command to upload files.
  9. Use 'bye' command to exit the ftp client.
  10. Use 'exit' command to exit the command line interpreter.

Use Aspera ascp command line program

Aspera is a commercial file transfer protocol that provides better transfer speeds than FTP over long distances. For short distance file transfers we recommend the use of FTP.

Download Aspera ascp command line client from here. Please select the correct operating system. The ascp command line client is distributed as part of the aspera connect high-performance transfer browser plug-in.

Your command should look similar to this:

    ascp -QT -l300M -L- <file to upload> <Webin-N>@webin.ebi.ac.uk:.

 

The '-l300M' option sets the upload speed limit to 30MB/s. You may wish to lower this value to increase the reliability of the transfer.

The '-L-' option is for printing logs out while transferring,

The <file to upload> can be a file mask (e.g. '*.cram') or a list of files.

The <Webin-N> is your Webin submission account.

Submitters having era-drop-N accounts may also continue to upload files through fasp.sra.ebi.ac.uk. However, this service is planned to be retired in the future.

    ascp -QT -l300M -L- <file to upload> <era-drop-N>@fasp.sra.ebi.ac.uk:.

MD5 checksums and how to apply them

Ftp transfers are not always successful. Especially with large files. The MD5 hash (or checksum) functions as a compact digital fingerprint of a file. We need to know the md5sum of each file when it is on your computer. If it has a different checksum when we calculate it on our side then the transfer was corrupted or incomplete.

image

EXAMPLE:
File U937_R1.fastq.bz2 (in diagram) has a different checksum after transfer.
The file may not be 100% intact and should not be processed at this stage.
Error notification is emailed to the submitter.

When you upload the files you must also register the md5 checksum of each file so that we can check that each file transfer was properly completed. To find the checksum for a file in Linux or Mac use the ‘md5sum’ tool:

home> md5sum Solid0140_20081011_2_Lib_11_2.fastq.gz 
b26854779ea34e0bc3f47219e6e079e6 Solid0140_20081011_2_Lib_11_2.fastq.gz

 

Windows operating system has a different way of calculating md5 checksums. There are also tools to do this available for download

There are 2 ways to register the md5sum of each file on your computer so that we can check if it is the same when the file has reached our servers.

1. Save each checksum in it’s own file (the checksum file) and upload both files (the original and the checksum file) at the same time. The checksum file has the same name as the original file except that it has “.md5” at the end

home> md5sum Solid0140_20081011_2_Lib_11_2.fastq.gz > Solid0140_20081011_2_Lib_11_2.fastq.gz.md5
home> cat Solid0140_20081011_2_Lib_11_2.fastq.gz.md5
b26854779ea34e0bc3f47219e6e079e6  Solid0140_20081011_2_Lib_11_2.fastq.gz

 

The ENA file uploader client (embedded inside Webin tool) will do this automatically.

image

2. The second way to register a checksum for each file is to do it at submission time. When you create your runs and experiments there is a column in the tsv/spreadsheet/web form to add the md5sum for each file. Of course you still need to calculate the checksums yourself using the methods discussed previously.

image

So what happens when ENA finds a file whose checksum does not match the one that is registered?

image

You will receive a notification by email.

image

Why is there no match? The possibilities are:

1. The wrong checksum is registered. You can check this – simply calculate the checksum of your local version of the file and compare to the one listed in the email. If the wrong checksum is registered you can upload a new checksum file (or use the upload client to re upload the file and by default, a new checksum file)

2. If you know the registered checksum to be correct (after carrying out step 1), then the transfer must have been incomplete. Try uploading the file again. You may need to seek out a more stable internet connection.

There is another scenario where the wrong checksum could be registered for file. This is when a different preliminary check has failed. Some of the checks carried out include “is the BAM file readable with samtools” or “does the fastq file have 4 lines per read”. To fix these errors you need to upload a fixed file which will have a different md5sum from the orginal. As before, you can upload a new checksum file (or use the upload client to re upload the file and by default, a new checksum file).

When updating checksums and uploading fixed files please preserve all file names and directory paths so that the system can find your replacement files and checksum files. If the run is already archived, you can not replace the file contained in the run by uploading one of the same name into your ftp area. These solutions are only for files that have failed preliminary checks (when you receive error notification email or when Webin displays “not archived”).

It is also possible to edit registered checksums within Webin by clicking ‘edit’ in the Runs tab. This is ideal if you only have a few problem files but if you have many it is more practical to create checksum files and upload them to your ftp area.

image

Latest ENA news

12 Jul 2017: Submission service maintenance - 14/7/17 to 17/7/17

Webin submission services will not be available between Friday 14/7...

07 Jul 2017: Update to Aspera server

EBI has built a new Aspera server on up-dated hardware with the latest Aspera version and configuration. This should improve...

06 Jul 2017: ENA Release 132

Release 132 of ENA's assembled/annotated sequences now available

30 Jun 2017: Taxon support for sequence, WGS and assembly in ENA Browser Tools

You can now download sequence, WGS and assembly data by tax ID using ENA Browser Tools

23 Jun 2017: New tools to download data from ENA

Introducing two new tools to make retrieving data from ENA much easier: enaBrowserTools and ENA FTP Downloader.