Running AEHTS on the EBI R-Cloud#

  1. Find a dataset to analyse or upload your own data
    AEHTS can be used on the EBI R-Cloud to analyse public and private datasets available through ArrayExpress. You can use this interface to search for a dataset. E.g. you can use the author's name in the search query.
    The procedure of getting one's data available for analysis on the cloud is to submit it to the AE Archive, where the data will remain password protected. Simple MAGE-TAB templates for submission can be obtained from here, curators will assist in file preparation and validation.

  2. Launch the R-Cloud Workbench, register and create a new project
    The EBI R-Cloud is a new service at the EBI which allows R users to log in and run distributed computational jobs remotely on its powerful 64-bit linux cluster. This is available through a Java client called the ArrayExpress R/Bioconductor Workbench. Open the following address in a browser and follow the instructions on the page to download or launch the Workbench:

    When connecting for the first time you will be requested to register. You will need to set/provide a username, password and e-mail address to allow you to retrieve your long running projects the next time you log in. Once registered you can log in and create a new project.

    Find your way around the workbench

  3. Run the whole pipeline with default options
    Running ArrayExpressHTS within R with default options is straight forward with a simple call to function ArrayExpressHTS. You must provide an ArrayExpress accession number for the dataset you want to analyse. For example, the publicly available dataset E-GEOD-16190 comes from a study by Chepelev et al. on detecting single nucleotide variations in expressed exons of the human genome. To re-analyse this data with default options run the following commands on the Workbench console:
    library(ArrayExpressHTS) e <- ArrayExpressHTS("E-GEOD-16190")
    When the pipeline finishes, 'e' will contain an ExpressionSet object that can then be used for downstream analysis.

    Running the whole pipeline will take time depending on the size of the dataset and how many computing nodes were allocated to you. Reports and intermediate files are available as soon as they are ready and can be visualised through the File Browsing tab. A directory will be automatically created on your working space for each of the samples in the dataset. Any file can be copied by dragging and dropping from the File Browser into, for example, your Desktop. The hierarchy created by the previous example would look like this:
    E-GEOD-16190 | |___ SRR017242 | |___ report | | |___ plotRawReport.html # report on raw data | | | |___ tophat_out | |___ accepted_hits.bam # alignment file for this lane | |___ report | |___ plotAlignedReport.html # report on aligned data | |___ other lanes... | |___ compare_report | |___ plotComparedRawReport.html # report comparing the several lanes of the experiment | |___ data | |___ SRR017242.fastq # raw data | |___ other lanes... | |___ E-GEOD-16190.idf.txt | |___ E-GEOD-16190.sdrf.txt | |___ esetfpkm.RData # ExpressionSet with expression data for the whole experiment

    To run the pipeline with options other than the default ones go to the advanced options page.

For help#

Please use our mailing list:

Return to ArrayExpressHTS Help Topics

Add new attachment

Only authorized users are allowed to upload new attachments.
« This page (revision-30) was last changed on 10-Jul-2014 13:13 by Andrew Tikhonov