This is version . It is not the current version, and thus it cannot be edited.
[Back to current version]   [Restore this version]
Would you like to try our new R-Cloud workbench?
Go to the R Workbench Web Page and launch the latest version.

affyParaEBI #

is a distributed computing package for Affymetrix microarray RMA pre-processing for the ArrayExpress R/Bioconductor Workbench


affyParaEBI (aPE) is an R/Bioconductor-based pipeline for fast pre-processing of Affymetrix chips. It allows users to quickly pre-process large amounts of CEL files into Biocondutor ExpressionSet objects that can be used for downstream analysis.

aPE works remotely on the EBI R cloud, and can be used to analyse local datasets or public datasets from the ArrayExpress Archive of Functional Genomics Data at the EBI.

Running aPE on the EBI R cloud#

  1. Choose a dataset to preprocess or upload your own data
    aPE can be used on the EBI R-Cloud to pre-process private datasets by uploading its corresponding CEL files to your R/Bioconductor Workbench account.
    You can also pre-process public datasets available through ArrayExpress.
    You can use this interface to search for a dataset.

  2. Launch the R-Cloud Workbench, register and create a new project
    The EBI R-Cloud is a new service at the EBI which allows R users to log in and run distributed computational jobs remotely on its powerful 64-bit linux cluster. This is available through a Java client called the ArrayExpress R/Bioconductor Workbench. Open the following address in a browser and follow the instructions on the page to download or launch the Workbench:

    When connecting for the first time you will be requested to register. You will need to set/provide a username, password and e-mail address to allow you to retrieve your long running projects the next time you log in. Once registered you can log in and create a new project.

    Find your way around the workbench

  3. Pre-process your data
    If your CEL files are in a folder 'td', running affyParaEBI within R is straightforward with the following simple code:
    > library(affyParaEBI) > cluster <- makeCluster(10,type="RCLOUD") > e <- preproParaEBI(path=td, clust=cluster) > stopCluster(cluster); rm(cluster)

    When the pipeline finishes, 'e' will contain an ExpressionSet object that can then be used for downstream analysis.

    The pre-processing will take time depending on the size of the dataset and how many computing nodes were allocated with "makeCluster" (10 in the above example). Typically, 50 nodes can pre-process 1000 CEL files in about 20 minutes.

    If you want to pre-process one or more experiments available in ArrayExpress, you can download the CEL files to a folder via ArrayExpress R/Bioconductor package:
    > library(ArrayExpress) > library(affyParaEBI) > #1) Create a two-node cluster > cluster<-makeCluster(2,type="RCLOUD") > #2) Retrieve CEL files from experiment E-MEXP-328 > td<-tempdir() > emexp328.raw<-ArrayExpress(input = "E-MEXP- 328", path=td, save=TRUE) > #3) Replace low performance nodes > cluster<-clusterOptimization(cluster, subst=TRUE) > #4) Perform parallel preprocessing of CEL files > emexp328.proc<-preproParaEBI(path=td, clust=cluster) > #5) Clean up cluster nodes and CEL files > stopCluster(cluster); rm(cluster) > file.remove(list.files(td, full.names=TRUE))



Add new attachment

Only authorized users are allowed to upload new attachments.
« This particular version was published on 02-Jun-2011 10:04 by Rodrigo Santamaria.