This is version . It is not the current version, and thus it cannot be edited.
[Back to current version]   [Restore this version]
Would you like to try our new R-Cloud workbench?
Go to the R Workbench Web Page and launch the latest version.

affyParaEBI #


affyParaEBI
is a distributed computing package for Affymetrix microarray RMA pre-processing for the ArrayExpress R/Bioconductor Workbench

Purpose#


affyParaEBI (aPE) is an R/Bioconductor-based pipeline for fast pre-processing of Affymetrix chips. It allows users to quickly pre-process large amounts of CEL files into Biocondutor ExpressionSet objects that can be used for downstream analysis.

aPE works remotely on the EBI R cloud, and can be used to analyse local datasets or public datasets from the ArrayExpress Archive of Functional Genomics Data at the EBI.

Running aPE on the EBI R cloud#

  1. Choose a dataset to preprocess or upload your own data

aPE can be used on the EBI R-Cloud to pre-process private datasets by uploading its corresponding CEL files to your R/Bioconductor Workbench account. You can also pre-process public datasets available through ArrayExpress. You can use this interface to search for a dataset.


  1. Launch the R-Cloud Workbench, register and create a new project

The EBI R-Cloud is a new service at the EBI which allows R users to log in and run distributed computational jobs remotely on its powerful 64-bit linux cluster. This is available through a Java client called the ArrayExpress R/Bioconductor Workbench. Open the following address in a browser and follow the instructions on the page to download or launch the Workbench: http://www.ebi.ac.uk/tools/rcloud

When connecting for the first time you will be requested to register. You will need to set/provide a username, password and e-mail address to allow you to retrieve your long running projects the next time you log in. Once registered you can log in and create a new project.

Find your way around the workbench

  1. Run the whole pipeline

If your CEL files are in a folder 'td', running affyParaEBI within R is straightforward with the following simple code:
> library(affyParaEBI) > cluster <- makeCluster(10,type="RCLOUD") > e <- preproParaEBI(path=td, clust=cluster) > stopCluster(cluster); rm(cluster)


When the pipeline finishes, 'e' will contain an ExpressionSet object that can then be used for downstream analysis.

The pre-processing will take time depending on the size of the dataset and how many computing nodes were allocated with "makeCluster" (10 in the above example). Typically, 50 nodes can pre-process 1000 CEL files in about 20 minutes.
If you want to pre-process one or more experiments available in ArrayExpress, you can download the CEL files to a folder via ArrayExpress R/Bioconductor package:
> library(ArrayExpress) > library(affyParaEBI) > #1) Create a two-node cluster > cluster<-makeCluster(2,type="RCLOUD") > #2) Retrieve CEL files from experiment E-MEXP-328 > td<-tempdir() > emexp328.raw<-ArrayExpress(input = "E-MEXP- 328", path=td, save=TRUE) > #3) Replace low performance nodes > cluster<-clusterOptimization(cluster, subst=TRUE) > #4) Perform parallel preprocessing of CEL files > emexp328.proc<-preproParaEBI(path=td, clust=cluster) > #5) Clean up cluster nodes and CEL files > stopCluster(cluster); rm(cluster) > file.remove(list.files(td, full.names=TRUE))


Return to ArrayExpressHTS Help Topics

Downloads#

affyParaEBI_1.0.tar.gz

Add new attachment

Only authorized users are allowed to upload new attachments.
« This particular version was published on 02-Jun-2011 09:48 by Rodrigo Santamaria.