%%commentbox
__Would you like to try our new R-Cloud workbench?__
\\ 
Go to the [R Workbench Web Page|http://wwwdev.ebi.ac.uk/Tools/rcloud] and launch the latest version.
%%

!!!affyParaEBI 
\\
;__affyParaEBI__: ''is a distributed computing package for Affymetrix microarray RMA pre-processing for the ArrayExpress R/Bioconductor Workbench''

!!Purpose
\\
affyParaEBI (aPE) is an R/Bioconductor-based pipeline for fast pre-processing of Affymetrix chips. It allows users to quickly pre-process large amounts of CEL files into Biocondutor ExpressionSet objects that can be used for downstream analysis.

aPE works remotely on the EBI R cloud, and can be used to analyse local datasets or public datasets from the ArrayExpress Archive of Functional Genomics Data at the EBI.

!! Running aPE on the EBI R cloud


# __Choose a dataset to preprocess or upload your own data__ \\ aPE can be used on the EBI R-Cloud to pre-process private datasets by uploading its corresponding CEL files to your R/Bioconductor Workbench account. \\ You can also pre-process public datasets available through ArrayExpress.  \\ You can use this [interface|http://www.ebi.ac.uk/arrayexpress/] to search for a dataset.\\ \\
# __Launch the R-Cloud Workbench, register and create a new project__ \\ The EBI R-Cloud is a new service at the EBI which allows R users to log in and run distributed computational jobs remotely on its powerful 64-bit linux cluster. This is available through a Java client called the ArrayExpress R/Bioconductor Workbench. Open the following address in a browser and follow the instructions on the page to download or launch the Workbench: [http://www.ebi.ac.uk/tools/rcloud] \\ \\ When connecting for the first time you will be requested to register. You will need to set/provide a username, password and e-mail address to allow you to retrieve your long running projects the next time you log in. Once registered you can log in and create a new project.\\ \\ [Find your way around the workbench|http://www.ebi.ac.uk/Tools/rcloud/quick_start.html] \\ \\
# __Pre-process your data__ \\ If your CEL files are in a folder 'td', running affyParaEBI within R is straightforward with the following simple code: \\ {{{
library(affyParaEBI) 
cluster <- makeCluster(10,type="RCLOUD") 
e <- preproParaEBI(path=td, clust=cluster) 
stopCluster(cluster); rm(cluster)
}}} \\ \\ When the pipeline finishes, 'e' will contain an ExpressionSet object that can then be used for downstream analysis. \\ \\ The pre-processing will take time depending on the size of the dataset and how many computing nodes were allocated with "makeCluster" (10 in the above example). Typically, 50 nodes can pre-process 1000 CEL files in about 20 minutes. \\ \\ If you want to pre-process one or more experiments available in ArrayExpress, you can download the CEL files to a folder via ArrayExpress R/Bioconductor package: \\{{{
library(ArrayExpress)
library(affyParaEBI)

#1) Create a two-node cluster
cluster<-makeCluster(2,type="RCLOUD") 

#2) Retrieve CEL files from experiment E-MEXP-328
td<-tempdir()
emexp328.raw<-ArrayExpress(input = "E-MEXP- 328", path=td, save=TRUE)

#3) Replace low performance nodes
cluster<-clusterOptimization(cluster, subst=TRUE) 

#4) Perform parallel preprocessing of CEL files
emexp328.proc<-preproParaEBI(path=td, clust=cluster) 

#5) Clean up cluster nodes and CEL files
stopCluster(cluster); rm(cluster)
file.remove(list.files(td, full.names=TRUE))
}}} \\

!!Downloads
[affyParaEBI_1.0.0.tar.gz|http://www.ebi.ac.uk/Tools/rcloud/downloads/affyParaEBI_1.0.0.tar.gz]