Organism(s): Homo sapiens
Reference(s):
19567590
(Filter by genes in paper)
21673316
(Filter by genes in paper)


Microarray quality metrics report for EGEOD12764 on array design AAFFY44
 Section 1: Between array comparison
 Section 2: Array intensity distributions
 Section 3: Variance mean dependence
 Section 4: Individual array quality
 Array metadata and outlier detection overview
array  sampleNames  *1  *2  *3  AssayName  FactorValue  FileName  
1  MCF10a_HRas_Vector_rep3  MCF10a_HRas_Vector_rep3  HRas over expression  GSM320251.CEL  
2  MCF10a_HRas_Vector_rep2  MCF10a_HRas_Vector_rep2  HRas over expression  GSM320250.CEL  
3  MCF10a_HRas_Vector_rep1  x  x  MCF10a_HRas_Vector_rep1  HRas over expression  GSM320249.CEL  
4  MCF10a_HRas_Vector_rep4  x  x  MCF10a_HRas_Vector_rep4  HRas over expression  GSM320252.CEL  
5  MCF10a_HRas_Vector_rep5  MCF10a_HRas_Vector_rep5  HRas over expression  GSM320253.CEL  
6  MCF10a_Null_Vector_rep5  MCF10a_Null_Vector_rep5  empty vector control  GSM320247.CEL  
7  MCF10a_Null_Vector_rep4  MCF10a_Null_Vector_rep4  empty vector control  GSM320246.CEL  
8  MCF10a_Null_Vector_rep2  MCF10a_Null_Vector_rep2  empty vector control  GSM320244.CEL  
9  MCF10a_Null_Vector_rep3  MCF10a_Null_Vector_rep3  empty vector control  GSM320245.CEL  
10  MCF10a_Null_Vector_rep6  x  MCF10a_Null_Vector_rep6  empty vector control  GSM320248.CEL  
11  MCF10a_Null_Vector_rep1  MCF10a_Null_Vector_rep1  empty vector control  GSM320243.CEL  
12  MCF10a_MEK1_Vector_rep1  MCF10a_MEK1_Vector_rep1  MEK1 over expression  GSM320254.CEL  
13  MCF10a_MEK1_Vector_rep4  MCF10a_MEK1_Vector_rep4  MEK1 over expression  GSM320257.CEL  
14  MCF10a_MEK1_Vector_rep3  MCF10a_MEK1_Vector_rep3  MEK1 over expression  GSM320256.CEL  
15  MCF10a_MEK1_Vector_rep6  MCF10a_MEK1_Vector_rep6  MEK1 over expression  GSM320259.CEL  
16  MCF10a_MEK1_Vector_rep2  MCF10a_MEK1_Vector_rep2  MEK1 over expression  GSM320255.CEL  
17  MCF10a_MEK1_Vector_rep5  MCF10a_MEK1_Vector_rep5  MEK1 over expression  GSM320258.CEL 
The columns named *1, *2, ... indicate the calls from the different outlier detection methods:
 outlier detection by Distances between arrays
 outlier detection by Boxplots
 outlier detection by MA plots
At the scope covered by this software, outlier detection is a poorly defined question, and there is no 'right' or 'wrong' answer. These are hints which are intended to be followed up manually. If you want to automate outlier detection, you need to limit the scope to a particular platform and experimental design, and then choose and calibrate the metrics used.
Section 1: Between array comparison
 Figure 1: Distances between arrays.Figure 1 (PDF file) shows a false color heatmap of the distances between arrays. The color scale is chosen to cover the range of distances encountered in the dataset. Patterns in this plot can indicate clustering of the arrays either because of intended biological or unintended experimental factors (batch effects). The distance d_{ab} between two arrays a and b is computed as the mean absolute difference (L_{1}distance) between the data of the arrays (using the data from all probes without filtering). In formula, d_{ab} = mean  M_{ai}  M_{bi} , where M_{ai} is the value of the ith probe on the ath array. Outlier detection was performed by looking for arrays for which the sum of the distances to all other arrays, S_{a} = Σ_{b} d_{ab} was exceptionally large. 3 such arrays were detected, and they are marked by an asterisk, *.
 Figure 3: Principal Component Analysis.

Figure 3 (PDF file) shows a scatterplot of the arrays along the first two principal components. You can use this plot to explore if the arrays cluster, and whether this is according to an intended experimental factor, or according to unintended causes such as batch effects. Move the mouse over the points to see the sample names.
Principal component analysis is a dimension reduction and visualisation technique that is here used to project the multivariate data vector of each array into a twodimensional plot, such that the spatial arrangement of the points in the plot reflects the overall data (dis)similarity between the arrays.
Section 2: Array intensity distributions
 Figure 4: Boxplots.Figure 4 (PDF file) shows boxplots representing summaries of the signal intensity distributions of the arrays. Each box corresponds to one array. Typically, one expects the boxes to have similar positions and widths. If the distribution of an array is very different from the others, this may indicate an experimental problem. Outlier detection was performed by computing the KolmogorovSmirnov statistic K_{a} between each array's distribution and the distribution of the pooled data.
 Figure 6: Density plots.

Figure 6 (PDF file) shows density estimates (smoothed histograms) of the data. Typically, the distributions of the arrays should have similar shapes and ranges. Arrays whose distributions are very different from the others should be considered for possible problems. Various features of the distributions can be indicative of quality related phenomena. For instance, high levels of background will shift an array's distribution to the right. Lack of signal diminishes its right right tail. A bulge at the upper end of the intensity range often indicates signal saturation.
Section 3: Variance mean dependence
 Figure 7: Standard deviation versus rank of the mean.Figure 7 (PDF file) shows a density plot of the standard deviation of the intensities across arrays on the yaxis versus the rank of their mean on the xaxis. The red dots, connected by lines, show the running median of the standard deviation. After normalisation and transformation to a logarithm(like) scale, one typically expects the red line to be approximately horizontal, that is, show no substantial trend. In some cases, a hump on the right hand of the xaxis can be observed and is symptomatic of a saturation of the intensities.
Section 4: Individual array quality
 Figure 8: MA plots.Figure 8 (PDF file) shows MA plots. M and A are defined as:
M = log_{2}(I_{1})  log_{2}(I_{2})
A = 1/2 (log_{2}(I_{1})+log_{2}(I_{2})),
where I_{1} is the intensity of the array studied, and I_{2} is the intensity of a "pseudo"array that consists of the median across arrays. Typically, we expect the mass of the distribution in an MA plot to be concentrated along the M = 0 axis, and there should be no trend in M as a function of A. If there is a trend in the lower range of A, this often indicates that the arrays have different background intensities; this may be addressed by background correction. A trend in the upper range of A can indicate saturation of the measurements; in mild cases, this may be addressed by nonlinear normalisation (e.g. quantile normalisation).
Outlier detection was performed by computing Hoeffding's statistic D_{a} on the joint distribution of A and M for each array. Shown are first the 4 arrays with the highest values of D_{a}, then the 4 arrays with the lowest values. The value of D_{a} is shown in the panel headings. 0 arrays had D_{a}>0.15 and were marked as outliers. For more information on Hoeffing's Dstatistic, please see the manual page of the function hoeffd in the Hmisc package.
This report has been created with arrayQualityMetrics 3.20.0 under R version 3.1.0 (20140410).
(Page generated on Wed Nov 12 15:13:33 2014 by hwriter )