Organism: Homo sapiens
Microarray quality metrics report for E-GEOD-12764 on array design A-AFFY-44

Microarray quality metrics report for E-GEOD-12764 on array design A-AFFY-44

- Array metadata and outlier detection overview
1MCF10a_MEK1_Vector_rep2MCF10a_MEK1_Vector_rep2MEK1 over expressionGSM320255.CEL
2MCF10a_MEK1_Vector_rep6MCF10a_MEK1_Vector_rep6MEK1 over expressionGSM320259.CEL
3MCF10a_MEK1_Vector_rep4MCF10a_MEK1_Vector_rep4MEK1 over expressionGSM320257.CEL
4MCF10a_MEK1_Vector_rep3MCF10a_MEK1_Vector_rep3MEK1 over expressionGSM320256.CEL
5MCF10a_MEK1_Vector_rep5MCF10a_MEK1_Vector_rep5MEK1 over expressionGSM320258.CEL
6MCF10a_MEK1_Vector_rep1MCF10a_MEK1_Vector_rep1MEK1 over expressionGSM320254.CEL
7MCF10a_HRas_Vector_rep4xxMCF10a_HRas_Vector_rep4HRas over expressionGSM320252.CEL
8MCF10a_HRas_Vector_rep3MCF10a_HRas_Vector_rep3HRas over expressionGSM320251.CEL
9MCF10a_HRas_Vector_rep1xxMCF10a_HRas_Vector_rep1HRas over expressionGSM320249.CEL
10MCF10a_HRas_Vector_rep5MCF10a_HRas_Vector_rep5HRas over expressionGSM320253.CEL
11MCF10a_HRas_Vector_rep2MCF10a_HRas_Vector_rep2HRas over expressionGSM320250.CEL
12MCF10a_Null_Vector_rep3MCF10a_Null_Vector_rep3empty vector controlGSM320245.CEL
13MCF10a_Null_Vector_rep6xMCF10a_Null_Vector_rep6empty vector controlGSM320248.CEL
14MCF10a_Null_Vector_rep5MCF10a_Null_Vector_rep5empty vector controlGSM320247.CEL
15MCF10a_Null_Vector_rep2MCF10a_Null_Vector_rep2empty vector controlGSM320244.CEL
16MCF10a_Null_Vector_rep1MCF10a_Null_Vector_rep1empty vector controlGSM320243.CEL
17MCF10a_Null_Vector_rep4MCF10a_Null_Vector_rep4empty vector controlGSM320246.CEL

The columns named *1, *2, ... indicate the calls from the different outlier detection methods:
  1. outlier detection by Boxplots
  2. outlier detection by Distances between arrays
  3. outlier detection by MA plots
The outlier detection criteria are explained below in the respective sections. Arrays that were called outliers by at least one criterion are marked by checkbox selection in this table, and are indicated by highlighted lines or points in some of the plots below. By clicking the checkboxes in the table, or on the corresponding points/lines in the plots, you can modify the selection. To reset the selection, reload the HTML page in your browser.

At the scope covered by this software, outlier detection is a poorly defined question, and there is no 'right' or 'wrong' answer. These are hints which are intended to be followed up manually. If you want to automate outlier detection, you need to limit the scope to a particular platform and experimental design, and then choose and calibrate the metrics used.

Section 1: Array intensity distributions

- Figure 1: Boxplots.
Figure 1 (PDF file) shows boxplots representing summaries of the signal intensity distributions of the arrays. Each box corresponds to one array. Typically, one expects the boxes to have similar positions and widths. If the distribution of an array is very different from the others, this may indicate an experimental problem. Outlier detection was performed by computing the Kolmogorov-Smirnov statistic Ka between each array's distribution and the distribution of the pooled data.

+ Figure 2: Outlier detection for Boxplots.
- Figure 3: Density plots.
HRas over expression MEK1 over expression empty vector control Density 0.0 0.1 0.2 0.3 0.4 0.5 6 8 10 12

Figure 3 (PDF file) shows density estimates (smoothed histograms) of the data. Typically, the distributions of the arrays should have similar shapes and ranges. Arrays whose distributions are very different from the others should be considered for possible problems. Various features of the distributions can be indicative of quality related phenomena. For instance, high levels of background will shift an array's distribution to the right. Lack of signal diminishes its right right tail. A bulge at the upper end of the intensity range often indicates signal saturation.

Section 2: Between array comparison

- Figure 4: Distances between arrays.
Figure 4 (PDF file) shows a false color heatmap of the distances between arrays. The color scale is chosen to cover the range of distances encountered in the dataset. Patterns in this plot can indicate clustering of the arrays either because of intended biological or unintended experimental factors (batch effects). The distance dab between two arrays a and b is computed as the mean absolute difference (L1-distance) between the data of the arrays (using the data from all probes without filtering). In formula, dab = mean | Mai - Mbi |, where Mai is the value of the i-th probe on the a-th array. Outlier detection was performed by looking for arrays for which the sum of the distances to all other arrays, Sa = Σb dab was exceptionally large. 3 such arrays were detected, and they are marked by an asterisk, *.

+ Figure 5: Outlier detection for Distances between arrays.
- Figure 6: Principal Component Analysis.
HRas over expression MEK1 over expression empty vector control PC1 PC2 -400 -200 0 200 -500 0 500

Figure 6 (PDF file) shows a scatterplot of the arrays along the first two principal components. You can use this plot to explore if the arrays cluster, and whether this is according to an intended experimental factor, or according to unintended causes such as batch effects. Move the mouse over the points to see the sample names.
Principal component analysis is a dimension reduction and visualisation technique that is here used to project the multivariate data vector of each array into a two-dimensional plot, such that the spatial arrangement of the points in the plot reflects the overall data (dis)similarity between the arrays.

Section 3: Individual array quality

- Figure 7: MA plots.
Figure 7 (PDF file) shows MA plots. M and A are defined as:
M = log2(I1) - log2(I2)
A = 1/2 (log2(I1)+log2(I2)),
where I1 is the intensity of the array studied,and I2 is the intensity of a "pseudo"-array that consists of the median across arrays. Typically, we expect the mass of the distribution in an MA plot to be concentrated along the M = 0 axis, and there should be no trend in M as a function of A. If there is a trend in the lower range of A, this often indicates that the arrays have different background intensities; this may be addressed by background correction. A trend in the upper range of A can indicate saturation of the measurements; in mild cases, this may be addressed by non-linear normalisation (e.g. quantile normalisation).
Outlier detection was performed by computing Hoeffding's statistic Da on the joint distribution of A and M for each array. Shown are first the 4 arrays with the highest values of Da, then the 4 arrays with the lowest values. The value of Da is shown in the panel headings. 0 arrays had Da>0.15 and were marked as outliers. For more information on Hoeffing's D-statistic, please see the manual page of the function hoeffd in the Hmisc package.

+ Figure 8: Outlier detection for MA plots.

This report has been created with arrayQualityMetrics 3.26.1 under R version 3.2.3 (2015-12-10).

(Page generated on Tue May 17 13:47:50 2016 by hwriter )