Principal Component Analysis in WEKA

To apply PCA to the dataset, follow the following step in the Preprocess tab (Figure 16):

1. In the Filters section, click Choose

2. Under Filters -> Unsupervised -> Attributes, Click PrincipalComponents

Figure 16 Select the Principal Components filter in the Preprocess tab.

3. Left-click on the text-box

4. In the maximumAttributes field, type: 2 (Figure 17)

(The maximumAttributes field indicates the number of principal components. To project the dataset to 2 dimensions, set the number of principal components to 2. This will help in visualising the dataset in 2 dimensions)

5. Click OK

6. Click Apply

Figure 17 Customise the Principal Components settings.

Now delete the Class column (which we created in step 2), since we don’t need it to be used during the clustering step (Figure 18).

  1. Check on the box next to Class
  2. Click Remove

Figure 18 Delete the Class column.

Now to visualise the dataset, click on the ‘Visualize Tab’. Double click on the upper left box (Figure 19).

You can see the dataset visualised in 2D. One attribute on the x-axis and the other on the y-axis.

Figure 19 Visualise the dataset.

Now that the dataset is ready, we can cluster it.