Principal Component Analysis in WEKA

To apply PCA to the dataset, follow the following step in the Preprocess tab (Figure 16):

1. In the Filters section, click Choose

2. Under Filters -> Unsupervised -> Attributes, Click PrincipalComponents

**Figure 16** Select the Principal Components filter in the Preprocess tab.

3. Left-click on the text-box

4. In the maximumAttributes field, type: 2 (Figure 17)

(The maximumAttributes field indicates the number of principal components. To project the dataset to 2 dimensions, set the number of principal components to 2. This will help in visualising the dataset in 2 dimensions)

5. Click OK

6. Click Apply

**Figure 17** Customise the Principal Components settings.

Now delete the Class column (which we created in step 2), since we don’t need it to be used during the clustering step (Figure 18).

Check on the box next to Class
Click Remove

Now to visualise the dataset, click on the ‘Visualize Tab’. Double click on the upper left box (Figure 19).

You can see the dataset visualised in 2D. One attribute on the x-axis and the other on the y-axis.

Machine learning in drug discovery

Congratulations!