Course progress: 0%

Preprocessing data

1. Delete the Gene column (we don’t want the Gene ID to be used during clustering):

Check the box next to Gene then click Remove (Figure 12).

2. WEKA automatically assumes that the last column is a Class rather than a feature. Therefore, the last column will not be used during clustering. To avoid this problem, we will create a dummy column and assign it as a Class. To create a new Column (Figure 13):

In the Filters section, click Choose
Under Filters -> Unsupervised -> Attributes, Click ADD
Left-Click on the text box
In the attributeName field, type: Class
Click OK
Click Apply

Now you see a new column called Class added as the last column in the dataset. The column is populated with NAN values, which indicates that it is empty.

3. Now we need to assign this column as the class column (Figure 14).

In the Filters section, click Choose
Under Filters -> Unsupervised -> Attributes, Click ClassAssigner (The ClassAssigner is a filtering processing to indicate which column is the class column)
By default the ClassAssigner chooses the last column as the class
Click Apply

**Figure 14** Assigning a column as the class column.

Now that the dataset is preprocessed, we can visualise it.

Machine learning in drug discovery

Preprocessing data

Congratulations!