Preprocessing data

1. Delete the Gene column (we don’t want the Gene ID to be used during clustering):

  • Check the box next to Gene then click Remove (Figure 12).
Figure 12 Remove the ‘Gene’ column.

2. WEKA automatically assumes that the last column is a Class rather than a feature. Therefore, the last column will not be used during clustering. To avoid this problem, we will create a dummy column and assign it as a Class. To create a new Column (Figure 13):

  • In the Filters section, click Choose
  • Under Filters -> Unsupervised -> Attributes, Click ADD
  • Left-Click on the text box
  • In the attributeName field, type: Class
  • Click OK
  • Click Apply

Now you see a new column called Class added as the last column in the dataset. The column is populated with NAN values, which indicates that it is empty.

Figure 13 Add a ‘Class’ column.

3. Now we need to assign this column as the class column (Figure 14).

  • In the Filters section, click Choose
  • Under Filters -> Unsupervised -> Attributes, Click ClassAssigner (The ClassAssigner is a filtering processing to indicate which column is the class column)
  • By default the ClassAssigner chooses the last column as the class
  • Click Apply
Figure 14 Assigning a column as the class column.

Now that the dataset is preprocessed, we can visualise it.