Dataset preprocessing

  1. Run WEKA as explained in the previous section.
  2. Click on “Explorer”
  3. Click Open File
  4. Select file: gene_expression_data.arff
Figure 24 The majority of the samples are classified as BRCA (i.e., breast carcinoma), while the least common cancer is COAD (i.e., colon adenocarcinoma).

Click on the “Visualize” tab to see the data visualised for each pair of features (Figure 25).

Figure 25 Use the ‘Visualize’ tab to see an overview of the data.