Model development

We will chose a decision tree classifier for the following reasons:

  • Easy to interpret, so we can use it to identity genes that could be potential targets for cancers
  • Has a built-in feature selection mechanism, which makes suitable to this highly dimensional dataset
  • Easy to visualise the model
  • Can learn non-linear relationships between the features and labels
  • Works well for multi-class classification problems

However, decision trees may overfit the data if the trees are too deep. If this happens, it is recommended to apply a pruning mechanism and tuning parameters that reduce the depth of the tree.

  1. Click on the Classify tab
  2. In the Classifier section, choose Trees and then J48 decision tree algorithm.
  3. To modify the models parameter, click on the model. In this exercise, we will apply the algorithm with the default parameters (Figure 26).
  4. In the Test Options section, there are several testing approaches, and in this exercise we will choose splitting the data into 70% training set and 30% testing set.
  5. Click Start, to begin model training and testing.
Figure 26 Modify the model parameters.