Model development
We will chose a decision tree classifier for the following reasons:
- Easy to interpret, so we can use it to identity genes that could be potential targets for cancers
- Has a built-in feature selection mechanism, which makes suitable to this highly dimensional dataset
- Easy to visualise the model
- Can learn non-linear relationships between the features and labels
- Works well for multi-class classification problems
However, decision trees may overfit the data if the trees are too deep. If this happens, it is recommended to apply a pruning mechanism and tuning parameters that reduce the depth of the tree.
- Click on the Classify tab
- In the Classifier section, choose Trees and then J48 decision tree algorithm.
- To modify the models parameter, click on the model. In this exercise, we will apply the algorithm with the default parameters (Figure 26).
- In the Test Options section, there are several testing approaches, and in this exercise we will choose splitting the data into 70% training set and 30% testing set.
- Click Start, to begin model training and testing.
