Course progress: 0%

Model development

We will chose a decision tree classifier for the following reasons:

Easy to interpret, so we can use it to identity genes that could be potential targets for cancers
Has a built-in feature selection mechanism, which makes suitable to this highly dimensional dataset
Easy to visualise the model
Can learn non-linear relationships between the features and labels
Works well for multi-class classification problems

However, decision trees may overfit the data if the trees are too deep. If this happens, it is recommended to apply a pruning mechanism and tuning parameters that reduce the depth of the tree.

Click on the Classify tab
In the Classifier section, choose Trees and then J48 decision tree algorithm.
To modify the models parameter, click on the model. In this exercise, we will apply the algorithm with the default parameters (Figure 26).
In the Test Options section, there are several testing approaches, and in this exercise we will choose splitting the data into 70% training set and 30% testing set.
Click Start, to begin model training and testing.

Machine learning in drug discovery

Model development

Congratulations!