Decision trees are among the simpler and the most interpretable machine learning models. They can be used as both classifiers and regressors. The nodes in a decision tree correspond to decisions concerning a certain attribute of the input sample. The branches correspond to the possible alternatives. Leaves of the tree correspond to the outcomes: the outputs of the decision tree. To give an instance, in the tree below we are deciding whether to have an ice cream (outcome Y) or not (outcome N) based on the quality of the ice cream and of the cone.

Decision tree: an illustrative example.

Given a tabular dataset, it is not difficult to create a branching or a leaf. It is, however, not clear which attributes should be picked first and which later. The way this is generally decided is by picking the attribute which can separate the various outcomes most cleanly. This can be illustrated using the dataset below. If we divide the dataset by cone quality, we end up with three partitions and each of them will contain a mixture of positive and negative outcomes. If we choose to pick attribute cone quality first, two of the partitions will be clean: with purely positive or purely negative outcomes. Only one partition – that for “good” ice cream quality will still be mixed. So we pick ice cream quality first and then use cone quality, but only if the ice cream quality is “good”. This way we typically end up more compact trees.

Two possible partitionings of a dataset.