Decision trees are a popular and versatile supervised machine learning algorithm used for both classification and regression tasks. They are powerful models that aim to mimic human decision-making processes.
A decision tree consists of a tree-like structure where:
Decision trees are constructed using a top-down, recursive partitioning approach called "Top-Down Induction of Decision Trees (TDIDT)". The algorithm selects the best attribute to split the dataset into subsets at each node based on a criterion, such as "Information Gain" or "Gini impurity" (this application uses "Information Gain"). This process continues until all instances within a subset belong to the same class or there are no more attributes to split the subset on. A leaf node is created that, in the former case, is assigned the class label that all instances in that subset belong to or, in the latter case, is assigned the most common class label of that subset.
Learn moreInput:
Output:
Algorithm:
To calculate the Information Gain, the following formula is used:
\(IG(S, A) = H(S) - H(S \mid A)\)
Where: