What is the Decision Tree ?

Amir Aliz
Oct 27, 2021
4 min read

Updated: Mar 30, 2023

Introduction:

The decision tree is known as one of the most popular algorithms in data mining algorithms. This algorithm has a structure like a tree as well as its name but grows upside down to its root at the top.

Deciding the right choice has always been a challenging step in solving each issue, and it would be worst if we had many options and choices. The one traditional way to reach this is to consider all possible options and outcomes to choose the best scenario. The Decision tree Algorithm utilises this way to assess each possible outcome with another to aid an individual to reach the best benefit. This algorithm draws a map that shows predicts the best option based on mathematics.

The Decision tree algorithm is considered an algorithm for classification problems related to the supervised learning algorithm in terms of machine learning. Supervised learning is or supervised machine learning, is considered as a subcategory of machine learning. It trains the algorithm to classify data by using labelled datasets. Training algorithms includes inputs data and correct outputs, which leads the model to learn.

Decision Tree Algorithm:

The main reason for using a decision tree is to predict and reach the best prediction. The decision tree predicts by the previous understanding decision.

There are two different types of decision trees based on the type of target variable: categorical variable decision tree and continuous variable decision tree.

How decision Tree work:

After gathering the data, we have to clean it. Sometimes Datasets include many none values or missing data, so they need to be sorted and clean. Then we split data into a training set and testing set.

Typically, 30 percent of data split for training algorithm and reminding data will pass to the algorithm.

The algorithm consists of different steps.

1- First, using selection measures, which will be explained further, select the proper attribute to split.

2 - convert those attributes to the parent node and child node

3- split child attributes into smaller sub-sets until one of the below conditions will match:

- All the tuples belong to the same attribute value.

- There are no more remaining attributes.

- There are no more instances.

These are some vital terminology according to the decision tree.

Root node: it the first node that shows the entire sample and would get divided into more node

Splitting: it Means the division of a single node into two or more sub-nodes.

Decision node: when a sub-nodes grow into multi sub-nodes

Leaf or Terminal node: when a node stop from the split.

Pruning: the opposite process of splitting and when we remove sub-nodes

Branch or sub-tree: subsection of the entire tree is called branch or sub-tree

Parent and Child node: when a node is split into sub-node, they are called parent node and child node, respectively.

Attribute Selection Measures:

One of the complicated steps in the decision tree is to define which attribute to choose as root. Random selection is one way that can be used in some cases but is not an assurance that we reach high accuracy. So to solve this problem, attribute selection measure use some criteria such as Entropy Information gain, Gini index, Gain Ratio, reduction in variance and chi-square

The attribute selection measure strategy calculates all of these criteria for every attribute and sorts the value. Finally, attributes are placed into the tree by order, the value with high information gain is placed at the root.

Information gain:

Information gain or GI checked the attributes to split in the proper position. The information gain is a decrease in entropy, so by calculating the entropy value of each attribute, we can calculate information gain. Based on the information calculation, we can build a decision tree.

Gini Index:

Gini Index, also known as Gini impurity, Gini Index is a metric to measure how often a randomly chosen element would be incorrectly identified. This value works with the categorical target variable (success) or (Failure).

For calculating the Gini index, is used 1-(p²+q²). where success(p) and failure(q) (p²+q²). which means an attribute with lower Gini impurity should be preferred.

Optimising Decision Tree Performance:

There is no debate that the Decision tree algorithm has become one of the most popular algorithms today. However, this does not mean it always performs definitely. This algorithm has many limitations and restrictions.

A common problem is overfitting. In the theoretical maximum depth in a decision tree is less than the number of all samples. Hence overfitting prevents the algorithm from reaching this. Also, reaching maximum depth in a complex dataset could be laborious and time-consuming. Since an inconsiderable change in reaching maximum depth, optimising the decision tree is a way to avoid overfitting and time-consuming.

Today, many studies have been done to improve the performance of Decision Tree, but Pruning Decision Trees and Random Forest are two common ways.

Visualise the Decision Tree:

One reason that the decision tree is so popular is that this algorithm can be used for regression and classification, and not only they do not require feature scaling, but they are accessible can be interpreted as it can visualise a decision tree. Consequently, it would help to know how our model works without much mathematical knowledge.