There are no items in your cart
Add More
Add More
Item Details | Price |
---|
Learn how decision trees predict numbers by splitting data.
Imagine playing a game of "Guess the Number." You ask questions like "Is it bigger than 50?" or "Is it even?" to narrow down the possibilities. Decision Trees in machine learning work similarly!
They are flowchart-like structures that ask questions about your data's features to guide you to a final prediction. Decision trees can predict categories (like "Will this customer buy?" - Classification Tree) or continuous numbers (like "How many hours will someone play tennis?" - Regression Tree). Today, we'll focus on understanding Regression Trees.
A decision tree learns by splitting the dataset into smaller and smaller subsets. At each step, it asks a question about one of the input features (e.g., "Is the weather Outlook Sunny?"). Based on the answer, the data goes down a specific branch.
The main goal when building a tree is to make the resulting groups (at the end of the branches) as "pure" or homogeneous as possible regarding the value we want to predict.
Basic Tree Terminology
How does the tree decide *which* question to ask at each step? For regression trees, a common method is to choose the split that results in the biggest reduction in the spread or variation of the target variable (the number we're trying to predict).
We often measure this spread using the Standard Deviation (SD). A low SD means the numbers in a group are very similar; a high SD means they are spread out. The tree wants to create groups (leaves) with the lowest possible SD.
Standard Deviation (s) ≈ Average distance of data points from their mean
(Formula: √[ Σ(value - mean)² / count ]
)
The method used is called Standard Deviation Reduction (SDR).
At any decision node, the algorithm considers all possible splits across all features:
SD_parent
) of the target variable for all data points currently in this node.SD_sunny
, SD_overcast
, SD_rainy
).Weighted_SD = (Fraction_in_Child1 * SD_Child1) + (Fraction_in_Child2 * SD_Child2) + ...
SDR = SDParent - Weighted_Average_SDChildren
The tree stops growing branches (creating leaf nodes) when:
Once a data point reaches a leaf node, what's the prediction? For a regression tree, it's simple: the prediction is the average (mean) of the target variable for all the *training* data points that ended up in that leaf.
Let's revisit the tennis example: predicting 'Hours Played' based on 'Outlook', 'Temperature', 'Humidity', 'Windy'.
Suppose at the Root Node (all 14 data points):
Now, let's test splitting by 'Outlook':
= (5/14 * 10.87) + (4/14 * 0) + (5/14 * 7.78)
≈ 3.88 + 0 + 2.78 = 6.66
= SD_Parent - Weighted_SD_Children
= 9.32 - 7.66 = 1.66
If we calculate SDR for splitting by Temperature, Humidity, and Windy, and find that 1.66 is the highest SDR, then 'Outlook' is chosen as the first split at the Root Node.
Example Tree Growth using SDR
Term | Definition |
---|---|
Decision Node | Where data splits based on a feature's condition. |
Root Node | The very first split/decision node at the top. |
Leaf Node | End node with the final prediction (average value for regression). |
Subtree | A branch and its subsequent nodes/leaves. |
Standard Deviation (SD) | Measures the spread or variation of numerical data. |
Coefficient of Variation (CV) | Relative spread (SD / Mean). Used as a stopping criterion. |
Standard Deviation Reduction (SDR) | The decrease in SD achieved by a split. Used to choose the best split. |
Problem | Solution Approach | Key Concept |
---|---|---|
Calculate SD for: [20, 25, 22, 28, 25]. | Find mean, find squared differences from mean, average them, take square root. | Calculating SD. |
Parent node (30 points, SD=12). Split A (10 points, SD=5). Split B (20 points, SD=8). Calculate weighted child SD. | Weighted SD = (10/30 * 5) + (20/30 * 8) ≈ 1.67 + 5.33 = 7.0 | Weighted average SD calculation. |
SDR for Feature A split = 4.2. SDR for Feature B split = 3.8. Which feature is chosen? | Feature A. | Select split with Maximum SDR. |
Key Formulas/Concepts:
SD (s) ≈ Measure of Spread
SDR = SD_Before - Weighted_Avg_SD_After
Interview Question
Question 1: What is the main difference between a Regression Tree and a Classification Tree in terms of their purpose and output?
A Regression Tree is used to predict a continuous numerical value (like price, temperature, hours). Its leaf nodes typically output the average of the target values in that leaf. A Classification Tree is used to predict a discrete category or class label (like 'Yes'/'No', 'Spam'/'Not Spam'). Its leaf nodes typically output the most common class in that leaf.
Question 2: Explain the goal of using Standard Deviation Reduction (SDR) when deciding where to split a node in a Regression Tree.
The goal of SDR is to find the split (based on a feature and value) that makes the resulting child nodes as homogeneous as possible in terms of the target variable. Homogeneous means the values are very similar, which corresponds to a low Standard Deviation. SDR quantifies how much the standard deviation decreases after a split compared to before. By choosing the split with the highest SDR, the algorithm picks the split that best separates the data into groups with less internal variation, leading towards more precise predictions at the leaves.
Interview Question
Question 3: If a leaf node in a Regression Tree is reached, how is the final prediction for a new data point falling into that leaf determined?
The final prediction is typically the average (mean) of the target variable values for all the *training* data points that ended up in that specific leaf node during the tree's construction.
Question 4: Why is it generally necessary to have stopping criteria when building a decision tree? What could happen if you didn't stop splitting?
Stopping criteria (like minimum samples per leaf, maximum depth, minimum SD reduction) are necessary to prevent overfitting. If the tree splits indefinitely until each leaf contains only one data point, it would perfectly memorize the training data (including noise) but would likely perform very poorly on new, unseen data because it hasn't learned the general underlying pattern.
Interview Question
Question 5: What do the Root Node and Decision Nodes represent in the overall structure and decision-making process of the tree?
The Root Node is the starting point, representing the entire dataset and the first question (split) asked based on the feature providing the highest SDR initially. Decision Nodes are subsequent points in the tree where further questions are asked about features to progressively partition the data down specific branches based on the answers, guiding a data point towards a final leaf node prediction.