When working with tree-based models like DecisionTrees
and RandomForests
in Python, there are a few common mistakes that people can make. This thread will cover some common mistakes along with example codes for better understanding.
1. Overfitting the model:
- Overfitting occurs when the model learns the training data too well and performs poorly on unseen data.
- One common mistake is not setting proper constraints on the tree’s depth or the number of estimators in the Random Forest.
- In the example below, two
DecisionTreeClassifier
models are created, one is not constrained with amax_depth
parameter and this can lead to overfitting while the other is constrained with a depth of 3 (max_depth=3
).
2. Using default hyperparameters without tuning:
- Tree-based models have several hyperparameters that can significantly affect their performance.
- One common mistake is not tuning these hyperparameters and using the default values, which may not be optimal for the given dataset.
- In this example, the
RandomForestRegressor
is first used with default hyperparameters and in the correct approach, we have tuned the hyperparameters to achieve optimal performance.