When working with tree-based models in scikit-learn, it can be beneficial to use OrdinalEncoder
instead of OneHotEncoder
for encoding categorical features.
The OrdinalEncoder
is a transformer that encodes categorical features as ordinal integers. This can be useful when working with tree-based models because it preserves the natural ordering of the categories.
In contrast, OneHotEncoder
creates a binary feature for each category, resulting in a larger feature space. This can be problematic for tree-based models, which can overfit to high-dimensional feature spaces.
Here’s an example of using OrdinalEncoder
instead of OneHotEncoder
:
Note that evaluating the tree on the training data may not be a good indicator of the model’s performance on new, unseen data. It’s important to evaluate the model on a separate testing set to get a more accurate estimate of its performance.