In machine learning, an overfitted model fits training set very well but cannot generalize to new instances. I evaluated my model using cross-validation and my accuracy drops when setting the maximum number of splits of my decision tree beyond a certain number. Can this be associated with overfitting?
Would appreciate any help!
What is the purpose of the Validation Set?
The significance of the validation set is that it is an unknown set of data that is used to assess the model, as opposed to using the same data.
This prevents the dataset from learning unnecessary patters. For example, If a model is asked to learn to identify cars. The training set coincidentally has more most pictures of cars in a road.
If this model is asked to identify a car in a showroom, it will say it is not a car, because a coincidental pattern is learnt that a car is on the road.
To prevent this we validate the model on a validation set so that chance of learning coincidental patterns is minimized.
For me to answer you specific question regarding the decision tree, I will need more specific details.