Choosing a classification algorithm in Supervised Machine learning domain is done with Bias Variance tradeoff.
- Bias is defined as error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting).
- Variance is defined as an error from sensitivity to small fluctuations in the training set. High variance may result from an algorithm modeling the random noise in the training data (overfitting).
So, the expected behavior of some common classification algorithms provided similar conditions are like:
|Naive Bayes||High Bias||Low Variance|
|Logistic Regression||Low Bias||High Variance|
|Decision Tree||Low Bias||High Variance|
|Bagging||Low Bias||High Variance, lesser than Decision tree|
|Random Forest||Low Bias||High Variance, lesser than Decision tree and Bagging|