A common way to reduce overfitting is to regularize the model. We have 3 common techniques of regularization in machine learning, namely, Ridge Regression, Lasso Regression, and Elastic Net
Also known as Tikhonov regularization, this is a regularized/constrained version of Linear regression; in this case, a regularization term is added to the cost function. This forces the algorithm to not only fit the data but also keep the model’s weight as close to 0 as possible but not exactly at 0 because this would result inBelow is a code snippet of how to implement ridge regression using Stochastic Gradient Descent: a flat line going through the data’s mean.
sgd_reg = SGDRegressor(penalty="l2") sgd_reg.fit(X, y.ravel()) sgd_reg.predict([[1.5]])
The penalty hyperparameter sets the regularization term to use. In this case, we specify “l2”, which is simply Ridge Regression.
The least Absolute Shrinkage and Selection Operator Regression (or Lasso Regression for short) is the second regularized version of Linear Regression on our list. Just like the Ridge Regression, it adds a regularization term to the cost function, but it uses the l1 norm of the weight vector.
An important characteristic of Lasso Regression is that it tends to eliminate the weights of the least important features (i.e., set them to zero).
Below is a code snippet to implement Lasso Regression:
from sklearn.linear_model import Lasso lasso_reg = Lasso(alpha=0.1) lasso_reg.fit(X, y) lasso_reg.predict([[1.5]])
Elastic Net is a middle ground between the Ridge and Lasso regressions. The term is a simple mix of both regression techniques, and you can control the mix ratio, r. When r = 0, Elastic Net is equivalent to Ridge regression and when r = 1, it is equivalent to Lasso Regression.
Below is a code snippet to implement Elastic Net (l1_ratio corresponds to the mix ratio, r):
from sklearn.linear_model import ElasticNet elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.5) elastic_net.fit(X, y) elastic_net.predict([[1.5]])
It is almost always preferable to have at least a little bit of regularization. So generally, you should avoid plain Linear Regression. Ridge is a good default, but if you suspect only a few features in your dataset are useful, you should prefer Lasso or Elastic Net because they tend to reduce the useless features’ weights down to zero.
In general, Elastic Net is preferred over Lasso because Lasso may behave erratically when the number of features is greater than the number of training instances or when several features are strongly correlated.