Step-wise regression using p values to drop variables with non-significant p values

I want to perform a stepwise linear Regression using p-values as a selection criterion, e.g.: at each step dropping variables that have the highest i.e. the most insignificant p-values, stopping when all values are significant defined by some threshold alpha .

Can somebody guide on how to do this in R?

From what I understand of stepwise regression, you are talking about the backward elimination version.

  1. Start with all variables as explanatory
    For all variables
  2. Fit a model on the explanatory variables
  3. Remove one variable at a time and refit the model
  4. Delete the feature which results in the least performance degradation by its removal
  5. Repeat from 2 until you are satisfied

The implementation should be very simple using a for loop and the built in t-test (since you wanted a p-value as the selection criteria).

However, I would discourage you from using this feature selection approach. There are some numerical issues (which are explored more thoroughly in this article) but the most important reason not to use stepwise regression is that it is not a replacement for domain expertise and logical reasons for feature selection.

Even if you have a lot of features you can do better, semi-automated methods like dimensionality reduction, exploring the correlations (pearson’s r or chi-squared) or even using a tree based model for feature pruning.