
Stepwise Regression
Stepwise regression is the step-by-step iterative construction of a regression model that involves the selection of independent variables to be used in a final model. Some use a combination of both methods and therefore there are three approaches to stepwise regression: 1. **Forward selection*begins with no variables in the model, tests each variable as it is added to the model, then keeps those that are deemed most statistically significant — repeating the process until the results are optimal. 2. **Backward elimination*starts with a set of independent variables, deleting one at a time, then testing to see if the removed variable is statistically significant. 3. **Bidirectional elimination*is a combination of the first two methods that test which variables should be included or excluded. Stepwise regression can be achieved either by trying out one independent variable at a time and including it in the regression model if it is statistically significant or by including all potential independent variables in the model and eliminating those that are not statistically significant. Stepwise regression is the step-by-step iterative construction of a regression model that involves the selection of independent variables to be used in a final model. Stepwise regression is a method that iteratively examines the statistical significance of each independent variable in a linear regression model.

What Is Stepwise Regression?
Stepwise regression is the step-by-step iterative construction of a regression model that involves the selection of independent variables to be used in a final model. It involves adding or removing potential explanatory variables in succession and testing for statistical significance after each iteration.
The availability of statistical software packages makes stepwise regression possible, even in models with hundreds of variables.




Types of Stepwise Regression
The underlying goal of stepwise regression is, through a series of tests (e.g. F-tests, t-tests) to find a set of independent variables that significantly influence the dependent variable. This is done with computers through iteration, which is the process of arriving at results or decisions by going through repeated rounds or cycles of analysis. Conducting tests automatically with help from statistical software packages has the advantage of saving time and limiting mistakes.
Stepwise regression can be achieved either by trying out one independent variable at a time and including it in the regression model if it is statistically significant or by including all potential independent variables in the model and eliminating those that are not statistically significant. Some use a combination of both methods and therefore there are three approaches to stepwise regression:
- Forward selection begins with no variables in the model, tests each variable as it is added to the model, then keeps those that are deemed most statistically significant — repeating the process until the results are optimal.
- Backward elimination starts with a set of independent variables, deleting one at a time, then testing to see if the removed variable is statistically significant.
- Bidirectional elimination is a combination of the first two methods that test which variables should be included or excluded.
An example of a stepwise regression using the backward elimination method would be an attempt to understand energy usage at a factory using variables such as equipment run time, equipment age, staff size, temperatures outside, and time of year. The model includes all of the variables — then each is removed, one at a time, to determine which is least statistically significant. In the end, the model might show that time of year and temperatures are most significant, possibly suggesting the peak energy consumption at the factory is when air conditioner usage is at its highest.
Limitations of Stepwise Regression
Regression analysis, both linear and multivariate, is widely used in the economics and investment world today. The idea is often to find patterns that existed in the past that might also recur in the future. A simple linear regression, for example, might look at the price-to-earnings ratios and stock returns over many years to determine if stocks with low P/E ratios (independent variable) offer higher returns (dependent variable). The problem with this approach is that market conditions often change and relationships that have held in the past do not necessarily hold true in the present or future.
Meanwhile, the stepwise regression process has many critics and there are even calls to stop using the method altogether. Statisticians note several drawbacks to the approach, including incorrect results, an inherent bias in the process itself, and the necessity for significant computing power to develop complex regression models through iteration.
Related terms:
Analysis of Variance (ANOVA) & Formula
Analysis of variance (ANOVA) is a statistical analysis tool that separates the total variability found within a data set into two components: random and systematic factors. read more
Autoregressive Integrated Moving Average (ARIMA)
An autoregressive integrated moving average (ARIMA) is a statistical analysis model that leverages time series data to forecast future trends. read more
Econometrics
Econometrics is the application of statistical and mathematical models to economic data for the purpose of testing theories, hypotheses, and future trends. read more
Least Squares Method
The least squares method is a statistical technique to determine the line of best fit for a model, specified by an equation with certain parameters to observed data. read more
Multiple Linear Regression (MLR)
Multiple linear regression (MLR) is a statistical technique that uses several explanatory variables to predict the outcome of a response variable. read more
Price-to-Earnings (P/E) Ratio
The price-to-earnings (P/E) ratio is the ratio for valuing a company that measures its current share price relative to its per-share earnings. read more
Regression
Regression is a statistical measurement that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). read more
Statistical Significance & Uses
Statistical significance is a determination that a relationship between two or more variables is caused by something other than chance. read more
T-Test
A t-test is a type of inferential statistic used to determine if there is a significant difference between the means of two groups, which may be related in certain features. read more
Variance Inflation Factor (VIF)
Variance inflation factor (VIF) is a measure of the amount of multicollinearity in a set of multiple regression variables. read more