Variance Inflation Factor (VIF)

Variance Inflation Factor (VIF)

Variance inflation factor (VIF) is a measure of the amount of multicollinearity in a set of multiple regression variables. A large variance inflation factor (VIF) on an independent variable indicates a highly collinear relationship to the other variables that should be considered or adjusted for in the structure of the model and selection of independent variables. Mathematically, the VIF for a regression model variable is equal to the ratio of the overall model variance to the variance of a model that includes only that single independent variable. A variance inflation factor (VIF) provides a measure of multicollinearity among the independent variables in a multiple regression model. In statistical terms, a multiple regression model where there is high multicollinearity will make it more difficult to estimate the relationship between each of the independent variables and the dependent variable.

A variance inflation factor (VIF) provides a measure of multicollinearity among the independent variables in a multiple regression model.

What Is a Variance Inflation Factor (VIF)?

Variance inflation factor (VIF) is a measure of the amount of multicollinearity in a set of multiple regression variables. Mathematically, the VIF for a regression model variable is equal to the ratio of the overall model variance to the variance of a model that includes only that single independent variable. This ratio is calculated for each independent variable. A high VIF indicates that the associated independent variable is highly collinear with the other variables in the model.

A variance inflation factor (VIF) provides a measure of multicollinearity among the independent variables in a multiple regression model.
Detecting multicollinearity is important because while multicollinearity does not reduce the explanatory power of the model, it does reduce the statistical significance of the independent variables.
A large variance inflation factor (VIF) on an independent variable indicates a highly collinear relationship to the other variables that should be considered or adjusted for in the structure of the model and selection of independent variables.

Understanding a Variance Inflation Factor (VIF)

A variance inflation factor is a tool to help identify the degree of multicollinearity. A multiple regression is used when a person wants to test the effect of multiple variables on a particular outcome. The dependent variable is the outcome that is being acted upon by the independent variables — the inputs into the model. Multicollinearity exists when there is a linear relationship, or correlation, between one or more of the independent variables or inputs.

Multicollinearity creates a problem in the multiple regression because the inputs are all influencing each other. Therefore, they are not actually independent, and it is difficult to test how much the combination of the independent variables affects the dependent variable, or outcome, within the regression model.

In statistical terms, a multiple regression model where there is high multicollinearity will make it more difficult to estimate the relationship between each of the independent variables and the dependent variable. Small changes in the data used or in the structure of the model equation can produce large and erratic changes in the estimated coefficients on the independent variables.

To ensure the model is properly specified and functioning correctly, there are tests that can be run for multicollinearity. Variance inflation factor is one such measuring tool. Using variance inflation factors helps to identify the severity of any multicollinearity issues so that the model can be adjusted. Variance inflation factor measures how much the behavior (variance) of an independent variable is influenced, or inflated, by its interaction/correlation with the other independent variables.

Variance inflation factors allow a quick measure of how much a variable is contributing to the standard error in the regression. When significant multicollinearity issues exist, the variance inflation factor will be very large for the variables involved. After these variables are identified, several approaches can be used to eliminate or combine collinear variables, resolving the multicollinearity issue.

Multicollinearity

While multicollinearity does not reduce a model's overall predictive power, it can produce estimates of the regression coefficients that are not statistically significant. In a sense, it can be thought of as a kind of double-counting in the model.

When two or more independent variables are closely related or measure almost the same thing, then the underlying effect that they measure is being accounted for twice (or more) across the variables. It becomes difficult or impossible to say which variable is really influencing the independent variable. This is a problem because the goal of many econometric models is to test exactly this sort of statistical relationship between the independent variables and the dependent variable.

For example, suppose that an economist wants to test whether there is a statistically significant relationship between the unemployment rate (independent variable) and the inflation rate (dependent variable). Including additional independent variables that are related to the unemployment rate, such a new initial jobless claims, would be likely to introduce multicollinearity into the model.

The overall model might show strong, statistically sufficient explanatory power, but be unable to identify if the effect is mostly due to the unemployment rate or to the new initial jobless claims. This is what the VIF would detect, and it would suggest possibly dropping one of the variables out of the model or finding some way to consolidate them to capture their joint effect depending on what specific hypothesis the researcher is interested in testing.

Related terms:

Econometrics

Econometrics is the application of statistical and mathematical models to economic data for the purpose of testing theories, hypotheses, and future trends.  read more

Economics : Overview, Types, & Indicators

Economics is a branch of social science focused on the production, distribution, and consumption of goods and services. read more

Error Term

An error term is a variable in a statistical model when the model doesn't represent the actual relationship between the independent and dependent variables. read more

Inflation

Inflation is a decrease in the purchasing power of money, reflected in a general increase in the prices of goods and services in an economy. read more

Jobless Claims

Jobless claims are a statistic reported weekly by the U.S. Department of Labor that counts people filing to receive unemployment insurance benefits. read more

Multiple Linear Regression (MLR)

Multiple linear regression (MLR) is a statistical technique that uses several explanatory variables to predict the outcome of a response variable. read more

Multicollinearity

Multicollinearity appears when there is strong correspondence among two or more independent variables in a multiple regression model. read more

Regression

Regression is a statistical measurement that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). read more

Residual Sum of Squares (RSS)

The residual sum of squares (RSS) is a statistical technique used to measure the variance in a data set that is not explained by the regression model. read more

Standard Error

The standard error is the standard deviation of a sample population. It measures the accuracy with which a sample represents a population. read more