Spurious Correlation

Spurious Correlation

Table of Contents What Is Spurious Correlation? Understanding Spurious Correlation Spotting Spuriousness Spurious Correlation Examples How to Spot Spurious Correlation? What Is an Example of Correlation but not Causation? What Is Spurious Regression? What Is False Causality? Spurious regression is a statistical model that shows misleading statistical evidence of a linear relationship; in other words, a spurious correlation between independent non-stationary variables. False causality refers to the assumption made that one thing causes something else because of a relationship between them. Table of Contents What Is Spurious Correlation? Understanding Spurious Correlation Spotting Spuriousness Spurious Correlation Examples How to Spot Spurious Correlation? What Is an Example of Correlation but not Causation? What Is Spurious Regression? What Is False Causality? When two random variables track each other closely on a graph, it is easy to suspect correlation where a change in one variable causes a change in the other variable. In late January, there is often chatter about the so-called Super Bowl indicator, which suggests that a win by the American Football Conference team likely means that the stock market will go down in the coming year, whereas a victory by the National Football Conference team portends a rise in the market.

Spurious correlation, or spuriousness, occurs when two factors appear casually related to one another but are not.

What Is Spurious Correlation?

In statistics, a spurious correlation (or spuriousness) refers to a connection between two variables that appears to be causal but is not. With spurious correlation, any observed dependencies between variables are merely due to chance or are both related to some unseen confounder.

Spurious correlation, or spuriousness, occurs when two factors appear casually related to one another but are not.
The appearance of a causal relationship is often due to similar movement on a chart that turns out to be coincidental or caused by a third "confounding" factor.
Spurious correlation can be caused by small sample sizes or arbitrary endpoints.
Statisticians and scientists use careful statistical analysis to determine spurious relationships.
Confirming a causal relationship requires a study that controls for all possible variables.

Understanding Spurious Correlation

Spurious relationships will initially appear to show that one variable directly affects another, but that is not the case. This misleading correlation is often caused by a third factor that is not apparent at the time of examination, sometimes called a confounding factor.

When two random variables track each other closely on a graph, it is easy to suspect correlation where a change in one variable causes a change in the other variable. Setting aside causation, which is another topic, this observation can lead the reader of the chart to believe that the movement of variable A is linked to the movement in variable B or vice versa.

However, closer statistical examination may show that the aligned movements are coincidental or caused by a third factor that affects the two variables. This is a spurious correlation. Research conducted with small sample sizes or arbitrary endpoints is particularly susceptible to spuriousness.

Spotting Spuriousness

The most obvious way to spot a spurious relationship in research findings is to use common sense. Just because two things occur and appear to be linked does not mean that there are no other factors at work. However, to know for sure, research methods are critically examined.

In studies, all variables that might impact the findings should be included in the statistical model to control their impact on the dependent variable.

Spurious Correlation

Many spurious relationships can be identified by using common sense. If a correlation is found, there is usually more than one variable at play, and the variables are often not immediately obvious.

Spurious Correlation Examples

Interesting correlations are easy to find, but many will turn out to be spurious. Three examples are the skirt length theory, the super bowl indicator, and a suggested correlation between race and college completion rates.

  1. Skirt Length Theory — Originating in the 1920s, the skirt length theory holds that skirt lengths and stock market direction are correlated. If skirt lengths are long, the correlation is that the stock market is bearish. If shirt lengths are short, the market is bullish.
  2. Super Bowl Indicator — In late January, there is often chatter about the so-called Super Bowl indicator, which suggests that a win by the American Football Conference team likely means that the stock market will go down in the coming year, whereas a victory by the National Football Conference team portends a rise in the market. Since the beginning of the Super Bowl era, the indicator has been accurate around 74% of the time, or 40 out of the 54 years, according to OpenMarkets. It is a fun conversation piece but probably not something a serious financial advisor would recommend as an investment strategy for clients.
  3. Educational Attainment and Race — Social scientists have focused on identifying which variables impact educational attainment. According to EducationData.org, in 2019, white 25- to 29-year-olds were 55% more likely than their black counterparts to have completed college. The implication being that race has a causal effect on college completion rates. However, it may not be race itself that impacts educational attainment. The results may also be due to the effects of racism in society, which could be the third "hidden" variable. Racism impacts people of color, placing them at a disadvantage educationally and economically. For example, the schools in non-white communities face greater challenges and receive less funding, parents in non-white populations have lower-paying jobs and fewer resources to devote to their children's education, and many families live in food deserts and suffer from malnutrition. Racism, rather than race, might be viewed as is a causal variable that impacts educational attainment.

How to Spot Spurious Correlation?

Statisticians and other scientists who analyze data must be on the lookout for spurious relationships all the time. There are numerous methods that they use to identify them including:

What Is an Example of Correlation but not Causation?

An example of a correlation is that more sleep leads to better performance during the day. Although there is a correlation, there is not necessarily causation. More sleep may not be the reason an individual performs better; for example, they might be using a new software tool that is increasing their productivity. To find causation, there must be factual evidence from a study that shows a causal relationship between sleep and performance.

What Is Spurious Regression?

Spurious regression is a statistical model that shows misleading statistical evidence of a linear relationship; in other words, a spurious correlation between independent non-stationary variables.

What Is False Causality?

False causality refers to the assumption made that one thing causes something else because of a relationship between them. For example, we may assume that Harry has been training hard to become a faster runner because his race times have improved. However, the reality might be that Harry's race times have improved because he has new running shoes made with the latest technology. The initial assumption was a false causality.

Related terms:

Anomaly

Anomaly is when the actual result under a given set of assumptions is different from the expected result. read more

Bear Market : Phases & Examples

A bear market occurs when prices in the market fall by 20% or more. read more

Bull Market : Characteristics & Examples

A bull market is a financial market in which prices are rising or are expected to rise. read more

Correlation

Correlation is a statistical measure of how two securities move in relation to each other.  read more

Durbin Watson Statistic

The Durbin Watson statistic is a number that tests for autocorrelation in the residuals from a statistical regression analysis. read more

Econometrics

Econometrics is the application of statistical and mathematical models to economic data for the purpose of testing theories, hypotheses, and future trends.  read more

Nonparametric Method

Nonparametric method refers to a type of statistic that does not require that the data being analyzed meet certain assumptions or parameters.  read more

Null Hypothesis : Testing & Examples

A null hypothesis is a type of hypothesis used in statistics that proposes that no statistical significance exists in a set of given observations. read more

P-Value

P-value is the level of marginal significance within a statistical hypothesis test, representing the probability of the occurrence of a given event. read more

Positive Correlation

Positive correlation is a relationship between two variables in which both variables move in tandem.  read more