**Correlation and linear regression are not the same**

**What is the goal?**

Correlation quantifies the degree to which two variables are related. Correlation does not fit a line through the data points. You simply are computing a correlation coefficient (r) that tells you how much one variable tends to change when the other one does. When r is 0.0, there is no relationship. When r is positive, there is a trend that one variable goes up as the other one goes up. When r is negative, there is a trend that one variable goes up as the other one goes down.

Linear regression finds the best line that predicts Y from X.

**What kind of data?**

Correlation is almost always used when you measure both variables. It rarely is appropriate when one variable is something you experimentally manipulate.

Linear regression is usually used when X is a variable you manipulate (time, concentration, etc.)

**Does it matter which variable is X and which is Y?**

With correlation, you don’t have to think about cause and effect. It doesn’t matter which of the two variables you call “X” and which you call “Y”. You’ll get the same correlation coefficient if you swap the two.

The decision of which variable you call “X” and which you call “Y” matters in regression, as you’ll get a different best-fit line if you swap the two. The line that best predicts Y from X is not the same as the line that predicts X from Y (however both those lines have the same value for R^{2})

**The best way to appreciate this difference is by example. **

Take for instance samples of the leg length and skull size from a population of elephants. It would be reasonable to suggest that these two variables are associated in some way, as elephants with short legs tend to have small heads and elephants with long legs tend to have big heads. We may, therefore, formally demonstrate an association exists by performing a correlation analysis. However, would regression be an appropriate tool to describe a **relationship** between head size and leg length? Does an increase in skull size **cause** an increase in leg length? Does a decrease in leg length cause the skull to shrink? As you can see, it is meaningless to apply a causal regression analysis to these variables as they are interdependent and one is not wholly dependent on the other, but more likely some other factor that affects them both (eg. food supply, genetic makeup).

Consider two variables: crop yield and temperature. These are measured independently, one by the weather station thermometer and the other by Farmer Giles’ scales. While correlation analysis would show a high degree of association between these two variables, regression analysis would be able to demonstrate the dependence of crop yield on temperature. However, careless use of regression analysis could also demonstrate that temperature is dependent on crop yield: this would suggest that if you grow really big crops you’ll be guaranteed a hot summer!

**Extra info for math freaks!!**

**Assumptions**

The correlation coefficient itself is simply a way to describe how two variables vary together, so it can be computed and interpreted for any two variables. Further inferences, however, require an additional assumption — that both X and Y are measured, and both are sampled from Gaussian distributions. This is called a bivariate Gaussian distribution. If those assumptions are true, then you can interpret the confidence interval of r and the P value testing the null hypothesis that there really is no correlation between the two variables (and any correlation you observed is a consequence of random sampling).

With linear regression, the X values can be measured or can be a variable controlled by the experimenter. The X values are not assumed to be sampled from a Gaussian distribution. The vertical distances of the points from the best-fit line (the residuals) are assumed to follow a Gaussian distribution, with the SD of the scatter not related to the X or Y values.

**Relationship between results**

Correlation computes the value of the Pearson correlation coefficient, r. Its value ranges from -1 to +1.

Linear regression quantifies goodness of fit with r2, sometimes shown in uppercase as R2. If you put the same data into correlation (which is rarely appropriate; see above), the square of r from correlation will equal r2 from regression.

Sources:

http://www.graphpad.com/support/faqid/1141/

http://www.le.ac.uk/bl/gat/virtualfc/Stats/regression/regrcorr.html

## Leave a Reply