In a recent Slashdot discussion, the some posters argued that the mantra "Correlation is not Causation" is overused on Slashdot. Having a long standing dislike of correlation studies, I took exception to this in the thread, whereupon another poster rebutted saying that
"Epidemiology and observational science have given up a lot without the need for experimentation...This poster also followed this requesting me to back up my argument with something more solid
A->B, B->A or C->A and C->B, these are the ONLY explanations for statistically significant correlation"
"I'd like to see you try to refute this, particularly if you can think of a way for variables to be correlated without some form of causal relationship as I've described."I don't normally go to such lengths, but considering my own particular disdain for the premise that correlation implies causation, I've decided to respond to this argument with some solid data showing just how absurd this idea can be.
Perhaps it could be said that my example is deliberately spurious. But now we come to the point: How less spurious than this example are most correlation studies?. If I hypothesise that one effect causes another, and a high correlation coefficient is taken as good evidence for my hypothesis, then why shouldn't this data be taken as evidence that the position of Saturn in the night sky between 2003 and 2006 caused the rise of the stock market between those dates? What is the essential difference here that separates my example here from all the hundreds, if not thousands of studies done every year on gods know what? Correlation does not, and will never imply causation. In short, if my r=0.88 is not good enough here, then why should r's, sometimes less than 0.5, be taken as "good evidence" for causation or relationship? The answer is of course; they shouldn't. Correlation proves nothing. It doesn't even suggest anything. Correlation studies are at best, only of use when performed on variables which we already know to be related.
In case anyone is interested (or skeptical) here is the relevant data file. The S&P data was obtained from Yahoo Finance, and the Saturn data was generated by an Ephemeris Generator at this site. The data file was created via some grepping and other bash scripting, and is submitted for your approval.
P.S. This page was created in not a lot of time, so please excuse its unleavened state. If anyone wants to complain about the content or form, or simply to discuss things further, you can email me a obsessivemathsfreak at gmail.com.
I will (eventually) get around to cleaning up this page (some more) at some stage. In particular, I'd like to add some material on Hill's criteria, which the Slashdot poster who I responded to offered as the proper framework in which to judge the quality of correlation studies.
In addition, in a much later discussion on correlation, another poster showed that by using linear detrending this correlation can be removed completely, giving a coefficient of 0.037. There are in fact many forms of detrending which can be used, which would probably invalidate more studies than Hill's criteria.