Correlation is NOT Causation

Saturn and the Stock Markets

In a recent Slashdot discussion, the some posters argued that the mantra "Correlation is not Causation" is overused on Slashdot. Having a long standing dislike of correlation studies, I took exception to this in the thread, whereupon another poster rebutted saying that

"Epidemiology and observational science have given up a lot without the need for experimentation...
A->B, B->A or C->A and C->B, these are the ONLY explanations for statistically significant correlation"
This poster also followed this requesting me to back up my argument with something more solid
"I'd like to see you try to refute this, particularly if you can think of a way for variables to be correlated without some form of causal relationship as I've described."
I don't normally go to such lengths, but considering my own particular disdain for the premise that correlation implies causation, I've decided to respond to this argument with some solid data showing just how absurd this idea can be.

Saturn Affects the Stock Markets: r=0.88

In short, the Right Ascension of the planet Saturn in the night sky is correlated with the S & P 500 stock market index. The two were correlated to one another with a correlation coefficient of r=0.88, which is apparently pretty high. Therefore, to those who accept the validity of correlation studies, I say that you must accept that there is strong evidence that the planet Saturn caused the S&P to rise in the years 2003-2006. Here's a graph of the relationship between the two between Jan 1, 2003 and Jan 1, 2006

A plot of the Right Ascension of Saturn in the night sky and the opening price of the S&P 500 between 2003 and 2006

Perhaps it could be said that my example is deliberately spurious. But now we come to the point: How less spurious than this example are most correlation studies?. If I hypothesise that one effect causes another, and a high correlation coefficient is taken as good evidence for my hypothesis, then why shouldn't this data be taken as evidence that the position of Saturn in the night sky between 2003 and 2006 caused the rise of the stock market between those dates? What is the essential difference here that separates my example here from all the hundreds, if not thousands of studies done every year on gods know what? Correlation does not, and will never imply causation. In short, if my r=0.88 is not good enough here, then why should r's, sometimes less than 0.5, be taken as "good evidence" for causation or relationship? The answer is of course; they shouldn't. Correlation proves nothing. It doesn't even suggest anything. Correlation studies are at best, only of use when performed on variables which we already know to be related.

In case anyone is interested (or skeptical) here is the relevant data file. The S&P data was obtained from Yahoo Finance, and the Saturn data was generated by an Ephemeris Generator at this site. The data file was created via some grepping and other bash scripting, and is submitted for your approval.

Science is the Experiment

I'll repeat what was said in my Slashdot post.We need to listen to Zombie Feynman's wisdom. "Ideas are tested by experiment. Everything else is bookkeeping". Correlation at the very best, bookkeeping. Most of the time, it's not even that. Every crackpot, every junk study, every racist and sexist on the planet will use dubious statistics to "prove" their point. This is not science. At least, not on its own. I find myself linking over and over to Feynman's Cargo Cult Science speech. He spotted this type of nonsense masquerading as real science almost 40 years ago, but no one listened. Now we are being overloaded with the cargo cult scientists, who put on a good show of playing a scientific game, but produce nothing or worth. Worse, they in fact to great harm to science as a whole by undermining its legitimacy. At a future date, I may expound on this page further, but for now, I'll leave it at that.

P.S. This page was created in not a lot of time, so please excuse its unleavened state. If anyone wants to complain about the content or form, or simply to discuss things further, you can email me a obsessivemathsfreak at gmail.com.

UPDATE:

I will (eventually) get around to cleaning up this page (some more) at some stage. In particular, I'd like to add some material on Hill's criteria, which the Slashdot poster who I responded to offered as the proper framework in which to judge the quality of correlation studies.

In addition, in a much later discussion on correlation, another poster showed that by using linear detrending this correlation can be removed completely, giving a coefficient of 0.037. There are in fact many forms of detrending which can be used, which would probably invalidate more studies than Hill's criteria.


As an afterthought, I should add that all the content on this page (especially the diagram) is (and was always intended to be) copylefted, which means you can use it wherever you like with no restrictions whatsoever provided you extend the same courtesy to others. This doesn't affect your existing fair use rights, which includes telling me to get stuffed and using excepts where and how you please.