
As Atlantic Business contributor Derek Lowe, from whom I stole that graph, notes,
I've seen a lot shakier plots used to justify some sweeping conclusions, and if those were justified, well, then I'm forced to conclude that Mexican lemons have improved highway safety a great deal. The vitamin C, maybe? The fragrance? Bioflavanoids?
This is particularly tricky when you bring time into it, because things trend--as we get richer, we buy safer cars, get better emergency rooms, etc. We also import more lemons to make our chi-chi cocktails and lemon meringue pies. Overlay the two, and you've got a hell of a causal relationship.
But I expect that four years from now, we'll still be having the same conversations with proponents of "cancer clusters" and Democrats convinced that they can scientifically prove that Democrats are better for GDP by doing ham-fisted regressions of Democratic presidencies with a few tightly correlated economic variables. What's the mechanism? What makes electric power lines cause cancer, but not the earth's vastly more powerful magnetic field? What policies did Harry Truman and Bill Clinton have in common (but not with Richard Nixon) that caused this marvelous confluence? Well, maybe we don't know the mechanism exactly, but never you mind: just look at that bee-yoo-ti-ful correlation!






For some reason this popped to mind:
http://stopdesign.com/archive/2009/03/20/goodbye-google.html
this is only exceeded in doubleplus-ungood by plotting two datasets/lines on a graph, and varying the two y-axis' scales so as to lead weak minds to the conclusion that there IS relatedness/causality
"Correlation does not equal causation!"
But it damn sure implies it.
Now in the above example there may be no causation. Or no direct causation.
Sure, it may be the case that everyone who gets cancer has consumed DiHydrogen Monoxide, but that doesn't mean that DHMO is carcinogenic.
OTOH, there is a strong correlation between consuming a few mg of PuO and getting cancer. And a causal relationship.
there is a strong correlation between consuming a few mg of PuO and getting cancer. And a causal relationship.
It depends on how you consume it. PuO is not extremely harmful if you eat it, but is a strong carcinogen if you breathe it in. Edward Teller would offer antinuclear activists to eat a teaspoon full of PuO if they would eat a teaspoon of cyanide.
When Ralph Nader described plutonium as "the most toxic substance known to mankind", Bernard Cohen offered to consume on camera as much plutonium oxide as Nader could consume of caffeine, the stimulant found in coffee and other beverages, which in its pure form has an oral (LD50) of 192 milligrams per kilogram in rats.
I really wanted to see the graphs showing the correlation with:
- teenage twitter sex and congressional ethics investigations;
- corporate lobbyist employment rates and cupcake consumption;
- blowfish deaths in japan and japanese beetle populations in my back yard
Can you put those up for me, too?
And fyi: one of the best natural controls for Japanese beetles is brother skunk, who eats the grubs. Nice choice there.
When Mexico gives you lemons, you make lemonade.
While your graph is impressive, I like this one even more. That image is embedded in the Open Letter to Kansas School Board introducing He of the Noodly Appendage.
"Global Average Temperature" vs "Number of Pirates" as a data set appears provocatively strongly correlated.
But where is the causality?...
Given the apparent correlation, it's my working hypothesis that it could reflect the growing importance of talking like a Pirate.
Conjecture: all the excitement of talking like a pirate generates so much HOT AIR, not to mention sweat walking the plank, it in turn proportionally warms the globe.
To put this to the test, we could submit a funding proposal to NSF to collect data on September 19th (Dave Eggers' National Talk Like a Pirate Day) to see if a spike manifests, not unlike physicists did by hiking off to Africa to observe solar eclipses to test Einstein's theories of relativity. Don't just take the appearance of correlation and causality at face value. Avast, Maties! Go and Apply the hard light of day!
Its 230am, but I'm still pretty sure they were testing the Photoelectric Effect. I may very-well be wrong, but some random memory from childhood (when physics was fun) indicates I am not. The rest of your comment though, nice.
Just think of the uproar if the slope were positive.
There would only be an uproar if we were letting Mexican truck drivers deliver produce past the border. As most readers of this blog will know, we're not.
In any case, I suspect airbags are to thank for this nice little graph.
I would be remiss in not posting this:
http://www.xkcd.com/552/
If you like unequivocal causation, you shouldn't have picked economics as the theme for your blog. Economics is a bunch of correlations plus vague non-mechanistic models. Such is life.
So what made you post the most repeated lesson of all time in statistics and scientifically oriented courses? What was that all about? Where does the outrage about Democrats and economic variables come from?
Of course the Republican shills here have this completely backwards.
For decades, the conventional wisdom was that Democrats didn't understand economics (or didn't care) and were thus a disaster for us, economically.
The charts of "presidents versus GDP", while not remotely sufficient to prove "Dems = good economy", is sufficient to disprove the Republican claim that "Dems = bad economy".
Oh, my goodness, how many ways is this typical of liberals? Narcissistic, self-congratulatory nick denoting intelligence ("voice of reason",) gratuitous and inaccurate insult ("shills",) premise that's simply imaginary ("conventional wisdom that Democrats didn't understand economics",) ... and then, the coup de grace, a claim that's simply and completely wrong. The charts of Presidents vs. GDP are not remotely sufficient to prove anything but the cluelessness of the individual creating the chart; the data points are too gross, and too thoroughly tainted by literally thousands of intervening variables, to establish anything worthwhile.
Gosh, what a shock - a complete lack of reading comprehension.
The chart of presidents vs. GDP is not sufficient to prove anything. It is, however, sufficient to DISPROVE the claim often made by Republicans that Democratic administrations are disastrous for the economy. That's all.
"It is, however, sufficient to DISPROVE the claim often made by Republicans that Democratic administrations are disastrous for the economy."
Only if one has never heard of the terms lags or variable omission bias.
It's always funny to see posts like these that come completely out of left field. You know Megan was arguing with someone about something and then decided to finish the conversation here. Meanwhile, most of us have no idea where it's coming from.
Also, what's the deal with arguing against cancer clusters? They are recognized by the CDC and while many initial reports are probably erroneous, it's not like they don't exist.
What makes electric power lines cause cancer, but not the earth's vastly more powerful magnetic field?
If the electromagnetic field isn't dangerous then why does lightning start forest fires?
I mean, we can all ask stupid questions if we engage in reductio ad absurdum.
I'm betting she just saw the graph and wanted to run with it.
If the electromagnetic field isn't dangerous then why does lightning start forest fires?
Thor just hates trees.
Because they weren't delivered via Mexican trucks.
I think that that if you wanted to prove something about political leadership and the economy, you would be closer to the truth using leadership of congress. Still, correlation between party and policies can be weak enough you'd still have a lot of problems.
Democrats convinced that they can scientifically prove that Democrats are better for GDP by doing ham-fisted regressions of Democratic presidencies with a few tightly correlated economic variables.
I suspect that the number of "analysts" offering such correlations will subside now that a Democratic President happens to be presiding over an economic collapse. All of a sudden, they'll remember that a President doesn't have a single knob that he can turn, "Economy Up or Down," and that what happens now is heavily affected by developments that occurred 5 or 10 years ago.
Right. Correlation does not imply causality, but certainly causality implies correlation. The contrapositive is thus also trivially true: that lack of causality implies lack of causation. Though actually, that's not the way I heard the claim, which was that Republicans were better on the economy than Democrats.
It Just Ain't So.:
This was part of a long series of posts by Cactus, so you can look up parts 2 through 5 there, as well as other(lots of other) ancilliary information on the topic.
Indeed. The problem is, not only are the various statistical measures bad for Republicans, no one seems to be able to tweak them so that somehow this is really all the Democrats fault . . . though many over at Angry Bear have tried. I could see this as something Megan might take issue with and also that she could quite easily 'continue the argument here'.
That should be that if causality => correlation, then lack of correlation => lack of causality. Sorry.
"Right. Correlation does not imply causality, but certainly causality implies correlation."
Statistics fail.
Causality + a large number of independent random samples implies correlation (with high probability).
There is one implication I'm more confident in: drawing conclusions (positive or negative) from 6 non-random non-independent samples implies stupidity.
We also import more lemons to make our chi-chi cocktails
Which should have lead to more traffic crashes.
Of course, that implies you can find your car in those circumstances.
Don't forget the correlation between sunspots and Republican Senators...
http://www.realclimate.org/index.php/archives/2007/05/fun-with-correlations/
I love that XKCD comic.
Correlation does not imply causality in a logical sense but it does indicate that causality is a possibility. I would argue that a strong correlation in a controlled data set does suggest some sort of link, though. Likewise, a lack of correlation in a controlled data set implies a lack of a link (and, therefore, causality).
I don't believe power lines cause cancer.
That said, it makes no sense whatsoever to raise doubts based on "the earth's vastly more powerful magnetic field."
The earth's magnetic field is constant, whereas power lines create 60 Hz (or 50 Hz) alternating magnetic fields. Alternating magnetic fields induce voltages, constant fields do not.
Dare I bring AGW into this?
Matt Yglesias responds to this post here.
He has some good points, but I'm not entirely convinced. Here's his conclusion:
The correlation
Yglesias' reasoning
It's worth reading the whole post. I think there are some potential problems with this, that I may post later.
The real issue is that if you don't look at policies themselves you really aren't doing much better than Megan's lemon graph. Yglesias mentions that policies do vary significantly,(Nixon courting labor and Clinton being a pro business Democrat), but then he doesn't really address the issue. How do you control for the fact that Clinton presided over more deregulation than Bush, JFK decreased top marginal tax rates, and George W. Bush spent like a drunken sailor on shore leave. In fact, the only policies that I would be considered strongly Democratic that Clinton put into effect would be his slight increase in top marginal tax rates. Policies matter-what's letter is behind the guys name implementing them doesn't. If you want to prove that your policies are better, try to break it down by policies. Otherwise, IMHO your just a partisan wasting your time trying and only convincing your fellow partisans of the rightness of their thinking. The people who you actually want to persuade won't listen.
Blink. You've just received a failing grade in my class. Also quite probably in every class where statistics is taught.
Probably just an attention-getting device to get me to respond to him, but on the outside chance . . .
I'm going to guess you taught a class with a title like "Statistics for the Social Sciences".
Fun fact: the "N" and "i.i.d." floating around in nearly every damn theorem in statistics are important. You can't just ignore them.
For small N and non-i.i.d. samples, there is absolutely no reason to believe that population causality implies sample correlation.
Another fun fact: even on the level of populations your statement is false (just in case you want to backtrack and claim you aren't talking about sample correlations). Take the causal relationship y=x^2 with x distributed according to any even probability distribution.
Yep. Just trying to get attention, as I suspected. Note the switch from 'correlation' to 'sample correlation', as well as the cross-definitions.
Er, no. That's as idiotic as saying that the probability of rolling a '3' on a six-sided die is zero since you rolled it once and it came up '4'. That Venus doesn't influence tides because you took six measurements that were accurate to within a half-inch. Iow, just because you don't see 'em doesn't mean they don't exist. If there is causation, a sufficiently powerful test will pick up the correlations over a series of runs. If there is no causation, there is no test, no matter how powerful, that will register correlations above chance over a series of runs. This isn't complicated, or obscure.
Next time, pick up a book.
You've exhausted my patience. Again.
Sov:"Note the switch from 'correlation' to 'sample correlation', as well as the cross-definitions."
Try to pay attention. The original conversation was about a sample correlation. Specifically, this claim was made:
"The charts of "presidents versus GDP", while not remotely sufficient to prove "Dems = good economy", is sufficient to disprove the Republican claim that "Dems = bad economy".
In response, you said this: "Right. Correlation does not imply causality, but certainly causality implies correlation."
The previous poster was talking about a (very small) sample with negative correlation proving the absence of positive causality. You agreed with his incorrect statement.
Maybe your second sentence is a completely unrelated tangent. If so, I'd suggest using paragraphs to separate different ideas.
"If there is causation, a sufficiently powerful test will pick up the correlations over a series of runs. "
False. Causation does not imply correlation. I'll repeat the counterexample: y = x^2, x distributed according to an even probability distribution (e.g. a gaussian of mean 0).
Your original statement about sample correlations is false. Your current statement about population correlations is false.
A guess; rather than just doing the integral and realizing you are wrong, you are instead going to respond with some condescending remark. Or perhaps you'll try to change the subject?
A pure mathematician's (very) rough take: I feel like it should be possible to do an even better job of producing some sort of quantitative measurement of how meaningful such a correlation is. This may just be a demonstration of my ignorance of statistics, but here's what I mean:
First of all, as an aside, the R^2 value used above may not reflect how small the number of data points being compared is, which is an obvious weakness of the above correlation. However, I'm sure there are different versions of the R^2 statistic to take this into account (and the above might be using such a thing for all I know).
My main point, however, is that it also doesn't take into account the context of the measurement. What you should try to estimate is how many theoretically independent indicators you could have chosen from, and then how likely it is to pick two that correlate by chance. So when you see this chart, comparing highway fatality rates and lemon exports, the question is how many comparable statistics are out there that you were choosing from when you cherry-picked those two. Then you ought to be able to calculate, based on that estimate, as well as the specifics of the measurements, how "meaningful" the comparison is. With enough indicators, it's almost certain that some will correlate by chance, but moreover you ought to be able to put a number on how unlikely such a thing is in context.
Put differently, say you take all measurable data about economics, health, etc. over a fixed time period and plot it against time in a hugely gigantic dimensional space. Some of it is obviously correlated, e.g. average vehicle mpg and average vehicle gpm, so it will tend to cluster along a subset of lower dimension to reflect this correlation. The mpg vs. gpm correlation is just because they're reciprocals, but things which are correlated for less obvious reasons will also lead to data clustering along a set of smaller dimension. So if you have a rough measurement of what exactly that dimension is, that's how many "independent" indicators are possible. Thus it becomes possible to talk about how likely you are to pick, by chance, two indicators which are actually unrelated, which correlate like the above.
I don't really know how one would do this (or if anyone has tried), but it would be interesting to try to come up with a measurement of the dimensionality of, say, all of economics. It might be surprisingly small. Obviously some subjective choices would have to be made in the process, but to even get an answer to within an order of magnitude would be cool.
I commented (way too late) on Dan Drezner's blog a while back about the same idea in regard to politics. He was posting about some sort of data purporting to show that GDP growth predicts the Presidential election pretty well. One could apply the same kind of analysis to such a situation. I guess it would be more useful for establishing that some correlation actually is meaningful than the opposite.
Seriously Megan, I feel you are walking around in your bathrobe and yelling at clouds here.
It sounds like what you are talking about is none other than factor analysis:
It's a good tool. But in the wrong hands . . . it's kind of like the idea that if all you have is a hammer, all problems will be found to fixable with just a hammer. In fact, this is what happened with the infamous 'G factor', the mysterious quantity that supposedly tied together all the subcategories on what is now the modern IQ test. I had a really good article on the subject bookmarked, but it seems that one of the two girls in my life has deleted it. I'll try to track it down, as it had an excellent randomly generated sample of data that had a very strong 'G factor'.
I find this confusing. I assumed you were trying to say that the apparent correlation between imported lemons and lower highway deaths was clearly a statistical accident. But that's not what you're saying; you're saying both are an artifact of rising wealth, so the correlation is meaningful. The analogy in the case of presidents would be that some third thing is occurring which both makes the US economy more prosperous and makes Americans vote for Democrats for President. Whereas at other times, something is occurring which both makes America less prosperous and makes Americans vote for Republicans.
I don't think any of this is really relevant. The point of the contention that the economy does better under Democrats is that it rebuts the default-value assumption that Republican government is good for the economy. The normal political presumption has always been that you vote for a Republican when you're concerned about overall economic growth and for a Democrat when you think we're rich enough but we need to worry about fairness. The fact that the economy actually performs better under Democratic presidents rebuts that default-value assumption.
Careful. According to PlumpBob, you've just referred to the same 'imaginary premise' that I did. Amazing how many people imagine they've heard the same imaginary premise from so many political analysts their whole life, isn't it?
Not without looking at overall economic trends it doesn't. If Dems get elected during the rising economic times, and Republicans get elected,it would actually be evidence for the second part of your assumption. The truth is, all you could probably prove is the uncontroversial assumption that people tend to vote for the opposite party when elections happen in bad economic times.
Not really sure where you get your assumptions from anyway, I thought the assumption was that wealthier people voted Republican, so that good economic times created more Republicans that worried about stuff like tax cuts.
The second part of that sentence was Republicans get elected during falling economic times.
I know it's not really all that important, since the points she makes are still valid, but the earths magnetic field... powerful? A refrigerator magnet's field is more powerful, at least if you're anywhere in its vicinity. I think a better point to make would in this case be that it seems that MRI magnets haven't so far given anyone cancer.. and they truly are VASTLY more powerful than the earths magnetic field or that of any standard powerline.
"Democrats convinced that they can scientifically prove that Democrats are better for GDP by doing ham-fisted regressions of Democratic presidencies with a few tightly correlated economic variables"
This is, of course, carefully ignoring perhaps 80 years of carefully researched and documented economic research explaining exactly why Democratic "anti-business" policies produce better business returns, higher living standards and so on. I'm sure there is older stuff, but you could start with the 1930s. For a popular treatment, dig up some old issues of Fortune, of all things, which was surprisingly pro-New Deal despite acknowledging that their readers considered FDR anti-business.
For something more recent, jump to the 1950s and 1960s. Perhaps a visit to archive.org might even turn up some of the classic business-organized labor-government propaganda explaining the Democratic economic viewpoint on the economy.
In fact, there is also a great deal of literature, both professional and general, that supports the Democratic argument that tighter regulation, higher taxes, stronger labor and so on lead to a better business climate and higher returns on investment. If you are familiar with this neglected material, you will not puzzle over apparent contradictions, e.g. that anti-business states are wealthier than pro-business states, e.g. that India far outstripped Pakistan economically after independence. They will be simply explained, much as evolution resolves problems in biology, quantum mechanics resolves problems in materials science and relativity resolves problems in cosmology.
Don't let your ignorance and biases show so blatantly. Put some pants on.