Data mining: tonal languages and genes edition

Bob Ladd and Dan Dediu found a correlation between geographical dispersion of certain genes and geographical dispersion of tonal languages.

There’s a great discussion of their method at Language Log and Mr. Ladd responds.

The bottom line is Bob and Dan found a correlation not causation (like the headlines are suggesting) and correlation can be spurious. As Bob says in his follow-up, data mining is good for “hypothesis-generating rather than hypothesis-testing.” Their next step is to go into the lab and do experiments.

I wish we had that option in Economics (or I wished we used that option more often).

  1. How do you experiment on something like that?

    “We’ve placed several hundred Taiwanese and Spanish children in locked cages all across this archipelago. At a signal, all of the cages will simultaneously open, and we’ll monitor their development using remote microphones affixed to dirigibles. We’ve seeded the ocean with sharks to discourage cross-pollination.”

