Data mining: tonal languages and genes edition

Bob Ladd and Dan Dediu found a correlation between geographical dispersion of certain genes and geographical dispersion of tonal languages.

There’s a great discussion of their method at Language Log and Mr. Ladd responds.

The bottom line is Bob and Dan found a correlation not causation (like the headlines are suggesting) and correlation can be spurious. As Bob says in his follow-up, data mining is good for “hypothesis-generating rather than hypothesis-testing.” Their next step is to go into the lab and do experiments.

I wish we had that option in Economics (or I wished we used that option more often).

2 thoughts on “Data mining: tonal languages and genes edition”

  1. How do you experiment on something like that?

    “We’ve placed several hundred Taiwanese and Spanish children in locked cages all across this archipelago. At a signal, all of the cages will simultaneously open, and we’ll monitor their development using remote microphones affixed to dirigibles. We’ve seeded the ocean with sharks to discourage cross-pollination.”

