Too much data, not enough theory

A professor forwarded this New Yorker article about the Large Hadron Collider in Europe.

One section of the article talks about tension between theorists and empiricists. The empiricists feel like they’ve been slaving away the last decade to build the collider and then the theorists are swooping in to claim credit for any of their discoveries. The theorist of course theorized the existence of any particles the empiricists might find.

Whatever the empirical physicists feel about the theorists, its clear that the LHC wouldn’t have been built, nor conceived of, if it wasn’t for theory. In physics, they have stories and then they run experiments to check those stories.

Economics has a different problem, I think. There’s too much data, not enough experimental data, and not enough theory. Econometrics allows a hundred ways to parse signal from noise. Given even with random data 5 of those ‘signals’ will be statistically significant, we end up making up stories to explain the data even if the data is just noise.

In economics, we have it backwards. We find data sources, we dig around until we get statistical significance, and then we have a paragraph in the last section of the paper outlining the several stories that can be told to explain the data.

15 thoughts on “Too much data, not enough theory”

  1. hey, i don’t see the connection of econ and the LHC. in econ, we can’t conduct the experiments or data collection we want, considering the impact it might have on national welfare (here, i’m speaking about the US, and NOT about Macro).

    in physics, they can design their experiments from the ground up. economists need to make due w data that was originally not meant for the kinds of things we might really want to hypothesize about.

  2. Yep, that gives us license to data mine…

    As far as I can tell, there is no institutional barrier in economics to just fiddling with regressions until you get significant results. Monkeys throwing darts would find significance 5% of the time. That means you throw 100 monkeys in a room with a dart board, five of them will get published. I would hope we’re better than monkeys.

    To hope for a semblance of professional honesty, I’d expect to see, let’s say, 50% of journal articles having negative results (e.g. the theory predicted x, but the data show y). How many negative results do you see? Zilch.

  3. You expect 50% of (good) empirical tests to show evidence against theoretical predictions? Wow, you have even less faith in the theorists than you do empiricists!

    Anywho, papers that find results contrary to theory can generally be published pretty well because of their “surprising result.” I think the biggest concern are the (potentially important and overlooked) papers that find no significant results that are lost to the netherspace of academic research.

  4. By the way, I completely agree that there aren’t enough experiments. While a lot of the experiments that we want to run aren’t reasonable, we can look for good substitutes. And don’t underestimate the value of “natural experiments.”

    Experiments are totally underused. I think this is a product of laziness more than a lack of funding. It’s easier to “mine the data” and/or make a case for a “natural experiment” than to actually do one. I sort of have an experiment planned (I’m waiting to find out about funding) but am pretty overwhelmed by how much work it will be if it it happens.

  5. Negative results: Right. Not rejecting the null hypothesis is a negative result. (How’s that triple, or is it quadruple, negative for you?)

    Rejecting theory: yeah, I’d say a big chunk of theories are nice, in theory, but don’t coorespond to reality. This is ok. A bad theory that, when proven wrong, points us in the right direction is a good theory (or something).

    Experiments: Do you suppose you’re overwhelmed by the experiment because you haven’t been trained to run them?

  6. I don’t think I’m necessarily overwhelmed because I haven’t been trained. I’d say it’s mostly because my experiment entails something that might be considered fraud (which isn’t necessarily an easy thing to do)–I want to send fake resumes to employers and analyze the sensitivity of the response to the names that are used (which will signal race and, hence, test for discrimination). Besides having to compile a stock of reasonable resumes for a variety of types of positions and send off resumes for long enough to get a good sample, I have to make sure the employers can’t find out they’re fake. This will entail masking where they are faxed/emailed from, having multiple phone lines for them to respond to, etc. I have an idea of how this might be done, and it’s been done by economists before, but I’m not sure how they dealt with these issues. If the money comes through, I’ll just ask them.

  7. Re: negative results
    I feel a force pulling us towards a discussion of falsifiability…head pressure increasing…disengage :)

    (I guess it would depends on how you set up the null hypothesis as well)

  8. You guys probably go over this every quarter, but how do you reproduce results from natural experiments? How do you run controls? Or do you just watch for similar experiments and slap the data around with statistics until it fits ?

    50% seems like a low rate for successful predictions. I predict that your prediction has a 50% accuracy rate, with a 49% margin of error.

    Do you guys work with simulations much? It seems as though if we can simulate stellar formation, global weather patterns, nuclear explosions, and folding proteins, surely we can simulate a (relatively small) economy with a million or so agents.

  9. I think agent based simulations would be cool… but the top journals don’t…

    This despite a recent Nobel Laureate that had a famous result based on agent simulation (albeit, he did it in the 60’s and his agents were checkers on a checker board).

    Jason, have you heard anything about Rabin’s course down in Berkeley?

  10. I haven’t heard anything about it. The syllabus is a doosy though.

    Regarding natural experiments, the idea is that some treatment randomly affects some individuals and not others, so you do have controls. Now, reproducing the results is where things get tricky and this is part of the “art of economics.” You simply can’t reproduce such things. The art is in assembling a large enough body of evidence that consistently supports your hypothesis while refuting competing hypotheses. The body might consist of theory, other instances of natural experiments, similar tests with alternative data sets, true experiments, etc. A truly masterful work will do this. At the same time, many attempt to do this through their body of work so they can get published more often (and be citing their own work in the meantime.)

  11. Jason says “art” because such experiments don’t falsify theory as in the physicaller sciences… We’re ok if 12 out of 15 studies show minimum wage reduces employment, unless of course the study’s author is named Card and then 1 out of 200 is all you need. :-)

    *pop* crap, there goes Lindo’s head.

  12. It sounds like globalization and improved communication will make it harder and harder to find natural experiments in the future. Refutation will come in the form of critics shouting “BUT THE INTERNET!!” at you over and over.

    I guess the agent AI in a simulation would be the tricky part. If you’re trying to model real people, you can’t just have them all minimax every trade decision. You can’t just throw extra hardware at that part of the problem either. I wonder if studying MMOs (massively multiplayer games) has any value here.

Comments are closed.