Non-experimental evidence

Angus Deaton recently said that all the attention that natural (and actual) experiments are getting is over blown. He claims experimental data has no special status in a hierarchy of evidence. I agree to the extent that I don’t think we should favor one form of evidence to the exclusion of other types of evidence ((I tenured member of the Cult of Identification told me once that she wouldn’t write a paper about a topic unless there was a clear source of exogenous variation. She proudly told me that she hadn’t used an instrumental variable in years.)). Evidence is evidence.

A readily available form of evidence about the relationship between native employment opportunities and immigration is cross-section data ((In applied micro seminars, you often hear Cult members hiss something to the effect, “But those estimates are from cross-sectional data”. With grimaces around the table at the mention of the taint.)). These data describe various geographical regions or worker skill groups. For each region or skill group, the analysts assigns average wages (or other employment outcome) and the percentage of the group that is immigrants. Then the analyst checks to see if there’s a correlation among the groups between wages and the number of immigrants.

As you can imagine, there’s a lot for the interested analyst to play with. Every country has its own data sources. You can change the definition of skill group. You can look at larger geographic regions like states or smaller ones like cities. And, like always, you can choose from the palette of statistical techniques to calculate your estimated correlation and effect size. Longhi, Nijkamp and Poot did a meta-analysis of 18 papers that reported 348 estimates of this correlation.

As a quick demonstration of what these papers look like, I’ve downloaded some Census 2000 data from IPUMS USA. For each state, I calculated the percent of workers that are foreign born and the average wage for native workers. Here’s the plot:
I’ve drawn the regression line. Surprisingly, the line has an upward slope, suggesting a positive correlation. The slope of the line is about 1.5.

One thing that’s wrong with this plot, besides the fact that I haven’t controlled for a bunch of obvious things, is that this simple correlation conflates the impact of immigration on native wages with the shared economic incentives of natives and immigrants to move to states that have positive wage growth. Both immigrants and natives will want to move to states that have good wage prospects; they select themselves, to use the jargon. We really only care about the first thing, the impact of immigrants on natives, and so we’d like to wash this correlation to get the stain of “selection” out.

A neat regularity among immigrants is that they tend to move to regions where previous immigrants had already called home. We’ll leave it to sociologists to tell us why this might be the case and for the moment just exploit this fact for our statistical purposes. We can predict the percentage of immigrants in a state in the year 2000 by looking at the percentage of immigrants in that state several years before. Here’s a plot:

The red line is the regression line and the black line is the 45 degree line. As you can see, the percentage of immigrants has uniformly increased in those 40 years, but the red line is positively sloped and the dots cluster pretty well around the regression line. The immigrate ratios in 2000 are predicted pretty well by their ratios in 1960(!).

So what? Well, suppose the percentage of immigrants in a state does not have an impact on the relative wage prospects in that state 40 years later. The prediction of the year 2000 immigration ratios using the red line, then, should be unrelated to the wage prospects for immigrants (and natives) in that year. This prediction is just the detergent we needed to get rid of the stain of selection. Basically, we’re taking the variation of immigrant ratios due to selection out and only looking at the variation due to immigrant clustering. Here’s a plot of native wages versus predicted year 2000 immigrant ratios:
The slope on the regression line is 1.8. That this slope is close to the slope of the one where I didn’t correct for selection suggests that selection isn’t that big of a deal.

While its size is a bit big and so makes me think I did something wrong, the sign of the slope I’ve estimated isn’t surprising. Longhi, Nijkamp and Poot found that almost as many estimates of the effect of immigration on native wages are positive as negative. Here’s their figure 1 which shows the distribution of estimates across analyses:
The estimates seem to cluster around zero. My estimates are 1.5 standard deviations away from the mean; not too bad for a quick and dirty analysis!

So even the non-experimental evidence suggests immigrants have little impact on native wages.

8 thoughts on “Non-experimental evidence”

  1. I’m not sure I agree about evidence from experiments and cross-section data being on the same level. It all comes down to the problem of causal inference. My memory of the philosophy and statistics here is vague, but I seem to recall that it’s a shorter inferential leap to causality given a well designed experiment.

    In any case, my operating hypothesis would be that the weight of evidence should be measured based on its contribution to causal inference. Certainly, a poorly designed experiment could have less inferential weight than a well performed cross-sectional analysis. But my guess is that, on average, experiments do better at establishing causality.

  2. On the topic of identification, the latest issue of J. Econ Perspectives has some amusing counterpoint between Angrist & Pischke and various critics.

  3. Kevin, I didn’t say they were on the same level. I just pointed out that we shouldn’t ignore evidence just because it harder to apply the “experimental” label to it.

    There is a Cult of Identification that doesn’t just ignore evidence but whole topics of study because there’s no identifiable natural experiment.

  4. Ahh, I misunderstood. As a card carrying Bayesian, I agree we shouldn’t ignore any evidence. So I guess my disagreement is with Angus about the position of experimental evidence in the evidence hierarchy. My prior is that experimental evidence is more likely to provide better causal inference.

  5. My recent on-going dabbling in instrumental variable analysis in medical science is what brought me to Austin Frakt’s blog (nice work!) which led me to this blog. Given my background in epidemiololgy, I find the above conversation re. study designs particularly interesting. I happen to agree that (any) evidence is evidence, not so much from ‘let’s start basing treatment decisions on these’ point-of-view but more from ‘ let’s see how these stock up against other studies as well as the inherent limitations associated with the study design’. In health service research, experimental evidence (talking clinical trials) is gold-standard when it comes to causal inference but, for the most part, has limited generalizability to populations outside of these studies. Neverthless, the enthusiasm for drawing causal inference from non-randomized, observational studies is high, and Kevin’s remark serves a good reminder.

  6. I’m happy to hear an epidemiologist looking at IV stuff and worrying about identification in general. My significant other is an “on the bench” medical researcher and in her PhD program had to take a few epidemiology classes… from what I could tell, they didn’t cover IVs in any of her classes.

  7. Confession time:)…..I was introduced to this concept not during my general epi years but when I transitioned to health outcomes research and worked with a mentor who directs/leads cost-effectiveness projects. I find it interesting that not hearing about this when training in a field whose core tennets revolve around bias and confounding. Calls for more interaction between these various disciplines.

Comments are closed.