A Good Instrument Is Hard to Find
No matter what academic discipline you’re in, estimating causal effects is hard. And one key characteristic of HPE research is that the data is observational, and so we as researchers can’t control the treatment assignment. We weren’t there! That time has passed!
So today, I’m going to talk about instruments. For those who do causal inference in HPE, instrumental variable (IV) analysis is one potential identification strategy (see here or here for a crash course). TLDR; If our independent variable of interest (our supposed treatment) is endogenous, then we try to find a third variable that is “as-if random” that helps determines selection into the treatment condition — this third variable is an instrument. When done successfully, these often make for really cool papers.
Sounds great, except IV is easily maligned. Partially because we have to pin our identification claims on an ‘exclusion restriction’ for which there is no statistical test (and so lots of room for fudging), and also because naive (confused?) scholars think IV analysis is easily done (it’s not). It’s also sometimes complicated to find historical instruments.
But fear not! Forewarned is forearmed….
Exclusion Restriction for HPE
Let’s start with the hard stuff.
There are a number of identifying assumptions in this research design, but the most salient is that IV assumes that the instrument only affects the dependent variable via its relationship with the independent variable. This is the infamous “exclusion restriction,” and this is where IV papers sink or swim. I won’t reinvent the wheel, except to note in historical research there are a few additional challenges on this dimension.
One thing to remember is that the exclusion restriction is sometimes harder to justify over a long period of time. If the potential instrument is associated with exposure at multiple time points, then there are multiple paths to the outcome (and potentially more exclusion restriction violations). Or if a lagged value of a covariate is used as an instrument, and that covariate affects outcomes that are durable and/or persistent, then that again presents a problem for identification (more specifically, those studying historical immigration, ethnolinguistic fractionalization, or religion should definitely check out the “Broken Instruments” working paper by Gallen and Raymond).
Generally, it’s also hard to map out confounders and causal pathways over decades, and we often underestimate the downstream effects interventions can have.
Readers should be aware of “post-instrument bias”, and read the working paper by Glynn and Rueda. Here, they note that researches often include post-instrument covariates to help justify the exclusion restriction, but argue this can actually undo all the benefits of a natural experiment.
Any exclusion restriction must be justified with case-specific knowledge, and the best defenses usually involve a combination of historical primary source evidence, creative descriptive data, and citations from other fields (hello, historians!). Luckily, this type of research is typically where HPE scholars shine. (Though missing data and lack of historical records might also make defending the exclusion restriction more difficult.)
Finally — though this might be less of an issue for historical instruments, since they are harder to find — it’s worth remembering that for popular instruments, the very fact that they have been used to predict many different things indicate there could be violations of the exclusion restriction. Rainfall, for example, is a plausibly exogenous instrument that is also very popular. This is demonstrated in Jon Mellon’s brilliantly titled paper “Rain, Rain, Go Away…”. He reviews 185 social science studies, finding 137 distinct variables linked to weather (and this is a conservative estimate). Mellon also helps the reader by providing steps to take to systematically review the existing literature for the use of popular instruments, to find potential exclusion restriction violations.
How does one find a good instrument for historical research?
The best advice is to be very, very familiar with your case in question (and read Thad Dunning’s textbook on natural experiments, so your subconscious knows what to look for). The defense of the exclusion restriction will require detailed knowledge anyway, but sometimes the best instruments are found in the process of doing background research. That’s how my coauthor and I found a lottery-based procedure which we used as an instrument to estimate the causal effect of committee service on careers — because I was reading a 19th century French book in the archives, and stumbled across it.
Another great piece of advice comes directly from Scott Cunningham’s Mixtape. He writes “A necessary but not a sufficient condition for having an instrument that can satisfy the exclusion restriction is if people are confused when you tell them about the instrument’s relationship to the outcome. . . Instruments are jarring… because these two things (Zi and Yi ) don’t seem to go together. If they did, it would likely mean that the exclusion restriction was violated. But if they don’t, then the person is confused, and that is at minimum a possible candidate for a good instrument.”
Or, you can read a bunch of historical + IV papers and get inspiration! I’ve listed some favorites here:
Nunn (2008): Looks at analysis of the impact of the slave trade on Africa’s economic development; uses distance from major slave ports as an instrument for the intensity of the slave trade.
Dube and Harish (2020): Looks how European queens were more likely to go to war in 15th-20th century Europe; uses gender of the first born and presence of a female sibling among previous monarchs as instruments for queenly rule
Acharya, Blackwell, Sen (2018): Looks at how slavery in 1860 correlates with present-day political attitudes and party affiliation; uses cotton suitability as an instrument for slavery prevalence
Biavaschi, Giulietti, and Siddique (2017): Looks at how migrants “Americanized” their names to improve their career prospects; uses index of linguistic complexity based on Scrabble points as an instrumental variable that predicts name Americanization
Cirone and van Coppenolle (2018): Looks at how budget committee service affects long term political careers; uses lottery-based procedure as an instrument for committee selection
Kern and Hainmueller (2009): Looks at whether exposure to West German TV made East German citizens less supportive of the communist regime; uses district-level access to West German television as instrument
Gihleb and Giuntella (2017): Looks at the effect of Catholic school attendance on better student outcomes; uses abrupt decline in female vocations (from reforms made at the Second Vatican Council) as an instrument for Catholic schooling
So You Want to Use An Instrument
IV is a perfectly plausible identification strategy for historical work, but here are some do’s and don’ts to remember:
DO put a separate section in your paper, discussing identification assumptions. While you can’t ‘prove’ the exclusion restriction, you can provide descriptive data or historical justifications (cite other fields!) that support your interpretation of the causal model. The Glynn and Rueda paper mentioned above collected data on IV papers in the the top three political science journals — APSR, AJPS, and JOP — and found that out of 155 papers using IV, only 116 explicitly discussed the exclusion restriction. What were the others doing?? Who knows.
DON’T add an IV lightly. This is not an easy identification strategy to defend, and should not be treated like a robustness check. A lonely paragraph and a regression table in the back won’t get you past referees at top journals (and is more likely to accidentally prime the reviewers that you don’t know the method).
DO consider alternative estimation strategies, in the same paper. Discussing or including ‘naive’ specifications like OLS can be important to understanding the bias in IV and potential exclusion restriction violations; there also can be fruitful discussion of the difference between the models.
DON’T use it because you want to be able to use the word ‘causal’ in your abstract. There are other identification strategies out there for observational data — difference-in-differences, regression discontinuity designs, matching, synthetic control — and a poorly done IV is not causal.
DO learn to love Directed Acyclic Graphs (DAGs) — these can help you anticipate exclusion restriction violations, and better help you understand your own research (check out the Mixtape to learn more).
Finally, an evergreen PSA for graduate students reading this post: when a faculty member retweets some plausibly exogenous and/or unexpected event in the world with the words “instrumental variable,” there’s an 80% chance they are being sarcastic. You’ve been warned.
 We’re looking for suggestions or volunteers to write a blog post on this issue; if you or someone you know are interested, let us know!