A lot of us have been eagerly following the vaccine rollout, looking forward to a time when we can travel and visit family and friends again. It striking to see how differently this process is playing out across American states. A healthy 55-year-old in Connecticut can get a vaccine right now, while one living just across the border in Massachusetts cannot. Meanwhile, a 30-year-old working in a grocery store in Connecticut would be eligible to receive a vaccine in neighboring New York but can’t yet sign up for an appointment at home.
This is something that we’ve seen all year. Countries, states, and localities have taken very different approaches to regulating behavior during the pandemic. This has led to strange situations where neighboring jurisdictions—towns on either side of state line, for example—have been subject to very different policies. A recent ProPublica piece by Alec MacGillis illustrates this well. Teenagers living in Hobbs, New Mexico have been mostly on lockdown, studying remotely and staying home, while those living just a few miles away across the border in Texas have had a largely normal school year with in-person education and football games with fans in attendance. As MacGillis writes, “Everything looks the same on either side of the Texas-New Mexico border…[but] the two sides of the state line might well have been in different hemispheres” since the crisis began.
This set up probably looks familiar to many Broadstreet readers. A lot of empirical HPE research has focused on spatial “discontinuities” or geographic boundaries as a way of addressing typical challenges with causal inference. Some prominent examples include Melissa Dell’s work on the long-run impact of the Mita in Peru and Stelios Michalopoulos and Elias Papaioannou’s series of articles on historical persistence in African political and economic development. This discussion paper by Felipe Valencia Caicedo has a great summary of recent work in economic history using regression discontinuity designs and similar strategies.
From this work, you can see why these sorts of designs can be a powerful tool. It is typically hard to estimate the effect of political, cultural, or institutional variables. These things are seldom randomly assigned and often depend on other messy cultural, historical, and geographic factors that are hard to observe and may themselves have important long-term consequences. If these messy, unobservable factors vary smoothly at the discrete political or institutional boundary, researchers can address this threat to inference by comparing units on either side of the border. These bordering areas should be the same in virtually every way except that they “happen” to be on opposite sides of a boundary.
Though attractive, there are reasons to be very cautious about designs like these, especially in historical work. Truly exogenous and persistent spatial discontinuities are very rare. Most borders are not randomly assigned, but rather depend on preexisting historical or geographic divides or political wrangling. Moreover, these borders (or at least their enforcement) often shift and evolve over time in ways that are clearly not exogenous. Most historical “shocks” also generate spillovers across space—through migration, trade, or even political activity or cultural diffusion—greatly complicating empirical analysis.
This post discusses some of the big-picture questions that we should ask ourselves when writing, reading, or reviewing HPE work using spatial regression discontinuity (RDD) or similar research designs (e.g., matching methods across a boundary, designs based on exploiting distances to boundaries, etc.). There are a number of technical issues that are worth thinking of as well, such as spatial correlation, appropriate “bandwidth” selection, and assorted issues with misspecification. For these, it is worth reviewing the standard RDD reference guides by Imbens and Lemieux (2008), Lee and Lemieux (2010), as well as the discussion paper by Valencia Caceido referenced above. I also set aside some of the issues with substantive and theoretical interpretation of Local Average Treatment Effects (LATEs) that are raised in Sean Gailmard’s earlier Broadstreet post and the broader critique by Deaton (2010), though I return to these issues at the end. I also want to emphasize that the point of this post is not to review or critique specific papers but rather to think broadly about some of the difficulties of applying these techniques in practice.
Why does this border or boundary matter?
This is a pretty obvious point, but one worth thinking about. Virtually every paper that employs this sort of design includes a narrative about why a given boundary or border should be significant, but this is not always accompanied by a thorough list of other boundaries or borders that might be significant as well or maybe instead of the boundary under consideration. It’s hard to think of these things in total abstraction, so I’ll draw on examples from a context that I know pretty well: colonial Mexico (New Spain). The colonial Spanish state was divided into several different political, legal, and religious jurisdictions. Some work on other parts of the Empire has exploited spatial boundaries between these sub-jurisdictions to analyze how distance to the colonial state influenced contemporary development.
It is tricky to do this with colonial jurisdictional boundaries in the area that is now Mexico for several reasons. One is that there were multiple jurisdictional boundaries operating at any given time (political, legal, religious, military, and economic), and these did not always map neatly onto one another. The figure above represents the civil and ecclesiastical divisions in central Oaxaca during the colonial period, from Gerhard’s comprehensive reference, A Guide to the Historical Geography of New Spain (map 13, p. 21). The map illustrates how messy these overlapping jurisdictional boundaries could be. A given civil jurisdiction could be spilt between several dioceses (themselves an integral part of the Spanish colonial government). Both types of boundaries were not always contiguous, with strange enclaves and exclaves carved out for different political and social reasons. These boundaries also changed over time, both as geographic borders were redrawn for different reasons and as the colonial bureaucracy evolved in different ways.
It is also worth considering the extent to which these political boundaries were actually enforced on the ground. The picture on the top of this post is John Thomson’s 1814 map of Spanish North America. The map traces the northern “border” of Spanish America, running a jagged and curved line through what is now Texas and Louisiana. The actual territorial boundary was far from clear. These frontier areas were not under solid control of any particular government or state. They were generally sparsely populated and occupied by a mix of indigenous peoples, traders, explorers, and settlers from different places. Ambiguity and conflict over the border persisted well into the 19th century, as those familiar with American, Mexican, and Texas history will know. Many “interior” boundaries were similarly contested and vague. When you read Gerhard’s descriptions of the 1786 administrative regions of New Spain, it is clear that there are discrepancies in historical sources over the exact boundaries of these districts and that there was considerable change in these boundaries across the colonial period.
Changing, vague, or overlapping borders can make it difficult to interpret what any LATE estimated around a single boundary actually means. Another issue is something that Ali Cirone brought up in her post on pre-analysis plans in HPE research: p-hacking and the file-drawer problem. If one wanted to examine the effect of colonial political institutions on post-colonial development in Mexico, for example, there are many “political” boundaries that could be chosen. Should we use 16th century political boundaries or those from the late 18th century? Should we look at civil or ecclesiastical jurisdictions? In some cases, there may be strong theoretical reasons to use one or the other. In others, we should ask how results may change using an alternative set of equally plausible borders (or perhaps why researchers chose to look at only one of several alternatives).
Why did these effects persist? How?
Many empirical papers using RDDs or similar strategies are interested in how historical institutions or events affected contemporary development. Even when there is a clear and well-defined boundary between political or cultural jurisdictions to be explored, and even when those boundaries are plausibly exogenous, examining causal effects over a long period of time requires even more careful thought. How did subsequent political, economic, or cultural events/institutions alter or change the impact of the shock? The idea that the impact of historical events can change over time is something that Vicky Fouka wrote about in an earlier post and something that Jennifer Alix-Garcia and I explore in a couple of papers.
As Vicky highlights in her post, historical dependence is not linear, and the consequences of historical events or institutions may subsequently change in unpredictable ways. More generally, it is worth asking what has enabled long-run differences between neighboring jurisdictions to persist, especially after those borders are erased. If one side of an obsolete border is persistently underdeveloped relative to the other, why don’t people migrate across the border, or why can’t firms take advantage of these differences by altering trade or investment patterns? Perhaps these adjustments are too difficult. Internal migration in the United States, for example, has declined over time despite large and persistent differences in wages and development between regions. Especially when looking at neighboring jurisdictions over a long period of time, however, the fact that households can migrate or make other behavioral adjustments should make us wonder why long-run differences in development persist.
As Adam Slez highlights in several of his Broadstreet posts, political jurisdictions also tend to change over time for reasons that are not random. This further complicates analysis. Subsequent historical shocks or institutions can be layered unevenly over the border or boundary under study. The figure above shows the overlap between the rough boundaries of 1786 colonial administrative regions (green), present-day states (red), and present-day municipios (grey) around the Colotlán region of northern Jalisco in central-west Mexico. This region has a distinctive history as a frontier zone during the first century of colonial rule and the subject of numerous subsequent jurisdictional disputes. For our purposes, what is interesting is how little agreement there is between the 1786 boundaries and subsequent political divisions. Importantly, as this book by María del Carmen Velázquez documents, these political divisions were redrawn several times because of prior historical conflicts. (See also Gerhard’s North Frontier of New Spain.)
From one perspective, the lack of correspondence between colonial and contemporary boundaries might be thought of as an advantage, allowing us to rule out that later political shocks could explain any observed discontinuities in outcomes at the defunct colonial boundaries. It is worth remembering, however, that these 1786 administrative regions were only one of several different territorial divisions during the colonial period, that these borders were often contested, and that settlements were often shifted across jurisdictions over time.
Where did this border come from? What else could be going on?
The discussion of shifting boundaries in the Mexican context brings up another challenge. Boundaries between political units or policy zones are seldom exogenously determined. In some cases, researchers may be able to make an argument that boundaries were drawn as if randomly, as in the infamous case of European powers haphazardly dividing up Africa in Michalopoulos and Papaioannou’s work. In most others, however, these boundaries are clearly not random but rather depend on major geographic features (e.g., mountains, rivers, or jungle) or political history (think of Adam’s example of the creation of counties in the Dakotas). Other observable and unobservable factors might differ on either side a significant geographic boundary like a mountain range (e.g., agricultural productivity, culture, history). Similarly, when the boundaries were drawn as a result of explicit political deals and debates, we should ask why a given village or individual ended up on one side of the border or the other.
The challenge of showing that estimated differences around a boundary can be attributed to the “treatment” is common to all regression discontinuity and related designs. As the user’s guides to RDD estimation by Imbens and Lemieux (2008) and Lee and Lemieux (2010) illustrate, the choices of bandwidth around the discontinuity, whether and how to incorporate covariates, how to model the running or forcing variable (local linear? polynomial?), and other functional form issues are challenging even in the standard case when a single variable determines whether a unit ends up in the treatment or control group. This is one of the reasons why most RDD papers use large-scale administrative data. Researchers need enough data within a reasonable neighborhood of the discontinuity to be able to precisely estimate any effects.
As Dell (2010) points out in her paper, it is even more challenging when the discontinuity is geographic because institutional boundaries vary along more than one dimension. This requires thinking even more carefully about how to flexibly control for directional trends across space in multiple directions. Dell uses several different approaches: a cubic polynomial in latitude and longitude, a single-dimension cubic polynomial in distance to Potosí (perhaps the largest city in the Western Hemisphere, for a time), and the cubic distance to the mita boundary. The first approach especially puts a lot of pressure on the data. Some of Dell’s datasets simply aren’t large or dense enough to give her the statistical power to precisely estimate the more complex multidimensional polynomial (see discussion on p. 1876 and subsequent tables). The single-dimensional approaches, however, don’t exactly control for local directional trends or gradients in development due to other factors. In his working paper on historical persistence, Morgan Kelly finds that introducing a control for latitude greatly reduces the estimated effects of mita boundaries in Dell’s paper (see Figures 2 and 3).
This discussion highlights why historical datasets—which are often small, have a sparse distribution of observations across space, and are measured with error—may be poorly suited to using regression discontinuity or similar research designs. There may not be enough units within a reasonable bandwidth of the border to say anything precisely about causal effects. A related challenge in these designs, also highlighted in Kelly’s working paper, is addressing spatial correlation between observations in estimated standard errors. This is an especially important consideration when tracing nearby units over a long period of time as is often done in historical persistence papers. There are several potential ways to address the problem of spatial dependence (the standard one is based on the procedure in Conley (1999) with some choice of spatial kernel; see Kelly’s paper for a discussion), but any of these will put further pressure on a limited historical dataset. Kelly finds that estimated standard errors in the articles that he examines generally increase significantly after modeling spatial dependence, finding further that “credible identification strategies tend to perform no better than naïve regressions” in his robustness tests (p. 1).
There is a reason why so many researchers have relied on regression discontinuity designs and similar methods for causal inference. It is very difficult to identify the long- or short-run impact of historical political, cultural, or institutional factors. By focusing on units just on either side of a plausibly exogenous border, researchers can avoid many of the standard empirical challenges that make it difficult to determine why outcomes differ across units.
The problem is that it is really hard to apply these methods in practice in HPE research. Many of boundaries that we might hope to exploit are endogenously determined, poorly defined, and/or evolving over time. Even if conditions are near ideal—there is a clear and exogenous spatial discontinuity to be exploited and an obvious theoretical reason why differences across this boundary should persist over time—flexibly addressing spatial correlation and controlling for directional trends in a compelling way requires a lot of statistical power that can be difficult to find in historical datasets. A broader point, raised in Sean Gailmard’s post, is what can be or ought to be learned from even a perfectly executed regression discontinuity design. Political and institutional boundaries are multifaceted, and it is not clear how any locally estimated causal effects will generalize across even similar cases.
These problems are not unique to historical work. Returning back to the example of neighboring towns across the Texas-New Mexico border during the pandemic, this might look at first glance like a nice opportunity to “test” whether more restrictive COVID policies were effective at curbing the spread of disease. There is no obvious discontinuous jump in demographic, economic, or cultural factors at this boundary. The only difference is that policies varied sharply at the boundary. Teenagers in Texas were largely in school and playing sports while those in New Mexico just a few miles away were studying from home and subject to strict lockdown policies. Despite this stark divide, the counties on either side of the state line ended up with roughly similar per capita COVID cases and deaths. Does this mean that the restrictive approach was ineffective? Looking more closely, it becomes clear why it is hard to learn a lot by comparing outcomes across these towns. The COVID policies adopted on one side of the border clearly had impacts on the other side. People could travel freely across the border and easily observe what was going on in the neighboring town, which undoubtedly shaped their private behavior. The area around the Texas-New Mexico border also has several distinctive characteristics (rurality, for instance) that make it hard to generalize to the country as a whole.
While there has been a trend toward using regression discontinuity and similar designs in economic history, there are a lot of implementation issues that need to be considered when writing or reading this work. In addition to some of the more technical considerations, it is helpful to step back and think about big-picture substantive issues. Does it make sense to think of this boundary as a spatial “discontinuity”? Does the mechanism through which effects are thought to persist make sense? What other borders might matter, and what else could be going on? How robust are the results to other estimation strategies or when accounting for spatial dependence?