What are we doing in Historical Political Economy?

The Historical Political Economy (HPE) field has attracted increasing scholarly interest in recent years, with influential papers in top disciplinary journals and a new journal devoted exclusively to it launched in 2021. One important aspect of HPE’s development is that it has occurred alongside the “credibility revolution” in empirical social science. As a result, much empirical research in HPE is inflected by the approach and priorities of credibly identified social science.

In this post I consider the effect that this has on the type of knowledge that the HPE field can generate. My overall point is that while credible identification has obvious and important benefits for empirical HPE, making it the centerpiece of any research program in HPE is going to distort the kinds of historical and theoretical understanding we create. In particular, centering the demands of causal identification will slant case selection without any corresponding benefits for generalized historical knowledge.

Most of the post is abridged from a paper I wrote, “Theory, History, and Political Economy,” that is forthcoming in the Journal of Historical Political Economy. The paper deals more broadly with the primacy of theory in HPE research.

Theory and Credibility in HPE

For many empirical scholars in HPE, a primary objective will be credible identification of a treatment effect. The crux of this is a credible exclusion restriction in empirical analysis, such as randomization or discontinuities in assigning factors to cases, which identify a causal effect of those factors in that case. This translates into an operational goal of finding cases that satisfy the stringent demands of identification.

Credible research designs are particularly valuable in causal explanation because they restrict the set of reasonable mechanisms more than most research designs. As Scott Ashworth, Chris Berry, and Ethan Bueno de Mesquita put it in their forthcoming book Theory and Credibility, credible designs do this by creating a tighter connection between the all-else-equal requirement of causal theories and the empirical findings.

Thus “positive” findings of a treatment effect tell us that mechanisms of reverse causation and confounding are not good candidates to explain some given data; “negative” findings tell us that a mechanism of causation is not an especially compelling explanation. By contrast, when a correlation does not credibly identify any treatment effect, mechanisms of causation, reverse causation, and confounding are all potentially reasonable candidates to explain the observed data. Thus, more credible designs are more informative about the operative mechanism than less credible designs.

Credibility and Causal Generalization

Social scientists usually do not only want to offer a candidate explanation for specific cases. We want causal generalization to a broad range of cases. (The issue in causal generalization is the same issue we face in external validity, but the phrase “causal generalization” seems a better fit for what I mean, so I use it.)

The issue in causal generalization is whether or when we can transfer knowledge of a causal mechanism empirically identified in one case to another case without executing the same empirical analysis of the second case. The caveat is important. We can of course replicate empirical analysis across a variety of cases and examine how well a causal mechanism accounts for a variety of cases. This produces inductive knowledge of causal mechanisms. The causal generalization question that I address is how to know that a causal finding in a case transfers to others, simply by virtue of its identification in the original case.

For instance, in a well known paper Dell and Olken (2020) identified a relationship between economic structures implemented by Dutch colonizers in Java and long run productivity in Java. On what basis could we apply that finding to other cases? We do so when we see a similarity to the original cases in a way that is relevant under some theory, so we suspect that a similar mechanism operates. If the mechanism delivers a specific relationship between colonial economic structures and long run productivity in the original study, we could reason that it implies a similar relationship in similar cases.

Somewhat more formally, consider the causal generalization argument (CG): if (i) a causal mechanism is credibly identified in case A, and if (ii) cases and have similar factors Xand Xwith causal capacities under m, then (iii) m can be inferred to operate in case B. Thus, one can make credible causal statements in case without a direct empirical study of that case, on the basis of credible identification in case A.

The CG argument is the linchpin of any non-inductive generalizability of empirical findings. The key question is how to move from step (ii) to step (iii). Without actually executing an empirical study in case B, this movement works only if we assume that similar factors across cases will lead to similar outcomes across cases. This means that we know how to translate a relationship between Xand Yinto a relationship between Xand YB, despite not actually observing this relationship. That is, we can understand the unobserved relationship between Xand Y— identify a candidate causal mechanism in case — simply because we know the relationship between Xand X(Pearl and Bareinboim 2014).

This translation is a special case of a research design we know by another name, selection on observables. Selection on observables means that, if we know an outcome and treatment status in one case, we can infer a counterfactual outcome Y’ under an alternative treatment X’. This is possible if and only if we know the structural model that translates into Y, which per se entails the causal effect of X. Moving from Xto Xacross two cases presents the exact same problem, and solving it requires the same assumption.[1]

We get a lot of mileage in generalized causal explanation from selection on observables, provided we believe it. The next question is, if selection on observables is valid between steps (ii) and (iii) of the CG argument, why isn’t it valid at step (i)?

If it is, then no specialized research design such as experiments, RD, and the like is necessary to establish credible causal statements at all. If it is not — and as credibility revolutionaries most of us probably agree it usually is not — then credible causal identification in case tells us nothing about case that we would not already know from a theory that operates in case B. In the real world conditions that I suspect are almost always relevant for HPE, empirical causal generalization is self-abnegating: it assumes the exact knowledge that we are seeking to establish.

Put somewhat differently, causal mechanism generalization from case to case requires a compelling mechanism to account for an effect in case A, and a theory of case that says it is similar to case in all respects relevant to the mechanism. But if we believe the theory required for the second step, we already know that the mechanism operates in case B. Causal identification in case adds nothing to understanding of causal mechanisms in case B. On the other hand, if we do not believe the theory required for the second step, then we do not know how to translate the causal effect in to an effect in B. Causal identification in again adds nothing to understanding of B.

It is at this point that many scholars discover their inner Bayesian. After all, if we learn that a mechanism operates somewhere, surely we should increase our confidence that it operates somewhere else, all the more so if the cases are similar in obvious ways. This argument is fine as far as it goes, but it sits uncomfortably with our demands for within-case analysis. One cannot be Bayesian for causal generalization but an identificationist for within-case causal explanation, because they are the same problem.[2] If we accept the credibility revolution’s standards for causal statements, the best we can hope for in causal generalization is that explaining a case leads to a qualitatively new causal mechanism in our library, which might help explain other cases.

In light of all this, consider again the analysis of Dell and Olken (2020) on the positive effect of colonial economic structures on long run productivity. The causal effects are carefully identified from Dutch colonization of Java in the 19th century. The paper notes two key channels plausibly responsible for the effect — development of manufacturing and colonial infrastructure investment (roads and rail) — that could be developed into a theory. All of its arguments are appropriately localized to this case. Beyond the intrinsic import of Javanese development, we may ask: what do we learn in general from this analysis? Empirically, nothing. It provides no basis to believe (and does not claim) that other places where colonizers developed manufacturing and infrastructure have also experienced long run productivity gains, because we have no way to know, absent an assumption or a separate study, whether those other cases are similar enough to Dutch colonization in Java or postcolonial governance there. In highlighting candidate mechanisms, the paper suggests factors scholars might consider in other cases. From this we might inductively develop general causal understanding. But the findings of this paper cannot extend beyond its case.

Causal Localism and Case Selection in HPE Research

The upshot of this argument is that, in HPE at least, all empirical causal knowledge is local. Even the most rigorous demonstration of a causal mechanism in one case cannot tell us that it operates in another case. This is true even if the second case is “similar” to the first in some sense relevant to some theory: the judgment of “similarity” assumes the exact theoretical knowledge we are trying to establish. But armed with that knowledge, the empirical finding from the first case is irrelevant; we would already know what we need to know about the second.

If all empirical causal knowledge is local, then the only question in selecting cases for empirical study is whether we want to know something about them. We should want to know something about cases either when they are historically important, or when they seem to require explanation with new causal mechanisms that scholars have not already identified.

If we study these cases, we will learn something about history. Namely, we will learn a candidate explanation for an important development. We might also learn something about theory, namely, a new entrant in our “library of theoretical mechanisms” that we can deploy to understand and categorize other cases in future research. (A good example of this is Garfias and Sellars (2021) on state centralization in colonial Mexico. They argue that disease-induced demographic collapse allows an imperial state to replace indirect rule through potentates with direct rule through the state’s agents.)

But it is no use studying a case simply because it leads to clean identification of something. Whatever mechanisms operate there cannot by virtue of that identification be said to operate elsewhere. Whatever is learned empirically is local to that case. It is also no use declining to study an interesting case simply because various identification problems are challenging. As we constitute this field, we should study important cases and flesh out the (possibly large) range of mechanisms that can explain them, not select our cases on the basis of what can be understood in a particular way.

I do not intend this argument to denigrate the generalized causal understanding obtainable in HPE (or for that matter in empirical social science, because the same argument applies to most of it). A large library of mechanisms is useful. It gives us a powerful toolkit for identifying factors to watch for in new cases (provided we guard appropriately against confirmation bias). It gives us a supple language for the grouping of like cases and the distinguishing of unalike ones following empirical analysis of those cases. This is valuable, and moreover, it is the only generalized causal knowledge we can expect to develop in HPE. So we should embrace it.


[1]My argument is not that the CG argument is vacuous. It holds for example when cases are randomly sampled in a purposive, controlled sampling design from a larger universe of cases. In this instance, causal findings from a sample obviously generalize to the population of un-analyzed cases. In HPE, we typically will not sample cases or control a sampling process in this way.

[2]See Little and Pepinsky (2021) on the implications of Bayesian reasoning for the hard-core identificationist position.


  • Professor Gailmard studies how principal-agent problems in politics affect the structure and development of political institutions. From this perspective, his research attempts to understand the strategic foundations of American political institutions; English imperial governance in the American colonies; expertise and political responsiveness in the bureaucracy; historical development of the American executive branch; congressional control of bureaucratic discretion; the internal organization of the U.S. Congress; and electoral accountability. He has also studied models of collective decision making in laboratory experiments.

    Professor Gailmard is the author of Learning While Governing: Expertise and Accountability in the Executive Branch (2012, University of Chicago Press, with John W. Patty), which won the 2013 William H. Riker Prize (best book in political economy) and 2017 Herbert A. Simon Prize (most signfiicant contribution to public administration 5 or more years old), as well as Statistical Modeling and Inference for Social Science (2014, Cambridge University Press), a Ph.D.-level textbook. He has published research in leading social science journals, including the American Political Science Review, American Journal of Political Science, and Journal of Politics.

Leave a Reply