The Endogeneity of Historical Data
I am a historical sociologist by training. While contemporary historical sociology is undoubtedly inspired by the work of classical sociologists including Karl Marx, Max Weber, and W. E. B. DuBois, what we know as historical sociology began to emerge in the 1960s, eventually becoming a recognized subfield in the 1980s. By most accounts, the push for historical sociology in the United States was in part a reaction to the growing dominance of quantitative research and the style of ahistorical empiricism that it accompanied. The irony is that the transition from intellectual movement to domesticated subfield was driven by appeals to quantitative logic by way of macro-causal comparison. This is perhaps best evidenced by the early work of Theda Skocpol, who helped to popularize the use of Mill’s methods of agreement and difference, arguing that these approaches were analogous to multiple regression—a claim that did not sit particularly well with historically-oriented researchers. Charles Ragin and David Zaret, for example, were quick to distinguish between the case-oriented logic of macro-casual comparison and the variable-oriented logic of regression. The more radical critique came from scholars such as Andrew Abbott and William Sewell, who argued that the overwhelming emphasis on conventional methodological reasoning undermined the revolutionary potential of the historical sociology movement by shifting attention away from the possibility of narrative sociology as an alternative explanatory paradigm.
This is a long way of saying that internal dynamics within the field of sociology contributed to the creation of a subfield that drew on the language of quantitative methods for legitimation while remaining overwhelmingly qualitative in practice. This is not to say that there was no such thing as quantitative history in sociology. Before introducing qualitative comparative analysis (QCA), Ragin’s early work used standard regression techniques to examine topics such as peasant rebellion in Romania and ethnic mobilization in Britain and Wales. He was hardly alone in these efforts. Much of this work appeared around the same time that we begin seeing programmatic calls for the use macro-comparative analysis, which may explain why regression-based approaches often get overlooked when telling the story of historical sociology. When it comes to the use of quantitative methods, the more recognizable break with the comparative tradition came in 1990s, when scholars began using network analysis to describe the relational foundations of political action, focusing in particular on the role of collective identities in the rise and fall of the modern state. The latter line of work was an outgrowth of the Harvard School of network analysis, which began to emerge in the 1970s under the leadership of Harrison White. After relocating to Columbia in 1988, White became the center of what Ann Mische has dubbed the “‘New York School’ of relational sociology.” Since its takeoff in the 1990s, the New York School has continued to produce new waves of scholarship. Recent examples include Kinga Makovi’s work on the abolition of the British slave trade and Mark Hoffman’s work on ideology and partisan in the wake of the American Revolution.
When I came into sociology, I was instantly attracted to network analysis, having been exposed to the work of economic sociologists such as Mark Granovetter and Wayne Baker. It was only later that I realized that people were doing the same thing in historical sociology. One of the big things that stands when looking at the use of network analysis in the context of historical research is the novelty of the data employed. In some cases, it seems almost miraculous that the data exist at all. One of the my favorite examples of this is Roger Gould’s paper on the Whiskey Rebellion. Gould shows that elite opposition to the expansion of centralized authority in the form of the whiskey tax was a function of their position in local patronage networks. More specifically, Gould shows that, all else being equal, elites—defined as individuals who held at least two public offices at any level between 1781 and 1794—were more likely to lead the insurgency if they were either (a) being crowded out of the local patronage system by federal officeholders or if they were (b) excluded from the local patronage system altogether. To approximate the structure of the underlying patron-client network, Gould capitalizes on the fact that a number of local offices required officeholders to provide the name of someone who would be willing to post a surety bond on their behalf. The result was a detailed record of political sponsorship. This is similar to the strategy employed in his work on the transformation of political contention in Paris between 1848 and 1871, where he used information on marriage witnesses to measure differences in the prevalence of intra-class ties across neighborhoods.
This brings me to what I really want to talk about: the endogeneity of historical data. We can think about this idea in terms of the process through which historical data are actually produced. There is a fundamental tension here, in the sense that the records that historically-oriented researchers use to generate quantitative data were not created for that purpose, yet their existence is often bound up with the stories that the researchers in question are trying to tell. The thing to think about is why a given set of records exists in the first place. Whose interest is it in to keep those records? The answer is relatively straightforward when looking at, say, a private business ledger. For me, the more interesting example is public records, which are, by definition, imbued with political meaning, in the sense that they are a reflection of the interests of the state. In the case of the Whiskey Rebellion paper, for example, the fact that certain officeholders were required to secure a surety bond of up to £1,000 prior to holding office was an indication of the scale of economic activity being undertaken by the Pennsylvania government at the time. We can see very clearly how what became historical network data in Gould’s hands emerged as a byproduct of political development.
I started thinking about this phenomenon when working with licensing data as part of my forthcoming book, The Making of the Populist Movement, which is slated to drop next Tuesday. The central argument of the book is that the rise of electoral Populism in South Dakota was a response to the expansion of state and market in the western United States during the late nineteenth century. I use information on the licensing of grain elevators in conjunction with information on the location of rail lines to recreate the market network defined by the tripartite relationship between rail lines, towns, and elevator companies. These licensing data were included as part of an annual report filed by South Dakota Board of Railroad Commissioners in 1890. Yet neither the Board nor the data could be taken as given. The South Dakota Board of Railroad Commissioners was a successor to the Dakota Territory Board of Railroad Commissioners, which was created on March 4, 1885—just a little over four-and-a-half years prior to the transition to statehood on November 2, 1889. Modeled after the Illinois Railroad and Warehouse Commission, the Dakota Board emerged as part of a broader wave of regulatory innovation spawned by the Granger movement of the 1870s. What these boards could actually do, however, was a point of considerable legal debate throughout the late nineteenth century, particularly when it came to issues such as rate-setting.
While the power of the South Dakota board was regularly brought into question by the railroad corporations and elevator companies it sought to regulate, its members made a concerted efforts to expand its admittedly limited reach where possible. Toward this end, the Board successfully lobbied the state legislature to pass legislation defining a public warehouse as any elevator or warehouse where grain is purchased, received, or handled. Under the previous law, the definition of a public warehouse was limited to elevators and warehouses where grain was stored for compensation, thus giving elevator and warehouse operators a way out of public licensing requirements. As regulatory power goes, this was a relatively minor victory. The interesting thing about all of this is that what look like neutral quantitative data depicting the distribution of economic activity were anything but. This similarly extends to the company reports that railroad corporations were compelled to submit to the Board by law. In looking at these examples, we can begin to see the political dimensions of political economy simply by telling the origin story of the data.
I will discuss the endogeneity of historical data more in future posts. This idea pops up not only when looking at administrative records, but when working with spatial data. It is easy to take the existence of entities such as states and counties for granted, but the organization of political space in the American West was actually a highly contentious process. Historical actors not only had an incentive to continually divide space into smaller units, but to divide that space in particular ways. The ability to influence the organization of political space was a resource that could be used to shape constituencies and build political alliances. In addition to talking about the manipulation of political boundaries as a substantive process, I will also go on to discuss the downstream effects faced by quantitative researchers. The fundamental problem is that when the boundaries of an observation change, it becomes to difficult compare values over time. As I will show, there are a number of ways of dealing with this problem, one of which is to use network analysis to identity constant geographies.