As I discuss in my book, The Making of the Populist Movement, the rise of the Populist movement in the American West during the late nineteenth century is commonly described as a response to economic hardship at the hands of railroads and grain elevator companies who worked together to dictate the terms according to which crops such as wheat would make their way to market. Until recently, the ability to quantify the relationship between market position and Populist mobilization has been hindered by a lack of data. This is true in two respects. First, in the absence of information on the distribution of market infrastructure such as rail lines and grain elevators, scholars have been forced to measure market position in terms of crop choice. While certain crops may have fetched a higher or more stable price, this misses the fact that Populist grievances were not so much about price per se as they were about the structural inequities in a marketing system that allowed various intermediaries to capture the benefits of farmers’ labor. Second, in contrast to voting data, which are systematically recorded as part of the election process, comprehensive information on the size and location of key movement organizations such as the Farmers’ Alliance has been hard—though not impossible—to come by. Consequently, quantitative accounts of the Populist movement have focused overwhelmingly on the electoral phase of the movement, which began as an outgrowth of Alliance activities before taking on a life of its own following the emergence of the People’s Party in 1891.
In a previous post, I showed how the first problem can be resolved by using network analysis to examine information gleaned from the Rand McNally Business Atlas and Shipping Guide and the annual reports of the South Dakota Board of Railroad Commissioners. The resulting analysis allows us to see which towns had rail lines, as well as the larger pattern of connections that emerged from the tripartite relationship between railroads, towns, and grain elevator companies. In this post, I turn to the second problem. As I will show, finding information on Alliance organizations and their membership is not enough. To make this data useable, we need to be able link information on the location of organizations to information on the distribution of the population. This raises the question of what it means to talk about the location of an organization versus the location of its members. To anticipate the discussion below, the solution is to incorporate information on geographic uncertainty by randomizing the location of organizations and then combining results across a large number of simulated datasets. The result is a form of multiple geographic imputation in which the effects of locational uncertainty are reflected in the standard errors attached to point estimates representing the probability of Alliance formation, the expected size of the resulting organizations, and the expected number of people mobilized overall (i.e., the expected size of the resulting organizations multiplied by the probability of Alliance formation).
The Search for Membership Data
Searching for historical data can be a bit like trying to find a needle in a haystack, with the important caveat that there is no guarantee that there is actually a needle to be found. As highlighted by Ali Cirone in a series of posts (here, here, and here), things have gotten increasingly easier in the digital age, with new data becoming available online every day. Such is the case with data on Populist organizing. The South Dakota Historical Society now maintains a publicly available database with information on more than seventeen thousand people who were associated with the South Dakota Farmers’ Alliance between 1890 and 1894. The database includes information on members’ names, the post offices that they used, the local organization with which they were affiliated, and whether or not they paid dues in a given year. This can be seen in the screenshot below, which depicts a subset of Alliance members from the area around the city of Miller, which served—and continues to serve—as the county seat for Hand County.
In the book, I use the number of individuals who paid dues in 1890 to estimate the number of people mobilized by the Alliance in the East River region of South Dakota on the eve of the electoral turn. By this measure, there were nearly five hundred local Alliance organizations—also known as suballiances—in operation in the East River region as of 1890, with the average suballiance including somewhere between fifteen and sixteen members. What we really want to know, however, is whether the number of people mobilized by the Alliance varied systematically as a function of various covariates including market position. I’ll discuss these connections at length in a future post. For now, I want to focus on the challenges associated with trying to capture the spatial distribution of Alliance activity relative to the distribution of the population, as indicated by the 1890 census, which provides population figures for more than eight hundred minor civil divisions across the East River region. The goal is to combine data on local Alliance activity with data on the size of the local population by using information on location of Alliance members to match Alliance organizations to the civil townships used to define minor civil divisions in South Dakota during this period.
What’s in a Location?
When it comes to figuring out the exact location of the individuals mobilized by the South Dakota Farmers’ Alliance, we have two pieces of information to go on: the name of the post office that they used and the name of the organization to which they belonged. It is tempting to imagine that the name of the post office that one used corresponded to the name of the town where they lived. The search options associated with the Alliance database would seem to suggest as much, in the sense that selecting values for the town field on the database homepage returns results based on which post office individuals were said to have used. This can be seen in the search results shown above. Note, however, that while the individuals listed all use the Miller post office, they do not belong to the Miller Alliance, but belong instead to Alliance organizations in the surrounding communities of Alpha and Rockdale. Looking at the full set of results reveals that the Miller post office served as the primary postal address for six different Alliance organizations, each of which appeared to be affiliated with a different township.
This is all to say that an organization’s location is likely to be a better indication of the location of its members than the post office addresses that they happened to use, though the two were closely related. Examining the relationship between the location implied by an organization’s name and the postal addresses used by its members, we find that in the majority of cases, an organization’s location is fairly close to what we get when we take the weighted average of the locations of the post offices to which its members were tied, with weights determined by the share of members using each address. When I first began working with this data, I had the idea of using the weighted average of postal locations to help assign organizations to townships in cases where an organization’s name does not refer to a readily identifiable location in the county with which it is associated. This led to a meta-question about what it means to talk about the location of an organization when the exact location of its members is unknown.
To make things more concrete, imagine that we knew the exact meeting place of each of the Alliance organizations that was in operation in the East River region of South Dakota in 1890. On its face, this would suggest a world in which the location of each organization was known exactly, allowing us to assign organizations to townships in straightforward manner. The problem is that organizations do not exist independently of their members, who are arrayed across some wider catchment area. When we say that the location of a local Alliance organization is known, what we really mean is that we have a decent guess about the locus of organizing activity, but even that is uncertain due to the absence of exact information on the location of individuals. To put it another way, we are uncertain about the locus of organizing activity regardless of whether an organization’s can be matched to a known location. The difference between matched and unmatched organizations is simply the degree of uncertainty that we attach to our initial guess about the center of organizing activity for any given suballiance.
Rather than sticking with our initial guesses regarding the locus of organizing activity and treating these locations as if they (a) were known exactly and (b) represented the location of individual members, we can treat the location of each organization as a random draw from a set of plausible locations and see how much the results of the analysis vary as a result of changing the set of randomly selected locations from one simulated dataset to the next. In the analysis presented in the book, the set of plausible locations attached to any given organization is allowed to vary depending on whether the name of the organization in question could be matched to a known location. For matched organizations, the set of plausible locations is defined by a circular sampling region centered on the coordinates of the location with which the organization is matched. In this case, the radius of the sampling region is set to three miles, which is equal to distance from the center of a standard civil township to its nearest border. For unmatched organizations, on the other hand, the sampling region is centered on the weighted average of postal locations, while the radius is expanded to nine miles to reflect the fact that we are less certain about the locus of organizing activity, as reflected in the fact that a greater share of the set of plausible locations will fall in different townships than the one occupied by the center of sampling region.
The figure below depicts the distribution of Alliance membership across the East River region of South Dakota based on the average location across fifty simulated datasets. Historical township boundaries were estimated using contemporary geographic data. Townships without an Alliance are marked by an “x,” while townships with an Alliance are shaded according to the log of the number of people mobilized. As this visualization suggests, we can think about Alliance mobilization as a hurdle process in which the expected number of people mobilized in a given township can be decomposed in terms of (a) the probability of Alliance formation and (b) the expected number of individuals mobilized in a township, conditional on having at least one Alliance organization present. While the average location of Alliance organizations is useful for visualization, it is less useful for the purposes of analysis due to the fact that uncertainty about the exact location of the organizations is not factored into the estimates. So rather than combining simulated datasets and then running a single model, I estimated separate negative binomial hurdle models for each dataset and then combined the results after the fact. As noted above, I will come back to these results another time. For now, I want to close by talking about how the results were actually combined as a way of highlighting the underappreciated connection between unweighted model averaging and multiple imputation.
I am on record as being opposed to the use of unweighted model averaging when combining results across models estimated using a single set of data. The reason why I am opposed to this is because some models are better than others and need to be weighted accordingly if the goal is to make inferences about the world, as opposed to simply summarizing the characteristics of the data in front of us. This reasoning no longer holds in the same way once we begin to think about averaging results across simulated datasets, which, unlike models, are self-weighting by virtue of how they are sampled. With this in mind, I compared the formulas used to combine results in the context of unweighted model averaging to Rubin’s rules for combining results in the context of multiple imputation for missing data and discovered that they are virtually the same! The only difference is that Rubin’s rules include a correction factor to account for the number of simulated datasets on which the combined estimates are based. In both cases, the chief advantage is the ability to produce an average point estimate accompanied by a standard error that incorporates information on both within- and between-sample variance.
Using Rubin’s rules in conjunction with cluster-robust standard errors to account for the nesting of townships in counties, I find that the expected number of people mobilized by the Alliance in any given township was around 9, with a 95 percent confidence interval ranging between roughly 7 and 11. This figure represents a combination of the estimated probability of having an Alliance organization—0.493—and the expected number of people mobilized, conditional on playing home to at least one Alliance—18.3. These estimates are based on the simple intercept-only model corresponding to the model implied by the map above. In future posts, we will look at more complicated specifications that speak to the relationship between market position and Populist mobilization while accounting for factors such as economic hardship, ethnic composition, and, perhaps most notably, population density.