The COVID-19 pandemic has shown how important it is for states to have information about the societies they govern. In a scramble to contain a public health crisis, governments across the world – from the United States to India – are influencing how often we wash our hands and how many friends we meet. Public and private organizations are collecting sensitive information about our health and contacts. Social distancing, virus testing, and contact tracing are essential for containing the spread of COVID-19, but also raise issues around the power of governments to monitor and regulate our lives.
Gathering data on individuals has long been a feature of states, in crisis and normal times. Indeed, reshaping societies and nature in order to measure, count, and order society – and ultimately control it – was how states developed. James Scott’s seminal book Seeing Like a State (1998) traces this process and proposes the concept of legibility to describe its desired outcome. Early states sought to make their populations legible to collect more revenue, but legibility also advanced social and economic development, improving many people’s lives.
In this post, I discuss how recent quantitative studies have operationalized legibility. I then illustrate the usefulness of one proxy for legibility, age heaping, using the 1897 census from Imperial Russia. I show that religious demography curtailed the reach of the state in some parts of the empire. Predominantly Russian Orthodox, state officials faced particular difficulties when interacting with minority populations, a problem that is also common in modern states.
Scholars have operationalized legibility in various ways. One approach is simply to track whether a state has conducted a census or a land survey. In a 2020 CPS article, Thomas Brambor and collaborators measure “information capacity” for 85 states from 1789 to the present, using an index that combines five indicators: regular census implementation, the release of statistical yearbooks, the introduction of civil and population registers, and the establishment of a national statistical agency. Cadastral records have been used as a proxy for legibility in settings as diverse as Napoleonic France, by Anne Degrave, and 20th-century Colombia, by Mariano Sánchez-Talanquer.
In a JOP article, Melissa Lee and Nan Zhang operationalize legibility using age heaping, or the tendency to report ages that end in a focal digit (typically 0 and 5). They calculate the Myers Index of age heaping, developed by demographers, for more than 120 countries in 1960–2012 based on some 370 censuses. This approach enables us to compare legibility across time periods, countries, and subnational units and is particularly useful for historical research, since most countries collect and publish data on ages in their censuses.
The basic idea is that true ages are distributed as a smooth curve, whereas errors in reported ages deviate from this natural distribution due to “heaping” on specific digits. In the figure below, it is visible to the naked eye, with frequencies of ages that end in 0 and 5 unusually high, particularly for ages between 20 and 60. Age heaping may occur due to people’s unawareness of their true age (i.e. lack of numeracy) or due to their refusal to share accurate information with state representatives (i.e. illegibility). In economics, age heaping has been primarily used to access numeracy and human capital more generally.
As Lee and Zhang demonstrate, age heaping correlates with the quality of data on other indicators and predicts more effective tax collection and greater provision of public goods. Studies have also concluded that age heaping predicts long-run economic growth, a finding that is also consistent with the interpretation of age heaping as a legibility issue. And as I show below, age heaping in Imperial Russia reveals not only low education levels, but also the state’s problematic relationship with some population groups.
Russia was a relative latecomer to statistical thinking. It started conducting censuses (revizii) in 1719-21, with the goal of assessing the poll tax and gathering conscripts, but these efforts were very limited. The empire’s first (and last) modern census took place in 1897, considerably later than in other states. By then, obtaining accurate data was seen as essential for maintaining Russia’s position in the world.
David W. Darrow provides a fascinating account of how the 1897 Russian census was planned and implemented. It was an ambitious undertaking, covering one-sixth of the earth’s surface and, as it turned out, some 125 million people. The census collected a broad range of information, from age, sex, and religion to class, literacy, language, and occupation. It revealed that just 69% of Russia’s population was Russian Orthodox and only 24% of the population could read and write.
In European Russia, identical questionnaires were administered to the heads of household regardless of their social status, religion, or wealth. Everyone was counted, including the emperor Nicholas II, whose primary occupation was recorded as the Master of Russian Land (“Khoziain zemli Russkoi”). The government used electromechanical tabulating machines invented by Herman Hollerith, who later founded IBM. Despite this technology, it took the Russian government eight years to process the data.
Not surprisingly, the census contained multiple inaccuracies. The publication of questionable census figures was already perceived as “dangerous” at the time, as Darrow notes. Some bureaucrats feared that inaccuracies in the data would erode the public’s trust not only in government figures, but also in the government itself. These concerns now appear overblown. If anything, widespread distrust of the imperial state undercut the state’s ability to collect accurate data about its citizens.
Fortunately, the very inaccuracies in the census can tell us something about state-society relations in Russia. I illustrate this using the data on ages for the fifty provinces of European Russia. I calculate the Myers Index of age heaping, which in theory can range from 0 (no heaping on any digit) to 90 (extreme age heaping).
The mean of the Myers Index in the data is 15.1. On this indicator, 19th-century Russia resembles Mexico in 1960 or Haiti in 1971 in Lee and Zhang’s dataset. The 1897 Russian census also counted the population with “unknown ages,” a tiny category, equivalent to an average of 0.03% of the population in a district. The share of unknown ages decreases with age heaping, suggesting that misreporting one’s age was an alternative to reporting unknown age. Age heaping was higher for females and decreases with literacy.
We can learn a great deal about the legibility of Russia’s population by mapping the Myers Index for each district and analyzing its predictors. In European Russia, the index ranges from 3.9 (comparable to Singapore in 1970) to 27.0 (think Nepal in 1981). As shown on the map, age heaping is less prevalent in capital cities and increases as one moves inward, away from the coasts and borders. It’s highest in Russia’s western and southern provinces.
The top predictors of age heaping at the district level are the past incidence of serfdom, literacy rates, distance to Moscow, and the presence of religious minorities. Schooling and remoteness are self-explanatory. The importance of coercive labor institutions is also consistent with other research. Pavithra Suryanarayan and Steven White find that in the US age heaping (among whites) was more prevalent in places with greater numbers of formerly enslaved black Southerners in 1880. They argue that whites feared that the information gathered from them would be used for taxation (see more on hollowing out of the state in the US South in this post). In Russia, the reason is probably the general level of poverty and underdevelopment among the economically exploited population.
Lower legibility in districts dominated by religious minorities – predominantly Western Christians in the west and Muslims in the southeast – at first seems puzzling, since minorities were more literate than the Orthodox population. As Brendan McElroy and I explain in a working paper that focuses on minorities the Volga basin, the Russian state has historically governed non-Orthodox confessions via indigenous elites, who served as intermediaries. This model of mediated governance was first applied to Muslims, who in 1897 comprised the largest minority (11% of the population). Despite the shortage of labor, Russia’s Muslims were spared from serfdom and conscription and were even allowed to employ Orthodox peasants. Islamic clerics and scholars received privileged treatment in return for securing the loyalty of the Muslim peasants. In the late 18th century, Catherine the Great institutionalized the arrangement by creating religious assemblies in Ufa and Crimea. The assemblies were headed by muftis and performed many administrative functions, such as record keeping.
Starting in the 1870s, the state began to standardize approaches to governance across Orthodox and non-Orthodox groups. Conscription obligations were expanded to Muslims, European colonists, and other religious groups. The religious assemblies were retained, but the Orthodox officials began to intervene in the selection of muftis and mullah, now requiring the knowledge of Russian for their selection.
These interventions undercut state officials’ legitimacy with the minority communities. Protests against the reforms brought together hundreds of Muslim villages and often ended with physical violence. It contributed to the spread of anti-mufti movements and to refusals by lower-level Muslim clerics to share parish statistics with the state. Zemstvo officials often encountered a hostile reception in Muslim villages as the communities feared that they would be baptized.
The minorities’ resistance to being counted goes a long way toward explaining the prevalence of age heaping in the 1897 census. Census enumerators encountered more hostile reception in Muslim communes. Religious intermediaries, tasked with conducting explanatory work ahead of the census and later recruited as enumerators, stopped cooperating after receiving threats from their congregations. In Samara, Kazan’, Perm’, Viatka, and Ufa provinces, Muslim resistance to the census was overcome only with the use of the military.
The prevalence of age heaping in minority districts is thus indicative not only of low numeracy, but also the limited reach of the central government in communities that were historically ruled via intermediaries. This observation is consistent with the APSR article by Paul Dower, Evgeny Finkel, Scott Gehlbach and Steven Nafziger, who show that districts with sizable religious minorities experienced greater peasant unrest at the time of the Great Reforms.
Illegibility in the eyes of the state may have insulated minority communities from state meddling in good times, but it was costly in times of bad harvests and epidemics. Brendan McElroy and I find that state relief during the 1891-92 famine was considerably lower in districts with higher shares of minorities and a higher incidence of age heaping. As Darrow notes, Russian officials themselves realized that better data on the population and its ages would have facilitated the targeting of famine relief.
Modern states have much greater capacity to gather information about their populations than Imperial Russia. Yet they struggle with legitimacy among ethnic and religious minorities in the same way, with terrible consequences for the quality of life and health of millions of people.
 The census collected data on ages up to 109, but I focus on a narrower window, 15 to 74, in part to deal with the dwindling sample size for older ages at the district level.