The Creation and Survival of Unreliable Data: Mexico’s 1921 Census
Those who have been following the ongoing political developments over the 2020 Census in the United States (not to mention recent posts on Broadstreet) will need no reminder that the process of counting the national population is complicated and political. Beyond the consequences for apportionment and federal transfers, the official census figures that emerge from the 2020 count will become a standard benchmark for measuring U.S. population for years to come. If you asked a random person to find the population of the country in 1900 or 1950, chances are that person would consult the census or a source that relied on it. The small and not-so-small decisions that go into calculating these figures—who should be counted, how the race or ethnicity of respondents should be measured and defined, or even whether response deadlines should be extended during a global pandemic—are usually glossed over by the time the official tallies are recorded in textbooks, government documents, or Wikipedia.
This post examines the case of Mexico’s 1921 General Census of Inhabitants, the first post-Revolution census, which was carried out under challenging conditions and during a critical time to measure Mexico’s population. How many people died or emigrated during the Revolution? How many others were lost to the Influenza Pandemic of 1918, which severely affected the country? What exactly did Mexico look like demographically as the post-Revolution political order began to take shape, culminating in the eventual consolidation of one-party rule?
Unfortunately, it is very hard to answer any of these questions. The design and implementation of Mexico’s 1921 census was fatally flawed, so unreliable in fact that it is not clear whether the figures should be used at all. Lack of capacity was a problem, as was outright fraud. The census almost certainly undercounted the national population by a significant margin. However, the size of this undercount is unclear and differs across space. Some areas—including entire states—seem to have been overcounted, exhibiting strange increases in population that mysteriously disappear in the more reliable 1930 count. In other places, numerous towns appear to have been ignored entirely. Authorities were so slow to process the raw data that the census totals were not published until 1928. Several of the standard population tables were never finished or included. According to Robert McCaa, a prominent historical demographer of Latin America, this was “without a doubt the worst Mexican census of the twentieth century,” which is really saying something given the highly questionable 1980 count.
The problems with the 1921 census have been known for a long time among specialists. However, the degree to which the 1921 data are unreliable is not widely known or discussed outside of this small group of scholars. The Mexican government publishes and distributes the 1921 figures alongside those of the other censuses without any special disclaimer or warning. The data continue to be cited and used frequently, even by careful and serious researchers. These remain the figures that journalists and the general public consult when trying to figure out how quickly Mexico grew over the 20th century or how many people died during the Revolution.
How do we know that this census is flawed? How have scholars dealt with these problems over the last century? Why do so many people and institutions, including the Mexican government, continue to promote census figures that are widely known to be misleading and, in some cases, outright falsified? What are the lessons of this case for HPE scholars? I discuss these questions below.
Why was the 1921 census flawed?
The 1921 General Census of Inhabitants was marred by trouble from the start. The name itself reveals the first problem. Mexico, like the United States, typically completes its decadal census on years ending in “0”: 1900, 1910, etc. This census, originally scheduled for October 1920, had to be delayed following the political crisis and revolt that led to the removal and assassination of then-president Carranza in the spring of that year. Planning for the census restarted quickly under the new leadership—first under interim president de la Huerta and then under Obregón—but officials were forced to delay the count to July and then to November 1921. Under pressure, authorities conducted the census on the basis of 1910 political divisions (the localities and municipios/counties as they existed prior to the Revolution), which had surely changed in the interim.
Some of the problems with the 1921 census can be explained by a lack of bureaucratic capacity. Many of the issues that Volha highlighted in her post on legibility in Russia’s 1897 census are present in this case as well. After taking over, Obregón replaced many of the officials in the National Statistical Office. As historian Moisés González Navarro[1] wrote, the implementation issues were compounded by “the frequent change of [federal] authorities, the lack of communication with many populations, the insecurity of the roads, the lack of cooperation of some local officials,” and other factors (p. 32). More than half of Mexico’s states failed to process and compile their census returns for federal officials (the government only centralized the processing of census returns in 1930). In many other areas, officials were unable to verify local census counts or the tally sheets were not shared with authorities. This included many Mayan communities in Quintana Roo, but also numerous municipios in the north (Sonora, Tamaulipas, Durango), center/west (Michoacán, Nayarit), and eastern (Veracruz) parts of the country (see González Navarro, p. 32–37).
The inherent lack of capacity or legibility was not the entire story. Some of the worst problems with implementation affected the Federal District of Mexico City, the seat of government and a place where the legibility of the population would have been high. Here it seems the local authorities chose not to comply or cooperate with federal officials in processing the census. According to González Navarro, this was mostly due to “apathy,” but he also blames ongoing political tensions with higher-level officials (p. 32). In the city of León in Guanajuato, a major urban center, the initial census was so obviously marred by fraud that authorities demanded that the entire process be repeated in July 1922.
In the end, the census was completed (or at least almost completed) and published. Some authorities defended the count; the director of the National Statistical Office said that he was not merely “satisfied, but proud” of the result (qtd. González Navarro, p. 34). Outside of a small number of self-interested officials, however, most involved stipulated that there had been major problems in implementation.[2]
How bad are the data?
For better or worse, the 1921 census figures represent the only systematic attempt to measure Mexico’s population right after the Revolution and just before a pivotal period of political conflict and state building in the 1920s. It is natural to ask whether the data are really so unreliable that they can’t be used. How much measurement error was there really? Are there adjustments that can be made to salvage at least some of the information?
Experts differ on the latter question, but there is clear evidence of significant error in 1921. The most obvious problem, highlighted by nearly every study on this issue, is a significant undercounting of population. The 1921 census shows a decline in population of over 800,000 people relative to the 1910 count, over five percent of the 1910 population. Taken at face value, this would imply an enormous loss of life in the Revolution, or perhaps massive emigration during the war, especially when the prior trajectory of population growth is considered. Estimates of the death toll of the Revolution and the extent of emigration vary, but virtually no researcher believes that Mexico’s population declined by this much.
There is debate, however, over how far off the tally is. One widely cited figure, from Gilberto Loyo, puts the size of the undercount at around 500,000 people, though this is debated. Some scholars believe that the sizable drop in population was driven partly by overcounting in 1910, but this is also debated. One thing that is clear is that whatever error exists differs across the country. González Navarro finds evidence for undercounts reaching up to 10 percent in the states of Guanajuato and Campeche (p. 33), while other places seem to have been overcounted.
An especially egregious case of miscounting and probable fraud, which is discussed in Greer’s 1966 master’s thesis (cited in footnote 2) and in González Navarro, is the population count for the state of Colima. The figure above presents the recorded population change in the census (relative to 1900) in Colima, its neighboring states (Jalisco and Michoacán), and the nation as a whole. In contrast to its neighboring states and to the rest of the country, the recorded population of Colima rose dramatically in 1921—up 18% from 1910 and over 40% from 1900—only to decline again by around a third by the 1930 count. This incredible population fluctuation almost certainly did not occur. There is no supportive evidence to suggest that Colima’s population skyrocketed and then declined precipitously over the 1920s. González Navarro and others believe that the most likely explanation is that the 1921 population figures were greatly inflated by officials for electoral reasons (p. 34).
Historical evidence suggests that the census was carried out accurately and efficiently in some parts of the country. The problem is that it is difficult, and perhaps impossible, to determine where the figures are reliable and where they are not. Unlike the more reliable 1930 count, there is no way to consult the individual census sheets to investigate evidence of fraud (missing microdata is a problem for Mexico’s 1980 census as well). Scholars attempting to estimate population growth, emigration, or war casualties on the national level have tried to salvage the 1921 data in various ways, usually making ad hoc adjustments based on estimated undercounting. This entails making assumptions, often strong assumptions, about the size of measurement error in different places based on information from other sources, such as earlier and later censuses, vital statistics, or counts of the Mexican immigrant population in the United States.
It is very hard to know whether any of these adjustments are valid in this context. The Revolution devastated parts of the country and led to large-scale emigration. Other disruptive conflicts erupted in the 1920s, notably the Cristero War, and these had enormous impacts in certain areas. This makes it especially hard to know how much of the unusual shifts in population between censuses are due to error as opposed to war casualties, emigration, or internal displacement. Given the well-documented implementation issues with the 1921 census, many scholars are skeptical of using these data at all. In his assessment of the death toll of the Revolution, McCaa writes that “the 1921 numbers are so fundamentally flawed that ignoring them entirely…seems a more prudent strategy”(p. 377).
Why do people still use the 1921 census?
Based on the above discussion, one might wonder why researchers, officials, journalists, and the general public still use and cite the 1921 census. An obvious reason is that the flaws with the census are not widely known or publicized. Outside of a small group of scholars, few have written about or discussed these issues. It is easy to consult and download digitized tables from the 1921 census without necessarily knowing that there is anything wrong with the data.
On its website (screenshot above), Mexico’s National Institute of Statistics and Geography (INEGI) presents the 1921 census alongside the other censuses from 1895 to 2020. If you read the fine print under “Momento histórico,” you’ll see a disclaimer about some of the implementation issues—lack of funding, ongoing warfare, political conflict, and so on—but the commentary ends on a positive note, highlighting the “indisputable” importance of the results. If you skip that tab and simply scroll down to the data section (as many of us would), there are dozens of tables from the 1921 census available for download in Excel format without any asterisk or warning. The 1921 data are also presented as fact in other sources: later censuses, the Historical Archive of Localities (which is problematic for other reasons, as we discuss in the online appendix of this paper), and many other secondary sources.
Even specialists may have little idea of how problematic the figures are, let alone generalists or the general public. As the “official” count, the 1921 census remains the natural source to consult for basic demographic information about this time period.
Lessons for HPE scholars
This is a sad story in many ways. We would love to know what the population of Mexico really looked like in 1921. There are important historical debates about the impacts of the Revolution that are exponentially more difficult to answer given the unreliability of these data. This was only census to ask about respondents’ perceptions of racial/indigenous heritage, which is interesting for many reasons, and one of the only sources of subnational data on education and employment at a critical time in Mexican history. Many people invested a lot of time and effort in trying to implement and then publish the 1921 census under extremely difficult circumstances. It is heartbreaking to think that these efforts may have been wasted.
For Mexico specialists, one obvious lesson is that 1921 census data should be interpreted with a great deal of caution. It is not true that there is nothing that can be learned from this census, but there is clear evidence of severe problems that differ across states and regions. It is certainly worth doing a bit of investigation before relying on any of these figures in empirical work.
More generally, this case illustrates how poor-quality information from a century ago can continue to find its way into academic work and public discussion. The 1921 census looks like a reliable source. It is promoted by the government and used by respected scholars. The digitized information is incredibly comprehensive and easy to download. Most of us only learned of the problems over time and through informal discussion with other specialists. While we are trained to think about potential sources of error when we put together our own datasets, it is tempting to take official sources like this one at face value, especially when the data are cited so frequently. These errors can distort our assessment of important substantive questions, such as how the decade of Revolution affected Mexico’s economy and society.
Finally, Mexico’s 1921 census provides one more reminder that if something looks too good to be true, perhaps it is. When I first started studying Mexico in graduate school, I was shocked at how easy it was to find detailed historical data on this time period. How is possible that someone can simply download a locality-level dataset on population in the immediate aftermath of the Revolution, a time period when the central government only had tenuous political control and violent conflict continued in much of the country? The answer is that it isn’t possible, not really. There are many obvious errors and omissions if you look closer, some of which are severe. The danger is that these problems are not always clear at the outset.
[1] Unless otherwise noted, references to González Navarro refer to the first volume of his 1974 Población y Sociedad en México (1900–1970). México, DF: UNAM.
[2] The above and many other issues are described in detail in Robert G. Greer’s unpublished M.A. thesis, “The Demographic Impact of the Mexican Revolution, 1910–1921,” University of Texas at Austin, 1966