As someone who does historical research, I think the most common question I get is “Where can I find historical data on X?”
Now, this is actually a very intelligent question (although unfortunately I don’t always know the answer). While some locations of historical data are obvious, especially if it’s a matter of public record, many times ye olde data remains elusive.
In any case, often the best advice I can give is to find a scholar who works on that particular historical topic, and ask them what they’ve seen — during their research, in their archives, or as part of their networks. This is the quickest way to effectively jumpstart your historical data search.
That being said, today I’d like to share some advice for finding historical data online. In this post I’ll focus specifically on key databases and search engines. While not exhaustive, these resources can provide a quick start for an eager HPE scholar.
At Broadstreet, we’ll also be hosting various data resources; stay tuned. And I’ll return to national libraries, as well as data collection and digitization, in future posts. Until then, happy hunting!
1. Dataset Repositories
One place to start your historical data search is with the powerhouse data repositories in the social sciences:
- Harvard Dataverse: https://dataverse.harvard.edu
There are many advantages to using pre-existing data repositories. Empirical datasets found here have data digitized and cleaned for use in academic research, and these complete datasets often includes extensive documentation of the variables and samples within (as well as replication code for published work). Data repositories often include all types of data, too — don’t forget images and maps (sometimes with geocoded locations), as well as qualitative or public records data that have been digitized and put online. While some datasets are restricted to academic institutions or members, increasingly many datasets are “open access” (viewable to the general public).
Take advantage of advanced filtering of keyword and time periods to find the data you need. Search terms such as “historical data” or “witchcraft trials” can be used to zero in on specific sources, though sometimes you’ll need to be creative with keywords. Repository records are also linked to existing studies that use this data. This makes it easy to see not only what research has been done already, but also information about the nature of the data and any assumptions or bias that the authors had to address (typically found as a discussion within the published piece). You also might find similar papers on your selected topics, which might lead you to the data you need.
Using these repositories, you can also search across studies by specific variables. For example, say you wanted to know which ICPSR datasets had variables that referenced “clergy” (maybe collecting whether the respondent was part of the Church, or data on of numbers of church staff, etc). If you search this term in the ICPSR, within a specific time period (ie before 1900), you’ll find clergy variables appearing in “Height of Students of the Ecole Polytechnique, 1794-1887,” and “Japanese-American Research Project (JARP): a Three-Generation Study, 1890-1966,” and “Pennsylvania Abolition Society: Census of 1838,” and “Executions in the United States, 1608-2002” (no comment on that last one).
What if your ideal data is interdisciplinary? The Harvard Dataverse demonstrates the utility of using repositories to search for data across disciplines, and even provides links to other thematic databases. For example, a quick search for historical data for Africa will take you to the World-Historical Dataverse (a separate resource provided by CHIA at the University of Pittsburgh). Via the Harvard search engine, you’ll easily find data on African Population Estimates, 1850-1960, Religion in Punjab, India 1901-1931, or even a handy list of all the datasets in the World-Historical Dataverse.
But what about older data? While the further back you go, the harder it is to find publicly available, there are still resources for those wishing to study antiquity. Check out the Digital Atlas of Roman and Medieval Civilization (DARMC), which has a Geodatabase of Ancient Ports and Harbours, the Roman Road Network, and even a a source for Carolingian Coin Hoards: AD 751-987 .
Another pattern I’ve noticed is that folks are generally surprised by the sheer variety of historical data that can be found in ICPSR and Harvard Dataverse. Yes, much of the data leans towards more recent centuries and established democracies, however, don’t underestimate the range. To prove my point:
- Opium export data for New York Chamber of Commerce, 1870-1912
- Slave Routes Datasets, 1650s – 1860s
- Marriage Strategy Among the European Nobility, 1500-1800
- Operation Barbarossa and Soviet Counterattacks, 1941
- United States Historical Election Returns, 1788-1823 (ICPSR 79)
- Basemaps of Intendencias in Colonial Spanish America, 1775-1808
- United Nations Roll Call Data, 1946-1985 (ICPSR 5512)
- Children’s Wages in Britain, 1280-1860
- Homicides in New York City, 1797-1999 [And Various Historical Comparison Sites] (ICPSR 3226)
2. Google Dataset Search: https://datasetsearch.research.google.com
Google Dataset Search is a relatively new resource, out of beta testing and extremely easy to use. It allows for simple keyword searches, across multiple languages, but more importantly is is able to troll through the entire internet to look for data.
This engine also searches existing databases, like Harvard Dataverse; so some results might be duplicative. But it also includes search results from a wide variety of sources, which might increase the chance you find the historical data you want — this is because particularly across disciplines, there’s wide variation in to what extent scholars submit data to a centralized database (opposed to hosting it at their university, personal website, or grant funding institution).
For example, if you use Google Datasets to search for “witchcraft trials,” you’ll find among the results a a research project called “Survey of Scottish Witchcraft, 1563 – 1736” that created a dataset of all individuals accused of witchcraft in Scotland. But Google brings up a link to another search engine for data, called DataCite; keeping clicking through and you’ll be brought to the actual data, which is hosted by the University of Edinburgh. In a few painless clicks, you find a dataset you may not have found using other methods.
The downside of such a wide reach is that results are sometimes…questionable. A search for “pirates 1800” will return a website for male steampunk costumes, for example, which might not be the research angle you are going for. Still, it’s a promising new resource, and worth checking out.
This ends Finding Historical Data I; to be continued…