In today’s post, I’m going to talk about the curious case of Ancestry.com** — in that it’s a surprising resource for academics looking for historical data.
Ancestry.com (hereafter referred to as Ancestry) has long been a resource for amateur genealogists to track down distant ancestors, but I always thought of this as place for people to primarily chat about family trees. But recently, coauthors and I were working on a new HPE paper that needs early 19th century data from Georgia, and we unexpectedly found some leads at Ancestry.
Upon further investigation, it seems that increasingly one can find systematic historical data on the site — either digitized primary source documents or searchable databases (for example, the US census). Academics have access to Ancestry through university subscriptions, and many public libraries offer free access (and of course people can subscribe to the service for a fee).
It’s not for downloading ready-made datasets, but in a post-pandemic era it’s an unexpected resource for investigating historical cases and developing research projects. It also brings up some interesting questions about public/private storage of historical data (which I’ll get to below). There are also some clear caveats, however, so researchers be warned!
Finding Historical Data
Ancestry is increasingly a good place to access primary source documents for research — digitized records and photographs of census entries, deeds, draft cards, marriage licenses, peerage lists, and even voter rolls (like the one from Australia in 1928, in the header image of today’s post). It covers the US and Europe, but even has collections from all over the globe (baptismal data from Peru or Reformed Dutch Church Records from Sri Lanka, for example).
In a world with no pandemic, and infinite amounts of funding for archival research, we would always go straight to the archives. But with safety, time, and money in short supply, Ancestry is a handy way to access digitized materials, particularly in early phases of a project.
For example, say you want to look at country-level land titling in the 19th century. There is no readily downloadable dataset for this information, so you plan to take a trip and eventually digitize data you find in county archives. But say you aren’t sure if the title records will even have the data you need, and even though you’ve read as many books by historians you can find, you have very little idea what these legal documents even look like.
Here’s where Ancestry comes in — you can actually access assorted digital scans; the photo below comes from Records of Land Titling in Savannah, Georgia, in 1853.
Importantly, this is just one example of this type of document — we don’t know if it’s representative or why it was uploaded in the first place, and can’t really use this for any type of inference. But it gives us an idea of what a document could look like, the data it might contain, whether it’s worth a trip to the archives, and even how much money we need to set aside in our grant for digitization costs!
As always, there are some caveats. Digital photos of primary source data are well and good, but sometimes they are missing the necessary source or attribution data (meaning we shouldn’t use them without verification). Selection bias could affect what data is uploaded to the site. Academics should also refrain from using data that is user-sourced or self-compiled with no primary source document to verify, because error and biases abound.
Digital primary source materials on Ancestry.com can’t take the place of proper data sampling frames in your research design or trips to the archives, but it can provide insight about the historical case and/or better insight on how to execute your project.
Another interesting facet about Ancestry is that it parters with external organizations to create dedicated and stable digital archives. The US census is a good example of this, in that the National Archives worked with Ancestry to put census records online. Its search feature will bring up all demographic data associated with an individual entry, a digitized scan of the entry itself, and can cross check with related household members. You don’t have to search for specific names, either — you can narrow to geographic locations and years, and then poke around.
If you need to download complete datasets of historical census data for the US, your best bet is still a site like ICPSR or IPUMS. (Though Ancestry collaborated with IPUMS on most of these datasets!) But if you want to explore the data with easier functionality, or have specific people or places you want to research, Ancestry.com is a surprisingly useful resource.
In another example, Ancestry partnered with the USC Shoah Foundation to make over 19 million Holocaust Records accessible (free) and searchable to the general public. This includes digitized records from the Arolsen Archives in Germany. This was not without controversy, and there is a useful discussion about making such sensitive data public, and the fact that while Holocaust materials are free to consult, other databases are behind a paywall, in Lerner (2021). But this collection is massive, and has exceptionally unique resources — including thousands of video histories — that would be out of reach for many scholars.
Curiouser and Curiouser
As data is increasingly monetized, this brings up a larger issue for historical data. Ancestry.com is a for-profit site, and its collaborations are public-private archival partnerships. While helping to expand public access to archival records is a noble endeavor, there are naturally some concerns, that are best highlighted in a study by Kriesburg (2017). One is that monetization drives which archives are digitized (and who has first access to them). Another is that the process of digitization and curation from the archives takes many forms, and so might suffer from a lack of expertise if outsourced. Finally, Kriesburg notes another could be that third party platforms “sever” the link between users and the original archival institutions — both in remembering where these records originated, and accessing the expertise of the sites that permanently house them.
This is a larger point for HPE scholars to consider. Because it seems like Ancestry.com is effectively becoming a broad ranging data repository, for all sorts of historical data. We might return to this in future posts!
**AncestryDNA is a different kettle of fish, and DNA privacy is outside my expertise!