top of page
Writer's pictureSarah Boye

Distant Reading: Text-Mining and Topic-Modeling Tools in the Age of Big Data

Macroanalysis is extremely useful for historians to discover patterns that may be invisible using traditional methods of analysis. One such method is distant reading which aggregates massive amounts of source data (also known as big data), in quantities far greater than any historian could conceivably analyze individually, to visualize long-term trends and patterns. Topic modelling is similar to this concept, but is more focused on a “probabilistic, statistical method that can uncover themes and categories” (Nelson 2011). These methods can be used to create “textual abstractions or data visualizations in lieu of direct images” (Graham, Milligan, and Weingart 2016, 1). Data and text mining, according to E. Thomas Ewing, et al., “offers historians a set of novel methods for identifying patterns, recognizing shifts, and gauging tone in a corpus of primary sources” (Ewing, et al. 2014). However, it is important to note that historians must also establish frameworks to conceptualize this macro-data to mitigate the risk of assuming an all knowing grasp of the “dynamics of continuity and change” (Grossman 2012).

The benefits of turning a macroscope to the information overload created by the mass digitization of sources over the last few decades is an obvious one. Not only does distant reading allow us to interact with our overwhelming corpus of source material, but we also can use the information gleaned from these methods in new ways, which were impossible to our predecessors in the field. An excellent example of this can be found in Edward Ayer’s big data analysis of the Richmond succession debates. In this study, Ayers was able to address a modern debate over the causes of the Civil War by analyzing patterns in the original Richmond succession debate transcripts to show that “the vocabularies of slavery and rights, entangled and intertwined from the very beginning of the United States, became one and the same in the secession crisis” (Ayers 2011).

To explore these exciting methodological tools, I decided to turn to Google Books Ngram Viewer to search for patterns related to our Hungerford School research. In looking at the deed transfer of the property to the Orange County Public School Board in 1952, I noticed a few names that warranted further attention. One of the names included was the American Missionary Association (AMA) which was founded by Northern abolitionists before the Civil War. During reconstruction, the AMA funded and established schools and churches for freed slaves in the south. Given the known association of Eatonville and the Hungerford School to Northern philanthropists, this seemed like a reasonable avenue to investigate.

In the above image is the Ngram Viewer for the search term “American Missionary Association.” The visual representation chart of the frequency of this term peaked in 1889 (two years after the founding of Eatonville), fell until about 1912, rose slightly and then fell again until 1923, then was fairly stable until the late 1940s, and finally fell drastically in 1960.

I next searched “Florida Normal School” which showed a similar peak just around the turn of the 20th century.

While "Florida Education" shows peaks during the time leading up to the sale of the Hungerford School property to OCPS.

Additionally, the Ngram for “Eatonville” (discounting the rise in the 1990s owing to the rediscovery of Zora Neal Hurston) shows a consistent pattern of frequency in the heyday of the Hungerford School with a drop right around 1960.


When viewed within the wider context of reconstruction, the incorporation of Eatonville in 1887, the building of the Hungerford Industrial and Normal School in 1897, segregation and integration of public schools, and the resistance and response to these shifts, patterns can be established that are consistent with regional patterns relating to these topics.

Above is a compilation of the aforementioned Ngrams overlaid with transparency to illustrate the numerous points where the the Ngram Viewer shows a frequency similarity between the American Missionary Association, Florida Education, Florida Normal School, and Eatonville. Granted, these are only temporal "hits" and not actual frequency. It's clear that the AMA's national status impacts the frequency, making the other topics appear insignificant (see below), though obviously, these topics are connected nonetheless and patterns can be discerned when viewed temporally.

The Ngram Viewer generated view of these topics based on their weighted frequency is shown above, but I find the temporal intersections of these three topics intriguing.

Based purely on frequency, there are similarly interesting overlapping patterns between the Ngrams for Florida Education and Eatonville specifically that might warrant further study. It's clear that Ngram Viewer can be useful tool for analysis that can illuminate unexpected patterns that may lead to new research avenues.



Bibliography


Ayers, Edward L. "The Causes of the Civil War, 2.0." New York Times, April 28, 2011.


Cohen, Dan. "Initial Thoughts on the Google Books Ngram Viewer and Datasets," Dan Cohen, December 2010. https://dancohen.org/2010/12/19/initial-thoughts-on-the-google-books-ngram-viewer-and-datasets/


Ewing, E. Thomas, et al. “Mining Coverage of the Flu: Big Data’s Insights into an Epidemic,” Perspectives on History, January 2014. http://www.historians.org/publications-and-directories/perspectives-on-history/january-2014/mining-coverage-of-the-flu-big-data’s-insights-into-an-epidemic


Graham, Shawn, Ian Milligan, and Scott Weingart. "Ch. 1 The Joys of Big Data for Historians." In The Historian's Macroscope: Big Historical Data. London: Imperial College Press, 2016.


Grossman, James. “‘Big Data’: An Opportunity for Historians?” Perspectives on History, March 2012. http://www.historians.org/publications-and-directories/perspectives-on-history/march-2012/big-data-an-opportunity-for-historians


Nelson, Robert K. "Of Monsters, Men -- and Topic Modeling," New York Times, May 29, 2011. http://opinionator.blogs.nytimes.com/2011/05/29/of-monsters-men-and-topic-modeling/

8 views

Comments


bottom of page