Collin Jennings and I have an article in Literary and Linguistic Computing: Visibility and meaning in topic models and 18th-century subject indexes.
This article addresses the ‘meaning problem’ of unsupervised topic modeling algorithms using a tool called the Networked Corpus, which offers a way to visualize topic models alongside the texts themselves. We argue that the relationship between quantitative methods and qualitative interpretation can be reframed by investigating the long history of machine learning procedures and their historical antecedents. The new method of visualization presented by the Networked Corpus enables users to compare the results of topic models with earlier methods of topical representation such as the 18th-century subject index. Although the article provides a brief description of the tool, the primary focus is to describe an argument for this kind of comparative analysis between topic models and older genres that perform similar tasks. Such comparative analysis provides a new method for developing conceptual histories of the categories of meaning on which the topic model and the index depend. These devices are linked by a shared attempt to represent what a text is ‘about’, but the concept of ‘aboutness’ has evolved over time. The Networked Corpus enables researchers to discover congruities and contradictions in how topic models and indexes represent texts in order to examine what kinds of information each historically situated device prioritizes.