New article in Debates in the Digital Humanities 2016

Now available online, accessible to all: Alien Reading: Text Mining, Language Standardization, and the Humanities. Preview: For all the talk about how computers allow for new levels of scale in humanities research, new debates over institutional structures, and new claims to scientific rigor, it is easy to lose sight of the radical difference between the way … Read more

New article in ELH

Collin Jennings and I have a new article in ELH: “A Scientifical View of the Whole”: Adam Smith, Indexing, and Technologies of Abstraction. Abstract: While numerous literary scholars have raised concerns about the capacity of computational methods to reveal unrecognized features of literary form and content, few have explored the approach of interpreting these methods in relation … Read more

The Depoeticizer

UPDATE 2020: I have since made a much more advanced program along these lines using neural networks. The Depoeticizer changes a text to bring it more in line with the expectations created by a statistical language model. Specifically, it uses an ngram model—the same kind that underlies both autocorrect and Markov-chain nonsense generators like the … Read more

Now online: The Distance Machine

My latest project is The Distance Machine, a new tool for visualizing language change. The Distance Machine uses historical data about word usage to highlight words in a text that were unusual at a selected point in time. The idea behind The Distance Machine is to raise a question that we do not often ask when reading … Read more

Coleridge Bot

Update 2024: The Twitter bot was shut down last year due to API policy changes. In her 1855 autobiography, Harriet Martineau recounts a meeting with Samuel Taylor Coleridge near the end of his life: I am glad to have seen his weird face, and heard his dreamy voice; and my notion of possession, prophecy,—of involuntary … Read more

New article in Literary and Linguistic Computing

Collin Jennings and I have an article in Literary and Linguistic Computing: Visibility and meaning in topic models and 18th-century subject indexes. Abstract: This article addresses the ‘meaning problem’ of unsupervised topic modeling algorithms using a tool called the Networked Corpus, which offers a way to visualize topic models alongside the texts themselves. We argue that … Read more