The Distance Machine shows how the word underlie emerges from obscurity in the mid-nineteenth century.
The Distance Machine shows how the word underlie emerged from obscurity in the mid-nineteenth century.

My two most important digital humanities projects are the Distance Machine, a web application that identifies instances where a historical text is using a word that has not yet come into common use, and the Networked Corpus, which visualizes statistical topic models of texts.  My articles on these projects can be found on my publications page.

I am the developer and maintainer of the “bursts” package for the R statistical computing environment. This package implements Jon Kleinberg’s burst detection algorithm, which can be used to detect bursts of activity in streams of temporal data.

I have done a number of experiments with algorithmically generated text. The Coleridge Bot is an algorithmic poetry generator that creates nonsense verse in rhymed meter. The Depoeticizer alters a text to make it less poetic—that is, more in line with the expectations created by a statistical language model.

I have also participated twice in Darius Kazemi’s National Novel Generating Month, which challenges people to write a computer program that produces a 50,000-word novel. My 2014 entry, The Story of John, compiles all the sentences mentioning the name “John” in a corpus of American fiction to create a single novel about a character called John. For the 2015 competition, I wrote a program called the Synonymizer that replaces every word in a text with a random synonym. I used this program to turn Henry James’s novel The Portrait of a Lady into The Portrayal of a Ma’am.

Back in 2003, I created Homespring, an absurdist programming language based on the metaphor of data as fish swimming in rivers. Since then, a few other enthusiasts of esoteric programming languages have created their own versions. I have gathered links related to Homespring here.

The source code for most of these projects is available on my GitHub page.