The Depoeticizer

The Depoeticizer changes a text to bring it more in line with the expectations created by a statistical language model. Specifically, it uses an ngram model—the same kind that underlies both autocorrect and Markov-chain nonsense generators like the Coleridge Bot. But instead of using this model to generate new text, the Depoeticizer uses it to “correct” an existing text, altering words by one or two characters so as to bring the text closer to the norm. It is, in effect, a spell-checker gone berzerk, which can be especially useful if you want to make a poem look more like the language of AP news feeds and financial newspapers.

This program includes four of the Continuous Speech Recognition-III models that are available here. It works by assuming that the text was generated based on this model, but that it has a certain chance of containing “typos.” Based on this, it (approximately) reconstructs what the text is most likely to look like without the “typos.” The higher the typo probability, the more likely the program is to change something. A probability of 0.5 means that the program will have no preference between keeping things the way they are and changing them; a probability above that means that it will prefer changing things to keeping them the same.

The source code for this program is on GitHub.