Tuesday, October 21, 2014

The Most Decade-Specific Words of the Past Two Centuries

Click to to see a zoomable standalone image:

This is from an analysis of Brigham Young University's Corpus of Historical American English, sort of a way-better-curated and easier-to-search version of Google Ngram Viewer. It covers a selected corpus of English from different genres and sources from 1810 to 2009.

Of course, the analysis is biased towards words at the beginning or end of the date range. We haven't stopped using the top word, 'soviet' (and we probably never will); as the decades pass, its frequency per decade metric will decline and decline, barring an unexpected return of the USSR. 'Soviet' also gets a boost because it's both a common and a proper noun, and I only used words that appeared in the Moby Scrabble list, which excludes proper nouns. I decided not to leave this word be, since it's totally different in usage from the top proper nouns that were excluded.

Almost all of these words are modern one, showing that the English vocabulary has been more in flux in modern times (the results are normalized per decade, so the terms do indeed take up a higher percentage of all words in the corpus from that decade). There are only five words that were not used in the first decade of the 21st century, and some of them are common-and-proper like 'soviet'. They are also the only words used in only six or fewer decades.

Words that were used in 16 decades or more were omitted; they were mostly uninteresting words like articles, prepositions, etc. that would have been removed by a common stoplist anyway.

Link to GitHub Repo

Link to IPython Notebook


  1. "Video" was common in the 1820s and there is no mention in the article.

    I'm curious.

    1. Most of the early uses of 'video' were either quotes in Latin (it means 'I see') or the archaic spelling of the capital of Uruguay, Monte Video.

  2. As far as I know, all meanings of "nazi" are derived from the National Socialist German Workers Party (aka Nazi Party), which was founded in 1920. How do you account for the use of that term before that date?

    1. They're OCR (optical character recognition) errors; the OCR engine tries to find a word in its dictionary, and it isn't smart enough to take dates into account. For example, an entry from 1901 has the following: "...and the conduct of the Vs.'. we have lately heard little. There Nazi been no news from South Africa to speak of all the week. Whenever that..." I would guess the OCR mistook "has" for "Nazi!"


Please leave comments & corrections here. Courtesy is appreciated.

Popular Posts

Scroll To Top