Tuesday, October 21, 2014

The Most Decade-Specific Words of the Past Two Centuries

Click to to see a zoomable standalone image:

This is from an analysis of Brigham Young University's Corpus of Historical American English, sort of a way-better-curated and easier-to-search version of Google Ngram Viewer. It covers a selected corpus of English from different genres and sources from 1810 to 2009.

Of course, the analysis is biased towards words at the beginning or end of the date range. We haven't stopped using the top word, 'soviet' (and we probably never will); as the decades pass, its frequency per decade metric will decline and decline, barring an unexpected return of the USSR. 'Soviet' also gets a boost because it's both a common and a proper noun, and I only used words that appeared in the Moby Scrabble list, which excludes proper nouns. I decided not to leave this word be, since it's totally different in usage from the top proper nouns that were excluded.

Almost all of these words are modern one, showing that the English vocabulary has been more in flux in modern times (the results are normalized per decade, so the terms do indeed take up a higher percentage of all words in the corpus from that decade). There are only five words that were not used in the first decade of the 21st century, and some of them are common-and-proper like 'soviet'. They are also the only words used in only six or fewer decades.

Words that were used in 16 decades or more were omitted; they were mostly uninteresting words like articles, prepositions, etc. that would have been removed by a common stoplist anyway.

Link to GitHub Repo

Link to IPython Notebook


Popular Posts

Scroll To Top