The trendiest words in American English for each decade of 19th & 20th c. (determined by a chemistry/astronomy technique)

While “trend” has a clear mathematical definition, “trendiness” does not; I’ve chosen a method that is equally sensitive to the nerdy sense of the word (rapidity of rise/fall) AND the common meaning (“trendy” = “popular”). More explanation later; here’s the chart (click to enlarge):

The calculations were done on the Corpus of Historical American English from Brigham Young University. You can see from their content list that they are heavy on books in the 19th century, then gradually newspapers take up more and more of the corpus. The overall amount of fiction is relatively stable, but this trend analysis is quite sensitive to corpus composition in each decade.

Until the 1920s, every popular word comes from books, usually a character name. For example, in the 1870s, there were at least five major books by different authors with a major character (whose name, therefore, got repeated a lot) named Elsie. It appears (at first glance, anyway) that character names had a bandwagon effect, much like baby names do 100 years later.

Also present are deliberately misspelled words like “uv” for “of” and “ter” for “to” (like “Ah oughts ter uv dun somethin”) . This was a style of satirical writing at the time, not all of it racist, but certainly some of it.

In the 20th century, President’s names dominate, except for “planes” during World War II and, surprisingly to me, EPA (for Environmental Protection Agency) in the 1990s. The reason it beat out “Clinton” is that his name kept being used throughout the next decade, and “Bush” because that name has a common meaning as well.

I’ve used a chromatography peak technique (popular in analytical chemistry and astronomy) to analyze non-hard-science data before, here’s a quick visual of how it works:

Here is a list of the 100 trendiest words overall:

word trendiness peak year height (% popularity
at peak year)
width at 50%
height (years)
reagan 0.00338 1985 0.0338 10
nixon 0.00283 1975 0.0283 10
uv 0.00277 1865 0.0277 10
kennedy 0.00275 1965 0.0275 10
eisenhower 0.00224 1955 0.0224 10
ter 0.00169 1885 0.0169 10
communist 0.00166 1955 0.0249 15
planes 0.00122 1945 0.0122 10
jimmie 0.00118 1915 0.0118 10
coolidge 0.00111 1925 0.0111 10
elsie 0.00107 1875 0.0107 10
bradshaw 0.00107 1835 0.0107 10
korea 0.00106 1955 0.0106 10
rollo 0.00104 1855 0.0104 10
vietnam 0.00103 1965 0.0154 15
roosevelt 0.00103 1935 0.0205 20
katy 0.00098 1865 0.0098 10
graeme 0.00094 1865 0.0094 10
eleanor 0.00093 1925 0.0093 10
winthrop 0.00093 1855 0.0093 10
jeff 0.00091 1955 0.0091 10
madeleine 0.00089 1865 0.0089 10
dave 0.00088 1915 0.0088 10
communists 0.00088 1955 0.0132 15
lanny 0.00086 1945 0.0086 10
dulles 0.00084 1955 0.0084 10
pa 0.00082 1885 0.0082 10
amy 0.00081 1865 0.0081 10
jimbo 0.00080 1975 0.0080 10
isabella 0.00078 1835 0.0078 10
kissinger 0.00078 1975 0.0078 10
soviet 0.00077 1955 0.0307 40
redwood 0.00076 1825 0.0076 10
dewey 0.00076 1945 0.0076 10
stitch 0.00075 1875 0.0075 10
gypsy 0.00074 1865 0.0074 10
hev 0.00073 1865 0.0073 10
hitler 0.00072 1945 0.0108 15
elvira 0.00071 1825 0.0071 10
mcs 0.00071 1955 0.0071 10
atomic 0.00071 1955 0.0106 15
cuba 0.00069 1965 0.0069 10
alessandro 0.00068 1885 0.0068 10
wilford 0.00068 1865 0.0068 10
truman 0.00068 1955 0.0102 15
malone 0.00067 1965 0.0067 10
magdalen 0.00067 1875 0.0067 10
korean 0.00066 1955 0.0066 10
rowland 0.00066 1875 0.0066 10
stevenson 0.00065 1955 0.0065 10
mabel 0.00065 1855 0.0097 15
beulah 0.00065 1885 0.0065 10
goldwater 0.00063 1965 0.0063 10
tommy 0.00063 1935 0.0063 10
gaulle 0.00062 1965 0.0062 10
jessie 0.00061 1945 0.0061 10
ramona 0.00061 1885 0.0061 10
vasco 0.00061 1835 0.0061 10
bunny 0.00060 1925 0.0060 10
newt 0.00059 1865 0.0059 10
gubb 0.00058 1915 0.0058 10
epa 0.00058 1995 0.0058 10
ivan 0.00058 1905 0.0058 10
christie 0.00057 1875 0.0057 10
madonna 0.00057 1895 0.0057 10
banneker 0.00057 1925 0.0057 10
hammond 0.00056 1825 0.0056 10
viet 0.00056 1965 0.0084 15
hed 0.00056 1865 0.0056 10
harding 0.00055 1925 0.0055 10
dorothy 0.00055 1905 0.0055 10
subcommittee 0.00054 1955 0.0054 10
elnora 0.00054 1905 0.0054 10
teddy 0.00054 1895 0.0054 10
id 0.00052 1995 0.0052 10
seor 0.00052 1835 0.0052 10
lulu 0.00052 1885 0.0052 10
downing 0.00052 1835 0.0052 10
lucia 0.00051 1825 0.0051 10
montague 0.00051 1905 0.0051 10
lemuel 0.00051 1875 0.0051 10
wich 0.00051 1865 0.0051 10
christy 0.00051 1895 0.0076 15
israeli 0.00051 1975 0.0076 15
bertha 0.00051 1865 0.0051 10
nazi 0.00051 1945 0.0076 15
heyward 0.00050 1825 0.0050 10
watergate 0.00050 1975 0.0050 10
ms 0.00050 1955 0.0050 10
puffer 0.00050 1845 0.0050 10
didn 0.00049 1985 0.0049 10
purl 0.00049 1875 0.0049 10
maroney 0.00049 1875 0.0049 10
nunez 0.00049 1835 0.0049 10
trina 0.00049 1925 0.0049 10
ronald 0.00048 1985 0.0048 10
randy 0.00048 1905 0.0048 10
georgie 0.00048 1875 0.0048 10
castro 0.00048 1965 0.0048 10
lottie 0.00047 1875 0.0047 10

A few observations: most of the words have a peak width of 10 years (the minimum, since COHA’s resolution is at the decade level). Notable exceptions are Roosevelt (FDR was president during two decades, one wartime) and Soviet (a 40 year peak, which means the peak height was quite high to make it on the list). Some words of note: failed presidential candidates Dewey, Stevenson and Goldwater; Newt (but not Gingrich); Hitler and Nazi; Watergate; Ronald (the only presidential first name on the list).

The code used is on my GitHub, but here’s the gist of it (no pun intended):

  1. The COHA 1-gram corpus is restricted, but I have an academic licence. Thanks to BYU for that. On my GitHub, I have summary data, but not the dataset itself.
  2. COHA is arranged in decades; I assigned each word the year in the middle of the decade (e.g. 1970s, which covers 1970-1979, was assigned 1975).
  3. For each word, I interpolated by simple mean the popularity for years ending in “0”. For example, if a word was a 0.0024% in 1975 and 0.0026% in 1985, I assigned 0.0025% in 1980. This was so peak widths for words that appeared only in one decade could be calculated (otherwise they would have a peak width of zero, and have infinite ‘trendiness’).
  4. Instead of interpolating further to calculate peak widths (which would be entering overfitting territory), I used a simple Boolean test to calculate the start and end of each peak. The first time a point at a five-year interval exceeded 50% maximum peak height, the counter started, and then the first time it sank below 50%, it stopped. This means if a word was bimodal (two peaks in different years) with a point below 50% of maximum between the two peaks, only the larger peak was counted. This was not a common occurrence, and it ensured words only ever appeared once each.
  5. “Trendiness” was calculated by the peak height (in % of corpus during that year) divided by peak width (in years, always a multiple of five for reasons explained in the previous step)

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.