Tuesday, September 1, 2015

The Domination of "Dominatrix" among feminine -trix endings in American English

The idea for this analysis came from the wonderful Lexicon Valley podcast, in whose episode Sex Workers last year U. Mich. professor Anne Curzan discusses the rise and fall of English words with feminine endings like -ess, -ette and -trix. Curzan points out that in recent decades the suffix has become indelibly associated with the word dominatrix, at the expense of other words except for a few legal terms.

This sounded quite plausible, therefore I was suspicious and had to check -- and it turns out Prof. Curzan was right:


As you can see from this stream graph, the red area representing dominatrix starts to expand in the 1970s, and by the mid-2000s represents the majority of feminine-ending -trix words.

The data comes from the Corpus of Historical American English (COHA), a curated set texts from of books, magazines and newspapers from 1810 to 2010.

The purplish areas on top represent Latin words used in English texts, e.g. victrix, the feminine of victor. There is no hard and fast rule to differentiate these words from English words; I classified them thusly if the feminine in English is much rarer than the feminine in Latin. The bottom, green, areas are words mostly used in legal contexts, often in probate law, to signify the feminine of executor, mediator and administrator.

Which leaves the orange area, aviatrix, which interestingly peaks in the 1930s, the time of the mysterious disappearance of indubitably the most famous person to be given that title, Amelia Earhart. There appears to be a dip in the 1950s followed by a rise in the 1980s, but given the very low frequency of the word (all of the -trix words in the 1930s are about equal in frequency to the words extirpate or peregrinations), the exaggerated dip may be due to sampling error.

A quick word about methodology: I removed words ending in -trix that are not feminine endings, such as matrix (even in Latin, it's a derivative of the already feminine mater; there is no word mator for it to be the feminine of). COHA is compiled on a per-decade basis, so I assigned the middle year of the decade to each data point, interpolated every 2.5 years and smoothed with the Hamming algorithm with a window of 10 years (to smooth out the sampling error somewhat and get a sense of the signal behind the noise).

Had I used a corpus larger than COHA, that would have helped for the sampling error, but I don't have access to any good candidates. The much-vaunted Google Ngrams corpus is, as in many applications, particularly misleading for this analysis -- as an uncurated corpus, it suffers greatly from availability bias. The Google Books files it is based on are heavily weighted towards the books found in libraries especially university libraries, where there will be many different editions of highly technical books and only a few representatives of anything else (for example, fiction and news). There is a supposedly fiction-only version of the corpus, but it actually gives very similar words frequencies, indicating their classification algorithm is problematic.

Here's what the above graph looks like from the Google Ngrams corpus:

You can see that the purple (Latin) and green (legal) areas are huge, which is to be expected when university books make up the bulk of the corpus. There is in addition a new, blue area representing advanced mathematics texts, where the feminine versions of director, tractor and motor have specific meanings in that domain. The conclusion that directrix is almost as popular as dominatrix doesn't pass the smell test. You can see that the phenomenon of the recent surge for dominatrix is still visible, although aviatrix does not have a surge in the 1930s (expected, given the paucity of news sources in this book corpus).


0 comments:

Post a Comment

Please leave comments & corrections here. Courtesy is appreciated.

Popular Posts

Scroll To Top