I had planned to take a break from blogging during the holidays, but today I saw this post on reddit about the use of the f-word in movies in the dataisbeautiful subreddit, and I was inspired. The top movie on the list I had seen was Eddie Murphy: Delirious; I was 13 when it came out, but nobody I knew had HBO, so my best friend and I had to wait till it showed up in the Betamax tape rental place. We made a lo-fi audio recording (a microphone held up to the TV speaker), and soon had it memorized and spent several years quoting it in all sorts of inappropriate situations.
So, let’s break down the use of the f-word (I admit, I’m being a total wuss, Google hosts this blog and I’d rather not deal with any automated fallout from using profanity, so I’m going to asterisk out all the naughty words) during the movie. Some simple poor man’s calculus (for each use of the word at time x, y equals the inverse of the average of the times of the previous and next use) shows the clustering of swearing during different parts of the film:
It would be great to know what parts of the movie those clusters correspond to: if you go to the bottom of the post, there’s a reversed version of the graph that allows you to see the dialogue (lightly Bowdlerized, again, I’m sorry) line by line.
I’ve been learning how to do Natural Language Programming in Python, and while I didn’t bring out the big guns, I thought it would be interesting to look at some of the simple patterns in word use in the movie:
Normally I would use a stop list to remove common words like “the” and “and”, and a corpus to compare word frequencies, but I think the raw data is the most informative perspective, showing how the profanity rivals the most common syntactic words in Delirious. Here are the top N-grams (words that appear side-by-side):
I’m a contributor to the FullMovieGifs subreddit, so I couldn’t resist the temptation to make one of Delirious. Hopefully Google doesn’t OCR these things; if you want to see it larger, click on it.
Finally, here’s a big, vertical version of the first graph in the blog, which you can mouseover to read the lines of dialogue (is it still called dialogue when only one person’s talking?) to your heart’s content. If you can’t see a really huge graph right underneath this sentence, click here to see it.
I think I’ll be hearing from my mom about this post.
Update Jan. 1, 2014: Whaddaya know, my mom was fine with it.