When I first experienced Roy Lichtenstein's pop art masterpiece M-Maybe, I instantly understood the visual shorthand: the subject is stuttering over that initial "m" because she is emotionally overwrought, probably with the added effect of cognitive dissonance: she can't really believe her beau was a no-show because he suddenly caught an illness. Lichtenstein drew this motif from romance comics, which were huge after WWII but could never really cope with women's lib and faded out by the late '70s. If you flip through some of these vintage comics, you're struck not only by the incessant pathos of these people's first-world problems, but by the same devices being used over and over: the single tear, throwing oneself on one's bed, holding one's hand to face or temple, the downcast thousand ten-yard stare -- and, of course, the stutter.

Men stuttered sometimes too, of course, but usually when they were flummoxed by the emotional behavior of women, to wit:

Ah, Robin. So chaste, so innocent, so totally not at all homoerotic in those green scaly hotpants* living with a man named Bruce.

If you like vintage comics (especially taken out of context), comicallyvintage's tumblr has over a thousand of them! (I should know, I looked at most of them in search of young ladies with temporary speech impediments.) The old-fashioned use of the words "dick", "gay" and especially "boner" are a chuckle riot.

* Actually, the tights or bare legs, domino masks and capes were a visual shorthand that was totally understandable to the audiences of the '30s and '40s: the circus strongman, the ultimate expression of butch masculinity. It did not stand the test of time.
Because Entropy.
I became unexpectedly unemployed yesterday, and since I don't believe in long mourning periods (or poverty) I started my job search right away, and came across this infographic. Let's be fair: there are far worse infographics out there. But my version of human nature somehow gets more perturbed by almost-competence than by abject failure; I suppose, knowing nothing about the creator, that in my head I'm blaming them for not trying hard enough. Well, if the creator happens to come across this, I totally don't want to hurt your feelings (much), you just need a little more practice, as do we all. 

I like to test data tools and data sets with "edge cases", a fancy word for using them in ways they were not designed to be used (which is, by the way, the definition of hacking). It's informative to see how far things will bend before they break -- and the good thing with data is it's easy to un-break.

Rare occurrences make good edge cases; so do recursive cases, i.e. run a data tool on itself. We looked briefly at the Google Ngram Viewer a couple of weeks ago; what happens if we determine Google Ngrams of the words "Google" and "Ngram"? (By the way, I like to call this kind of approach 'selfremetacursironiferentiality'. I'm sure it will catch on one day so I look like less of a dork when I say it.)

Of course, the frequency of the word "google" after the company was incorporated in September 1998 is predictable: it becomes a very common word (and is even adopted into that hallowed club, The Verb, where Xerox briefly rested and from which Kleenex was inexplicably barred). The only interesting thing about its 2001-2008 (where the data set ends) rise is that it's pretty linear; I would have intuited either positive or negative curvature, but don't forget this is the word's appearance in published, printed matter, not in conversation.

Let's have a look at "google" and "ngram" (both case-insensitive) from 1880 to 2000, before the rise of Google and with a vertical axis about fifty-fold lower so we can see the edge cases (in my experience, the more jagged a line is*, the more interesting it is.**)

That's a lot of use of the word "google" before the company we all know and... well, know... existed. Using Google Books, the mystery is easy to solve: there was a newspaper comic strip character named Barney Google, and a lot of anthologies were published over the years. Not unusually, the technical term "ngram" lags far behind a term used in pop culture; however, it is surprising that around the dawn of the 20th century a term used in computational linguistics would turn up. Again, Google Books solves the mystery: this is an artifact of a lot of directories of names from around this time being poorly scanned; the name "Ingram" is being recorded as "I, ngram" (which sounds like a terrible book title).

The moral of this story, as with all data sets too huge to be curated by humans (and, coincidentally, every other Aesop's fable): things are not always what they seem, so we'll be sure to dig a little before drawing conclusions, especially in edge cases. The next time someone brings up over the water cooler how ngrams were being studied in 1902, you can nod to yourself knowingly.

* of course, sometimes that means it's just noise, but I find noise interesting too
*** that's what she said.

I had a different webcomic planned for this week, but the shiny orange "Publish" button is more tempting to press when I'm half-asleep than the "Save" button, so this one suddenly got to the head of the line!

For those who aren't up on their 1972 celebrity semi-scandals, here's what the comic is referring to. Edmund J. Mittlebaum is entirely invented; the identity of Carly's spurned love has more incompatible theories than the Kennedy assassination. Taylor Swift claims she knows who it is, which makes sense, because if there's one thing Taylor is tight-lipped about, it's ex-boyfriends.

Copyright © 2012