Trendiest baby names in the Social Security database (determined with an analytical chemistry technique)

A few attempts have been made to determine the trendiest baby names in the U.S. Social Security Administration database; FlowingData, for example, looked at the quickest rises and falls, and determined that Catina was the most flash-in-the-pan name; however, at its peak, it comprised only 0.0097% of girls' names. This is a perfectly legitimate analysis, but it's been in the back of my mind that to measure an admittedly ill-defined quality like "trendiness", maybe overall popularity should count as well as steepness of rise and fall.

Therefore, I turned to a technique I've used in chemometrics (I knew it would come in handy one day, it's been years since I've touched a gas chromatograph) to analyze peaks for both size and sharpness. First the results:



For comparison, here are the much sharper peaks for the two names that had the quickest rise and fall regardless of overall popularity, Catina and Deneen*, and then those peaks in comparison to the trendiest name according to this technique, Linda:




Here's an explanation of how the "trendiness" score for this analysis was determined; peak height divided by peak width (which can be measured in various ways) is a pretty standard metric in chemistry:


With chromatographic peaks we normally use 50% peak heights, but they're of a more predictable shape. The 10% figure I chose is entirely arbitrary, but it seems to strike a good balance between allowing and disallowing names due to weird shapes and baseline noise. Nothing changes if you go down to 5% or up to 20%.

The beauty of this approach is that it is almost equally sensitive to changes in peak height or peak width, i.e. popularity of the name or length of time the name was popular.

As has been remarked by many analysts, girls' names tend to rise and fall in popularity higher and quicker than those of boys; this analysis bears that out. There are really only two boys' names that one would consider a sharp peak; the other three are presidents' names or, in the case of Dewey, that of the hero of the Spanish-American war.

As always, be wary of numbers from this dataset before 1936, when social security numbers were first assigned.

I've put the top 100 trendy boys' and girls' names on my other, nerdier blog, prooffreaderplus.com.

Finally, here are links to my Baby Name GitHub Repo, and to an IPython notebook for this analysis.

* The names Catina and Deneen come from a soap opera and a musical act, respectively. Thanks to a reader who pointed out that missing values (nobody was named Catina before 1949) had shifted the peaks down around the year 1900; whoops, that was quite careless of me. The graph is now correct.


0 comments:

Please leave comments & corrections here. Courtesy is appreciated.

Copyright © 2012 prooffreader.com