Graphing the problematic aspects of the U.S. Baby Names dataset
Every time I post about the popular U.S. Social Security Administration baby names dataset, I try to acknowledge the fact that there are some serious problems with it — and by “problems”, I mean things the average person unfamiliar with it will assume are true, but which actually aren’t, specially prior to World War II. I’ve covered all of these to one degree or another in my previous baby names posts here and here and here and here and here and here, but there are always a few questions from readers, so I thought it would be nice to be able to link to something that explained all the major concerns clearly and concisely:
Tableau Public’s new Story View feature is well-suited to this kind of presentation, and I’ll add panels if and when I come across more problematic aspects of sufficient magnitude.
I’d like to reiterate one thing: the problem isn’t in the data, it’s in how it’s often presented and understood. The Social Security Administration does not make any false claims whatsoever (although IMHO they could make their disclaimers more prominent). And some of the baby names blogs and websites make a decent effort to address these issues, or at least not to make unsupportable conclusions based on the data.