Wednesday, June 4, 2014

Hurricanes with male names are at least as deadly as those with female names

Lots of news services have been passing on the startling conclusions of a recent academic paper in The Proceedings of the National Academy of Sciences, a quite high-impact journal, that, and these are direct quotes from the paper in question's abstract:
feminine-named hurricanes cause significantly more deaths than do masculine-named hurricanes. Laboratory experiments indicate that this is because hurricane names lead to gender-based expectations about severity and this, in turn, guides respondents’ preparedness to take protective action.
I'll just show you this graph I made with the data published in the PNAS paper, then explain further below (click to enlarge):

The blue (male) line is almost always above the orange (female) line. What gives?

I was certainly not the only person to be skeptical of the paper's conclusions; Tyler Vigen used his usual satirical approach to good effect, showing how remarkable spurious correlations can be. But the thing is, this idea that people wouldn't take female-named hurricanes seriously as a threat may sound dubious, but it also sounds plausible, and I imagine it pushes some buttons.

I am not at all qualified to comment on the soundness of the social science in this paper, but I do think the data analysis is quite flawed.

First of all: it's kind of a rule that one should compare apples to apples. Before 1979, all hurricanes had female names. While this by itself does not invalidate their hypothesis that, all other things being equal, a feminine-named hurricane will result in more deaths, all other things were not equal before and after 1979. Off the top of my head, back when hurricanes had only female names, meteorologists were not as good at predicting the severity and paths of hurricanes as they later became with the help of experience and, especially, computers, and communications technology was not as good at relaying information and evacuation orders. Neither of these factors were addressed in the paper; isn't it worth looking into whether other factors besides name gender contributed to deaths?

So if we limit our analysis to post-1979, when we can directly compare male and female names of hurricanes, the masculine hurricanes caused more deaths up until 2012's Sandy. At present, the feminine names are barely ahead, 459 to 413. This is at least counter-evidence to the paper's claim that "changing a severe hurricane’s name from Charley to Eloise could nearly triple its death toll." (To be fair, they didn't just look at whether a name was male or female -- I know one male named Sandy -- but how "masculine" or "feminine" a study group considered the names. This doesn't change the fact that they ignored much more plausible reasons for deaths prior to 1979.)

I find it curious that the researchers limited their analysis to American deaths; hurricanes kill a lot more people before they ever reach the United States. Of course, a greater proportion of non-Americans are too poor to shelter or evacuate, but this strikes me as a combination of partial cherry-picking, circular reasoning and insufficient research: they limited their data to people affluent enough protect themselves against a hurricane, and then claimed they died because they didn't protect themselves a hurricane, without actually looking at whether or not they protected themselves against a hurricane.

For example, the second-most deadly hurricane on their list, Diane, killed 200 people in 1955 despite being only a Category One (Five is the strongest) for which evacuation orders are rarely if ever given. The reason it was so deadly is that Hurricane Connie passed through the same areas in Pennsylvania and Connecticut a few days before, saturating the ground so that Diane caused massive floods.

At least they left Katrina off the list; I think most people would agree it's probable that there were some social, economic and political factors that contributed more to its 1,833 deaths than its name. But their reason for considering Katrina an outlier was that it "leads to a poor model fit due to over-dispersion." There's kind of another rule in data analysis: you don't choose your data to fit your model, you choose your model to fit your data.

I will admit that the researchers' laboratory studies succeeded in convincing me that the people they studied (including Amazon Mechanical Turk users, not exactly a representative cross-section of people who might ignore a hurricane) answered questions in such a way that they appeared to assign lower risk to hypothetical hurricanes with more feminine names. It's just rather a stretch to claim that:

(a) This laboratory result is truly an indicator that in a real-world scenario these people would actively ignore the risk of dying in a hurricane; and

(b) That there is any risk-ignoring behavior correlated with hurricane deaths at all. (There very well might be. But the researchers didn't even attempt to find out. There was no historical data, no text mining of contemporary news sources, just a bare minimum of meteorological data, damage and death assessment.)

PNAS is a good journal (and always a barrel of laughs when you say the acronym out loud). I'm sure they'll get it better next time.

UPDATE Randal Olson, who is definitely an expert in such matters, pointed out that a more convincing graph would be one that showed deaths from hurricanes were more frequent in general before 1979 when they started giving them male names. So I whipped one up quick in Excel. Katrina of course incredibly skews the aggregate data, but you can see it was more common for any individual hurricane to have over a handful of deaths before 1979 (click to enlarge)

I didn't control for storm severity as Randal suggested, but I'm reasonably confident it will change nothing: the six storms that caused more than 100 deaths were categories 1, 1, 2, 4, 5 and 5.


  1. for this kind of data (several extremely high points and low the most of other), is helpful to take logarithms and then compare them.

  2. It also helps to get the data right. I, too, took a quick look at the data used by the study. The 2008 Atlantic hurricane season statistics report that 3 days after the beginning of Hurricane Gustav (which caused 112 direct deaths), Hurricane Hanna came along and caused 500 direct deaths. Where is that on the graph above?

  3. It also helps to get the data right. I, too, took a quick look at the data used by the study. The 2008 Atlantic hurricane season statistics report that 3 days after the beginning of Hurricane Gustav (which caused 112 direct deaths), Hurricane Hanna came along and caused 500 direct deaths. Where is that on the graph above?

  4. I think that's because the study only considered American deaths; I have no idea why they made that choice.

    1. That is what I thought, too, but there were only 41 total direct US deaths attributed to the 2008 hurricanes, yet Gustav + Ike add up to over 100 on this graph. Or am I reading it wrong?

    2. The study looked at both direct and indirect deaths as tabulated by the NOAA monthly reports.

  5. I haven't seen this addressed at all:

    After 1979, we started giving hurricanes male names, yes? Okay, that is all and well, but the process by which that happens is not random. We alternate between male and female names.

    I don't know for sure, but imagine that we started with a female name when this process began. This results in the data we have today. Okay, but imagine that we started with a male name. All of the deadly hurricanes, like Katrina and Sandy, would have male names, and we wouldn't be having this conversation right now.

    The non-randomness of the selection process introduces a potential bias. The authors are trying to make casual or quasi-correlation claims but there was no random selection in which hurricanes got what names. Add to this that their masculine-feminine rating scale was not drawn from the general public and you get a very flawed presentation of the study's data.

  6. If I were doing this, I'd adjust for severity and population at area of landfall: build out a full, plausible model. I haven't looked at the distribution, but just off the cuff since it's cumulative deaths being counted that sounds like a Poisson process, so I'd probably end up fitting a Poisson model. But you could also think of these as risk factors and try to cast them in terms of raw probability and build up a bayesian model.

    Another thing to look at is *all* the storms, not only hurricanes. These things get named or not named early, and so is there a tendency to name marginal storms just prior to obvious big storms? And is that tendency to give a marginal one a male name *so that* a female likely hurricane can be named? (A selection bias worth checking out.)

  7. When I first read the paper and saw they did not analyze anything but the most basic information about the hurricanes (category, deaths and atmospheric pressure), my first instinct was to look at hurricanes' speed, direction and point of landfall to determine whether they were confounding factors in the analysis. Sadly, I didn't even need to go that far.

  8. I am fairly certain they used American deaths as opposed to American and, say, Honduran deaths for a variety of reasons. For starters, the "built-in" bias would be for lower deaths among Americans than Hondurans because we have better infrastructure and a 'healthier' system with more actively prepared individuals. E.g., your base chance of survival would be higher in America than it would be in non-American chances. But more importantly, it taps into ethnography and culture. You can't make a generalization about US culture using data from non-Americans.

  9. To be fair, the authors have posted responses to a number of issues raised by many, including why they kept data from pre-1979 as well as a little bit more about how they modeled the data (and other variables they reviewed besides gender-ness of name):


Please leave comments & corrections here. Courtesy is appreciated.

Popular Posts

Scroll To Top