Tuesday, October 27, 2015

Composition of Facebook photos by gender of account holder

I downloaded 2400 profile pics by scraping http://www.facebookrandomusers.com, and then manually sorted them into the various categories. I came up with the categories to reflect the most salient features, e.g. whether they photo had a child in it, regardless of whether it had adults in it too.

The results are about what I would have predicted. Men are more likely to just show themselves, or some image that is not of a human being, or the default photo (presumably by an abandoned account), while women are most likely to show themselves with one other adult (who may or may not be a romantic partner), or a child.

Tuesday, October 20, 2015

Hexmaps of the 2015 Canadian federal election

My apologies, these maps don't display well on mobile devices; I tried to fix it, but then they didn't display well on wide desktop screens. I need JavaScript lessons.

Riding-by-riding hexmap

Last night as of this writing, October 19, 2015, Canadians once again proved how difficult polling is in this country, and tossed the ruling Stephen Harper Conservative government out, to be replaced by the Liberals, led by Justin Trudeau, son of former PM Pierre-Eliot Trudeau. The left-learning New Democratic Party (NDP), elected in 2011 to be official opposition for the first time in what was called the 'Orange Wave', suffered an 'Orange Crush'.

It's hard to make maps of Canadian elections, because each riding has approximately the same number of electors, yet are of vastly different sizes due to particular geography of Canada. The smallest riding, at 6 square kilometers, is Toronto Centre; the largest, over 300,000 TIMES LARGER, is Nunavut (1.8 million sq. km.).

Therefore I made a hex grid of Canadian ridings (since the ridings are of equal population, it's technically a cartogram), which was a challenge. I used Spatialite to calculate shared borders and distances between riding centers, and Gephi to create network graphs. I tried to preserve, as much as possible, the local features (so that ridings which have neighboring borders and/or are close together will be in nearby hexes) and global features (the overall shape of Canada). It was hard, and I had to make lots of judgment calls; I leave it to others to judge the results.

The most striking part of this election as a spectator was on the right; the Liberals won every single one of the 33 seats in the Atlantic and Newfoundland time zones. These results were being released while the polls were still open in the West, an odd feature of Canadian politics. I can't help but think that has some effect.

Overall election results

I know, they're pie charts, which professionals hate, but they have two overwhelming virtues: everyone knows what they are when they look at them, and they (mostly) unambiguously show parts of a whole. The liberals outperformed the Conservatives by a factor of 1.24 in terms of votes, but 1.86 in terms of seats, and with a majority government end up with 100% of the power, which is why there are those who call for an end to first-part-the-post elections and the institution for some sort of more proportional representation scheme.

Traditional election map, for comparison

Here's a traditional map that preserves riding sizes. The smaller ridings are, of course, invisible, so it's hard to get an accurate picture. This is a screenshot from one of Canada's national newspapers, The Globe and Mail, and of course it's zoomable on their website; plus, I suppose I shouldn't criticize them too much since theirs was the only site nice enough to present election data in a form that I could scrape it at 4:00 a.m. the next day.

Hexmap of 2011 election (redistributed to 2015 ridings)

Canada was redistricted to have 338 ridings two years ago instead of the previous 308 (to the consternation of Canadian political statistics website www.threehundredeight.com, I'm sure), and the official government body Elections Canada was nice enough to show what the results of that election would have been with polling stations redistributed to the ridings they fall in now. You can see the Orange Wave in Quebec, and the Conservative majority government led by Ontario and the West, and the non-unanimously red East.

Second-place finishes

You can see most of the two-way battles were between Liberal and Conservative in Ontario and Alberta, NDP and Conservative in the rest of the prairies, Liberal-NDP or Bloc-NDP in most of Quebec, and Con-NDP or Con-Lib in the Atlantic. BC, as usual, has a bit of everything. The lone 'Other' second-place finish in Newfoundland is a former Liberal MP who was kicked out of the caucus amid sexual harrassment allegations and ran as an independent.

Margin of victory

Most of the landslides were in the Atlantic and Prairies, and most of the close finishes were in Ontario and Quebec.

Winners shaded by margin of victory

This allows us to see that the most of the landslides were Conservative in the Prairies, and Liberal elsewhere.

Voter turnout

Turnout was 68-69% (depending on whose numbers you use), far higher than the 61% in the 2011 election. The clusters of high turnout are in Ottawa (a political town), Vancouver Island (lots of old folks who like to vote), the Prairies (lots of dedicated Conservative voters) and the Atlantic provinces (who, it appears, were pissed off enough at the Conservatives to turn out in drovers). The lower vote turnouts seem to happen most often in rural ridings, where it's a longer driver to the polling stations.

Sources: Elections Canada (riding geography and statistics), The Globe and Mail (elections results)
Tools: Python, Chorogrid, JavaScript, Raphael.js, Gephi, Spatialite, Excel

Monday, September 28, 2015

Pope Francis's speech to congress was more similar to the founding fathers' inaugural addresses than to those of Republicans and Democrats

Within minutes of Pope Francis's Sept. 2015 address to U.S. both Republicans and Democrats were claiming the speech vindicated their worldviews. This is not surprising, as modern Catholic values don't map to the American binary: helping the poor (Dem) and immigrants (Dem), against abortion (Rep) and gay marriage (Rep), in favour of action against climate change (Dem), etc.

Surely there must be a way to quantitatively determine who wins this tug of war? Turns out there is: natural language processing.

By comparing the word choice and frequency in the Pope's speech to those of all presidential inaugural addresses since 1789, we can see which speeches are most similar. I chose to use the inaugural addresses because I thought they were more in the same spirit of the Pope's address, outlining hopes and dreams for the nation, as opposed to the more pragmatic, say, state of the union addresses.

Compare the dots closest to the Pope's in the center; mouseover to see the three most characteristic words each inaugural speech had in common with the Pope, compared to all other speeches.

Comparison of Pope Francis's address to congress with presidential inaugural addresses throughout history
(mouseover to see names)

This analysis (and of course others might differ) seems to show that Democrats have a slight lead. 42% of Republican speeches are of above average similarity to the Pope's, while the same is true of 50% of Democratic speeches. But get this: a whopping 100% of 'Other' speeches (by early presidents, including the founding fathers, before the modern two-party system started) are of above-average similarity to the Pope's compared to the rest. 

There are a few reasons this could happen; since as we'll see below, the most common correspondences for the top Pope terms are Republicans and Democrats, my hypothesis is that these same Republicans and Democrats also use a lot of terms the pope didn't use, while the 'Others' stuck in general to a more restricted, common vocabulary with the Pope, talking in generalities about the human condition rather than specifics about partisan issues. (This is borne out by the fact that the Jaccard similarity results very closely match the cosine similarity results; read more here if you want)


The code I used is in this gist, and a more detailed description of my methodology is in my other, nerdier blog. Briefly, I built a TF-IDF* matrix 58 rows down (one for each of the 57 presidential inaugural addresses, and one for the pope) and 8896 columns across (one for each unique word used at least once in any address, minus extremely common words like 'the' and 'and'). I then calculated the similarities of each pair of rows in the matrix, and projected them onto two dimensions using the t-SNE** algorithm. I hacked the algorithm a bit so that it would reflect, in general, the distances to the pope more faithfully at the expense of the distances between presidents. I used a hacked version of Dunning log-likelihood to determine the three most characteristic words in common.

* Term Frequency-Inverse Document Frequency
** t-Distributed Stochastic Neighbor Embedding

Addendum: Top Pope words and most similar presidents for each word

A few things to note about words that were never used in presidential inaugural addresses: 'ibid' ranks so highly because the copy of the Pope's address that was published had inline references, 'merton' is a monk, Thomas Merton, and 'dorothy' is Dorothy Day, founder of the Catholic Worker Movement.
0.32  dialogue   
0.21  ibid       
0.14  people     Cleveland[1893][D], Cleveland[1885][D], Adams[1797][O] & 52 more
0.13  merton     
0.11  dorothy    
0.11  solidarity 
0.1   like       Pierce[1853][D], Bush[1989][R], F.D.Roosevelt[1941][D] & 24 more
0.1   family     Reagan[1985][R], Buchanan[1857][D], Polk[1845][D] & 15 more
0.1   luther     Clinton[1997][D]
0.09  social     Grant[1873][R], Harding[1921][R], Harrison[1889][R] & 19 more
0.09  good       F.D.Roosevelt[1937][D], Bush[1989][R], Jefferson[1801][O] & 45 more
0.09  martin     Reagan[1981][R], Clinton[1997][D]
0.09  human      Reagan[1985][R], Carter[1977][D], F.D.Roosevelt[1941][D] & 32 more
0.09  world      Clinton[1993][D], Truman[1949][D], Harding[1921][R] & 49 more
0.09  common     Obama[2009][D], Bush[2001][R], Eisenhower[1953][R] & 34 more
0.08  thomas     Bush[2001][R], Clinton[1993][D], Reagan[1981][R]
0.08  king       Obama[2013][D], Clinton[1997][D], Garfield[1881][R]
0.08  women      Wilson[1913][D], Obama[2009][D], Bush[1989][R] & 11 more
0.08  especially Monroe[1821][O], Eisenhower[1953][R], Taylor[1849][O] & 12 more
0.08  building   Hoover[1929][R], Eisenhower[1957][R], Nixon[1973][R] & 6 more
0.08  spirit     F.D.Roosevelt[1941][D], Carter[1977][D], Harrison[1841][R] & 38 more
0.08  moses      
0.08  lincoln    Reagan[1981][R], T.Roosevelt[1905][R], F.D.Roosevelt[1941][D] & 1 more
0.08  life       F.D.Roosevelt[1941][D], T.Roosevelt[1905][R], Wilson[1913][D] & 44 more
0.08  dream      Carter[1977][D], Clinton[1997][D], Reagan[1985][R] & 7 more
0.08  dignity    Reagan[1985][R], Bush[2005][R], Eisenhower[1957][R] & 14 more
0.08  god        Lincoln[1865][R], Reagan[1985][R], Nixon[1969][R] & 34 more
0.08  activity   Cleveland[1893][D], McKinley[1901][R], Truman[1949][D] & 2 more
0.08  want       Eisenhower[1957][R], Harding[1921][R], Coolidge[1925][R] & 15 more
0.07  culture    Hoover[1929][R], Obama[2009][D], Reagan[1981][R] & 3 more

Popular Posts

Scroll To Top