Monday, September 28, 2015

Pope Francis's speech to congress was more similar to the founding fathers' inaugural addresses than to those of Republicans and Democrats

Within minutes of Pope Francis's Sept. 2015 address to U.S. both Republicans and Democrats were claiming the speech vindicated their worldviews. This is not surprising, as modern Catholic values don't map to the American binary: helping the poor (Dem) and immigrants (Dem), against abortion (Rep) and gay marriage (Rep), in favour of action against climate change (Dem), etc.

Surely there must be a way to quantitatively determine who wins this tug of war? Turns out there is: natural language processing.

By comparing the word choice and frequency in the Pope's speech to those of all presidential inaugural addresses since 1789, we can see which speeches are most similar. I chose to use the inaugural addresses because I thought they were more in the same spirit of the Pope's address, outlining hopes and dreams for the nation, as opposed to the more pragmatic, say, state of the union addresses.

Compare the dots closest to the Pope's in the center; mouseover to see the three most characteristic words each inaugural speech had in common with the Pope, compared to all other speeches.

Comparison of Pope Francis's address to congress with presidential inaugural addresses throughout history
(mouseover to see names)

This analysis (and of course others might differ) seems to show that Democrats have a slight lead. 42% of Republican speeches are of above average similarity to the Pope's, while the same is true of 50% of Democratic speeches. But get this: a whopping 100% of 'Other' speeches (by early presidents, including the founding fathers, before the modern two-party system started) are of above-average similarity to the Pope's compared to the rest. 

There are a few reasons this could happen; since as we'll see below, the most common correspondences for the top Pope terms are Republicans and Democrats, my hypothesis is that these same Republicans and Democrats also use a lot of terms the pope didn't use, while the 'Others' stuck in general to a more restricted, common vocabulary with the Pope, talking in generalities about the human condition rather than specifics about partisan issues. (This is borne out by the fact that the Jaccard similarity results very closely match the cosine similarity results; read more here if you want)


The code I used is in this gist, and a more detailed description of my methodology is in my other, nerdier blog. Briefly, I built a TF-IDF* matrix 58 rows down (one for each of the 57 presidential inaugural addresses, and one for the pope) and 8896 columns across (one for each unique word used at least once in any address, minus extremely common words like 'the' and 'and'). I then calculated the similarities of each pair of rows in the matrix, and projected them onto two dimensions using the t-SNE** algorithm. I hacked the algorithm a bit so that it would reflect, in general, the distances to the pope more faithfully at the expense of the distances between presidents. I used a hacked version of Dunning log-likelihood to determine the three most characteristic words in common.

* Term Frequency-Inverse Document Frequency
** t-Distributed Stochastic Neighbor Embedding

Addendum: Top Pope words and most similar presidents for each word

A few things to note about words that were never used in presidential inaugural addresses: 'ibid' ranks so highly because the copy of the Pope's address that was published had inline references, 'merton' is a monk, Thomas Merton, and 'dorothy' is Dorothy Day, founder of the Catholic Worker Movement.
0.32  dialogue   
0.21  ibid       
0.14  people     Cleveland[1893][D], Cleveland[1885][D], Adams[1797][O] & 52 more
0.13  merton     
0.11  dorothy    
0.11  solidarity 
0.1   like       Pierce[1853][D], Bush[1989][R], F.D.Roosevelt[1941][D] & 24 more
0.1   family     Reagan[1985][R], Buchanan[1857][D], Polk[1845][D] & 15 more
0.1   luther     Clinton[1997][D]
0.09  social     Grant[1873][R], Harding[1921][R], Harrison[1889][R] & 19 more
0.09  good       F.D.Roosevelt[1937][D], Bush[1989][R], Jefferson[1801][O] & 45 more
0.09  martin     Reagan[1981][R], Clinton[1997][D]
0.09  human      Reagan[1985][R], Carter[1977][D], F.D.Roosevelt[1941][D] & 32 more
0.09  world      Clinton[1993][D], Truman[1949][D], Harding[1921][R] & 49 more
0.09  common     Obama[2009][D], Bush[2001][R], Eisenhower[1953][R] & 34 more
0.08  thomas     Bush[2001][R], Clinton[1993][D], Reagan[1981][R]
0.08  king       Obama[2013][D], Clinton[1997][D], Garfield[1881][R]
0.08  women      Wilson[1913][D], Obama[2009][D], Bush[1989][R] & 11 more
0.08  especially Monroe[1821][O], Eisenhower[1953][R], Taylor[1849][O] & 12 more
0.08  building   Hoover[1929][R], Eisenhower[1957][R], Nixon[1973][R] & 6 more
0.08  spirit     F.D.Roosevelt[1941][D], Carter[1977][D], Harrison[1841][R] & 38 more
0.08  moses      
0.08  lincoln    Reagan[1981][R], T.Roosevelt[1905][R], F.D.Roosevelt[1941][D] & 1 more
0.08  life       F.D.Roosevelt[1941][D], T.Roosevelt[1905][R], Wilson[1913][D] & 44 more
0.08  dream      Carter[1977][D], Clinton[1997][D], Reagan[1985][R] & 7 more
0.08  dignity    Reagan[1985][R], Bush[2005][R], Eisenhower[1957][R] & 14 more
0.08  god        Lincoln[1865][R], Reagan[1985][R], Nixon[1969][R] & 34 more
0.08  activity   Cleveland[1893][D], McKinley[1901][R], Truman[1949][D] & 2 more
0.08  want       Eisenhower[1957][R], Harding[1921][R], Coolidge[1925][R] & 15 more
0.07  culture    Hoover[1929][R], Obama[2009][D], Reagan[1981][R] & 3 more

1 comment:

  1. While this provides useful insight into how similar the wording is, it gives no indication as to whether the intent is similar. You can use the exact same word for completely different purposes. Thus you would need to go beyond simple word comparison and do natural language processing that delves into the actual context of the sentences.


Please leave comments & corrections here.

