Update: here’s my final edit of the chart; I think the city labels are much less misleading now. I’ve come across a much more fine-grained data set, albeit from 1995; you can see it in my Nov. 27, 2013 blog post.
Here’s the original, which seemed to imply that the bars were only made up of population from the indicated cities, whereas the bars indicate the population of the entire country at the same latitude of those cities:
A co-worker and friend happened to mention that Vancouver was further north than Montreal; I sort of knew that, but I was surprised to find out it was 400 km further north. So I was curious, and tried to find a histogram of Canadian population by latitude; maybe my Google fu was lacking, but I couldn’t find one, so I decided to make one myself.
Little did I know what I would discover; that data is not easy to obtain. There is lots of population data available for download from the Statistics Canada website, but it does not contain geographical coordinates, and StatsCan uses its own defined areas called census subdivisions. They have available for download geographical boundary files, but they would have required an amount of computation rather disproportionate to the task of simply determining latitudes.
Luckily, StatsCan also makes the population available by Forward Sortation Area, the first three letters of the Canadian six letter postal code, e.g. the FSA of the Canadian parliament at postal code K1A 0A9 is K1A. So now it was just a matter of finding out the latitudes of FSAs or postal codes. Simple, right?
Wrong. Canada Post considers its postal codes intellectual property subject to copyright; a license to use and analyze it costs $892 a year for StatsCan’s info, and over $5000 for many business products. They are suing a website for providing information on postal code geography. Universities used to be able to access Canada Post’s geographical data, but no longer. I work for a university, and the reference library has someone who is able to take the publicly available ArcGIS files and determine the centroids using the expensive proprietary commercial software for which the university has a license.
So: the population data is divided into 1600 FSAs, which is pretty decent resolution. The centroid (geographical center) for most postal codes fits reasonably well within the 0.5 degree latitude (about 55 km) resolution of the graph, except of course for the very large FSAs the farther north you go. But in any case, these areas would have had to be aggregated somehow to even be visible on the scale (for example, if if the northernmost FSA, X0A, were spread out among its 14 degrees of latitude), so I think this is a reasonable compromise.
A note on the city labels: I tried to give the largest municipalities that contributed to the population in each bar of the histogram as an aid to understanding, not as a systematic data set. This became difficult for some of the larger FSA’s; it was difficult to match the latitude of a town with the latitude of the centroid of its FSA. So in some cases, I may have used a town with a population of 2,000 when there was a town with 3,000 people at the extreme north or south of the FSA. And a note about Edmonton: it straddles two bars because the center of the city is almost exactly on the demarcation, 53.5 degrees north. Edmonton is a bit smaller than Calgary, but there are other sources of population in each latitude than the city mentioned, so do not draw the wrong conclusion from the size of the bars.
You can peruse the data I used in this Google Doc.
Comments are welcome, even, nay especially, critical ones.
EDIT 2013-10-16 14:49 GMT: Montreal straddles the 45.5 degree latitude, and by marking the 45.5-46.0 bar as “Laval”, the graph appeared to be indicating that Laval had a larger population than Montreal. I’ve explained how the labels are generated, but it’s an obvious conclusion to draw from a glance at the map without reading the methodology (and the methodology had to be tweaked for Edmonton and Montreal, which straddle the cusps of the graphs, and the centroids of the FSAs are problematic to begin with). Clarity is the most important thing, so I’ve updated the bar to read “Laval & Montréal”. Thank you to the commenters in Reddit’s dataisbeautiful forum for pointing this out.
EDIT 2013-10-16 15:33 GMT: When you’re wrong, you’re wrong, and I was wrong. My labels were utterly misleading. Now I have put the major contributor AND every Canadian city with over 100,000 population on the graph. I had intended the labels just as a geographical reference, but I definitely did not think through what fresh eyes coming to the graph would think.
EDIT 2013-10-16 21:53 GMT: These labels are really getting me in trouble. I produced the graph first without them, but I envisaged a torrent of “You should have indicated where these people live!” I’ve removed the most northerly ones, because again, they’re misleading. Lesson learned: less is more.
EDIT 2013-10-16 22:41 GMT: Added hi-res version without labels. I think that’s enough editing today. Enjoy! And thanks for all the feedback! The vast majority of it was very constructive, it’s appreciated.