It’s been a year since my last official MakeoverMonday entry. I’m finally realizing that most if the action is for Sunday, so maybe I’ll do more this year. Week 1 for 2020 looks simple but it’s already confusing me. The task is to makeover this Vox chart from 2014:
The chart shows Gallup poll responses for favorite sports to watch over 70+ years for the three most popular sports (US only). However, the Makeover Monday data covers responses for 19 sports but only seven polls spanning 14 years. So perhaps I’m already breaking the rules, but I’m going to use the full data since it’s available on the same sourced page at Gallup. That includes only seven sports, but others are tiny and can be ignored for this makeover.
Review of the original
I like main design decisions of the original:
- Showing trends over time
- Dropping less popular sports to focus on the main sports
- Smoothing the trend lines
- Labeling the lines directly instead of with a separate legend
- Trying to use semantic colors — I didn’t realize it until I tried to pick semantic colors myself: football fields are green; basketballs are orange; baseball bats are yellow.
- Abbreviating the years so the x axis is not so crowded.
Oddities of the original:
- Uneven amount of trend line smoothness
- Y axis labels and gridlines are at 10% except, the top one is at 13%.
- Labels colors don’t match the lines and are not quite aligned with the ends of the lines.
- Putting the y axis labels above the ticks/gridlines instead of inline with them is not that uncommon, but it still takes me longer to parse the positions.
The uneven smoothness was the most prominent feature for me. At first, I read it as saying the change had been steady for decades before starting to fluctuate in the internet era. However, I realized it was more likely that the poll was conducted less frequently in the past, which is indeed the case.
Continuing that thought, let’s look at all the data values for those sports. Here’s a remake using the same technique as the original, connected line with smooth connections, but also showing the data points.
This matches pretty well, except for 1972 and 1994 when the polls were conducted twice each year. It looks like the Vox author ignored one of the polls in each of those years. Also, the data I retrieved has an additions year of data (2017) after the Vox article came out in 2014.
Beyond the granularity, the shared data includes seven sports instead of three, adding ice hockey, soccer, auto racing and figure skating. Of those, only ice hockey and soccer had more that 2% of responses.
The dates are given as month/year values which distinguishes the multiple polls taken in some years. One year, 1997, the poll question was “What is your favorite sport to follow?” instead of “to watch.” The results weren’t that different, but I can imagine quite different interpretations.
Though I didn’t use the official 19-sport data set which only goes back to 2004, I noticed it also tallied responses such as “other” (about 5%) and “none” (about 13%). I can dream that ultimate frisbee accounts for a decent chunk of other, but unlikely. I’m sure it would be up there for a question on “your favorite sport to play.”
I do think the long-term trends for the main sports over time is a good message, so I sought to show that while minimizing the recognized oddities. The most straightforward thing to do is a scatterplot and a real smoother (in this case a spline regression):
The data marks help communicate the irregular polling and variation but also add a bit of visual noise. I didn’t try abbreviating the years, and I didn’t put a lot of effort into lining up the line labels. One downside of attaching my labels to data points in the graph is that I had to expand the graph which means 2020 and beyond is now visible on the date axis. Not terrible but seems like a negative.
I added the next two sports for a fuller story and since soccer seems to be really gaining this past decade. And for some pedantic reason, I dropped the 1997 responses when the poll question had a slightly different wording. Didn’t want to have to add an asterisk to chart title.
Another way to show the irregular polling would be to show vertical lines on the polling dates.
Not bad — I hadn’t really noticed how the frequency had dropped off in the last 10 years. We’ve lost any indication of the variation in the responses, though. We can get an estimate by adding a bootstrap confidence interval to the spline regression.
There’s some argument for only showing the confidence band.
Not sure I like that, but maybe I’m just not used to it. I’ll compromise and go with a thinner trend line.
For this last graph, I put a little more effort into lining up the line labels without extending the axis.
More data exploration
Though my chart has a small text summary in the subtitle, I don’t speculate on the why of the trends. The Vox article suggests the creation of the Super Bowl and modern NFL were catalysts for the shift from baseball to football. I imagine the rise of TV viewing was an issue, where football may be more accessible or more fun to watch with TV. And the recent rise of soccer in the US could be related to the rising Hispanic population, the success of the women’s national team, or just more internationalization in general.
The Gallup data also includes the month of the poll, which I showed in my charts as the 15th of the given month. One might also wonder if the popularity of a sport depends on whether the sport is in season or not during the poll. Unfortunately, there’s not enough data and month variation to read too much into it. Most of the recent polls have been done in December in the thick of football season. I did try a linear model with month as a separate factor, and a few month-sport interactions had p-values less that 0.001. For instance, the effect on football of polling around March is about negative three percentage points.
I not even sure my work qualifies since I didn’t use the official data set, but I think I’ll submit the last chart above as my entry.