Charity Solicitations Visualized

Way back in 2006, I tallied my charity solicitations, and since then the situation has gotten comically worse. So over the past year, I’ve tried to keep all the solicitations I received from various charities, some of which I haven’t contributed to in years. This time, instead of a simple table, I’ve made a couple kitchen floor charts showing the actual pieces of mail received. I probably missed a few, and phone calls are not represented.

Here are my “charts” of charities sending the most and least amount of mail in 2014.

charitymost

charityleast

Care is easily the most annoying (I even found another piece after taking the pic). I didn’t even know I had donated to Care, but they handled a donation for typhoon relief.

It’s good to see that Public Citizen and Southern Poverty Law Center are doing better. They were the top offenders in 2006, but they’re now mostly honoring my request for one solicitation per year.

Perhaps the annoyingness is exacerbated by my giving pattern, which is to give toward the end of the year. Unfortunately for me, common practice is to accept the December gift and then send an “annual renewal” just a few weeks later in January.

The worst offenders have either been dropped or switched to reduced anonymous giving, but I expect the junk mail to continue for years. And anonymous giving is expensive as far as I can tell. Network for Good adds a 5% fee and JustGive adds 4.5%. Hopefully, that includes the credit card fees, but I’m not sure. It shouldn’t been so expensive just to move money. Fidelity Charitable with a flat fee of $100 per year may be another option, especially if the credit card fee is separate for the other options.

My truly least annoying charity (annoyance == 0) is one that sent me no mailings or online annoyances: the Online Encyclopedia of Integer Sequences. Thanks to Neil Sloane and many volunteers for that excellent resource. Wikimedia was also in the no-mailings camp, but they made Wikipedia pretty annoying to use for most of December, so I’m not sure where to rank them.

Remaking ProPublica’s Blocked News Graph

Earlier this month, investigative newsroom ProPublica published an interactive graph showing the availability of various news web sites within China.

original

My initial reaction was that the missing (gray) and inconclusive (yellow) data are dominating the view without providing much information. Of course, the graph is precisely reflecting the data available, but I’d prefer a view that helps me see larger patterns. Which news sites were likely blocked all year? The Huffington Post stands out with a lot of red, but it’s not so obvious what other news sites were also never measured as available throughout the year.

Here’s a close-up of three sites with lots of inconclusive measurements that dominate the coloring. They have three different mixes of conclusive measurements, but that difference doesn’t stand out until you focus on it.

originalclose

So my main “improvement” is to interpolate the absent/inconclusive data. For instance, if a site is measured as blocked, then unmeasured, then measured as blocked again, relabel the middle no data region as presumed blocked. Same for open, and for mixed endpoints relabel no data as in transition. Inconclusive is a really tricky category; I leave it as different from no data but similarly tint inconclusive measurements according to the surrounding measurements.

My second improvement is to avoid the red/green colorblindness issue. I switched the open color from green to blue. The stoplight colors have strong connotations which help with the interpretation of the original graph, and so it’s a trade-off to balance a small benefit to the 97% with normal vision versus a large benefit to the few. Still, here’s what the close-up above looks like with the most common red/green colorblindness (using Color Oracle):

originalblind

My third improvement is to reduce the white space between rows. I didn’t realize it until I tried it, but I think the strong white banding is distracting.

Finally, here’s my view (data through December 19):

remakeshort

Is my improvement better? In its current form it looks a bit busy, but I think if a designer picked better colors, it would be quite functional, assuming you buy in to the interpolation idea.

With my remake, it’s easier to see the three sites that were never seen as blocked (CNN, ProPublica, and Washington Post), the six news sites that were never seen as open and especially the transitions, which might be the most interesting parts. Presumably, the late September transitions are related to the Hong Kong protests.

I haven’t tried to reproduce the nice axis labeling, reference lines or the interactivity of the ProPublica version, all of which I like.

Graph Makeover Contest — Household spending

I tried Naomi Robbins’ Graph Makeover Contest recently. After trying a few such competitions, I find I learn more after seeing the other entries if I first make the effort of producing an entry myself. Robbins presented the following table from the Bureau of Labor Statistics report on prices and spending [pdf] and asked for a graphic version.

The first step was to get the data and understand it. Ignoring the percent change, which is computed, you can think of the data as one continuous variable, annual spending, by three categorical variables: year, housing tenure and expenditure. Expenditure is tricky though. It’s really expenditure and sub-expenditure. The food expenditure is completely broken down into sub-expenditures, but transportation and healthcare are only partially subdivided. And double checking the data, it seems that some expenditures are missing altogether because they don’t add up to the stated total.

The second step is to decide what message the graph should communicate, and that will lead into the third step of selecting an appropriate visualization for the data and the message. The table presentation, for example, shows all the data with good accuracy, but shows some messages better than others. It’s not too hard to see that the total expenditures stayed about the same from 1986 to 2010 or that health insurance has by far the biggest precent increase. However, it’s harder to rank expenditures or to see that renters pay relatively more on housing than homeowners.

If I were writing the report, I would probably include several graphs to make several points about the data. Here’s a collection I experimented with.

This area chart does a good job of showing that total expenditures stayed about the same and how the totals compare across housing tenure levels. Less effectively, it shows how each expenditure changed over time and their ranking (larger expenses are at the bottom).

Note that I added in an “Other” category for the missing expenditures and replaced the umbrella expenditures with their sub-expenditures so summing would work out. That meant, for instance, replacing “Transportation” with the new sub-expenditure “Transportation (non-fuel)”.

The area chart muddles the changes of the middle items though. This slope chart works better to seeing each expenditure change.

Coloring is another challenge with this many categories, and I didn’t find a great solution. Maybe only the interesting lines should have color…

We’ve lost the total, of course, and though we can see absolute amounts and changes, we still don’t see the giant relative increase in health insurance. For that I tried a log scale.

A log scale is may not be appropriate for a general audience, but it does show relative change well because equal vertical distances represent equal multipliers.

If you want to concentrate only on relative change, this “spoke” chart I made by accident makes the point even better about the health insurance cost change.

It took me a while decipher it, but it’s sort of like all the lines from the previous chart centered on the same location.

My actual entry used the log scaled slope lines, except I changed the vertical axis to relative values and added the total spending numbers at the bottom, which adds the message about stable spending and provides a basis for the percentages. Using relative spending amounts helps see that renters spend relatively more on housing, which was one point raised in the prose of the original report.

I’m still not happy with having so many colors. With more time, perhaps I could have found better colors, mixed in different line styles, tried putting the labels on the lines, combined similar categories, …

Worse Bar Chart Labels

I recently ran into this bar chart as part of a credit card summary. It took me a few moments to realize each run of superscripted digits was part of a single number, as opposed to indicating exponentiation or footnotes. I’ve never seen such bad labels before. I hope the chart was custom-made and not a feature of some charting software.

At least the bad labeling overshadows the other minor faults of the graph. Sort of like when having a bad toothache prevents you from noticing your sore knee.

Lots of Dots Quilt

At a family gathering last month, Bonnie and I received an unexpected gift from my cousin the Polka Dot Debutante. She made this amazing quilt for us titled “Lots of Dots”:

 

Being in the data visualization business, I can’t help but see lots of familiar and new plots:

Chapel Hill Election Clustering Revised

I’ve updated the cluster analysis based on comments received. Thanks to Ed Harrison, I have included data from the Durham County precincts. And since other commenters explained away the apparent under-voting in some precincts, I recalculated the percentages to be based on the number of people voting in that race instead of the total ballots cast for the precinct. For town council, I approximated 4 votes per person which is necessarily on the high side, but makes the town council percentages comparable to the mayor percentages.

Two-way cluster of precincts and candidates
Two cluster of precincts and candidates

I also figured out how to color the clusters by absolute values rather than relative values, which helps to differentiate the candidates. They still fall into two large groups, but now it’s easier to see subgroups. Mayor-council alignments are highly sensitive to the council multiplication factor (4 here), so ignore Kleinschmidt and Czajkowski for candidate clustering.

For the record, low scoring candidates have been eliminated (otherwise they make all precincts look more similar), and absentee and provisional votes have been combined with One Stop precincts.

The precincts present a similar clustering as before, except the under-voters are now distributed into other clusters. The yellow group is fairly neutral. The green group is left leaning. The purple group is left-leaning with a focus on Harrison/Rich/Easthom. The blue group is left-leaning with a focus on Merritt/Kleinschmidt. The red group is right-leaning and includes two of the Durham precincts.

As a bonus, I thought this visual was attractive. It shows a smoothed trend line of the vote percentages (times four for town council candidates) by precinct, where the precincts are ordered by support for Kleinschmidt, the winner of the race for mayor. (Click graph for a larger version.)

CH2009Performance

The “left-leaning” candidates generally rise with Kleinschmidt while the “right-leaning” candidates (dotted lines) fall. Merritt’s strong showing at Lincoln and Northside is also evident. Unfortunately for him, those precincts had very low turn-out.

Chapel Hill Election Clustering

Damon Seils provided some great maps of the precinct results from this month’s local elections. I played around with the data, and found the results of a two-cluster analysis to be interesting. The ballots don’t include party affiliation, but candidates fell into two clusters, anyway, and the precincts fit several different profiles in support of those two candidate groups.

Two-way clustering of candidates and precincts
Two-way clustering of candidates and precincts

I’ll agree the diagram looks a bit complicated, but if you put in a little work, there’s a few gems to be found. Precincts are listed down the left side, and candidates across the bottom. The square at each precinct-candidate intersection is colored according to the candidate’s relative support at that precinct, red being strong, gray medium and blue weak. That part’s called a cell plot or heat map.

The tree-like parts are dendrograms, which show the results of the hierarchical cluster analyses. Similar items (precincts or candidates) are grouped together in the tree.

For the candidates, along the bottom, there’s a clear pair of clusters, which I’ll call left-leaning and right-leaning candidates. Coincidently, the left-leaning are on the left and the right-leaning are on the right.

The precincts are more interesting, though I have even less knowledge of their actual political orientations. I’ve colored the precincts into five groups. The first (red) and to a greater extent the second (yellow) cluster generally voted in favor of the right-leaning candidates. That is, the left six columns of the heat map are bluish and the right four columns are reddish. The opposite is true for the green and purple clusters; they’re more left-leaning, especially the purple precincts.

What puzzles me is the middle (blue) cluster. Those precincts don’t seem to like anyone. The numbers I used for clustering were percent of ballots cast, and apparently there were more voters in those precincts with incomplete ballots, voting in some but not all races. For instance, the two major mayoral candidates, Kleinschmidt and Czajkowski, only received votes on 28% and 21%, respectively, of the ballots at the Kings Mill precinct.

That leads to looking at votes per ballot for each race by precinct. Here’s a bar chart with the precincts ordered by town council votes per ballot.

VotesPerBallot

Most precincts had near 100% participation in the mayoral race (exactly 100% for Booker Creek and Coker Hills), and most precincts averaged over three (of four available) votes in the town council race. So only the already-identified cluster of three (plus Dogwood Acres to a lesser degree) stand out regarding participation.

I imagine the One Stop (early voting for all precincts) totals reflect a lot of Carrboro voters. What makes the others different? Were people there to vote for a different race, like the school board? Or just voting for a favorite son/daughter candidate?