CHANCE magazine is running a contest to create the best illustration for a data set of the effectiveness of three antibiotics on sixteen strains of bacteria. Designer Will Burtin used this data set for a 1950s visualization.
With only five variables and sixteen observations, my first question is, “What’s wrong with just using a table?” The table in the contest description is even nicely laid out.
My second question is, “Best for whom?” Which illustration is best depends on the audience, which in this case might be doctors, researchers, statisticians or the general public among others.
The data shows Minimum Inhibitory Concentration (MIC, presumably in µg/ml) for each antibiotic and bacteria combination. Lower is better, indicating less antibiotic is needed to treat the bacteria. The MIC values vary widely from 0.001 to 1000, and I applied a logarithm transform for analysis, either on the data or on the graph. Besides nicely spreading out the data values, the log transformation may have a physical interpretation. If an antibiotic culture grows exponentially, then the log of the concentration is the time to grow it.
Exploring the data a little bit, the simplest visualization is a heat map, where every number is represented by a swatch of color. I don’t see much advantage over the table of numbers, except to quickly find extreme values or certain other patterns that the colors help with.
Next, we might think from a researcher/statistician perspective and try to cluster the bacteria that react similarly to the antibiotics. Here’s a heat map and dendrogram resulting from a cluster analysis. The rows are colored by gram staining. It’s like the heat map above, but similar bacteria are grouped together (and the color scale is slightly different). The bacteria that are clustered close together might suggest a commonality for future research.
Since there are only three antibiotics, we can view the data as a 3D scatter plot. Here, the data markers correspond to the clusters.
3D doesn’t work too well in static 2D media like this one since you need to be able to rotate it to see the structure. If you do rotate it, you can see that three of the clusters appear roughly in a straight line, so maybe there are really two different kinds instead of four. Here’s a view looking straight down the line.
A scatter plot matrix shows all the 2D relationships better and is better for static presentation. It can’t show the 3D the alignment of the three clusters, but you can get a hint of it in the neomycin versus penicillin panel.
For my contest entry, I decided to go with the perspective of a 1950s doctor, with the idea that a doctor treating a patient doesn’t know usually know what bacteria is causing the infection and may or may not have the results of a gram staining. With that in mind, my visualization shows the MIC for each antibiotic with the best dose for each scenario called out.
The graph shows that penicillin is best for gram positive bacteria since all purple circles are below 1µg/ml for penicillin. Similarly, neomycin is best for gram negative bacteria and streptomycin is best if gram staining is unknown. A drawback of this graph is that the points are not labeled or connected. I tried a few ways to do that with labels and lines, but the graph just became too messy. If you need that much detail, you probably need the table of numbers.
After doing all that, I found Burtin’s original visualization via a NY Times article.
I hope this isn’t what CHANCE is looking for. It has little communication value except to say “Look how cool I am!” At least all the data is present, so a meticulous reader can get what information he needs. The audience for this must be a hospital administrator who needs to feel like he’s getting his money’s worth with fancy visualizations. I think it is more a work of art than of communication.
Jon Peltier has a write-up of his contest entry. It has all the data of Burtin’s original in a much better rectangular structure.