Archive for January, 2009

How big is Nymphenburg Park?

Sunday, January 18th, 2009

Nymphenburg Palace

Nymphenburg Palace in Munich was the featured picture at Wikipedia recently, but what caught my eye was the area values given for the palace park in the caption: “200-hectare (494-acre) park.” That looks like a case where a unit conversion introduces false precision. A round number like 200 suggests low precision, so 200 hectares might mean anything from 150 to 250 hectares, but 494 suggests higher precision, between 493.5 and 494.5 acres in this case.

It looks like someone took a round number (200) and multiplied it by the hectare-to-acre conversion factor (1.47) to get a precise number (494). It would be better to go back to the original precise number of hectares, convert that to acres, and then round to the desired level of detail.

Trying to find the actual size of the park was more difficult than I expected. After finding a few places that listed the size as 200 acres, it was evident a different kind of error was also occurring, but it wasn’t clear whether hectares or acres was correct. Google hit count comparisons didn’t help. Searching for Nymphenburg Palace park “200 acre” gave 125 hits while Nymphenburg Palace park “200 hectare” gave 101 hits.

Just to be sure, I found the park on Google Maps and measured it myself with an online planimeter tool. The area of my rough polygon was 225 hectares, so that settles the 200 acre versus 200 hectare issue for me.

Planimeter tool on Nymphenburg Park

Planimeter tool on Nymphenburg Park

Eventually, I found the German language Wikipedia page for the Nymphenburg park, and it provided two areas, 180 and 229 hectares, with apparent authority. Translation:

The park inside the garden wall has a size of 180 hectares, the area of the entire facility is 229 hectares.

200 hectares could represent either 180 or 229. Exactly 180 hectares is 445 acres and 229 hectares is 565 acres, so you need to know where the 200 came from in order to know how to represent it in acres. It could be correct to say 400, 500, or 600 acres.

Burtin Antibiotic Illustrations

Sunday, January 11th, 2009

CHANCE magazine is running a contest to create the best illustration for a data set of the effectiveness of three antibiotics on sixteen strains of bacteria. Designer Will Burtin used this data set for a 1950s visualization.

With only five variables and sixteen observations, my first question is, “What’s wrong with just using a table?” The table in the contest description is even nicely laid out.
burtin-data

My second question is, “Best for whom?” Which illustration is best depends on the audience, which in this case might be doctors, researchers, statisticians or the general public among others.

The data shows Minimum Inhibitory Concentration (MIC, presumably in µg/ml) for each antibiotic and bacteria combination. Lower is better, indicating less antibiotic is needed to treat the bacteria. The MIC values vary widely from 0.001 to 1000, and I applied a logarithm transform for analysis, either on the data or on the graph. Besides nicely spreading out the data values, the log transformation may have a physical interpretation. If an antibiotic culture grows exponentially, then the log of the concentration is the time to grow it.

Exploring the data a little bit, the simplest visualization is a heat map, where every number is represented by a swatch of color. I don’t see much advantage over the table of numbers, except to quickly find extreme values or certain other patterns that the colors help with.

burtin-heat

Next, we might think from a researcher/statistician perspective and try to cluster the bacteria that react similarly to the antibiotics. Here’s a heat map and dendrogram resulting from a cluster analysis. The rows are colored by gram staining. It’s like the heat map above, but similar bacteria are grouped together (and the color scale is slightly different). The bacteria that are clustered close together might suggest a commonality for future research.

burtin-cluster

Since there are only three antibiotics, we can view the data as a 3D scatter plot. Here, the data markers correspond to the clusters.

burtin-3d-1

3D doesn’t work too well in static 2D media like this one since you need to be able to rotate it to see the structure. If you do rotate it, you can see that three of the clusters appear roughly in a straight line, so maybe there are really two different kinds instead of four. Here’s a view looking straight down the line.

burtin-3d-2

A scatter plot matrix shows all the 2D relationships better and is better for static presentation. It can’t show the 3D the alignment of the three clusters, but you can get a hint of it in the neomycin versus penicillin panel.

burtin-scm

For my contest entry, I decided to go with the perspective of a 1950s doctor, with the idea that a doctor treating a patient doesn’t know usually know what bacteria is causing the infection and may or may not have the results of a gram staining. With that in mind, my visualization shows the MIC for each antibiotic with the best dose for each scenario called out.

burtin-gb

The graph shows that penicillin is best for gram positive bacteria since all purple circles are below 1µg/ml for penicillin. Similarly, neomycin is best for gram negative bacteria and streptomycin is best if gram staining is unknown. A drawback of this graph is that the points are not labeled or connected. I tried a few ways to do that with labels and lines, but the graph just became too messy. If you need that much detail, you probably need the table of numbers.

After doing all that, I found Burtin’s original visualization via a NY Times article.

I hope this isn’t what CHANCE is looking for. It has little communication value except to say “Look how cool I am!” At least all the data is present, so a meticulous reader can get what information he needs. The audience for this must be a hospital administrator who needs to feel like he’s getting his money’s worth with fancy visualizations. I think it is more a work of art than of communication.

Jon Peltier has a write-up of his contest entry. It has all the data of Burtin’s original in a much better rectangular structure.