Numerals are Visualizations, too

August 9th, 2009

I like looking at annual reports as a good source of data visualizations. Much of the typical report is just feel-good decoration, and the graphs usually fall into that category with lots of shine but little content. However, what caught by eye in the Public Citizen 2008 annual report [PDF], was a table of numbers (the graphs aren’t too great either).

Misaligned figures

See anything odd about the numbers? The columns are not aligned vertically because of different digit widths; in particular, the “1″ digit is very narrow. As a result, the Publications and Subscriptions value seems smaller than the Grants value at first glance, since the latter number is wider.

I thought it was a cardinal rule of font design that all digits were the same width. Unicode even has a “Figure Dash” character, which is a dash with the same width as the digit characters.

I set out to find the font in question. First I sampled what I had on my Mac. I didn’t find the font, but I did find several fonts with digits of unequal width. Most of them were artful fonts like Comic Sans, but Georgia was also in that category.

Next I tried Indentifont, a clever idea for identifying a font by asking a series of questions about the characters, such as what kind of bar the “G” has. It returned a few fonts that matched by answers, but none that looked like the report text. The “1″ and the “t” are particularly distinctive.

Finally I realized that with the PDF available I could just examine the file in a text editor. After searching for the word “Font” a few times, I kept seeing the word “Knockout” nearby. Checking the characters on the foundry site the Knockout font family, shows a perfect match for the font called “No. 32 Junior Cruiserweight”.

So my theory about fonts was wrong, but I still hold that tables of numbers should never contains variable-width digits.

Dual-Scaled Graph Examples

June 12th, 2009

Visualizations experts says it’s generally a bad idea to put two different vertical axes on a single graph (see Dual-Scaled Axes in Graphs — Are They Ever the Best Solution? [PDF]) since it invites comparison of data on different scales. However, the treatment is still popular because of the two-in-one information density, and the distortion can be overcome with careful reading.

In the worse case, though, two completely different scales are carefully transformed to almost line up and suggest correlation. An insidious example from a few years ago was the presidential popularity versus price of gas graphs, about which one writer believes show that there’s “clearly a correlation.”

It does look like a correlation, but that’s only because the scales have been transformed to follow the same long-term path, which could be done to any two generally linear data series. There are a couple related spikes for the September 11 attacks and the Iraq War start which our eyes quickly pick up on, but otherwise the local ups and downs don’t match too well. These graphs disappeared shortly afterwards when the two trends obviously diverged (gas prices got better but Bush ratings didn’t).

What I really don’t understand, though, is this next example on employment data from My Budget 360.

leisure-vs-manufacturing1

I’ve never seen a dual-scaled graph where both scales were the same units and approximate range. What’s the point? It shifted the intersection point a little, but not enough to affect the thesis of the article. It does exaggerate the climb of the blue line (leisure employment). I can’t tell if this one is intentional distortion or just carelessness.

Small Town News

April 1st, 2009

It’s not a prank, but I do have a little comedy for April Fool’s Day. “Small Town News” is probably my favorite segment on Letterman these days, and last week I submitted this item to the show from the Chapel Hill News Police Blotter section.
stolenbranches

It’s not A material, but maybe it will make it on the show.

My favorite Small Town News piece involved a newspaper photo of an empty field and a fence. The caption read something like “Charles Smith reported two falcons on his fence Tuesday, but they had flown away before our photographer arrived.”

Brown Tie-Dye

March 10th, 2009

With my west coast cousin and fellow blogger and fiber artist coming this way for a big birthday celebration, I decided to make her a tie-dye T-shirt. Lee’s a quilter and her favorite color is brown, so that set the theme. I didn’t have any brown dye, which made for an extra challenge. I found two strategies on the web: one started with boiling walnut husks, and the other was to mix all the primaries together in some unspecified proportion. I also knew from my color science research for graph colors that brown was dark orange.

After trying a few test patches, I found two combinations that worked well and used them both. One was equal parts orange and black. The other was one part cyan, two parts magenta, and three parts yellow.

My first effort was to simulate a quilt with rectangular patches. This shirt uses both browns.

quilt tie-dye

Next I tried to capture the eight-pointed star pattern I’ve seen in quilts. The eight wedges didn’t quite fill out into touching diamonds like planned, but it still made a nice flower. The blue is really a mix of a dark blue and a light blue, which is what produced the glow effect.

flower tie-dye

I still had some dye left over, so I made a couple more shirts for myself. The first takes advantage of the way the cyan and yellow bleed out of the CMY brown to produce a green halo. This shirt also employs six-fold symmetry, which was a little tricky.

six fold tie-dye

Finally, I went with a basic horizontal stripe pattern using the orange brown.

stripe tie-dye

Carmine’s of Chapel Hill

March 1st, 2009

It looks like my (mild) promotion of the term gluten freendly hasn’t caught on, but I have discovered another gluten freendly local restaurant. Carmine’s of Chapel Hill is a new Italian restaurant in the space that used to be Sal’s Pizza. I haven’t eaten in an Italian place since my Celiac diagnosis, but Carmine’s carries gluten-free pasta and even a gluten-free beer.

Both times I’ve been there, the chef-owner came out to assure me the gluten-free noodles would be cooked in fresh water and that my entire entree was gluten-free. I had the Veal Marsala, which was delicious, with a wine sauce made with real cream (no flour thickener!).

How big is Nymphenburg Park?

January 18th, 2009

Nymphenburg Palace

Nymphenburg Palace in Munich was the featured picture at Wikipedia recently, but what caught my eye was the area values given for the palace park in the caption: “200-hectare (494-acre) park.” That looks like a case where a unit conversion introduces false precision. A round number like 200 suggests low precision, so 200 hectares might mean anything from 150 to 250 hectares, but 494 suggests higher precision, between 493.5 and 494.5 acres in this case.

It looks like someone took a round number (200) and multiplied it by the hectare-to-acre conversion factor (1.47) to get a precise number (494). It would be better to go back to the original precise number of hectares, convert that to acres, and then round to the desired level of detail.

Trying to find the actual size of the park was more difficult than I expected. After finding a few places that listed the size as 200 acres, it was evident a different kind of error was also occurring, but it wasn’t clear whether hectares or acres was correct. Google hit count comparisons didn’t help. Searching for Nymphenburg Palace park “200 acre” gave 125 hits while Nymphenburg Palace park “200 hectare” gave 101 hits.

Just to be sure, I found the park on Google Maps and measured it myself with an online planimeter tool. The area of my rough polygon was 225 hectares, so that settles the 200 acre versus 200 hectare issue for me.

Planimeter tool on Nymphenburg Park

Planimeter tool on Nymphenburg Park

Eventually, I found the German language Wikipedia page for the Nymphenburg park, and it provided two areas, 180 and 229 hectares, with apparent authority. Translation:

The park inside the garden wall has a size of 180 hectares, the area of the entire facility is 229 hectares.

200 hectares could represent either 180 or 229. Exactly 180 hectares is 445 acres and 229 hectares is 565 acres, so you need to know where the 200 came from in order to know how to represent it in acres. It could be correct to say 400, 500, or 600 acres.

Burtin Antibiotic Illustrations

January 11th, 2009

CHANCE magazine is running a contest to create the best illustration for a data set of the effectiveness of three antibiotics on sixteen strains of bacteria. Designer Will Burtin used this data set for a 1950s visualization.

With only five variables and sixteen observations, my first question is, “What’s wrong with just using a table?” The table in the contest description is even nicely laid out.
burtin-data

My second question is, “Best for whom?” Which illustration is best depends on the audience, which in this case might be doctors, researchers, statisticians or the general public among others.

The data shows Minimum Inhibitory Concentration (MIC, presumably in µg/ml) for each antibiotic and bacteria combination. Lower is better, indicating less antibiotic is needed to treat the bacteria. The MIC values vary widely from 0.001 to 1000, and I applied a logarithm transform for analysis, either on the data or on the graph. Besides nicely spreading out the data values, the log transformation may have a physical interpretation. If an antibiotic culture grows exponentially, then the log of the concentration is the time to grow it.

Exploring the data a little bit, the simplest visualization is a heat map, where every number is represented by a swatch of color. I don’t see much advantage over the table of numbers, except to quickly find extreme values or certain other patterns that the colors help with.

burtin-heat

Next, we might think from a researcher/statistician perspective and try to cluster the bacteria that react similarly to the antibiotics. Here’s a heat map and dendrogram resulting from a cluster analysis. The rows are colored by gram staining. It’s like the heat map above, but similar bacteria are grouped together (and the color scale is slightly different). The bacteria that are clustered close together might suggest a commonality for future research.

burtin-cluster

Since there are only three antibiotics, we can view the data as a 3D scatter plot. Here, the data markers correspond to the clusters.

burtin-3d-1

3D doesn’t work too well in static 2D media like this one since you need to be able to rotate it to see the structure. If you do rotate it, you can see that three of the clusters appear roughly in a straight line, so maybe there are really two different kinds instead of four. Here’s a view looking straight down the line.

burtin-3d-2

A scatter plot matrix shows all the 2D relationships better and is better for static presentation. It can’t show the 3D the alignment of the three clusters, but you can get a hint of it in the neomycin versus penicillin panel.

burtin-scm

For my contest entry, I decided to go with the perspective of a 1950s doctor, with the idea that a doctor treating a patient doesn’t know usually know what bacteria is causing the infection and may or may not have the results of a gram staining. With that in mind, my visualization shows the MIC for each antibiotic with the best dose for each scenario called out.

burtin-gb

The graph shows that penicillin is best for gram positive bacteria since all purple circles are below 1µg/ml for penicillin. Similarly, neomycin is best for gram negative bacteria and streptomycin is best if gram staining is unknown. A drawback of this graph is that the points are not labeled or connected. I tried a few ways to do that with labels and lines, but the graph just became too messy. If you need that much detail, you probably need the table of numbers.

After doing all that, I found Burtin’s original visualization via a NY Times article.

I hope this isn’t what CHANCE is looking for. It has little communication value except to say “Look how cool I am!” At least all the data is present, so a meticulous reader can get what information he needs. The audience for this must be a hospital administrator who needs to feel like he’s getting his money’s worth with fancy visualizations. I think it is more a work of art than of communication.

Jon Peltier has a write-up of his contest entry. It has all the data of Burtin’s original in a much better rectangular structure.