Archive for the ‘Graphs’ Category

Automobile Maker Market Share Chart

Monday, November 13th, 2006

A week or so ago, Junk Charts featured a discussion (Rip Tide) of a New York Times chart of how auto-maker market share distribution in the US is becoming more like it is in Europe. The original chart showed lots of information in a pleasant way, but as usual folks want to do better — either to look better or to make the point better.

I scraped the data (csv) from the chart (thanks GraphClick), and provided a rough alternative to the graph.

auto market share distributions

My graph aims to show only enough information to support the text of the original chart. I chart ordered market share histograms for three different years so one can get a sense of how the US and European market share distributions are changing. I’m not sure how well the data supports the thesis though — it looks like both distributions are becoming more like the other rather that just US becoming more like Europe.

I just found out today that Junk Charts actually made use of the data I posted and provided yet another alternate view (Calming the Rip Tide). Interesting, but I don’t think the boxplots work since they don’t show a trend.

From Hayseed to Ubergeek

Tuesday, September 12th, 2006

What a journey! From being labeled a “hay seed” [sic] by an anonymous blog commentor to being recoginized as a “ubergeek” in print by the Raleigh News & Observer. The G.D. Gearino column in today’s paper traces his steps to track down my gender-neutral first name analysis that a fellow reporter somehow got whiff of at a bar or party.

I don’t know who leaked the exercise, but now it’s out there. I used the Wake County registered voter database to analyze gender distributions of various first names to see which one was the most gender neutral. Of course, there are lots of ways to measure neutral, but I used the statistical definition of independence, looking for the name whose female/male ratio was most similar to that of the population (53% female) with the smallest confidence interval. Casey was the top name followed by Carey.

I explored the time component, but didn’t factor it into my analysis. Just as names go in and out of favor they also change genders over time. For instance, Morgan was more male, but these days it’s more female. That is, an older voter named Morgan is likely to be male, and a young voter named Morgan is likely to be female. Most neutralish names move toward female. The only names that I remember going from female to male over time were Frankie and Robbie.

Orange County doesn’t seem to have voter names on-line — just summary statistics by precinct.

Data Visualization Winner

Thursday, August 10th, 2006

Data Visualization Winner BadgeMy week-end and evenings spent staring at pixels paid off as Comprehensive Winner designation in Business Intelligence Network’s Data Visualization Competition. My entry included visualizations for all five scenarios, and I won the checking account scenario and tied for first in the freestyle scenario, in which I revised the old OWASA water graph. The analysis of the winners and other entries will be released later, and I look forward to reading them. Unless one of my entries for the other scenarios is highlighted as an example of what not to do.

The checking account scenario was so simple I almost didn’t enter it. It involved a checking account statement with only 7 or 8 transactions for a given month. I thought it would have been a better challenge to visualize a statement with dozens of transactions, some occurring on the same day. I did the simple visualization in a way that was scalable to the more complex case, which may have helped my entry.

I found problems with all of my entries soon after submitting them, but I thought the budget summary scenario was my best entry (PDF). Below is the updated OWASA graph (my JMP version, original OWASA version). Getting my contour colors from Color Brewer probably helped.

Visual Pain Scale

Sunday, July 23rd, 2006

Pain Scale


I never like it when doctors ask you to rate you pain level on a scale of 1 to 10, and I really don’t see the how it helps to visualize the pain scale. Even stranger, this scale is from a poster at the vet with a graphic of a dog skeleton.

Is the dog supposed to point at the bone that hurts and at the appropriate tick on the pain scale?

In case you can’t read the text, this scale goes from 0 = Pain Free to 10 = Worst Possible Pain.

Data Visualization 2006 Competition Entry

Sunday, July 16th, 2006

I’ve posted my entries for the 2006 Data Visualization competition sponsored by Business Intelligence Network and Stephen Few. I used JMP for most of the data manipulation and prototyping and then traced and edited the graphs with OmniGraffle, a nice Mac OS X graphics application. My process ended up being a lot of work so I didn’t get things as polished as I would like, but I think the experience will at least help me learn more when I see the winning entries.

Excerpt from Budget Visualization

There were four scenarios with a problem description and the raw data to use to create the visualzation. Above is part of my budget visualization solution. A fifth scenario was to do anything you like, and I updated the OWASA reservoir status graphs I did a few months ago.

3-D Pie Chart at NCSSM

Tuesday, April 25th, 2006

Information visualization specialists like Stephen Few and Howard Wainer often call out the 3-D Pie Chart as a graphical pariah. The curved areas of pie charts are already difficult to compare already, and they become worse when a 3-D perspective is added. Though authors like to rail on it, I suspected the form didn’t really occur in serious or even semi-serious data presentations. However, below is such a graph I found in the wild recently.

NCSSM Alumni Giving

It’s from an North Carolina School of Science and Mathematics summary of alumni giving, and the graph breaks it down by class year. There seems to be one only piece of information to be gleaned from the graph: some class gave a lot more than any others. You can’t tell which one, because the classes of 1984 and 1993 are assigned the same color wedge. The graph may fall into the so-bad-it’s-good category because it tells you that there is something interesting in the data, forcing you to read the provided table of numbers to figure out just what it is. Unfortunately, the report has no explanation for the 1993 spike, but I’m guessing it was a single .com jackpot winner.

NCSSM Alumni Giving ScatterplotHere’s a quick scatterplot of the same data with 1993 removed (so it wouldn’t throw off the scale) and a Loess smoother added. It’s not surprising to see that older alumni give more money. The 1985 and 1987 mini-spikes are partially explained by considering class sizes. As I recall, the class size went from about 150 students to about 250 with the class of 1985, and the school’s two grades alternated between big and small sizes for a few years until they either evened out or the school grew again and expanded the even year class sizes to catch up.

OWASA Water Graphs

Saturday, April 8th, 2006

It’s easy to find critiques of bad graphs (just ask Google), so I’d like to comment on a good graph. The Orange Water and Sewer Authority does a great job of posting graphs of water supply and demand. The graphs are not especially pretty, but they do a good job of showing the data and have helpful legends.

The third “graph” is the real gem. (It’s not really a graph — it’s a decorated spreadsheet.) It shows a lot of information in a fairly approachable format. The colors represent water conservation stages associated with a given reservoir level and month. It’s easy to see that a 60% full reservoir is something to worry about in June but not a real danger in January. And for comparison, the graph also shows the levels during the 2002 drought, last year, and this year.

OWASA was kind enough to share the data with me, and here is my attempt at a prettier version of the graph.


OWASA Reservior Risk Levels

The original graph has its share of graphic shortcomings, but it still makes for an insightful look at the data. Shortcomings:

  • some cells on the border between green and striped are miscolored, according to the legend
  • there’s too much detail (don’t need both counts and percents, for example)
  • it’s too small to read all of the numbers
  • the stripe pattern doesn’t work well over text
  • the overall presentation is too chunky (due to spreadsheet limitations) for the continuous data

My version addresses those shortcomings but has some of its own, mostly due to my app limitations:

  • x axis uses numbers instead of month names
  • y axis uses fractional numbers instead of percents
  • no title or legends
  • stray horizontal line across the top
  • missing some data, such as average monthly water usage