Relationship chord diagram study

I saw this “relationship chart” in a 10 Visualizations Every Data Scientist Should Know by Jorge Castañón and was intrigued. I’ve never found these diagrams very valuable, but I was eager to learn where they are useful. Maybe I just needed the right data. In this case, the data consists of patient attributes in a drug study.

Each node is an attribute value and each curved line between two nodes represents patients having both attributes, with line thickness corresponding to the number of such patients. The article listed three insights:

  1. All patients with high blood pressure were prescribed Drug A.
  2. All patients with low blood pressure and high cholesterol level were prescribed Drug C.
  3. None of the patients prescribed Drug X showed high blood pressure.

How does the diagram support these statements? Not very well. It turns out some of these “insights” are not even true, let alone easy to discern. Claim #1 is false because there is a line from high blood pressure to Drug Y. Claim #2 describes a three-way relationship, which is not generally represented in the chart. I downloaded the nicely-provided raw data, to find the claim was actually false. Claim #3 is true because there is no line connecting Drug X and high blood pressure.

Likely the errors were editing mistakes or draft mix-ups, but the fact that an advocate for the usefulness of the chart didn’t notice the errors suggests the charts aren’t that insightful after all.

When I pointed out the errors on Twitter, the author immediately correctly them in the article, which was great to see. Now the claims read:

  1. All patients with high blood pressure were about equally prescribed Drug A and Y.
  2. Drug C is only prescribed to low blood pressure patients.
  3. None of the patients prescribed Drug X showed high blood pressure.

(I just realized that insight #3 is redundant given insight #1.)

Even is the chart type is not that effective, it could be that all other options are worse. That is, maybe seeing relationships between eleven attributes is too much for quick graphical understanding. So let’s try some alternatives.

The most obvious alternative is a graphical adjacency matrix since the original is a node-link graph and any node-link graph can be represented as an adjacency matrix. Here each square represents the number of patients with the X and Y axis values in common.

The missing squares certainly pop out better than the missing lines of the original for claim #3. To test claim #1, find BP/HIGH on the Y axis and scan across for the drug values. Drug A and Drug Y have about equal sized rectangles.

Since the data size is relatively small, we can replace the rectangles with grids containing one dot per patient.

The take-aways are generally the same but with a little extra precision since you can count dots if you like.

These eleven attributes are not all independent — they represent four variables with two to four values each. I’ve taken advantage of that in the axis layouts above, and can go further by using a different chart type, parallel sets, a generalization of parallel coordinates for categorical data.

Is that useful? Less so, I think. The lines do help support the connection concept, but the usefulness depends on the arrangement. Relationships between adjacent axes can be discerned but others can’t. However, it can be useful if you interact with it.

To test claim #1 we can click on the BP/HIGH value and see that those patients got both Drug A and Drug Y.

To test claim #2, select the combination of BP/LOW and Cholesterol/HIGH to see that both Drug C and Drug Y were included.

To test claim #3, select Drug X and see that none of the BP/HIGH group is highlighted.

I’m still not sold on radial relationship charts and prefer the matrix as a static view, perhaps adding marginal indications of the size of each group which would correspond to the circle sizes in the original. But the radial charts are so popular I feel like I must be missing something and will keep studying.