Reconstituted XML Schema Graphs

The paper Analysis of XML schema usage provides a glimpse at some interesting data for a group of schemas that the authors analyzed. Unfortunately, it’s only a glimpse as the data is not provided and the summarizing graphs are generally lacking. And my correspondence with one of the authors indicates that none of the data is available.


Original graph of XML Schema SizesThe first graph, shown here in reduced form, is especially inappropriate. The authors use a scatterplot to show the distribution for schema sizes. To read it, you have to count dots within each horizontal division, as described in the notes for the plot.

Schema Size HistogramNot to be deterred, I reversed-engineered the data from the graph and regraphed it as a histogram, a boxplot and a smoothed density curve, which are all better than a scatterplot for analyzing a distribution of one variable. Unfortunately, JMP doesn’t handle log axes for histograms so I had to graph the log of the size instead of the size. The graphs in the paper obviously use Excel, and maybe it has the same deficiency. The paper uses the original graph to conclude that the bulk of the schemas have sizes in the range of 10KB to 10MB, or 101 KB to 104 KB, though the histogram helps tighten that to the range 101.5 KB to 103.5 KB, for what it’s worth.

Schema Size by LOCThe paper next shows a similar scatterplot (not shown here) of LOC and argues that the similarity of the plots verifies the high correlation between KB and LOC. Not that the conclusion is bad, but why not plot them against each other to show a correlation? The graph at right does just that, showing the fitted line on a log-log scale. Once again, it’s from the reconstituted data.


Oh yeah, I guess I better provide the data to back up my plots; it’s in xsd_reconstituted.csv.



This is not the first time I could have used a graph scraper — is there such a beast? That is, a program that scans a graph and outputs a table of data that could have produced the graph.

2 Responses to “Reconstituted XML Schema Graphs”

  1. xan says:

    Absolutely! GraphClick is awesome. Thanks!

    I tried it on the XSD Schema graphs, and it worked great. My eyeballing was pretty close, so I haven’t updated the data for the article.

Leave a Reply