XML Schema Type Tables and Substitution Groups
Sunday, February 10th, 2008The XML Schema 1.1 was already running behind when I left the Working Group in 2004, and it’s still a work in progress. Though I no longer write XML tools, I try to keep up with the group’s activities and provide hopefully useful comments to public working drafts. However, knowing the WG is so far behind schedule, I’m hesitant to make too many official comments since each comment must be addressed by the group, adding to the delay.
Many of my comments have been resolved recently in a relative flurry of activity. (The comment archive shows more activity last month than any previous three month period.) When a comment is resolved, the original poster can (silently) accept the resolution or appeal to the W3C director. I disagreed with the resolution of my comment on type tables and substitution groups, but just registered my dissent and closed it anyway rather than appeal — I trust the working group’s expertise over my casual interest.
Substitution groups have always been questionable in my mind. I’d prefer typed wildcards or, at least, an opt-in mechanism rather than opt-out for substitution groups to limit their unintended use.
Type tables is a big feature added late in the game and doesn’t seem to interact well with substitution groups. Type tables allow alternative types to apply to an element based on its context, such as an attribute value. I thought such context-based constraints should be in a separate layer, as is done with Schematron, but it seems like half the schema-dev questions are about how to impose such constraints within XML Schema, so I can understand why the Working Group would want to add it.
The problem, as I see it, is that type alternatives live as element declaration properties rather than within the type hierarchy. Substitution group members must have types in proper derivation relationships, but that only applies to the declared types, not the alternatives types. So combining type tables with substitution groups can break the spirit of the derivation hierarchy, if not the letter of it.

Not to be deterred, I reversed-engineered the data from the graph and regraphed it as a histogram, a boxplot and a smoothed density curve, which are all better than a scatterplot for analyzing a distribution of one variable. Unfortunately,
The paper next shows a similar scatterplot (not shown here) of LOC and argues that the similarity of the plots verifies the high correlation between KB and LOC. Not that the conclusion is bad, but why not plot them against each other to show a correlation? The graph at right does just that, showing the fitted line on a log-log scale. Once again, it’s from the reconstituted data.