Report to the UBL Organizing Committee

UBL Schema subcommittee (Eduardo Gutentag, chair)
12 August 2001

The following are the commitee's recommendations, as tasked:

Schema Recommendation

This committee recommends using W3C's XML Schema.

This recommendation is based not entirely on technological considerations: it also reflects the desire to move forward and get on with the job.

This recommendation is not pre-emptive.

QC Function

This subcommittee also recomends that the OC establish a schema QC function, in whatever framework the OC decides (as part of a QC subcommittee, as a separate subcommittee, etc.)

Part of this QC function should be the continuation of the preliminary work that the UBL-Schema subcommittee has started in its risk analysis of W3C XML Schema features.

Risk Analysis of XML Schema Features

Features and Risks

The Risk column can be none, low, high, unacceptable, or uncertain. We didn't assess the importance of each feature, just the risk of using it. Levels of risk could be due to several factors: interoperability issues, extensibility issues, tool deployment, etc.

Feature Risk Comments
Target namespaces high Huge interoperability and comprehensibility problems; hard to mitigate risks.
Wildcards high Useful for publishing flexibility in catalog applications, but we might be concerned about the ability of foreign-namespace material to be a Trojan horse and, e.g., disable a base semantic; we may want to use it advisedly and ensure that only specific namespaces get in.
Globally defined elements none Necessary and appropriate.
Locally defined elements high
Occurrence (n,m) none It's essential for business documents.
Mixed content high Can be confusing to application designers, and so we should guide them not to use it except in cases where "free text" is needed (typically publishing applications) and that, in those cases, they are aware of considerations such as whitespace.
Attributes none
Global attributes low They seem okay, but people need to be aware of the prefixing requirements.
Defaulted and fixed attribute values uncertain Different processing scenarios (e.g., multipurpose large validation suite vs. small single-purpose tool) seem to favor different choices on this; relying on documentation for essential business info is a concern, but so is the fact that documents parsed in the absence of their schema are interpreted differently than when parsed in the schema's presence. Note that RELAX NG doesn't have this feature but that XSLT could replace it.
Simple types low We need to keep our eye on the few ambiguities, and define a profile (e.g., either always use UTC or always define a time zone) and/or define types that replace some of the built-in types (e.g. dates and times), though the latter adds to the risk because there won't be widespread implementations.
Anonymous complex types low Use only when not intended for reuse.
Named complex types low Use with caution.
Complex type abstractness low Critical for xsi:type, but we're concerned about usage parameters.
Complex type extension low  
Complex type restriction high  
Substitution groups uncertain This is one way to allow "all elements of the same 'class'" in a certain content model location, and abstract complex types with xsi:type in the instance is another. It's unclear which is safer. Also, model groups can be redefined to accomplish approximately the same thing.
Attribute groups low They're just a macro feature, and thus are to be avoided when reuse of types is desired.
Model groups low Same as attribute groups.
Keys in general high The simple type "ID" is risky because it must be an XML NAME, and references to keys might as well be URI references because the references often come from outside.
XPointer (used in key references done as URI refs) high Not well supported; we may have to define a profile.
Scoped keys high Ditto.
Multipart keys high Ditto; in addition, it's not transformable into other schema languages.
Uniqueness constraint uncertain It's highly desirable for business documents, but we're uncertain about its deployment in tools.
Notations unacceptable
Annotations low We need to define a profile for how to to use this, so that arbitrary application info isn't added.
Application info unacceptable This is designed to add a layer of semantics that could mess up our intended semantics.
Processing instructions in schemas high Ditto.
Processing instructions in documents uncertain Has the potential for Trojan horses (especially if programming code is included), but do we need to provide some kind of escape hatch to account for real life? And anyway, we can't control (through XML parsers) whether people use them; we could say that processors that handle UBL documents may/must ignore PIs.
xml:lang uncertain Its valid values are not enumeratable; if we use this rather than create our own attribute, we would probably want to restrict its values somehow; however, this is a schema design issue and not a risk assessment issue.
xml:space uncertain