Naming and Design Rules (NDR) SC Report

1 November 2001 [corrected in accordance with TC instructions]

This subcommittee has the following members:

Doug Bunting (attending Wednesday morning and Thursday morning)
Bill Burcham (attending Tuesday afternoon, Wednesday morning, and Thursday morning)
Dave Carlson (attending Tuesday afternoon, Wednesday morning, and Thursday morning)
Mark Crawford (attending Tuesday afternoon, Wednesday morning, and Thursday morning)
Matt Gertner (attending Wednesday morning and Thursday morning)
Arofan Gregory (attending Tuesday afternoon, Wednesday morning, and Thursday morning)
Eduardo Gutentag (attending Tuesday afternoon, Wednesday morning, and Thursday morning)
Eve Maler (chair) (attending Tuesday afternoon, Wednesday morning, and Thursday morning)
Dale McKay (attending Tuesday afternoon, Wednesday morning, and Thursday morning)
Sue Probert
Mike Rawlins (attending Tuesday afternoon, Wednesday morning, and Thursday morning)
Kelly Schwarzhoff (attending Wednesday morning)
Gunther Stuhec (attending Tuesday afternoon, Wednesday morning, and Thursday morning)

Also attending as an observer on Tuesday afternoon and Wednesday morning was Phillip Engel from KPMG and XBRL. Bob Glushko also sat in on Wednesday afternoon and Thursday morning.

Charter

"Recommend to the TC rules and guidelines for normative-form schema design, instance design, and markup naming, and write and maintain documentation of these rules and guidelines."

We wanted to ask the TC: Is the Library Content Library SC going to pick up the quality review function, or do we need to?

Also, some high-level design principles (as presented by the Planning SC) need more work. We would like to own this. Anyone got a problem with that? [Accepted by the TC.]

We intend for the documentation to be in testable form (to be applied by the quality review function, wherever it resides), to the extent possible.

Other SCs will decide, e.g., "What is the correct content of a purchase order?" We are in charge of deciding the correct schema expression of the answer to that question (though we may occasionally provide feedback to the other SCs that will make them change their answer). We answer the technology questions; the others answer the business questions.

This SC will need to defer to the CDA SC for requirements on context-driven assembly, but we would document the design aspects meeting those requirements.

We anticipate covering aspects of design such as namespaces, modularity, versioning, customization methodologies, enumerated type handling, and design decisions having to do with the relationship of the document formats to business processes. There will probably be emerging design rules coming up all the time.

We intend to prepare our deliverable, and then stop meeting regularly but provide support as necessary to designers throughout Phase 1.

SC Logistics

Eve Maler has offered to chair this SC, if the TC agrees.[Accepted by the TC.]

[Marion Royal was accepted as a new SC voting member by the TC.]

We agreed (as the most inclusive schema-related SC) that we should keep the three subcommittees separate for now. Mark Crawford offered to edit the NDR report.

Bob suggested that we hold a BOF or other session at XML 2001 presenting the UBL NDR recommendations. (Jon is presenting on UBL in a regular presentation slot on December ~12, in which he can hopefully present our work.) There are five work-weeks between now and then. We will meet for a two-hour teleconference (8-10am PT, 11am-1pm ET, 5-7pm Europe) on Wednesdays for the next five weeks.

ACTION: Eve to set up teleconference service and send mail about it, get a ubl-ndr mailing list set up, and get a UBL TC subpage for our SC set up.

We plan to produce position papers on each of these areas, with a champion driving that process. Our deliverable will consist of the collection of these position papers. We assume there will be a SC email list, and the discussion of each position should be labeled with a distinctive subject line.

Position papers should be provided in either HTML or PDF form and should contain:

Executive summary
Description of the parameters of the technical issue
Description of the options available and their pros and cons
Recommendation by the champion (to be replaced by a subcommittee decision eventually)

ACTION: The champions identified so far (Bill, Gunther, Dave) to attempt to produce drafts of their position papers before our telecon next Wednesday.

ACTION: The NDR report editor (Mark) to work on a draft NDR report outline that takes into account the BL Schema SC and X12 work to date.

The SC, by unanimous consent, appointed Mark Crawford as the vice-chair.

Relationship to X12 NDR Work

A year ago, a special task group under the X12 steering committee was tasked with providing guidance and recommendations on XML. One of the recommendations was to come up with a consistent set of design rules and naming conventions that could be used by the various subcommittees. This is similar to the need for (and existence of) rules for creating EDI transactions.

The development of X12 rules and conventions languished for a year, partly because of lack of clarity around their scope and perceived dependencies on ebXML. The work started up in in early October 2001 as a result of renewed interest in X12's role in XML development, resulting from the X12 sponsored cross-industry XML summit in August. The group plans to have an initial deliverable ready for public review by December 1, and with two pilot projects doing invoices scheduled to start no later than that date.

The X12 committee has produced an outline similar to our list of questions, and they are intending to flesh it out much as we intend to produce a report. One thing not appearing in the outline yet is the matter of "philosophical design rules", for example, optionality at a business level that might be different from the schema level.

We agreed that true alignment doesn't seem to make sense, since we're all at approximately the same place. The NDR SC might have more XML syntax expertise, and the X12 group may have more depth of knowledge about implementation constraints. We shouldn't build in any dependencies on their schedule because they may take a long time. Also, X12 may have a problem with contributing any material to UBL's work because UBL is chartered to make all of its work available unencumbered; thus, collaborating with X12 may be problematic. Finally, the two groups may not share overall design principles.

The main argument for collaboration has to do with promoting eventual convergence. We agreed that informal collaboration is a good idea, in both directions: sharing our work for feedback and reviewing X12 material. As long as there are people who overlap the two groups, we'll naturally get a sharing of information.

Analysis Inputs

X12 white paper: Mike Rawlins chairs the X12 C/TG3 group, which is working on this. They are working on a white paper which will probably become an X12 technical report.
xCBL rules: xCBL has a set of design rules and naming conventions.
ACTION: Arofan to send a copy of the latest documentation of these conventions and rules.
BL Schema SC report: The old BL Schema subcommittee analyzed the features of W3C XML Schema with an eye towards which should be used or not.
Gunther rules: Gunther wrote a document about design rules to the Mapping and Planning committees.
xCBL XSD mapping: xCBL has a set of mapping rules to XSD, which may prove useful.
ebXML naming conventions: ebXML produced a set of naming conventions. It's up to at least V1.7.
ISO 11179: The approved version could be helpful, but doesn't contain anything specific about XML. There is a draft of Part 5 that makes some XML-related recommendations.
ACTION: Gunther to send the approved version to the SC.
ACTION: Mark to send the draft of Part 5 to the SC.
eCo spec style guidelines: The "semantics" portion of the eCo specification has a set of style and naming guidelines for XML.
ACTION: Arofan to send to the SC.

Recommendations

The list of issues needs to be greatly expanded and organized.

How should the schemas be modularized and versioned, and how should namespaces be applied?: Champion: Bill Burcham.
When should elements vs. attributes be used?: Champion: Gunther Stuhec.
When should local vs. global elements be defined? What is the relationship of this to namespaces?: Champion: Dave Carlson.
What sort of customization/refinement should be supported (e.g., extension, restriction, or both), and how? How does this relate to context-driven assembly goals?: Champion: TBD
How should markup be named?: Champion: TBD
How should enumerated types (such as code lists) be handled?: Champion: TBD
Should markup naming accommodate aliasing into other natural languages, and how? Should the ebXML UID concept be supported?: Champion: TBD
How should value defaults be handled?: Champion: TBD
When should named vs. anonymous complex types be used?: Champion: TBD
What constraints should be adhered to because of a lack of tools support?: Champion: TBD
Should all design rules be deterministic?: Champion: TBD

Local vs. Global Elements Discussion

When should local vs. global elements be defined? What is the relationship of this to namespaces?

	Pro	Con
Option 1: all global All elements are global within the namespace.	All element declarations are reusable. Also, the exercise forces you to figure out whether elements are really "the same". This option is the simplest, and doesn't preclude any real functionality.	A complex type gets fragmented because an element reference can be far from its declaration. (This can be mitigated with tools support.) Also, it's somewhat more awkward to do data binding because you don't have encapsulation of any elements.
Option 2: global + local non-unique Some elements are global and some are local, with multiple local elements with the same name allowed.	It has harmony with OO languages, because attributes are encapsulated within their class. Also, the only good way to do restriction (if you want to do it at all) is to do it on local elements. Also, you can have many subelements with the same name that have tiny differences according to their ancestry.	We're doubtful about tools support for validating this feature. Also, this puts more of a burden on the schema creation process and requires the development of guidelines for when to make an element local.
Option 2a: global + local non-unique unqualified Some elements are global and some are local, with multiple local elements with the same name allowed. elementForm[Default] is set to unqualified.	Unqualified local elements can be reused in schemas not our own, and be "aliased" as natively being in the foreign namespace. Such elements always look nice and "simple", and XSLT and other applications don't need to worry about namespace-qualifying these elements.	Unqualified local elements can't be versioned using any namespace-related mechanism because they're not permanently in any one namespace.
Option 2b: global + local non-unique qualified Some elements are global and some are local, with multiple local elements with the same name allowed. elementForm[Default] is set to qualified.	Qualified local elements can be versioned using a namespace-related mechanism because they're permanently in one namespace.	Qualified local elements can't be reused in schemas not our own, to be "aliased" as natively being in the foreign namespace. Also, in a schema-merging scenario (e.g. merging back changes into a development tree), qualified local elements can't be handled properly; unqualified ones and global elements can.
Option 3: global + local unique Some elements are global and some are local, with all local element names required to be unique within the namespace.	It has harmony with OO languages, because attributes are encapsulated within their class; generating schemas from UML models would be easier (e.g., because name clashes aren't a problem). Also, the only good way to do restriction (if you want to do it at all) is to do it on local elements. Also, the schema is more human-readable because everything is defined "locally".	Same as 2. Also, you still have to come up with different names for things, rather than perhaps using the single most intuitive name (e.g., Title for both book and chapter titles, if they have different content models).
Option 3a: global + local unique unqualified Some elements are global and some are local, with all local element names required to be unique within the namespace. elementForm[Default] is set to unqualified	Same as 2a.	Same as 2a.
Option 3b: global + local unique qualified Some elements are global and some are local, with all local element names required to be unique within the namespace. elementForm[Default] is set to qualified	Same as 2b.	Same as 2b.

With the caveat that we need to ensure that local elements can be validated, we support Option #2. This means we are on the hook to develop conventions and rules for deciding when to make elements local. It has been noted that since local elements can't be referenced (they are perfectly "hidden" and can only be reused if you literally copy and paste them), only elements you actively don't want to be referenced should ever be local.

ACTION: Someone (Kelly?) to find out the current level of support for validating local elements. In order to do this, they will need to develop a small worked example with which to test the validators.

Regarding unqualified vs. qualified: We can't decide this until we decide how UBL will use namespaces.

XBRL and SAML use only global elements.

Namespaces, Modularity, and Versioning Discussion

Terminology

We may add to this as we go.

schema module: A "schema document" (as defined by the XSD spec) that is intended to be taken in combination with other such schema documents to be used.
root schema: A schema document corresponding to a single namespace, which is likely to pull in (by including or importing) schema modules. Issue: Should a root schema always pull in the "meat" of the definitions for that namespace, regardless of how small it is?
schema: Never use this term unqualified!
instance root/doctype: This is still mushy. The transitive closure of all the declarations imported from whatever namespaces are necessary. A doctype may have several namespaces used within it.

Number of Namespaces

It's not a good idea to have one huge namespace for everything.

Current thinking: There will be a Core namespace (modularized? multiple Core namespaces?) and some smallish number of functional namespaces for each of the document type categories (approximately six to start, according to the Planning SC recommendations). Customizations would create their own namespaces for any new elements they invent.

The simpler the hierarchy the easier the management issues, such as versioning (Eduardo does not agree). Functionally related and core component set are an integrated union. If you modularize the core as well, frequency of versioning is also modularized, with the center being almost frozen. Issue: Should namespace names contain version information, or should versions be indicated in some other way?

Designing schemas for people who will resolve at runtime.

Number of Instance Roots

In some cases, various actions in the protocol (create vs. delete) will have totally different document structure requirements. But in some cases (create vs. update), the content might be identical. However, we still think we should design in favor of more document types rather than less, e.g. one for each transmission (a la RosettaNet). It avoids confusion on the part of developers to have a separate document type for each thing. We might then decide to optimize some of them by merging them together.

xCBL and cXML History

CBL 2.0 had the intention of being divided into several namespaces, with different document types divided into functional areas. The namespace level of granularity was supposed to be versioned. What ended up happening was that it became one big namespace with one version, partly because of tool constraints.

CBL 3.0 was constrained to taking the same approach, because transformation engines in the larger EDI trading partners couldn't handle namespaces. Different partners would use different software, and the level and type of namespace support differed widely.

The plan moving forward was as follows: The core namespace is the truly horizontal stuff; its purpose in being a namespace is to indicate commonality. Order management and invoicing (e.g.) would be two namespaces that use the core namespace; the purposes of these namespaces would be business process indicators. In the third ring, extension namespaces (e.g., MyPO) would include the lower levels; the purposes of these namespaces would be ownership (distinct from xCBL's ownership). Each namespace might consist of one or multiple "included" modules.

Having only one namespace for the core can introduce performance problems, especially as the core grows. Also, large module size can present problems in tools.

They considered giving each element or small group of elements its own namespace, but there are traditional component reuse problems with this (e.g., early vs. late binding questions on each version of each element).

In xCBL, there were subsets that are used by specific implementers, the equivalent of an implementation guide in EDI. Schematron is used at the IG level for validation. And Schematron works well with XSD. It supports expressing your IG's as a subset of the core schema. xCBL allows for extending Core to meet IG issues. You need to subset the extended whole. The extensions only apply to the trading partners that need them, not to the original core. There are risks associated with allowing too much extension. We should focus on customization by extension and then restriction.

Ariba first used one big namespace. They also followed the CommerceOne extension approach, but not quite as automated. They did not have the same level of versioning problems because they were careful. Customers who didn't care about the added data just dropped it at the parser so versioning was not as prevalent. Used URI for both schema location and namespace and changed both simultaneously. Internally, they had similar modularity, but at build time they concatenated the modules. They chose this approach in part because of network performance. Performance is significantly hampered if done at parse time.