I've summarized Eve's argument in outline form below to highlight which
parts I agree with and which parts I dispute. My arguments are mixed into
the outline...
1. list containers are necessary in the physical model (*DISPUTED*)
1.a. containers (general) are useful (*AGREED*)
1.b. list containers (particular) have great value (*DISPUTED*)
1.b.1. (this point works against the thesis): the equivalent of DTD nested
parenthesized groups can be adequately modeled by introducing a complex type
for the nested group. I'd say further that any argument that starts from a
DTD construct with no analogue in XSD is a non-starter since UBL has
standardized on XSD.
1.b.2. (this point works against the thesis): multiple (neighboring) series
of like items are unambiguous. There is no need to introduce extra
intermediate "container" elements. Demonstrated by the discussion and the
XSLT example.
1.b.3. (*DISPUTED*): reference to (earlier) argument about easier
customization afforded by "general container" elements. Here's the crux of
that point verbatim:
"If a trading community ever wanted to add some *un*like elements to a
collection of like elements (such as metadata that applies across a
whole line item list), the only honest way to do this in XSD is to hang
the additions off a LineItemList structure. (It would be extremely
difficult and ugly to add a list container in the customization that
wasn't present in the original.)"
It seems to me though, that once one adds *un*like elements to a collection
of like elements, that collection ceases to be a collection of like elements
and becomes *indistinguishable* from what we would all agree is just a
"plain old" ABIE. In that case, would we propose that the new ABIE follow
the same rules? Take for example, this instance document:
...
...
...
...
...
Now let's specialize the first-class container "LineItemList" to add
"Height", "Weight" and "ShoeSize"...
...
...
...
...
...
...
...
...
But what ho! Look at those nasty (repeated) "LineItem" elements -- boldly
mixing with their (different) neighbors (Height, Weight...) -- unsegregated
by a sheltering container element. We've violated the proposed rule that
all containers of like items be explicitly represented by a (container)
element in the instance document.
To resolve this, we have to modify the specialization like this:
...
...
...
...
...
...
...
...
But wasn't the whole point of the proposal that it be "easy" to extend
containers of like items to contain unlike ones? I guess it was easy -- the
first time, but the resultant specialization (CONTAINER1-PRIME) broke the
proposed rule. It suffered the supposed ills we were trying to eliminate.
The proposed rule pushed the problem down from the ABIE1 type to the
CONTAINER1 type. In order to apply the proposed rule consistently, the
CONTAINER1 type was forced to undergo wholesale restructuring
(CONTAINER1-PRIME-2).
Did the proposed rule achieve anything new and positive with respect to XSD
derivation? For instance, it would be good if CONTAINER1-PRIME-2 were a
subtype of CONTAINER1. Unfortunately that's not possible since CONTAINER1
specified zero or more LineItem and CONTAINER1-PRIME-2 specifies essentially
"zero or zero" LineItem (LineItem is "pushed down" under LineItemList).
Also, I see no qualitative difference between CONTAINER1-PRIME or
CONTAINER1-PRIME2 and any other ABIE. CONTAINER1-PRIME is no longer a
container of like elements -- it's now a generalized object=relation=entity
with attributes and associations. It just so happens to have, among other
attributes, an association to one or more LineItem.
Since Eve introduced this whole "intellectual honesty" idea ;-) I guess I
should allow that there is precedent in OO, ER and other data modeling
methods for the idea of "first class association" and "first class
container". Container is the retrograde case of the more general
association. A container of like items can be modeled as a 2-way
association (i.e. an association between two classes/entities) with
navigability in one direction only. An association may in general be N-way
and have navigability in many directions.
The analogue between "first class container" and the proposal at hand would
be to explicitly model each container type, and to allow subclassing of
those to add attributes. For instance List of LineItem or List
would appear in the logical model and it would not be unusual to see a
subclass of List of LineItem that adds attributes (and semantics) to the
base class (in our example CONTAINER1-PRIME-2). My experience is that this
this is almost never the approach taken by experienced designers in
environments where containers are the lingua franca (languages like
SmallTalk, Java, C++). In those environments, container semantics are used
"as is". ** Composition with, and not specialization of, containers is the
preferred form of reuse. **
There is support in the literature and practice of ER and relational
modeling for first class associations. Neither the Core Components
Technical Specification, nor XML Schema have a way to represent general
(2-way or N-way) associations. Since XML Schema can't represent them, their
only possible use by UBL is in the logical model. That being said, it has
been my experience with enterprise data models (granted, not Universal ones
like UBL) that the number of associations modeled in this first-class way is
small relative to the total number of associations in a model. I daresay
that the UBL 80/20 rule would dictate here that such first-class
associations be construed as comprising the "20" case and therefore
relegated to "exception" status rather than "rule" status.
One last issue occurs to me. If it's so important to have "container"
elements for sequences of like items, then why isn't it equally important to
have container elements for associations to (single) items. What's so
special about "1..n" and "0..n"? Is it not true that a "1..n" or a "0..n"
can become a "1..1" via contextual specialization? If one thinks about this
from an XML instance document perspective one arrives at structures
reminiscent of the "old days" of global element declarations in which
"types" were validated based solely on tag names so any situation where the
global tag name didn't make sense necessitated an intervening tag. But XSD
solves that problem for us by introducing the notion of a type, the name of
which need never appear in the instance document. This frees our tag names
to be descriptive of the _role_ of each element relative to its (I'm going
to say it) CONTEXT! Thus what would in general have been a 2*N deep
structure (in the old days) becomes an N level deep structure.
Now for those of you who will poo-poo this a mere optimization-ism --
remember, that ain't no constant factor there -- the factor of 2 goes in the
_exponent_ since an XML document instance is a tree.
2. list containers are necessary in the logical model (*DISPUTED*)
2.1 because list containers are necessary in the physical model they are
necessary in the logical model (*DISPUTED*)
The foregoing discussion was meant to take the legs out from under point
2.1. I believe that Eve and I have shown that list containers don't really
solve any problems in the physical model, and that therefore any arguments
built on that premise must be supported by other means.
Bill Burcham
Sr. Software Architect, Standards and Applied Technology
Sterling Commerce, Inc.
469.524.2164
bill_burcham@stercomm.com
-----Original Message-----
From: Eve L. Maler [mailto:eve.maler@sun.com]
Sent: Friday, September 06, 2002 2:16 PM
To: 'ubl-ndrsc@lists.oasis-open.org'; 'ubl-lcsc@lists.oasis-open.org'
Cc: Norman Walsh
Subject: [ubl-lcsc] Eve's writeup on list containers
In Wednesday's meeting, I agreed to provide a writeup on the issues with
list containers. Here it is. I'm hoping that some of this might be
useful in the eventual creation/modification of a position paper on the
wider topic of containers in general.
*List Containers in UBL
*TOC
Executive summary
Introduction and definitions
Motivating the presence of containers in the physical model
The particular value of list containers in the physical model
The value of list-container ABIEs in the logical model
Conclusion
*Executive summary
This writeup considers whether list containers should appear in our
physical model and, separately, in our logical model. I conclude by
recommending that it should appear in both, but I take far too many
words and steps to get there. :-)
*Introduction and definitions
When we say "container" in this discussion, we mean an XML element,
plain and simple. XML is a hierarchical technology, leading to the
possibility -- indeed the likelihood -- of significantly nested element
structures in nearly all XML instance documents.
A BIE is a model for a piece of business information to which has been
applied a semantically unique and useful definition (and of course also
an identified business context, because it's a BIE and not a CC, but
that's not important right now). The containership discussion revolves
pretty much entirely around ABIEs, which are collections of other BIEs
and thus have a kind of hierarchy themselves.
Note that our process of turning our logical model (spreadsheet) into a
physical model (XSD) takes ABIEs and turns them into complex datatypes,
which each govern one or more XML elements. So there is more than just
a vague similarity here -- ABIE hierarchy pretty much turns into XML
hierarchy, to a first approximation.
One question we've been considering is whether to allow for any kinds of
containers that are *not* properly ABIEs. Do we ever need XML elements
in UBL that are not connected to a semantically unique and useful
definition but are rather some kind of artificial construct? Is it
useful to have a construct adds nothing of value to the logical model,
but does add value to the physical model? We'll consider the logical
and the physical value propositions separately for list containers.
This writeup focuses on a specific kind of container, a "list
container". This is a container whose contents consist solely of a
series of like elements. An example is the series of line items that
appears in an order. Currently all multiple-cardinality constructs in
our logical model are of the form 0..n or 1..n; we don't (yet) have any
cardinalities like 3..35. But this discussion applies to them as well.
*Motivating the presence of containers in the physical model
It's pretty easy to see the value added when containers are present in
(appropriate) abundance in the physical model. All XML tools adapt to
XML's hierarchical nature and tend to hook up nearly all processing to
the presence of elements (or element boundaries, for streaming processors).
Thus, containers allow processing to "factor out" into the right place.
In OO-speak, if you turn containers into objects to which you attach
methods, it's desirable to hide certain data "lower down" from methods
operating on other data "higher up". (There's no doubt an analogy here
to relational processing of functional dependencies as well, but as my
grip on OO technology is somewhat tenuous and my grip on relational
technology is more so, I'll stop while I'm behind.) You wouldn't want
your Address information associated directly with a Party; it's useful
to have the intervening Address ABIE/container because you do things
with Party data that you don't do with Address data, and vice versa.
The benefit of factoring-out is true for XSD customization as well. If
the data is appropriately grouped, then each trading community needing
to add more data will be able to find the conceptually right place to
add those pieces. It would be weird to have to add new address-related
fields to PartyType, particularly since XSD would require that they get
stuck at the end of the content model, possibly far away from the other
address stuff.
As an aside, note that it's typically easier to flatten a deeply nested
XML document (removing information) than to "tree-ify" a relatively flat
XML document (adding new information, which may require human invention).
*The particular value of list containers in the physical model
I have been arguing that XML processing benefits from list containers as
much as from other kinds of containers. However, intellectual honesty
(darn it :-) compels me to report some new findings that make this
position slightly harder to hold.
Our current modeling formalism allows for only relatively simple content
models. For example, the content model of an order is (in DTDspeak):
(Header, LineItem+, Summary?)
I don't believe it's possible to encode something more like the
following in our spreadsheet without inventing another layer of ABIE;
the nested parenthesized group can't be done and would probably
*deserve* its own ABIE and definition if we really wanted this structure:
(Header, (LineItem+, Summary?)+)
We do have some cases of multiple series of like elements within a
single parent element, but each set has a different element name. For
example, an order header looks like this:
(IssueDateTime, Identifier*, BuyerIdentifier?, SellerIdentifier?,
BuyerAccountId?, Quote*, ...)
Here, both Identifier and Quote are series of like elements, but they
have different names so the series are easily distinguishable in processing.
Because of this situation, it turns out that XSLT processing (for
example) is not all that hard when there is no list container present.
All processing that needs to be done on the list as a whole could simply
be done in the XSLT template for the parent element. For example, a
skeleton template for handling the Order element could look like this:
...order setup stuff goes here...
(...shunt off to Header template...)
...line item setup stuff goes here...
(...shunt off to one LineItem template that works for all
of them equally...)
...line item wrapup stuff goes here...
(...shunt off to Summary template...)
...order wrapup stuff goes here...
I have been arguing that it would be more appropriate to put the line
item setup and wrapup processing in its own dedicated LineItemList
template, because these things are specific to the line items, not the
order. I asked Norm Walsh about this (he supplied an XSLT stylesheet
from which I derived the above example); while he agreed that my
position is conceptually sound, he felt that it may not be all that
compelling in practical terms for XSLT designers. Those of you who are
planning to compile the UBL schemas into an object representation, or
use procedural code to operate on UBL, might want to weigh in on this as
well.
Of course, the argument made above for easier customization still holds
true for list containers just as much as for other containers. If a
trading community ever wanted to add some *un*like elements to a
collection of like elements (such as metadata that applies across a
whole line item list), the only honest way to do this in XSD is to hang
the additions off a LineItemList structure. (It would be extremely
difficult and ugly to add a list container in the customization that
wasn't present in the original.)
The only reason I can think of *not* to have list container elements is
that it creates a bunch of new elements. The question then becomes on
of whether it's a true economy or a false economy not to have them, at
which point the above arguments need to be referred to.
On balance, I maintain that the presence of list containers in the
physical model is a good thing. My general policy is to be generous
with containers because they are the common currency of XML; I hope I've
shown that the presence of list containers makes processing more natural
and object-oriented, and allows for customization where the lack of a
container would effectively disallow it.
*The value of list-container ABIEs in the logical model
If you accept my reasoning on why list containers are good in the
physical model, it's time to consider whether we also need them in the
logical model.
List containers have an interesting characteristic not shared by other
kinds of containers: You can inspect the logical model and know exactly
where all the list containers would go -- and therefore it would be easy
to have our perl script generate them in the physical model while having
no annoying traces of them appear in the logical model. They would
never have to be ABIEs, and never incurring the expense of having to
write and maintain spreadsheet rows for them. So why might it be
desirable for a list container to be a true first-class ABIE?
One reason is that there is overhead in having to manage an obvious
mismatch like this between the logical and physical models. Right now,
every XSD complex type has a connection to a BIE. If we generate
complex types with no connection back to a BIE, we have to explain a new
kind of "thing" in the UBL universe. (However, note that the advance
word on CCTS V1.85 is that they are inventing a non-BIE structure for
cases like this, so perhaps the mismatch would go away. Then again,
that means that a non-BIE list container would *still* appear in the
logical model, just as a new kind of second-class thing.)
Another reason is that even if the list container doesn't start out as
an ABIE logically, the moment that a trading community wants to use the
power of XSD customization to contextualize a UBL list, they'll need to
turn it into an ABIE to do so. If it can be an ABIE as a result of that
process, why not in its original state? It's not so hard to define a
list ABIE initially as "A collection of information about ", and I don't really see any impurity in it
when the first little contextualization could also use the same
definition for a real BIE. Also, I have no idea how a customizer would
go about turning a non-BIE structure into a true BIE, particularly if
the non-BIE really didn't appear anywhere in the logical model.
In fact, what this boils down to is that there's not so much difference,
after all, between an ABIE and a complex datatype in XSD. The reason
that complex datatypes exist in XSD is *not* to manage constraints on
the physical expression of the XML (DTDs did that just fine without the
notion of a type hierarchy); it's to encapsulate a particular chunk of
data that is likely to be associated with its own special functionality.
In this sense it's truly object-oriented, exactly as the BIE system is.
So if we agree that list containers are desirable in the physical model,
it seems like more trouble than it's worth to *avoid* them in the
logical model.
*Conclusion
List containers good as logical things!
List containers good as physical things!
:-)
--
Eve Maler +1 781 442 3190
Sun Microsystems cell +1 781 883 5917
XML Web Services / Industry Initiatives eve.maler @ sun.com
----------------------------------------------------------------
To subscribe or unsubscribe from this elist use the subscription
manager:
----------------------------------------------------------------
To subscribe or unsubscribe from this elist use the subscription
manager: