Data Types - Abstract Specification

This document specifys the HL7 Version 3 Data Types on an abstract layer, independent of representation. By "independent of representation" we mean independent of both abstract syntax as well as implementation in any particular implementation technology.

This document is accompanied by Implementation Technology Specifications (ITS). The ITS documents can serve as a quick compendium to the data types that is more practically oriented toward the representation in that particular implementation technology.

Vocabulary tables within this specification list the current contents of vocabulary domains for ease of reference by the reader. However, at any given time the normative source for these domains is the vocabulary tables in the RIM database. For some large domains, only a sample of possible values is shown. The complete domains can be referenced in the vocabulary tables by looking up the domain name associated with the table in the RIM vocabulary tables.

In the previous ballots of the v3 data types documents there were two ITS-independent specification, called "Part-1" and "Part-2". Since a completely representation-independent data type specification is abstract, Part-1 was supposed to provide an easier read. However, it was also more shallow and at times not correct. Part-1 gave the wrong impression as if HL7 version 3 data types were defined as an abstract syntax which is an incorrect assumptions. For that reason the specification has again been restructured.

The ITS documents now assume the function of a "practical" exposition of this material that is fairly concise and easy to read for those readers who know the respective implementation technology. The ITS documents quote the abstract specification on its concise definitions and possibly on tutorial material (that has been merged back from Part-1 into this abstract specification.)

During the last two ballots, the editorial process of these document has been largely automated to minimize duplication of text and formalize the specification shuch that technical changes to the material can be implemented much more quickly and consistently. The work on document restructuring both regarding document production (i.e., moving from Word processor documents to XML, that was initiated by the HL7 publication committee several ballot cycles ago) as well as the various attempts at restructuring this material, imposed a lot of editorial burden on the editors. We believe that the result to date is a great improvement as future technical ballot comments can be more safely, consistently and easily accomodated.

On the other hand, this editorial work dominated much of the contnet work for the last two ballot preparations. The reader may remember that during the previous ballot the abstract data type specification was added without any changes (neither editorial nor content changes) because all the available resources had been spent on the ITS for XML. During this ballot preparation the abstract specification has been updated regarding the editorial process improvements, but only very few essential content modification had been applied.

The editors therefore want to apologize to all those readers who had previously submitted technical ballot comments that had been agreed to during the reconciliation, as many of those suggestions and agreements are still not reflected in this ballot draft. We would ask the readers to please resubmit their understanding of the prior resolutions to their comments -- however informal this may be -- into their ballot. We understand that a formal resubmission of comments places an extra burden, which is why we will greatly appreciate notes about previous reconciliation agreements submitted as informal material (this would even include handwritten notes that can be sumbitted by FAX to +1-317-630-6962 attn: Gunther Schadow/V3DT.)

This standard is the result of several years of intense work through e-mail, telephone conferences and meeting discussions. Gunther Schadow (Regenstrief Institute for Health Care) chaired this task force, and is the main author of this document. Major contributions are from Mark Tucker (Regenstrief Institute), Paul V. Biron (Kaiser Permanente), Lloyd McKenzie (IBM), George Beeler, and Stan Huff (Intermountain Health Care), as well as Mike Henderson (Kaiser Permanente), Anthony Julian (Mayo), Joann Larson (Kaiser Permanente), Mark Shafarman (Oacis Healthcare Systems), Wes Rishel (Gartner Group), and Robin Zimmerman (Kaiser Permanente). Acknowledgements for their critical review and infusion of ideas go to Bob Dolin (Kaiser Permanente), Clem McDonald (Regenstrief Institute), Kai Heitmann (HL7 Germany), Rob Seliger (Sentillion), and Harold Solbrig (Mayo Clinic). Vital support came from the members of the task force, Laticia Fitzpatrick (Kaiser Permanente), Matt Huges, Randy Marbach (Kaiser Permanente), Larry Reis (Wizdom Systems), Carlos Sanroman (Kaiser Permanente), Greg Thomas (Kaiser Permanente). Thanks James Case (University of California, Davis), Norman Daoust (Partners HealthCare Systems), Irma Jongeneel (HL7 The Netherlands), Michio Kimura (HL7 Japan), John Molina (SMS), Richard Ohlmann (McKessonHBOC), David Rowed (HL7 Australia), and Klaus Veil (Macquarie Health Corp., HL7 Australia), for sharing their expertise in critical questions. This work was made possible by the Regenstrief Institute for Health Care.

Appendices

Every data element has a data type. Data types define the meaning (semantics) of data values that can be assigned to a data element. Meaningful exchange of data requires that we know the definition of values so exchanged. This is true for complex "values" such as business messages as well as for simpler values such as character strings or integer numbers.

According to ISO 11404, a data type is "a set of distinct values, charac terized by properties of those values and by operations on those values." A data type has intension and extension. Intentionally, the data type defines the properties exposed by every data value of that type. Extensionally, data types have a set of data values that are of that type (the type's "value set").

Semantic properties of data types are what ISO 11404 calls "properties of those values and [...] operations on those values." A semantic property of a data type is referred to by a name and has a value for each data value. The value of a data value's property must itself be a value defined by a data type - no data value exists that would not be defined by a data type.

Data types are thus the basic building blocks used to construct any higher order meaning: messages, computerized patient record documents, or business objects and their transactions. What, then, is the difference between a data type and a message, document, or business object? Data type values stand for themselves, the value is all that counts, neither identity nor state or changing of state is defined for a data value. Conversely in business objects, we track state and identity; the properties of an identical object might change between now and later. Not so with data values: a data value and its properties are constant. For example, number 5 is always number 5, there is no difference between this number 5 and that number 5 (no identity distinguished from value), number 5 never changes to number 6 (no change of state). One can think of data values as immutable objects where identity does not matter (identity and equality are the same.)¹

Data values can be represented through various symbols but the data value's meaning is not bound to any particular representation.

For example, cardinal numbers (non-negative integers) are defined - intentionally - as a data type where each value has a successor value, where zero is the successor of no other cardinal value. Based on this definition we can define addition, multiplication, and other mathematical operations. Whatever representation reflects the rules we stated in the intentional definition of the cardinal data type is a valid representation of cardinal numbers. Examples for valid cardinal number representations are decimal digit strings, bags of glass marbles, or scratches on a wall. The number two is represented by the word "five" by the Arabic number "5" or the Roman number "V". The representation does not matter as long as it conforms to the semantic definition of the data type.

Another example, the Boolean data type is defined by its extension, the two distinct values true and false and the rules of negation and combining these values in conjunction and disjunction. The representation of Boolean values can be the words "true" and "false," "yes" and "no," the numbers 0 and 1, any two signs that are distinct from each other. The representation of data types does not matter as long as it conforms to the semantic definition of the data type.

This specification defines the semantics, the meaning of the HL7 data types. This specification is about semantics only, independent from representational and operational concerns or specific implementation technologies. Additional standards for representing the data values defined here are being defined for various technological approaches. These standards are called "Implementable Technology Specification" (ITS.) Those ITS define how values are represented so that they conform to the semantic definitions of this specifications, this may include syntaxes for character or binary representations, and computer procedures to act on the representation of data values. The meaning of these ITS representations communicated, generated, and processed in computer programs, is defined based on this standard, the semantic data type specification.

Data values have properties defined by their data type. The "fields" of "composite data types" are the most common example of such properties. However, more generally one should think of a data value's property as logical predicates or as mathematical functions; in simpler but still correct terms, properties are questions one can ask about a data value to receive another data value as an answer.

A property is referred to by its name. For example, the data type integer may have a property named "sign." A property has a domain, which is the set of possible "answer" values. The set of possible "answer" values is defined by the property's data type, but the domain of a property may be a subset of the data type's value set.

A property may also have arguments, additional information one must supply with a question to get an answer. For example, an important property of an integer number is that one integer plus another integer results in another integer, so the plus property of one integer needs an argument: the other integer.

Whether semantic properties have arguments is not a fundamentally relevant distinction. A data type's semantic property without arguments is not necessarily a "field" of a "composite" data type. For example, for integer values, we can define the property is-zero that has the Boolean value true when the number is zero and false when the number is not zero. This does not mean that is-zero must be an explicit component of any integer representation.

A data type's semantic property with arguments has no specific operational notions such as "procedure call," "passing arguments," "return values," "throwing exceptions," etc. These are all concepts of computer systems implementation of data types - but these operational notions are irrelevant for the semantics of data types.

This specification is about semantics of data types only. Neither is it about value representation syntax (not even an abstract syntax), nor is it about an operational interface to the data values.

Why does this specification make such a big issue about its being abstract from representation syntax as well as operational implementation?

HL7 needs this kind of abstract semantic data type specification for a very practical purpose. One important design feature of HL7 version 3 is its openness towards representation and implementation technologies. All HL7 version 3 specifications are supposed to be done in a form independent from specific representation and implementation technologies. HL7 acknowledges that, while at times some representation and implementation technologies may be more popular than others, technology is going to change - and with changing technology, representations of data values will change. HL7 standards are primarily targeted to healthcare domain information, independent from the technology supporting this information. HL7 expects that specifications defined independent from today's technology will continue to be useful, even after the next technological "paradigm shift".

The issue of data types is closer to implementation technology than most other HL7 information standards - and therein lays a certain danger that we define data types too dependent on current implementation technologies.

The majority of HL7 standards are about complex business objects. Complex business objects with many informational attributes can be specified as abstract syntax, where components are eventually defined in terms of data types. Conversely, defining data types in terms of abstract syntax is of little use because the components of such abstract syntax constructs would still have to have data types.²

Why doesn't this specification define a set of primitive data types based on which composite data types could be defined simply as abstract syntax?

Any concrete implementation of the HL7 standards must ultimately use the built-in data types of their implementation technology. Therefore, we need a very flexible mapping between HL7 abstract data types and those data types built into any specific implementation technology. With a semantic specification, an Implementable Technology Specification (ITS) can conform simply by stating a mapping between the constructs of its technology and the HL7 version 3 data type semantics. Whether a data type is primitive of composite is irrelevant from a semantic perspective, and the answer may be different for different implementation technologies.

For example, this standard specifies a character string as a data type with many properties (e.g., charset, language, etc.) However, in many Implementation Technologies, character strings are primitive first class data types. We encourage that these native data types be used rather than a structure that slavishly represents all the semantic properties as "components." This specification only requires that the properties defined for data values can somehow be inferred from whatever representation is chosen, it does not matter how these values are represented. Whether "primitive" or "composite", with few or many "components", as "fields" or "methods" - this is all irrelevant.

For another example, a decimal representation, a floating-point register and a scaled integer are all possible native representations of real numbers for different implementation technologies. Some of these representations have properties that others do not have. Scaled integers, for instance, have a fixed precision and a relatively small range. Floating-point values have variable precision and a large range, but floating-point values lose any information about precision. Decimal representations are of variable precision and maintain the precision information (yet are slow to processing.) The data type semantics must be independent from all these accidental properties of the various representations, and must define the essential properties that any technology should be able to represent.

Why does HL7 need its own data type standard? Why can't HL7 simply adopt a standard defined by some other body?

As noted in the previous section, all HL7 implementation technologies have some data type system, but there are differences among the data type systems between implementation technologies. In addition, many implementation technologies' data type systems are not powerful enough to express the concepts that matter for the HL7 application layer.

For example, few implementation technologies provide the concepts of physical quantities, precision, ranges, missing information, and uncertainty that are so relevant in scientific and health care computing.

On the other hand, implementation technologies do make distinctions that are not relevant from the abstract semantics viewpoint, e.g., fixed point vs. floating-point real numbers; 8, 16, 32, or 64-bit integers; date vs. timestamp.

A number of data type systems have been used as input to this specification. These include the type systems of many major programming languages, including BASIC, Pascal, MODULA-2, C, C++, JAVA, ADA, LISP and SCHEME. This also includes type systems of language-independent implementation technologies, such as Abstract Syntax Notation One (ASN.1), Object Management Group's (OMG) Interface Definition Language (IDL) and Object Constraint Language (OCL), SQL 92 and SQL 99, the ISO 11404 language independent data types, and XML Schema Part 2 data types. Health care standards related data types have been considered as well, among these HL7 version 2.x, types used by CEN TC 251 messages and Electronic Health Record Architecture (EHCRA) and DICOM.

This specification defines data types in several forms, using textual description, UML diagrams, tables, and a formal definition.

A formal definition of data types is used in order to clarify the semantics of the proposed types as unambiguously as possible. This data type definition language is described in detail in Section 1.3. Formal languages make crisp essential statement and are therefore accessible to some formal argument of proof or rebuttal. However, the terseness of such formal statements may also be difficult to understand by humans. Therefore, all the important inferences from the formal statements are also included as plain English statements.

For a quick overview at the beginning of many data types this specification contains tables listing what is called "primary" properties. "Primary" properties are a somewhat fuzzy notion of those properties that are more likely to be thought of as "fields" when the data type where implemented as a record ("composite data type"). These tables only exist to facilitate an overview of the content and purpose of data types. While their content is part of the normative specification, the fact that a property is or is not listed in these tables has no significance. There is no requirement that the properties listed in these tables be represented as fields, and these tables are not abstract syntax definitions.

Property tables are not shown for all data types. Again, this does not mean that those data types have no properties. It also does not mean that those data types are "primitive" data types as per this specification. The property tables are used as a helpful summary only, and are not used when they would confuse more than they would help.

Each row of the property tables describes one property with the following columns:

The Unified Modeling Language (UML) is used for a graphical presentation of how data types relate. Data types are shown as UML classes. The name compartment contains the long name of the data type followed by a colon and the standard abbreviation. Properties of types without are all shown in the UML operations compartment. No instance attributes are shown, in accordance with the fact that this abstract specification is not about implementation or concrete representation. Generalization links indicate extension and restriction relationships. Aggregations are an additional representation of properties, when the relation between data types through that property is important. Generic types are shown as UML parameterized classes, with UML realization links relating their instantiations

Definition of a data type occurs in two steps. First, the data type is declared. The declaration claims a name for a new data type with a list of names, types, and signatures of the new type's semantic properties. This declares, not defines the type. The definition occurs in both logic statements about what is always true about this type's values and their properties (invariant statements.)

Table 1: Overview of HL7 version 3 data types
Name	Symbol	Description
DataValue	ANY	Defines the basic properties of every data value. This is an abstract type, meaning that no value can be just a data value without belonging to any concrete type. Every concrete type is a specialization of this general abstract DataValue type.
Boolean	BL	The Boolean type stands for the values of two-valued logic. A Boolean value can be either or , or, as any other value may be NULL.
Encapsulated Data	ED	Data that is primarily intended for human interpretation or for further machine processing outside the scope of HL7. This includes unformatted or formatted written language, multimedia data, or structured information in as defined by a different standard (e.g., XML-signatures.) Instead of the data itself, an may contain only a reference (see .) Note that the data type is a specialization of the data type when the media type is text/plain.
Character String	ST	The character string data type stands for text data, primarily intended for machine processing (e.g., sorting, querying, indexing, etc.) Used for names, symbols, and formal expressions.
Concept Descriptor	CD	A concept descriptor represents any kind of concept usually by giving a code defined in a code system. A concept descriptor can contain the original text or phrase that served as the basis of the coding and one or more translations into different coding systems. A concept descriptor can also contain qualifiers to describe, e.g., the concept of a "left foot" as a postcoordinated term built from the primary code "FOOT" and the qualifier "LEFT". In exceptional cases, the concept descriptor need not contain a code but only the original text describing that concept.
Coded Simple Value	CS	Coded data in its simplest form, where only the code and display name is not predetermined. The code system and code system version is fixed by the context in which the CS value occurs. CS is used for coded attributes that have a single HL7-defined value set.
Coded With Equivalents	CE	Coded data that consists of a coded value (CV) and, optionally, coded value(s) from other coding systems that identify the same concept. Used when alternative codes may exist.
Instance Identifier	II	An identifier that uniquely identifies a thing or object. Examples are object identifier for HL7 RIM objects, medical record number, order id, service catalog item id, Vehicle Identification Number (VIN), etc. Instance identifiers are defined based on ISO object identifiers.
Telecommunication Address	TEL	A telephone number (voice or fax), e-mail address, or other locator for a resource mediated by telecommunication equipment. The address is specified as a Universal Resource Locator (URL) qualified by time specification and use codes that help deciding which address to use for a given time and purpose.
Postal Address	AD	Mailing and home or office addresses. A sequence of address parts, such as street or post office Box, city, postal code, country, etc.
Entity Name	EN	A name for a person, organization, place or thing. A sequence of name parts, such as first name or family name, prefix, suffix, etc. Examples for entity name values are "Jim Bob Walton, Jr.", "Health Level Seven, Inc.", "Lake Tahoe", etc. An entity name may be as simple as a character string or may consist of several entity name parts, such as, "Jim", "Bob", "Walton", and "Jr.", "Health Level Seven" and "Inc.", "Lake" and "Tahoe".
Trivial Name	TN	A restriction of entity name that is effectively a simple string used for a simple name for things and places.
Person Name	PN	A name for a person. A sequence of name parts, such as first name or family name, prefix, suffix, etc.
Organization Name	ON	A name for an organization. A sequence of name parts.
Integer Number	INT	Integer numbers (-1,0,1,2, 100, 3398129, etc.) are precise numbers that are results of counting and enumerating. Integer numbers are discrete, the set of integers is infinite but countable. No arbitrary limit is imposed on the range of integer numbers. Two NULL flavors are defined for the positive and negative infinity.
Real Number	REAL	Fractional numbers. Typically used whenever quantities are measured, estimated, or computed from other real numbers. The typical representation is decimal, where the number of significant decimal digits is known as the precision.
Ratio	RTO	A quantity constructed as the quotient of a numerator quantity divided by a denominator quantity. Common factors in the numerator and denominator are not automatically cancelled out. The data type supports titers (e.g., "1:128") and other quantities produced by laboratories that truly represent ratios. Ratios are not simply "structured numerics", particularly blood pressure measurements (e.g. "120/60") are not ratios. In many cases the should be used instead of the .
Physical Quantity	PQ	A dimensioned quantity expressing the result of measuring.
Monetary Amount	MO	A monetary amount is a quantity expressing the amount of money in some currency. Currencies are the units in which monetary amounts are denominated in different economic regions. While the monetary amount is a single kind of quantity (money) the exchange rates between the different units are variable. This is the principle difference between physical quantity and monetary amounts, and the reason why currency units are not physical units.
Point in Time	TS	A a quantity specifying a point on the axis of natural time. A point in time is most often represented as a calendar expression.
Set	SET	A value that contains other distinct values in no particular order.
Sequence	LIST	A value that contains other discrete values in a defined sequence.
Bag	BAG	An unordered collection of values, where each value can be contained more than once in the bag.
Interval	IVL	A set of consecutive values of an ordered base data type.
History	HIST	A set of data values that conform to the history item (HXIT) type, (i.e., that have a valid-time property). The history information is not limited to the past; expected future values can also appear.
Uncertain Value - Probabilistic	UVP	A generic data type extension used to specify a probability expressing the information producer's belief that the given value holds.
Parametric Probability Distribution	PPD	A generic data type extension specifying uncertainty of quantitative data using a distribution function and its parameters. Aside from the specific parameters of the distribution, a mean (expected value) and standard deviation is always given to help maintain a minimum layer of interoperability if receiving applications cannot deal with a certain probability distribution.
General Timing Specification	GTS	A set of points in time, specifying the timing of events and actions and the cyclical validity-patterns that may exist for certain kinds of information, such as phone numbers (evening, daytime), addresses (so called "snowbirds," residing in the south during winter and north during summer) and office hours.

Every data type is declared in a form that begins with the keyword type. For example, the following is the header of a declaration for the data type Boolean that has the short name alias BL and extends (specializes) the data type ANY.⁴

Definition 1:
type Boolean alias BL extends ANY values(true, false) { BL not; BL and(BL x); };

The Boolean data type declaration also contains a values-clause that declares the Boolean's complete set of values (its extension) as named entities. These named values are also valid character string literals. None of the other data types defined in this specification has a finite value set, which is why the values-clause is unique to the Boolean. In the marked-up formal language, value names use Italics font.

The block in curly braces following the header contains declarations of the semantic properties that hold for every value of the data type. A semicolon terminates each property declaration; and another semicolon after the closing curly brace terminates the data type declaration.

A property declaration mentions from left to right: (1) the data type of the property's value domain, the property name, and (3) an optional argument list. The argument list of a property is enclosed in parentheses containing a sequence of argument declarations. Each argument is declared by the data type name and argument name. Semantic properties without arguments do not use an empty argument list.⁵

The extends-clause has the usual meaning of a specialization relationship known from the object-oriented method.⁶ Specialization means (a) inheritance of properties from the genus to the species, and (b) substitutability of values of the species type for variables of the genus type. In addition, however, this data type definition language specifies two variants of specialization: extension (extends) and restriction (restricts). Extension indicates that additional properties are being defined for the specialized type. Restriction indicates that the inherited properties are being constrained.

An example for inheritance is: when ANY has the property isNull and BL extends ANY then BL also has this property isNull even though isNull is not listed explicitly in the property declaration of BL. An example for substitutability is: when a property is declared as of a data type ANY and BL extends ANY then a value of such property may be of type BL. In other words, substitutability is the same as subsumption of all values of type BL being also values of type ANY.⁷

The type-declaration may be qualified by the keyword abstract and protected. An abstract type is a type where no value can be just of this type without belonging to a concrete specialization of the abstract. A protected type is a type that is used inside this specification but no property outside this specification should be declared of a protected type.⁸ (We also use the qualifier private at one point. Private types are only specified for the sake of formal definition of other types and are not used in any form outside this specification.)

The declaration of semantic properties, their names, data types, and arguments provide only clues as to what the new data type might be about. The true definition lies in the invariant statements. Invariant statements are logical statements that are true at all times.

Throughout this specification, invariant statements are provided in a formal syntax but are also written in plain English. The advantage of the formal syntax is that it can be interpreted unambiguously, and that it is strongly typed. The advantage of plain English statements is that they are more understandable, especially to those untrained in reading formal languages.

The formal syntax does help to sharpen the decisiveness of this specification. In some cases, however, the full semantics of a type are beyond what can be fully expressed in such invariant statements. The combination of both plain and formal language helps to make this specification more clear.

Invariant statements are formed using the invariant keyword that declares one or more variables in the same form as an argument list of a property. The invariant statement can contain a where clause that constrains the arguments for the entire invariant body. The invariant body is enclosed in curly braces. It contains a list of assertions that must all be true.

The semantics of the invariant statement is a logic predicate with a universal quantifier ("for all").

Definition 2:
invariant(BL x) where x.nonNull { x.and(true).equals(x); };

The above invariant statement can be read in English as "For all Boolean values x, where x is non-NULL it holds that x AND true equals x." All properties should be named such that one can read the assertions like English sentences.⁹

The argument list of an invariant statement need not be specified if no such argument is needed.

Definition 3:
invariant { true.not.equals(false); false.not.equals(true); };

Assertions in invariant statements are expressions built with the semantic properties of defined data types. Assertion expressions must have a Boolean value (true or false.)¹⁰ No primitive data types, or operations, pre-exist the definition of any data type. The only preexisting features of the assertion expression language are:¹¹

Within assertion expressions, nested quantifier statements can be formed similar to invariant statements. In fact, the universal quantifier built using the forall keyword is the same as the invariant statement. The universal quantifier can be used in a nested expression when the complexity of the problem requires it, such as in the following example:

Definition 4:
invariant(SET<T> x, y) where x.nonNull { x.subset(y).equals( forall(T element) where x.contains(element) { y.contains(element); }); };

The existence quantifier has the meaning as in common propositional logic. For example, the following invariant means: "SET values x and y intersect if and only if there exists an element e that is contained in both sets x and y."

The existence quantifier may have a where-clause; however, there is no difference whether an assertion is made as a where-clause or in the body of the existence quantifier. Conversely, for universal quantifiers, the where-clause weakens the assertion since the body now only applies for values that meet the criterion in the where-clause.

Definition 5:
invariant(SET x, y) where x.nonNull { x.intersects(y).equals( exists(T e) { x.contains(e); y.contains(e); }); };

This specification defines certain allowable conversions between data types. For example, there is a pair of conversions between the Character String (ST) and Encode Data (ED). This means that if a one expects an ED value but actually has an ST value instead, one can turn the ST value into an ED.¹²

Three kinds of type conversions are defined: promotion, demotion, and character string literals. Type conversions can be implicit or explicit. Implicit type conversion occurs when a certain type is expected (e.g. as an argument to a statement) but a different type is actually provided. If the type provided has a conversion to the type expected the conversion should be done implicitly.

An explicit conversion can be specified in an assertion expression using the converted-to type name in parenthesis before the converted value. For example the following is an explicit type conversion in the where clause of an invariant statement.

Definition 6:
invariant(ED x) where ((ST)x).nonNull { ... };

The type conversion has lower priority than the property resolution period. Thus "(T)a.b " converts the value of the property b of variable a to data type T while "((T)a).b " converts the value of variable a to T and then references property b of that converted value.

Implicit type conversions in the assertion expressions are performed where possible. If a property's formal argument is declared of data type T; but the expression used as an actual argument is of type U; and if U does not extend T; and if U defines a conversion to T, that conversion from T to U takes effect.

A demotion is a conversion with a net loss of information. Generally, this means that a more complex type is converted into a simple type.

An example for a demotion is the conversion from Interval (IVL) to a simple Quantity (QTY), e.g. the center of the interval. In the data type definition language, a demotion is declared using the keyword demotion and the data type name to which to demote:

The specification of demotions shall indicate what information is lost and what the major consequences of losing this information are.

A promotion is a conversion where new information is generated. Generally, this means that a simpler type is converted into a more complex type.

Definition 7:
type Interval alias IVL { ... demotion QTY; ... };

For example, we allow any Quantity (QTY) to be converted to an Interval (IVL). However, IVL has more semantic properties than QTY, low and high boundary. Thus, the conversion of QTY to IVL is a promotion. The additional properties of QTY not present in IVL must assume new values, default values, or computed values. The specification of the promotion must indicate what these values are or how they can be generated.

A promoting conversion from type QTY to type IVL is defined as a semantic property of data type QTY using the keyword promotion and the data type name to which to promote:

Typically, a promotion is defined from a simple type to a more complex type. Also typically, the simple type is declared earlier in this document than a more complex type. Declaring all promotions to complex types in the simple type would thus involve forward references and would be confusing to the reader. Therefore, an alternative syntax allows promotions to be defined in the more complex type. This is indicated by naming the type from which to promote in an argument list behind the type to which to promote.

Definition 8:
type Quantity alias QTY { ... promotion IVL; ... };

Definition 9:
type Interval alias IVL { ... promotion IVL (QTY x); ... };

A literal is a character string representation of a data value. Literals are defined for many types. A literal is a type conversion from and to a Character String (ST) with a specially defined syntax.

Not every conversion from and to an ST is a literal conversion, however. A literal for a data type should be able to represent the entire value set of a data type whereas any other conversion to and from ST may only map a smaller subset of the converted data type.

The purpose of having literals is so that one can write down values in a short human readable form. For example, literals for the types integer number (INT) and real number (REAL) are strings of sign, digits, possibly a decimal point, etc. The more important interval types (IVL<REAL>, IVL<PQ>, IVL<TS>) have literal representations that allow one to use, e.g., "<5" to mean "less than 5", which is much more readable than a fully structured form of the interval. For some of the more advanced data types such as intervals, general timing specification, and parametric probability distribution we expect that the literal form may be the only form seen for representing these values until users have become used to the underlying conceptualizations.

Each literal conversion has its own syntax (grammar,) often aligned with what people find intuitive. This syntax may therefore not be completely straightforward from a computer's perspective.¹³

In the data type definition language we declare a literal form as a property of a data type using the keyword literal followed by the data type name ST, since the literal is a conversion to and from the ST data type.

Definition 10:
type IntegerNumber alias INT { ... literal ST; ... };

The actual definition of the literal form occurs outside the data type declaration body using an attribute grammar. An attribute grammar is a grammar that specifies both syntax and semantics of language structures. The syntax is defined in essentially the Backus-Naur-Form (BNF).¹⁴

For example, consider the following simple definition of a data type for cardinal numbers (positive integers.) This type definition depends only the Boolean data type (BL) and has a character string literal declared:

The literal syntax and semantics is first exposed completely and then described in all detail.

Definition 11:
type CardinalNumber alias CARD { BL isZero; BL equals(CARD x); CARD successor; CARD plus(CARD x); CARD timesTen; literal ST; };

Definition 12:
CARD.literal ST { CARD : CARD digit { $.equals($1.timesTen.plus($2); } \| digit { $.equals($1); }; CARD digit : "0" { $.isZero; } \| "1" { $.equals(0.successor); } \| "2" { $.equals(1.successor); } ... \| "8" { $.equals(7.successor); } \| "9" { $.equals(8.successor); } };

Every syntactic rule consists of the name of a symbol, a colon and the definition (so called production) of the symbol. A production is a sequence of symbols. These other symbols are also defined in the grammar, or they are terminal symbols. Terminal symbols are character strings written in double quotes or string patterns (called regular expressions.) Thus the form:

means, that any cardinal number symbol is a cardinal number symbol followed by a digit or just a digit. The vertical bar stands for a disjunction (logical OR.) A syntactic rule ends with a semicolon.

Every symbol has exactly one value of a defined data type. The data type of the symbol's value is declared where the symbol is defined:

Definition 13:
CARD : CARD digit \| digit;

Definition 14:
CARD digit : "0" \| "1" \| "2" \| ... \| "8" \| "9";

means that the symbol digits has a value of type CARD. The start-symbol is the data type itself and does not need a separate name.

The semantics of the literal expression is specified in semantic rules enclosed in curly braces for each of the defined productions of a symbol:

symbol : production₁ { rule₁ } | production₂ { rule₂ } | ... | production_n { rule_n };

A semantic rule is simply a semicolon-separated list of Boolean assertion expressions of the same kind as those used in invariant statements. However, there are special variables defined in the semantic rule that all begin with a dollar character (e.g., $, $1, $2, $3, ...) The simple $ stands for the value of the currently defined symbol; while $1, $2, $3, etc. stand for the values of the parts of the semantic rule's associated production. For example, in

Definition 15:
CARD : CARD digit { $.equals($1.timesTen.plus($2); } \| digit { $.equals($1); };

the first production "CARD digit" has a semantic rule that says: the value $ of the defined symbol equals the value $1 of the first symbol CARD times ten plus the value $2 of the second symbol digit.¹⁵

A terminal symbol can be specified as a string pattern, so-called regular expression. The regular expression syntax used here is the classic syntax invented by Aho and used in AWK, LEX, GREP, and PERL. Regular expressions appear between two slashes /.../. In a regular expression pattern every character except [ ] ^ $ . / : ( ) \ | ? * + { } matches itself. The other characters that are actually used in this specification are defined in Table 4.

Table 2: Special Characters for Regular Expressions
Pattern	Definition
[ ... ]	Specifies a character class. For example, /[A-Za-z]/ matches the characters of the upper and lower case English alphabet.
[^ ...]	Specifies a character class negatively. For example, /[^BCD]/ matches any character except B, C, and D.
...?	The preceding pattern is optional. For example, /ab?c/ matches "ac" and "abc".
...*	The preceding pattern may occur zero or many times. For example, /ab*c/ matches "ac", "abc", "abbc", "abbbc", etc.
...+	The preceding pattern may occur one or more times. For example, /ab+c/ matches "abc", "abbc", "abbbc", but not "ac".
... {n,m}	The preceding pattern may occur n to m times where n and m are cardinal numbers 0 ( n ( m. For example, /ab{2,4}c/ matches "abbc", "abbbc", and "abbbbc".
... \| ...	The pattern on either side of the bar may match. For example, /ab\|cd/ matches "abd" and "acd" but not "abcd".
( ... )	The pattern in parentheses is used as one pattern for the above operators. For example, /a(bc)*/ matches "a", "abc", "abcbc", "abcbcbc", etc.
... : ...	The left pattern matches if followed by the right pattern, but the right pattern is not consumed by a match. For example, /ab:c/ matches "abc" but not "ab", however, the value of a symbol thus matched is "ab" and the "c" is left over for the next symbol. The colon is a slight deviation from the conventional slash / but the slash is also conventionally used to enclose the entire pattern and may occur as a character to match - three meanings is one too many.
... \ ...	Matches the following character literally, i.e. escapes from any special meaning of that character. For example, /a\+b/ matches "a+b".
... \/ ...	Matches the slash as a character. For example, /a\/bc/ macthes "a/bc".

Generic data types are incomplete type definitions. This incompleteness is signified by one or more parameters to the type definition. Usually parameters stand for other types. Using parameters, a generic type might declare semantic properties of other not fully specified data types. For example, the generic data type Interval is declared with a parameter T that can stand for any Quantity data type (QTY). The components low and high are declared as being of type T.

Definition 16:
template<QTY T> type Interval<T> alias IVL<T> { T low; T high; };

Instantiating a generic type means completing its definition. For example, to instantiate an Interval, one must specify of what base data type the interval should be. This is done by binding the parameter T. To instantiate an Interval of Integer numbers, one would bind the parameter T to the type Integer. Thus, the incomplete data type Interval is completed to the data type Interval of Integer.

For example the following type definition for MyType declares a property named "multiplicity" that is an interval of the cardinal number data type used in the above examples.

Generic data types for collections are being used throughout this specification. The most important of them are

Definition 17:
type MyType alias MT { IVL<CARD> multiplicity; };

Set (SET<T>.) A set contains elements in no particular order and without duplicate elements. The SET<T> data type requires all elements of a set to be of the same data type.

Sequence (LIST<T>.) A sequence is a collection of values in an arbitrary but particular order. A sequence has a head and a tail, where the head is an element and the tail is the sequence without its head.

These and other generic types are fully defined in Section Error! Reference source not found.. These generic data types and their properties are being used in this specification early on. For the best understanding of this specification knowledge about the set, sequence and interval is important and the reader is advised to refer to Section Error! Reference source not found. when coming across a generic type being used to define another type.

Generic data type extensions are generic types with one parameter type that the generic type extends. In the formal data type definition language, generic type extensions follow the pattern:

Definition 18:
template<ANY T> type GenericTypeExtensionName extends T { ... };

These generic type extensions inherit properties of their base type and add some specific feature to it. The generic type extension is a specialization of the base type, thus a value of the extension data type can be used instead of its base data type.¹⁶

Definition: A meta-type declared in order to allow the formal definitions to speak about the data type of a value. Any data type defined in this specification is a value of the type DataType.

Definition: A CE specifying the identifier of the data type. The short alias name, if defined, is the main code value, in which case the long name is an equivalent translation in the CE value.

Definition: Defines the basic properties of every data value. This is an abstract type, meaning that no value can be just a data value without belonging to any concrete type. Every concrete type is a specialization of this general abstract DataValue type.

Definition: Represents the fact that every data value implicitly carries information about its own data type. Thus, given a data value one can inquire about its data type.

Definition: Indicates that a value is a non-exceptional value of the data type.

When a property, RIM attribute, or message field is called mandatory this means that any non-NULL value of the type to which the property belongs has a non-NULL value for that property, in other words, a field may not be NULL, providing that its container (object, segment, etc.) is to have a non-NULL value.

Definition: Indicates that a value is an exceptional value, or a NULL-value. A null value means that the information does not exists, is not available or cannot be expressed in the data type's normal value set.

Every data element has either a proper value or it is considered NULL. If (and only if) it is NULL, the provides more detail as to in what way or why no proper value is supplied.

Definition: If a value is an exceptional value (NULL-value), this specifies in what way and why proper information is missing.

The null flavors are a general domain extension of all normal data types. Note the distinction between value domain of any data type and the vocabulary domain of coded data types. A vocabulary domain is a value domain for coded values, but not all value domains are vocabulary domains.

The null flavor "other" is used whenever the actual value is not in the required value domain, this may be, for example, when the value exceeds some constraints that are defined too restrictive (e.g., age less than 100 years.)

Some of these null flavors are defined as named properties that can be used as simple predicates for all data values. This is done to simplify the formulation of invariants in the remainder of this specification.

Remember the difference between semantic properties and representational "components" of data values. An ITS must only represent those components that are needed to infer the semantic properties. The null-flavor predicates ANY.nonNull, ANY.isNull, ANY.notApplicable, ANY.unknown, and ANY.other can all be inferred from the property.

Definition: A predicate indicating that this exceptional value is of ANY.nullFlavor not-applicable (NA), i.e., that a proper value is not meaningful in the given context.

Definition: A predicate indicating that this exceptional value is of ANY.nullFlavor unknown (UNK).

Definition: A predicate indicating that this exceptional value is of ANY.nullFlavor other (OTH), i.e., that the required value domain does not contain the appropriate value.

Definition: Equality is a reflexive, symmetric, and transitive relation between any two data values. Only proper values can be equal, null values never are equal (even if they have the same null flavor.)

How equality is determined must be defined for each data type. If nothing else is specified, two data values are equal if they are indistinguishable, that is, if they differ in none of their semantic properties. A data type can "override" this general definition of equality, by specifying its own equals relationship. This overriding of the equality relation can be used to exclude semantic properties from the equality test. If a data type excludes semantic properties from its definition of equality, this implies that certain properties (or aspects of properties) that are not part of the equality test are not essential to the meaning of the value.

For example the physical quantity has the two semantic properties (1) a real number and (2) a coded unit of measure. The equality test, however, must account for the fact that, e.g., 1 meter equals 100 centimeters; independent equality of the two semantic properties is too strong a criterion for the equality test. Therefore, physical quantity must override the equality definition.

Definition: The Boolean type stands for the values of two-valued logic. A Boolean value can be either true or false, or, as any other value may be NULL.

With any data value potentially being NULL, the two-valued logic is effectively extended to a three-valued logic as shown in the following truth tables:

Definition: Negation of a Boolean turns true into false and false into true and is NULL for NULL values.

Definition: Conjunction (AND) is associative and commutative, with true as a neutral element. False AND any Boolean value is false. These rules hold even if one or both of the operands are NULL. If both operands for AND are NULL, the result is NULL.

Definition: The disjunction x OR y is false if and only if x is false and y is false.

Definition: The exclusive-OR constrains OR such that the two operands may not both be true.

Definition: The logical implication is important to make invariant statements. An implication is a rule of the form IF condition THEN conclusion. Logically the implication is defined as the disjunction of the negated condition and the conclusion, meaning that when the condition is true the conclusion must be true to make the overall statement true.

The implication is not reversible and does not specify what is true when the condition is false (ex falso quodlibet lat. “from false follows anything”).

The literal form of the Boolean is determined by the named values specified in the values clause, i.e., true and false.

Definition: Data that is primarily intended for human interpretation or for further machine processing outside the scope of HL7. This includes unformatted or formatted written language, multimedia data, or structured information in as defined by a different standard (e.g., XML-signatures.) Instead of the data itself, an ED may contain only a reference (see TEL.) Note that the ST data type is a specialization of the ED data type when the ED media type is text/plain.

Encapsulated data can be present in two forms, inline or by reference. Inline data is communicated or moved as part of the encapsulated data value, whereas by-reference data may reside at a different (remote) location. The data is the same whether it is located inline or remote.

Definition: Binary data is a raw block of bits. Binary data is a protected type that should not be declared outside the data type specification.

A bit is semantically identical with a non-null Boolean value. Thus, all binary data is — semantically — a sequence of non-null Boolean values.

An empty sequence is not considered binary data but counts as a NULL-value. In other words, non-NULL binary data contains at least one bit. No bit in a non-NULL binary data value can be NULL.

Definition: Identifies the encoding of the encapsulated data and identifies a method to interpret or render the data.

The mediaType is a mandatory property, i.e., every non-NULL instance of encapsulated data must have a defined type property.

The IANA defined domain of media types is established by the Internet standard RFC 2046 [http://www.isi.edu/in-notes/rfc2046.txt]. RFC 2046 defines the media type to consist of two parts:

However, this specification treats the entire media type as one atomic code symbol in the form defined by IANA, i.e., top level type followed by a slash "/" followed by media subtype. Currently defined media types are registered in a database [http://www.isi.edu/in-notes/iana/assignments/media-types] maintained by IANA. Currently more than 160 different MIME media types are defined, with the list growing rapidly. In general, all those types defined by the IANA may be used.

To promote interoperability, this specification prefers certain media types to others. This is to define a greatest common denominator on which interoperability is not only possible, but that is powerful enough to support even advanced multimedia communication needs.

Table 6 below assigns a status to certain MIME media types, where the status means one of the following:

The set of required media types is very small so that no undue requirements are forced on HL7 applications, especially legacy systems. In general, no HL7 application is forced to support any given kind of media other than written text. For example, many systems just do not want to receive audio data, because those systems can only show written text to their users. It is a matter of application conformance statements to say: "I will not handle audio". Only if a system claims to handle audio media, it must support the required media type for audio.

Definition: For character-based encoding types, this property specifies the character set and character encoding used. The charset is defined according to Internet RFC 2278, [http://www.isi.edu/in-notes/rfc2278.txt].

The charset domain is maintained by the Internet Assigned Numbers Authority (IANA) [http://www.isi.edu/in-notes/iana/assignments/character-sets]. The IANA source specifies names and multiple aliases for most character sets. For the HL7's purposes, use of multiple alias names is not allowed. The standard name for HL7 is the one marked by IANA as "preferred for MIME." If IANA has not marked one of the aliases as "preferred for MIME" the main name shall be the one used for HL7.

Table 7 lists a few of the IANA defined character sets that are of interest to current HL7 members.

Definition: For character based information the language property specifies the human language of the text.

The need for a language code for text data values is documented in RFC 2277, IETF Policy on Character Sets and Languages [http://www.isi.edu/in-notes/rfc2277.txt]. Further background information can be found in Using International Characters in Internet Mail [http://www.imc.org/mail-i18n.html], a memo by the Internet Mail Consortium.

The principles of the code domain of this attribute are specified by the Internet standard RFC 1766. It is a set of pre-coordinated pairs of one 2-letter ISO 639 language code and one 2-letter ISO 3166 country code.¹⁷

Language tags do not modify the meaning of the characters found in the text; they are only an advice on if and how to present or communicate the text.¹⁸

The language tag should not be mandatory if it is not mandatory in the implementation technology. Semantically, language tagging of strings follows a default-logic. If nothing else is specified the local language is assumed. If a language is set for an entire message or document, that language is the default. If any information element or value that is superior in the syntax hierarchy specifies a language, that language is the default for all subordinate text values.

If language tags are present in the beginning of the encoded binary text (e.g., through Unicode's plane-14 tags) this is the source of the language property of the encapsulated data value.

Definition: Indicates whether the raw byte data is compressed, and what compression algorithm was used.

Definition: A telecommunication address (TEL), such as a URL for HTTP or FTP, which will resolve to precisely the same binary data that could as well have been provided as inline data.

The semantic value of an encapsulated data value is the same, regardless whether the data is present inline data or just by-reference. However, an encapsulated data value without inline data behaves differently, since any attempt to examine the data requires the data to be downloaded from the reference.

An encapsulated data value may have both inline data and a reference. The reference must point to the same data as provided inline.

By-reference encapsulated data may not be allowed depending on the attribute or component that is declared encapsulated data. ST must always be inline.

Definition: The integrity check is a short binary value representing a cryptographically strong checksum that is calculated over the binary data. The purpose of this property, when communicated with a reference is for anyone to validate later whether the reference still resolved to the same data that the reference resolved to when the encapsulated data value with reference was created.

The integrity check is calculated according to the ED.integrityCheckAlgorithm. By default, the Secure Hash Algorithm-1 (SHA-1) shall be used. The integrity check is binary encoded according to the rules of the integrity check algorithm.

The integrity check is calculated over the raw binary data that is contained in the data component, or that is accessible through the reference. No transformations are made before the integrity check is calculated. If the data is compressed, the Integrity Check is calculated over the compressed data.

Definition: Specifies the algorithm used to compute the integrityCheck value.¹⁹

Definition: A thumbnail is an abbreviated rendition of the full data. A thumbnail requires significantly fewer resources than the full data, while still maintaining some distinctive similarity with the full data. A thumbnail is typically used with by-reference encapsulated data. It allows a user to select data more efficiently before actually downloading through the reference.

A thumbnail is an abbreviated rendition of the full data.²⁰ A thumbnail requires significantly fewer resources than the full data, while still maintaining some distinctive similarity with the full data. A thumbnail is typically used with by-reference encapsulated data. It allows a user to select data more efficiently before actually downloading through the reference.

Thumbnails may not be allowed depending on the attribute or component that is declared encapsulated data. ST never have thumbnails, and a thumbnail may not itself contain a thumbnail.

Two values of type Encapsulated Data are equal if and only if their type and referenced data are equal. For those ED values with compressed data or remote data, only the de-referenced and uncompressed data counts for the equality test. The compression and reference property themselves are excluded from the equality test, as is the thumbnail and the language property. If the ED.mediaType is character based and the charset property is not equal, the charset property must be resolved through mapping of the data between the different character sets.

The integrity check algorithm and integrity check is excluded from the equality test. However, since equality of integrity check value is strong indication for equality of the data, the equality test can be practically based on the integrity check, given equal integrity check algorithm properties.

Definition: The character string data type stands for text data, primarily intended for machine processing (e.g., sorting, querying, indexing, etc.) Used for names, symbols, and formal expressions.

The character string is a restricted encapsulated data type (ED), whose type property is fixed to text/plain, and whose data must be inlined and not compressed. Thus, the properties compression, reference, integrity check, algorithm, and thumbnail are not applicable. The character string data type is used when the appearance of text does not bear meaning, which is true for formalized text and all kinds of names.

The character string (ST) data type interprets the encapsulated data as character data (as opposed to bits), depending on the charset property of the encapsulated data type.

The character string inherits the properties head, tail, and length from BIN (via ED). These properties head, tail, and length, are redefined so that the character string appears as a sequence of entities each of which uniquely identifies one character from the joint set of all characters known by any language of the world.²¹ The properties head, tail, and length therefore refer to character, string, and character counts respectively, rather than bits and bit counts.

The head of a string is a string of only one character. A character string must at least have one character or else it is NULL. The length of a character string is the number of characters in the string. A zero-length string is an exceptional value (NULL), not a proper character string value.

The length of a string is the number of characters, not the number of encoded bytes. Byte encoding is an ITS issue and is not relevant on the application layer.

Two variations of character string literals are defined, a token form and a quoted string.²² The token form consists only of the lower case and upper case English alphabet, the ten decimal digits and the underscore. The quoted string can contain any character between double-quotes. The double quotes prevent a character string from being interpreted as some other literal. The token form allows keywords and names to be parsed from the data type specification language.

Definition: A concept descriptor represents any kind of concept usually by giving a code defined in a code system. A concept descriptor can contain the original text or phrase that served as the basis of the coding and one or more translations into different coding systems. A concept descriptor can also contain qualifiers to describe, e.g., the concept of a "left foot" as a postcoordinated term built from the primary code "FOOT" and the qualifier "LEFT". In exceptional cases, the concept descriptor need not contain a code but only the original text describing that concept.

The concept descriptor is mostly used in one of its restricted or “profiled” forms, CS, CE, CV.

Definition: A concept qualifier code with optionally named role. Both qualifier role and value codes must be defined by the coding system. For example, if SNOMED RT defines a concept "leg", a role relation "has-laterality", and another concept "left", the concept role relation allows to add the qualifier "has-laterality: left" to a primary code "leg" to construct the meaning "left leg".

The use of qualifiers is strictly governed by the code system used. The CD data type does not permit using code qualifiers with code systems that do not provide for qualifiers (e.g. pre-coordinated systems, such as LOINC, ICD-10 PCS.)

Definition: Specifies the manner in which the concept role value contributes to the meaning of a code phrase. For example, if SNOMED RT defines a concept "leg", a role relation "has-laterality", and another concept "left", the concept role relation allows to add the qualifier "has-laterality: left" to a primary code "leg" to construct the meaning "left leg". In this example "has-laterality" is the CR.name.

If a coding system allows postcoordination but no role names (e.g. SNOMED) the name attribute can be NULL.

Definition: The concept that modifies the primary code of a code phrase through the role relation. For example, if SNOMED RT defines a concept "leg", a role relation "has-laterality", and another concept "left", the concept role relation allows adding the qualifier "has-laterality: left" to a primary code "leg" to construct the meaning "left leg". In this example "left" is the CR.value.

This property is of type concept descriptor and thus can in turn have qualifiers. This allows qualifiers to nest. Qualifiers can only be used as far as the underlying code system defines them. It is not allowed to use any kind of qualifiers for code systems that do not explicitly allow and regulate such use of qualifiers.

Definition: Indicates if the sense of the role name is inverted. This can be used in cases where the underlying code system defines inversion but does not provide reciprocal pairs of role names. By default, inverted is false.

For example, a code system may define the role relation "causes" besides the concepts "Streptococcus pneumoniae" and "Pneumonia". If that code system allows its roles to be inverted, one can construct the post-coordinated concept "Pneumococcus pneumonia" through "Pneumonia - causes, inverted - Streptococcus pneumoniae."

Roles may only be inverted if the underlying coding system allows such inversion. Notably, if a coding system defines roles in inverse pairs or intentionally does not define certain inversions, the appropriate role code (e.g. "caused-by") must be used rather than inversion. It must be known whether the inverted property is true or false, if it is NULL, the role cannot be interpreted.

Definition: The plain code symbol defined by the code system. For example, "784.0" is the code symbol of the ICD-9 code "784.0" for headache.

A non-exceptional CD value has a non-NULL code property whose value is a character string that is a symbol defined by the coding system identified by the codeSystem property. Conversely, a CD value without a value for the code property, or with a value that is not from the cited coding system is an exceptional value (NULL of flavor other).

Code systems shall be referred to by Unique Identifier (UID). The UID allows unambiguous reference to standard HL7 codes, other standard code systems, as well as local codes. HL7 shall assign an UID to each of its code tables as well as to external standard coding systems that are being used with HL7. Local sites must use their ISO Object Identifier (OID) to construct a globally unique local coding system identifier.

Under HL7's branch, 2.16.840.1.113883, the sub-branches 5 and 6 contain HL7 standard and external code system identifiers respectively. The HL7 Vocabulary Technical Committee maintains these two branches.

A non-exceptional CD value (i.e. a CD value that has a non-null code property) has a non-NULL code system specifying the system of concepts that defines the code. In other words whenever there is a code there is also a code system.

An exceptional CD of NULL-flavor "other" indicates that a concept could not be coded in the coding system specified. Thus, for these coding exceptions, the code system that did not contain the appropriate concept must be provided in the code system property.

Some code domains are qualified such that they include the portion of any pertinent local coding system that does not simply paraphrase the standard coding system (coded with extensibility, CWE.) If a CWE qualified field actually contains such a local code, the coding system must specify the local coding system from which the local code was taken. However, for CWE domains the local code is a valid member of the domain, so that local codes in CWE domains constitute neither an error nor an exceptional (NULL/other) value in the sense of this specification.

The code system name is optional and has no function in communication. The purpose of a code system name is to assist an unaided human interpreter of a code value to interpret the code system UID. It is suggested — though not absolutely required — that ITS provide for code system name fields in order to annotate the UID for human comprehension.

HL7 systems must not functionally rely on the code system name. The code system name can never modify the meaning of the code system UID value and cannot exist without the UID value.

Definition: If applicable, a version descriptor defined specifically for the given code system

HL7 shall specify how these version strings are formed for each external code system. If HL7 has not specified how version strings are formed for a particular coding system, version designations have no defined meaning for such coding system.

Different versions of one code system must be compatible. Whenever a code system changes in an incompatible way, it will constitute a new code system, not simply a different version, regardless of how the vocabulary publisher calls it.

For example, the publisher of ICD-9 and ICD-10 calls these code systems, "revision 9" and "revision 10" respectively. However, ICD-10 is a complete redesign of the ICD code, not a backward compatible version. Therefore, for the purpose of this data type specification, ICD-9 and ICD-10 are different code systems, not just different versions. By contrast, when LOINC updates from revision "1.0j" to "1.0k", HL7 would consider this to be just another version of LOINC, since LOINC revisions are backwards compatible.

Definition: A name or title for the code, under which the sending system shows the code value to its users.

The display name is included both as a courtesy to an unaided human interpreter of a code value and as a documentation of the name used to display the concept to the user. The display name has no functional meaning; it can never exist without a code; and it can never modify the meaning of the code.

The original text exists in a scenario where an originator of the information does not assign a code, but where the code is assigned later by a coder (post-coding.) In the production of a concept descriptor, original text may thus exist without a code.²³

Although the concept descriptor's value property is NULL, original text may still exist for the CD value. Any CD value with the code property of NULL signifies a coding exception. In this case, the text property is a name or description of the concept that was not coded. Such exceptional CD may contain translations. Such translations directly encode the concept described in the original text property.

Neither display name nor original text is part of the information a receiving system must automatically recognize. An information producer is responsible for the proper coding of all information in the value attribute, for any information consumer may safely ignore the display name and original text attributes.

A concept descriptor can be demoted into a character string (ST) value representing only the original text of the CD value.

Definition: A set of other concept descriptors that translate this concept descriptor into other code systems.

The translation property is a set of other concept descriptors thate each translate the first concept descriptor into different code systems. Each element of the translation set was translated from the first concept descriptor. Each translation may, however, also contain translations. Thus, when a code is translated multiple times the information about which code served as the input to which translation will be preserved.

Definition: Specifies additional codes that increase the specificity of the the primary code.

The primary code and all the qualifiers together make up one concept. A concept descriptor with qualifiers is also called a code phrase.

Qualifiers constrain the meaning of the primary code, but do not shift or even invert the meaning of the primary code. The meaning of the primary code without qualifiers must not be wrong, although less specific.

Qualifiers can only be used according to well-defined rules of post-coordination. A concept descriptor may only have qualifiers if the code system defines the use of such qualifiers or if there is a third code system that specifies how other code systems may be combined.

For example, SNOMED allows constructing concepts as a combination of multiple codes. SNOMED RT defines a concept "cellulitis (morphologic abnormality)" (M-41650) a role "associated topography" (G-C505) and another concept "left foot (body structure)" (T-D9720). SNOMED-RT allows one to combine these codes in a code phrase:

In this example, there is one code system, SNOMED-RT that defines all the primary code and the qualifiers and how these are used, which is why in our example representation the codeSystem does not need to be mentioned for the qualifier name and value (the codeSystem is inherited from the primary code.)

Another common example is the U.S. Health Care Financing Administration (HCFA) procedure codes. HCFA procedure codes (HCPCS) are based on CPT-4 and add additional qualifiers to it. For example, the patient with above finding (plus peripheral arterial disease, diabetes mellitus, and a chronic skin lesion at the left great toe) may have an amputation of that toe. The CPT-4 concept is "Amputation, toe metatarsophalangeal joint" (28820) and a HCPCS qualifier needs to be added to indicate "left foot, great toe" (TA). Thus we code:

In this example, the code system of the qualifier (HCPCS) is different than the code system of the primary code (CPT-4.) It is only because there are well-defined rules that define how these codes can be combined, that the qualifier may be used. Note also, that the role name is optional, and for HCPCS codes there are no distinguished role names.

The order of qualifiers is preserved, particularly for the case where the coding system allows post-coordination but defines no role names. (e.g., some ICD-9CM codes, or the old SNOMED "multiaxial" coding.)

The main use of concept descriptors is for the purpose of indexing, querying and decision-making based on a coded value. A semantically unambiguous specification of coded values therefore requires a clear definition of what equality of concept descriptor values means and how CD values should be compared.

The equality of two concept descriptor values is determined solely based upon the code and coding system. The code system version is excluded from the equality test.²⁴ If qualifiers are present, the qualifiers are included in the equality test. Translations are not included in the equality test.²⁵ Exceptional concept descriptor values are not equal even if they have the same NULL-flavor or the same original text.²⁶

Some code systems define certain style options to their code values. For example, the U.S. National Drug Code (NDC) has a dash and a non-dash form. An example for the dash form may be 1234-5678-90 when the non-dash form is 01234567890. Another example for this problem is when certain ISO or ANSI code tables define optional alphanumeric and numeric forms of two or three character lengths all in one standard.

In the case where code systems provide for multiple representations, HL7 shall make a ruling about which is the preferred form. HL7 shall document that ruling where that respective external coding system is recognized. HL7 shall decide upon the preferred form based on criteria of practicality and common use. In absence of clear criteria of practicality and common use, the safest, most extensible, and least stylized (the least decorated) form shall be given preference.²⁷

Definition: Specifies whether this concept descriptor is a specialization of the operand concept descriptor.

Naturally, concepts can be narrowed and widened to include or exclude other concepts. Many coding systems have an explicit notion of concept specialization and generalization. The HL7 vocabulary principles also provide for concept specialization for HL7 defined value sets. The implies-property is a predicate that compares whether one concept is a specialization of another concept, and therefore implies that other concept.

When writing predicates (e.g., conditional statements) that compare two codes, one should usually test for implication not equality of codes.

For example, in Table 19 the "telecommunication use" concepts: work (W), home (H), primary home (HP), and vacation home (HV) are defined, where both HP and HV imply H. When selecting any home phone number, one should test whether the given use-code cimplies H. Testing for cequals H would only find unspecified home phone numbers, but not the primary home phone number.

Operationally, implication can be evaluated in one of two ways. The code system literals may be designed such that one single hierarchy is reflected in the code literal itself (e.g., ICD-9.) Apart from such special cases, however, a terminological knowledge base and an appropriate subsumption algorithm will be required to evaluate implication statements. For post-coordinated coding systems, designing such a subsumption algorithm is a non-trivial task.²⁸

Use of the full concept descriptor data type is exceptional. It requires a conscious decision and documented rationale. In all other cases, one of the CD restrictions shall be used.²⁹

All CD restrictions constrain certain properties of the CD. Properties may be constraint to the extent that only one value may be allowed for that property, in which case mentioning the property becomes redundant. Constraining a property to one value is referred to as suppressing that property. Although, conceptually a suppressed property is still semantically applicable, it is safe for an HL7 interface to assume the implicit default value without testing.

Definition: Coded data in its simplest form, where only the code and display name is not predetermined. The code system and code system version is fixed by the context in which the CS value occurs. CS is used for coded attributes that have a single HL7-defined value set.

For example, since the ED type subscribes to the MIME design, it trusts IETF to manage the media type. This includes that this specification subscribes to the extension mechanism built into the MIME media type code (e.g., "application/x-myapp").

For CS values, the designation of the domain qualifier will always be CNE (coded, non-extensible) and the context determines unambiguously which HL7 value set applies.³⁰

Every non-NULL CS value has a defined code system. The external representation of the CS needs not explicitly mention the code system, because the context mandates one and only one code system to be used. Specifying the code system explicitly would be redundant. However, the code system property assumes that context-specific default value and is not NULL.

Definition: Coded data, specifying only a code, code system, and optionally display name and original text. Used only as the data type for other data types' properties.

This type is used when any reasonable use case will require only a single code value to be sent. Thus, it should not be used in circumstances where multiple alternative codes for a given value are desired. This type may be used with both the CNE (coded, non-extensible) and the CWE (coded, with extensibility) domain qualifiers.

Definition: Coded data that consists of a coded value (CV) and, optionally, coded value(s) from other coding systems that identify the same concept. Used when alternative codes may exist.

The CE type is used when the use case indicates that alternative codes may exist and where it is useful to communicate these. The CE type provides for a primary code value, plus a set of alternative or equivalent representations.

Definition: An identifier that uniquely identifies a thing or object. Examples are object identifier for HL7 RIM objects, medical record number, order id, service catalog item id, Vehicle Identification Number (VIN), etc. Instance identifiers are defined based on ISO object identifiers.

Definition: A unique identifier string is a character string which identifies an object in a globally unique and timeless manner. The allowable formats and values and procedures of this data type are strictly controlled by HL7. At this time, user-assigned identifiers may be certain character representations of ISO Object Identifiers (OID) and DCE Universally Unique Identifiers (UUID). HL7 also reserves the right to assign other forms of UIDs, such as mnemonic identifiers for code systems.

The sole purpose of the UID is to be a globally and timelessly unique identifier. The form of the UID, whether it is an OID, an UUID or any other form is entirely irrelevant. As far as HL7 is concerned, the only thing one can do with a UID is denote to the object for which it stands. Comparison of UIDs is literal, i.e. if two UIDs are literally identical, they are assumed to denote to the same object. If two UIDs are not literally identical they may not denote to the same object (and in general are assumed to denote to different objects.)

No difference in semantics is recognized between the different allowed forms of the UID. The different forms are not distinguished by a component within or aside from the identifier string itself.

Even though this specification recognizes no semantic difference between the different forms of the unique identifier forms, there are differences of how these identifiers are built and managed, which is the sole reason to define subtypes to the UID for each of the variants.

Definition: A globally unique string representing an ISO Object Identifier (OID) in a form that consists only of numbers and dots (e.g., "2.16.840.1.113883.3.1"). According to ISO, OIDs are paths in a tree structure, with the left-most number representing the root and the right-most number representing a leaf.

Each branch under the root corresponds to an assigning authority. Each of these assigning authorities may, in turn, designate its own set of assigning authorities that work under its auspices, and so on down the line. Eventually, one of these authorities assigns a unique (to it as an assigning authority) number that corresponds to a leaf node on the tree. The leaf may represent an assigning authority (in which case the OID identifies the authority), or an instance of an object. An assigning authority owns a namespace, consisting of its sub-tree.

OIDs are the preferred scheme for unique identifiers. OIDs should always be used except if one of the inclusion criteria for other schemes apply.

According to ISO/IEC 8824 an object identifier is a sequence of object identifier component values, which are integer numbers. These component values are ordered such that the root of the object identifier tree is the head of the list followed by all the arcs down to the leaf representing the information object identified by the OID. The fact that OID extends LIST<INT> represents this path of object identifier component values from the root to the leaf.

The leaf and "butLeaf" properties take the opposite view. The leaf is the last object identifier component value in the list, and the "butLeaf" property is all of the OID but the leaf. In a sense, the leaf is the identifier value and all of the OID but the leaf refers to the namespace in which the leaf is unique and meaningful.

However, what part of the OID is considered value and what is namespace may be viewed differently. In general, any OID component sequence to the left can be considered the namespace in which the rest of the sequence to the right is defined as a meaningful and unique identifier value. The value-property with a namespace OID as its argument represents this point of view.³¹

HL7 shall establish an OID registry and assign OIDs in its branch for HL7 users and vendors upon their request. HL7 shall also assign OIDs to public identifier-assigning authorities both U.S. nationally (e.g., the U.S. State driver license bureaus, U.S. Social Security Administration, HIPAA Provider ID registry, etc.) and internationally (e.g., other countries Social Security Administrations, Citizen ID registries, etc.) The HL7 assigned OIDs must be used for these organizations, regardless whether these organizations have other OIDs assigned from other sources.

When assigning OIDs to third parties or entities, HL7 shall investigate whether an OID is already assigned for such entities through other sources. It this is the case, HL7 shall record such OID in a catalog, but HL7 shall not assign a duplicate OID in the HL7 branch. If possible, HL7 shall notify a third party when an OID is being assigned for that party in the HL7 branch.

Though HL7 shall exercise diligence before assigning an OID in the HL7 branch to third parties, given the lack of a global OID registry mechanism, one cannot make absolutely certain that there is no preexisting OID assignment for such third-party entity. Also, a duplicate assignment can happen in the future through another source. If such cases of supplicate assignment become known to HL7, HL7 shall make efforts to resolve this situation. For continued interoperability in the meantime, the HL7 assigned OID shall be the preferred OID used.

While most owners of an OID will "design" their namespace sub-tree in some meaningful way, there is no way to generally infer any meaning on the parts of an OID. HL7 does not standardize or require any namespace sub-structure. An OID owner, or anyone having knowledge about the logical structure of part of an OID, may still use that knowledge to infer information about the associated object; however, the techniques cannot be generalized.

An HL7 interface must not rely on any knowledge about the substructure of an OID for which it cannot control the assignment policies.

The structured definition of the OID is provided mostly to be faithful to the OID specification. Within HL7, OIDs are used as UID strings only, i.e., the literal string value is the only thing that is communicated and is the only thing that a reciever should have to consider when working with UIDs in the scope of the HL7 specification.

For compatibility with the DICOM standard, the literal form of the OID should not exceed 64 characters. (see DICOM part 5, section 9).

Definition: A globally unique string representing a DCE Universal Unique Identifier (UUID) in the common UUID format that consists of 5 hyphen-separated groups of hexadecimal digits having 8, 4, 4, 4, and 12 places respectively.

Both the UUID and its string representation are defined by the Open Group, CDE 1.1 Remote Procedure Call specification, Appendix A.

UUIDs are assigned based on Ethernet MAC addresses, the point in time of creation and some random component. This mix is believed to generate sufficiently unique identifiers without any organizational policy for identifier assignment (in fact this piggy-backs on the organization of MAC address assignment.)

UUIDs are not the preferred identifier scheme for use as HL7 UIDs. UUIDs may be used when identifiers are issued to objects representing individuals (e.g., entity instance identifiers, act event identifiers, etc.) For objects describing classes of things or events (e.g., catalog items), OIDs are the preferred identifier scheme.

The structured definition of the UUID is provided mostly to be faithful to the UUID specification. Within HL7, UUIDs are used as UID strings only, i.e., the literal string value is the only thing that is communicated and is the only thing that a reciever should have to consider when working with UIDs in the scope of the HL7 specification.

The literal form for the UUID is defined according to the original specification of the UUID. However, because the HL7 UIDs are case sensitive, for use with HL7, the hexadecimal digits A-F in UUIDs must be converted to upper case.

Definition: A globally unique string defined exclusively by HL7. Identifiers in this scheme are only defined by balloted HL7 specifications. Local communities or systems must never use such reserved identifiers based on bilateral negotiations.

HL7 reserved identifiers are strings that consist only of (US-ASCII) letters, digits and hyphens, where the first character must be a letter. HL7 may assign these reserved identifiers as mnemonic identifiers for major concepts of interest to HL7.

Definition: A unique identifier that guarantees the global uniqueness of the instance identifier. The root alone may be the entire instance identifier.

In the presence of a non-null extension, the root is commonly interpreted as the "assigning authority", that is, it is supposed that the root somehow refers to an organization that assigns identifiers sent in the extension. However, the root does not have to be an organizational UID, it can also be a UID specifically registered for an identifier scheme.³²

Definition: A character string as a unique identifier within the scope of the identifier root.

The extension is a character string that is unique in the namespace designated by the root. If a non-NULL extension is exists, the root specifies a namespace (sometimes called "assigning authority" or "identifier type".) The extension property may be NULL in which case the root OID is the complete unique identifier.

It is recommended that systems use the OID scheme for external identifiers of their communicated objects. The extension property is mainly provided to accommodate legacy alphanumeric identifier schemes.

Some identifier schemes define certain style options to their code values. For example, the U.S. Social Security Number (SSN) is normally written with dashes that group the digits into a pattern "123-12-1234". However, the dashes are not meaningful and a SSN can just as well be represented as "123121234" without the dashes.

In the case where identifier schemes provide for multiple representations, HL7 shall make a ruling about which is the preferred form. HL7 shall document that ruling where that respective external identifier scheme is recognized. HL7 shall decide upon the preferred form based on criteria of practicality and common use. In absence of clear criteria of practicality and common use, the safest, most extensible, and least stylized (the least decorated) form shall be given preference.³³

HL7 may also decide to map common external identifiers to the value portion of the II.root OID. For example, the U.S. SSN could be represented as 2.16.840.1.113883.4.1.123121234. The criteria of practicality and common use will guide HL7's decision on each individual case.

Definition: A human readable name or mnemonic for the assigning authority. This name may be provided solely for the convenience of unaided humans interpreting an II value. Note: no automated processing must depend on the assigning authority name to be present in any form.

The assigning authority name is not the name for the individually identified object, but for the namespace, that immediately contains that object identifier. Two cases exist.

Definition: Specifies if the identifier's extension is intendended for human display and data entry (displayable = true) as opposed to pure machine interoperation (displayable = false).

Definition: If applicable, specifies during what time the identifier is valid. By default, the identifier is valid indefinitely. Any specific interval may be undefined on either side indicating unknown effective or expiry time. Note: identifiers for information objects in computer systems should not have restricted valid times, but should be globally unique at all times. The identifier valid time is provided mainly for real-world identifiers, whose maintenance policy may include expiry (e.g., credit card numbers.)

The II type conforms to the history item data type extension (Section 0). This means that the data types HXIT<II> and II are the same.

Two instance identifiers are equal if and only if their root and extension properties are equal.

Definition: A telephone number (voice or fax), e-mail address, or other locator for a resource mediated by telecommunication equipment. The address is specified as a Universal Resource Locator (URL) qualified by time specification and use codes that help deciding which address to use for a given time and purpose.

The semantics of a telecommunication address is that a communicating entity (the responder) listens and responds to that address, and therefore can be contacted by an other communicating entity (the initiator.)

The responder of a telecommunication address may be an automatic service that can respond with information (e.g., FTP or HTTP services.) In such case a telecommunication address is a reference to that information accessible through that address. A telecommunication address value can thus be resolved to some information (in the form of encapsulated data, ED.)

The telecommunication address is an extension of the Universal Resource Locator (URL) specified as an Internet standard RFC 1738 [http://www.isi.edu/in-notes/rfc1738.txt]. The URL specifies the protocol and the contact point defined by that protocol for the resource. Notable use cases for the telecommunication address data type are for telephone and fax numbers, e-mail addresses, Hypertext references, FTP references, etc.

Definition: A telecommunications address specified according to Internet standard RFC 1738 [http://www.isi.edu/in-notes/rfc1738.txt]. The URL specifies the protocol and the contact point defined by that protocol for the resource. Notable uses of the telecommunication address data type are for telephone and telefax numbers, e-mail addresses, Hypertext references, FTP references, etc.

Definition: Identifies the protocol used to interpret the address string and to access the resource so addressed.

Some URL schemes are registered by the Internet Assigned Numbers Authority (IANA) [http://www.iana.org], however IANA only registers URL schemes that are defined in Internet RFC documents. In fact there are a number of URL schemes defined outside RFC documents, part of which are registered with the World Wide Web Consortium (W3C).³⁴

Similar to the ED.mediaType, HL7 makes suggestions about . classifying them as required, recommended, other, and deprecated. Any scheme not mentioned has status other.

Note that this specification explicitly limits itself to URLs. Universal Resource Names (URN) are not covered by this specification. URNs are a kind of identifier scheme for other than accessible resources. This specification, however, is only concerned with accessible resources, which belong into the URL category.

Definition: The address is a character string whose format is entirely defined by the URL.scheme.

While conceptually URL has the properties scheme and address, the common appearance of a URL is as a string literal formed according to the Internet standard. The general syntax of the URL literal is:

Note that there is no special data type for telephone numbers, telephone numbers are TEL and are specified as URL.

The telephone number URL is defined in Internet RFC 2806 [http://www.isi.edu/in-notes/rfc2806.txt]. Its definition is summarized in this subsection. This summary does not override or change any of the Internet specification's rulings.

The URL.address is the telephone number in accordance with ITU-T E.123 Telephone Network and ISDN Operation, Numbering, Routing and Mobile Service: Notation for National and International Telephone Numbers (1993). While HL7 does not add or withdraw from the URL specification, the preferred subset of the URL.address address syntax is given as follows:

The global absolute telephone numbers starting with the "+" and country code are preferred. Separator characters serve as decoration but have no bearing on the meaning of the telephone number. For example: "tel:+13176307960" and "tel:+1(317)630-7960" are both the same telephone number; "fax:+49308101724" and "fax:+49(30)8101-724" are both the same fax number.

Definition: Specifies the periods of time during which the telecommunication address can be used. For a telephone number, this can indicate the time of day in which the party can be reached on that telephone. For a web address, it may specify a time range in which the web content is promised to be available under the given address.

The TEL data type where is constrained to a simple interval of time (IVL<TS>) conforms to the history item data type extension (HXIT). Thus, HXIT<TEL> is a simple restriction of TEL.

Definition: One or more codes advising a system or user which telecommunication address in a set of like addresses to select for a given telecommunication need.

The telecommunication use code is not a complete classification for equipment types or locations. Its main purpose is to suggest or discourage the use of a particular telecommunication address. There are no easily defined rules that govern the selection of a telecommunication address.

Two telecommunication address values are considered equal if both their URLs are equal. Use code and valid time are excluded from the equality test.

Definition: Mailing and home or office addresses. A sequence of address parts, such as street or post office Box, city, postal code, country, etc.

The AD is primarily used to communicate data that will allow printing mail labels, that will allow a person to physically visit that address. The postal address data type is not supposed to be a container for additional information that might be useful for finding geographic locations (e.g., GPS coordinates) or for performing epidemiological studies. Such additional information is captured by other, more appropriate HL7 elements.

Addresses are conceptualized as text with added logical mark-up. The mark-up may break the address into lines and may describe in detail the role of each address part if it is known. Address parts occur in the address in the order in which they would be printed on a mailing label. The approach is similar to HTML or XML markup of text (but it is not technically limited to XML representations.)

Addresses are essentially sequences of address parts, but add a "use" code and a valid time range for information about if and when the address can be used for a given purpose.

Definition: A character string that may have a type-tag signifying its role in the address. Typical parts that exist in about every address are street, house number, or post box, postal code, city, country but other roles may be defined regionally, nationally, or on an enterprise level (e.g. in military addresses). Addresses are usually broken up into lines, which are indicated by special line-breaking delimiter elements (e.g., DEL).

Definition: Specifies whether an address part names the street, city, country, postal code, post box, etc. If the type is NULL the address part is unclassified and would simply appear on an address label as is.

Definition: A set of codes advising a system or user which address in a set of like addresses to select for a given purpose.

An address without specific use code might be a default address useful for any purpose, but an address with a specific use code would be preferred for that respective purpose.

Definition: A General Timing Specification (GTS) specifying the periods of time during which the address can be used. This is used to specify different addresses for different times of the year or to refer to historical addresses.

The AD where is constrained to a simple interval of time (IVL<TS>) conforms to the history item data type extension (HXIT). Thus, HXIT<AD> is a simple restriction of AD.

Two address values are considered equal if both contain the same address parts, independent of ordering. Use code and valid time are excluded from the equality test.

Definition: A character string value with the address formatted in lines and with proper spacing. This is only a semantic property to define the function of some of the address part types.³⁵

The AD data type's main purpose is to capture postal addresses, such that one can visit that address or send mail to it. Humans will look at addresses in printed form, such as on a mailing label. The AD data type defines precise rules of how its data is formatted.³⁶

Addresses are ordered lists of address parts. Each address part is printed in the order of the list from left to right and top to bottom (or in any other language-specific reading direction, which to determine is outside the scope of this specification.) Every address part value is printed. Most address parts are framed by white space. The following six rules govern the setting of white space.

This means that all address parts are generally surrounded by white space, but white space does never accumulate. Delimiters are never surrounded by implicit white space and every white space contributed by preceding or succeeding address parts is discarded, whether it was implicit or explicit.

The first form would result from a system that only stores addresses as free text or in a list of fields line1, line2, etc.:

The second form is more specific about the role of the address parts than the first one:

This form is the typical form seen in the U.S., where street address is sometimes separated, and city, state and ZIP code are always separated.

The latter form above is not used in the USA. However, it is useful in Germany, where many systems keep house number as a distinct field. For example, the German address:

Definition: A name for a person, organization, place or thing. A sequence of name parts, such as first name or family name, prefix, suffix, etc. Examples for entity name values are "Jim Bob Walton, Jr.", "Health Level Seven, Inc.", "Lake Tahoe", etc. An entity name may be as simple as a character string or may consist of several entity name parts, such as, "Jim", "Bob", "Walton", and "Jr.", "Health Level Seven" and "Inc.", "Lake" and "Tahoe".

Entity names are conceptualized as text with added logical mark-up. Name parts occur in a natural order in which they would be displayed, as opposed to in a order detemined by name part. The ordeing of the name parts is significant a feature that replaces the need for a separate "display name" property. Applications may change that ordering of name parts to account for their user's customary ordering of name parts. The approach is similar to HTML or XML markup of text (but it is not technically limited to XML representations.)

Entity names are essentially sequences of entity name parts, but add a "use" code and a valid time range for information about when the name was used and how to choose between multiple aliases that may be valid at the same point in time.

Definition: A character string token representing a part of a name. May have a type code signifying the role of the part in the whole entity name, and a qualifyer code for more detail about the name part type. Typical name parts for person names are given names, and family names, titles, etc.

Definition: Indicates whether the name part is a given name, family name, prefix, suffix, etc.

Not every name part must have a type code, if the type code is unknown, not applicable, or simply undefined this is expressed by a NULL value (type.isNull). For example, a name may be "Rogan Sulma" and it may not be clear which one is a first name or which is a last name, or whether Rogan may be a title.

Entity names are conceptualized as text with added mark-up. The mark-up may break the address into lines and may describe in detail the role of each name part if it is known. Name parts occur in the order in which they would be printed on a mailing label. The model is similar to HTML or XML markup of text.

Definition: The qualifier is a set of codes each of which specifies a certain subcategory of the name part in addition to the main name part type. For example, a given name may be flagged as a nickname, a family name may be a pseudonym or a name of public records

Definition: A set of codes advising a system or user which name in a set of names to select for a given purpose.

A name without specific use code might be a default address useful for any purpose, but an address with a specific use code would be preferred for that respective purpose.

Definition: An interval of time specifying the time during which the name is or was used for the entity. This accomodates the fact that people change names for people, places and things.

Two name values are considered equal if both conatain the same name parts, independent of ordering. Use code and valid time are excluded from the equality test.

Definition: A character string value with the entity name formatted with proper spacing. This is only a semantic property to define the function of some of the name part types.³⁹

The EN data type's main purpose is to capture names of people, places, and things (entities), so that one can address and refer to these entities in speech and writing. Humans will look at names in printed form, such as on a mailing label. The EN data type therefore defines precise rules of how its data is formatted.⁴⁰

Entity names are ordered lists of entity name parts. Each entity name part is printed in the order of the list from left to right (or in any other language-specific reading direction.) Every entity name part (except for those marked "invisible") is printed. Most entity name parts are framed by whitespace. The following six rules govern the setting of whitespace.

Three restrictions to Entity Name are defined in order to allow making specific constraints for certain kinds of entities, trivial name (TN), person name (PN), and organization name (ON).

Definition: A restriction of entity name that is effectively a simple string used for a simple name for things and places.

The TN is a EN that consists of only one name part without any name part type or qualifier. The TN, and its single name part are therefore equivalent to a simple character string. This equivalence is expressed by a defined demotion to ST and promotion from ST.

Trivial names are typically used for places and things, such as Lake Erie or Reagan National Airport:

Definition: A name for a person. A sequence of name parts, such as first name or family name, prefix, suffix, etc.

Since most of the functionality of entity name is in support of person names, the person name (PN) is only a very minor restriction on the entity name part qualifier.

None of the special qualifiers need to be mentioned if they are unknown or irrelevant. The next example shows extensive use of multiple given names, prefixes, suffixes, for academic degrees, nobility titles, vorvoegsels ("van"), and professional designations.

The next example is an organization name, "Health Level Seven, Inc." in simple string form:

The following example shows a Japanese name in the three forms: ideographic (Kanji), syllabic (Hiragana), and alphabetic (Romaji).

Definition: A restriction of entity name part that only allows those entity name parts qualifiers applicable to person names. Since the structure of entity name is mostly determined by the requirements of person name, the restriction is very minor.

A name for an organization, such as "Health Level Seven, Inc." An organization name consists only of untyped name parts, prefixes, suffixes, and delimiters.

The following is the organization name, "Health Level Seven, Inc." in a simple string form:

Definition: A restriction of entity name part that only allows those entity name parts qualifiers applicable to organization names.

Definition: The quantity data type is an abstract generalization for all data types (1) whose value set has an order relation (less-or-equal) and (2) where difference is defined in all of the data type's totally ordered value subsets. The quantity type abstraction is needed in defining certain other types, such as the interval and the probability distribution.

Definition: A predicate expressing an order relation that is reflexive, asymmetric and transitive, between this quantitity and another quantity.

The relation is defined on any totally ordered partition of the quantity data type. A totally ordered partition is a subset of the data types's defined values where all elements have a defined order (e.g., the integer and real numbers are totally ordered.)

By contrast, a partially ordered set is a set where some, but not all pairs of elements are comparable through the order relation (e.g., a tree structure or the set of physical quantities is a partially ordered set.) Two data values x and y of an ordered type are comparable (x.compares(y)) if the less-or-equal relation holds in either way (x ≤ y or y ≤ x).

A partial order relation generates totally ordered subsets whose union is the entire set (e.g., the set of all length is a totally ordered subset of the set of all physical quantities.)

For example, a tree structure is partially ordered, where the root is considered less or equal to a leaf, but there may not be an order among the leafs. Also, physical quantities are partially ordered, since an order exists only among quantities of the same dimension (e.g., between two lengths, but not between a length and a time.) A totally ordered subset of a tree is a path that transitively connects a leaf to the root. The physical dimension of time is a totally ordered subset of physical quantities.

Definition: A predicate indicating if this value and the operand can be compared as to which is greater than the other.

Two quantities are comparable if they are both elements of a common totally ordered partition of their data types' value space. The definition is based on QTY.lessOrEqual.

Definition: A quantity expressing the "distance" of this quantity from the operand quantity, that must be comparable. The data type of the difference quantity is related to the operand quantities but need not be the same.

A difference is defined in an ordered set if it is semantically meaningful to state that Δ is the difference between the values x and y. This difference Δ must be meaningful independently from the values x and y. This independence exists if for all values u one can meaningfully derive a value v such that Δ would also be the difference between u and v. The judgment for what is meaningful cannot be defined formally.⁴¹

The has a data type that can express the difference between two values for which the ordering relation is defined (i.e., two elements of a common totally ordered subset.) For example, the difference data type of integer number is integer number, but the difference type of point in time is a physical quantity in the dimension of time. A difference data type is a totally ordered data type.

The difference between two values x minus y must be defined for all x and y in a common totally ordered subset of the data type's value set. Zero is the difference between a value and itself.

Definition: The sum of this quantity and its operand. The operand must be of a data type that can express the difference between two values of this quantity's data type.

Definition: The neutral element in the difference and addition operations, i.e., if a quantity is zero, addition to, or subtraction from any other comparable quantity will result in that other quantity.

Definition: A predicate expressing an order relation that is asymmetric and transitive, between this quantitity and another quantity. The ordering is the same as QTY.lessOrEqual, but irreflexive.

Definition: A predicate expressing an order relation that is reflexive, asymmetric and transitive, between this quantitity and another quantity. This is the inverse order of QTY.lessOrEqual.

Definition: A predicate expressing an order relation that is asymmetric and transitive, between this quantitity and another quantity. This is the invese of QTY.lessThan.

Definition: Integer numbers (-1,0,1,2, 100, 3398129, etc.) are precise numbers that are results of counting and enumerating. Integer numbers are discrete, the set of integers is infinite but countable. No arbitrary limit is imposed on the range of integer numbers. Two NULL flavors are defined for the positive and negative infinity.

Since the integer number data type includes all of the semantics of the mathematical integer number concept, the basic operations plus (addition) and times (multiplication) are defined. These operations are defined here as characterizing operations in the sense of ISO 11404, and because these operations are needed in other parts of this specification, namely the semantics of the literal form.

The traditional recursive definitions of addition and multiplication are due to Grassmann, and use the notion of INT.successor.⁴²

Definition: The INT value that is greater than this INT value but where no INT value exists between this value and its successor.

Definition: The result of multiplying this integer with the operand, equivalent to repeated additions of this integer.

Definition: The inverse element of the INT value, another INT value, which, when added to that value yields zero (the neutral element.)

Definition: A predicate indicating whether the INT zero (neutral element) is less or equal to this INT.

Definition: A predicate indicating whether this INT is less than zero (not non-negative.)

Definition: The integer division operation of this integer (dividend) with another integer (divisor) is the integer number of times the divisor fits into the dividend.

The literal form of an integer is a simple decimal number, i.e. a string of decimal digits.

Definition: Fractional numbers. Typically used whenever quantities are measured, estimated, or computed from other real numbers. The typical representation is decimal, where the number of significant decimal digits is known as the precision.

The term "Real number" in this specification is used to mean that fractional values are covered without necessarily implying the full set of the mathematical real numbers that would include irrational numbers such as ρ, Euler's number, etc.⁴³

This specification offers two choices for a number data type. The choice is made as follows: Any number attribute is a real if it is not known for sure that it is an integer. A number is an integer if it is always counted, typically representing an ordinal number. If there are conceivable use cases where such a number would be estimated or averaged, it is not always an integer and thus should use the Real data type.

The algebraic operations are specified here as characterizing operations in the sense of ISO 11404, and because these operations are needed in other parts of this specification.

Unlike the integer numbers, the real numbers semantics are not inductively constructed but only intuitively described by their axioms of their algebraic properties. The completeness axioms are intentionally left out so as to make no statement about irrational numbers.

Definition: A REAL value, which, when added to another REAL value yields zero (the neutral element of addition.)

Definition: A predicate indicating if this value is the number one, i.e., the neutral element of multiplication. There is exactly one real number that has this property.

Definition: An operation in REAL that forms an abelian group and is related to addition by the law of distribution.

Definition: A REAL value, which, when muliplied with another REAL value yields one (the neutral element of multiplication). Zero (the neutral element of addition) has no inverse element.

The INT and REAL data types are related by a homomorphism that maps every value in INT to a value in REAL whereby the algebraic properties of INT are preserved. This means, an integer can be promoted to a real and a real can be demoted to an integer by means of rounding off the fractional part.

Definition: The basis of exponentiation is the iterative multiplication of a real number, and extended to rational exponents as the inverse operation.

The literal form of an integer is a string of decimal digits with optional leading "+" or "-" sign, and optional decimal point, and optional exponential notation using a case insensitive "e" between the mantissa and the exponent. The number of significant digits must conform to the precision property.

Examples of real literals for two thousand are 2000, 2000., 2e3, 2.0e+3, +2.0e+3.

Note that the literal form does not carry type information. For example, "2000" is a valid representation of both a real number and an integer number. No trailing decimal point is used to disambiguate from integer numbers. An ITS that uses this literal form must recover the type information from other sources.

The precision attribute is only the precision of a decimal digit representation, not the accuracy of the real number value.

The purpose of the precision property for the real number data type is to faithfully capture the whole information presented to humans in a number. The amount of decimal digits shown conveys information about the uncertainty (i.e., precision and accuracy) of a measured value.

The precision of the representation should match the uncertainty of the value. However, precision of the representation and uncertainty of the value are separate independent concepts. Refer to Section 4.4.2 for details about uncertain real numbers.

For example "0.123" has 3 significant digits in the representation, but the uncertainty of the value may be in any digit shown or not shown, i.e., the uncertainty may be 0.123±0.0005, 0.123±0.005 or 0.123±0.00005, etc. Note that external representations should adjust their representational precision with the uncertainty of the value. However, since the precision in the digit string is granular to 0.5 the least significant digit, while uncertainty may be anywhere between these "grid lines", 0.123±0.005 would also be an adequate representation for the value between 0.118 and 0.128.

Equality of real numbers is determined based on the value and precision. The value with a higher precision is rounded to the precision of the other value and then the comparison made.

Definition: A quantity constructed as the quotient of a numerator quantity divided by a denominator quantity. Common factors in the numerator and denominator are not automatically cancelled out. The RTO data type supports titers (e.g., "1:128") and other quantities produced by laboratories that truly represent ratios. Ratios are not simply "structured numerics", particularly blood pressure measurements (e.g. "120/60") are not ratios. In many cases the REAL should be used instead of the RTO.

Ratios are different from rational numbers, i.e., in ratios common factors in the numerator and denominator never cancel out. A ratio of two real or integer numbers is not automatically reduced to a real number.

The default value for both numerator and denominator is the integer number 1 (one.) The denominator may not be zero.

Definition: The quantity that is being devided in the ratio. The default is the integer number 1 (one.)

Definition: The quantity that devides the numerator in the ratio. The default is the integer number 1 (one.) The denominator must not be zero.

A ratio literal form exists for all ratios where both numerator and denominators have literal forms. A ratio is simply the numerator literal a colon as separator followed by the denominator literal. When the colon and denominator are missing, the integer number 1 is assumed as the denominator.

For example, the rubella virus antibody titer value 1:64 could be represented using the literal "1:64".

Definition: An extension of the coded value data type representating a physical quantity using a unit from any code system. Used to show alternative representation for a physical quantity.

Definition: The magnitude of the measurement value in terms of the unit specified by this code.

Definition: The unit of measure specified in the Unified Code for Units of Measure (UCUM) [http://aurora.rg.iupui.edu/UCUM].

Definition: An alternative representation of the same physical quantity expressed in a different unit, of a different unit code system and possibly with a different value.

Physical quantities semantically are the results of measurement acts. Although physical quantities are represented as pairs of value and unit, semantically, a physical quantity is more than that. To find out whether two physical quantities are equal, it is not enough to compare equality of their two values and units independently. For example, 100 cm equals 1 m although neither values nor units are equal. To define equality we introduce the notion of a canonical form.

Definition: A physical quantity expressed in a canonical unit. In any given unit system has every physical dimension can be assigned one canonical unit Defining the canonical unit is not subject of this specification, only asserting that such a canonical unit exists (and can be arbitrarily chosen) for every physical quantity. An abstract physical quantity is equal to its canonical form.

For example, for a unit system based on the Système International (SI) one can define the canonical form as (a) the product of only the base units; (b) without prefixes; where (c) only multiplication and exponents are used (no division operation); and (d) where the seven base units appear in a defined ordering (e.g., m, s, g...) Thus, 1 mm Hg would be expressed as 133322 m^-1 s^-2. As can be seen, the rules how to build the canonical form of units may be quite complex. However, for the semantic specification it doesn't matter how the canonical form is built, nor what specific canonical form is chosen, only that some canonical form could be defined.

Two physical quantities are equal if each their values and their units of their canonical forms are equal.

Two physical quantities compare each other (and have an ordering and difference) if the units of their canonical forms are equal.

Definition: A predicate indicating if this value is the number one, i.e., the neutral element of multiplication. There is exactly one physical quantity that has this property and is called the unity.

Algebraic operations are defined for physical quantities because they are characterizing operations in the sense of ISO 11404 and because this specification makes use of them when defining the literal form.

Definition: The product of two physical quantities is the product of their values times the product of their units.

Definition: A PQ value, which, when muliplied with another PQ value yields one (the neutral element of multiplication). Zero (the neutral element of addition) has no inverse element. The quotient of two comparable quantities is comparable to the unity (the unit 1).

Definition: Multiplication with a real number forms a scaled quantity. A scaled quantity is comparable to its original quantity.

If two quantities Q₁ and Q₂ compare each other, there exists a real number r such that r1 = Q₁ / Q₂.

A REAL value can be converted to a PQ value with the unity, i.e. the unit 1 (one). Likewise, a physical quantity that compares the unity can be converted to a real number.

The literal form for a physical quantity is a real number literal followed by optional white space and a character string representing a valid code in the Unified Code for Units of Measure (UCUM) [http://aurora.rg.iupui.edu/UCUM].

Definition: A monetary amount is a quantity expressing the amount of money in some currency. Currencies are the units in which monetary amounts are denominated in different economic regions. While the monetary amount is a single kind of quantity (money) the exchange rates between the different units are variable. This is the principle difference between physical quantity and monetary amounts, and the reason why currency units are not physical units.

Definition: The magnitude of the monetary amount in terms of the currency unit.

The precision attribute of the real number type is the precision of the decimal representation, not the precision of the value. The real number type has no notion of uncertainty or accuracy. For example, "1.99 USD" (precision 3) times 7 is "13.93 USD" (precision 4) and should not be rounded to "13.9" to keep the precision constant.

Two MO values are equal if each their values and their currency units are equal.

Two MO values compare each other (and have an ordering and difference) if their currency units are equal.

If the currencies are not equal, the amounts cannot be compared. Conversion between the currencies is outside the scope of this specification. In practice, foreign exchange rates are highly variable not only over long and short amounts of time, but also depending on location and access to currency trade markets.

Definition: Two monetary amounts can be added if they are denominated in the same currency.

Definition: Multiplication with a real number to forms a scaled quantity. A scaled quantity is comparable to its original quantity.

The literal form for a monetary amount consists of the currency code string, optional white space, and REAL literal amount.

Definition: A a quantity specifying a point on the axis of natural time. A point in time is most often represented as a calendar expression.

Semantically, however, time is independent from calendars and best described by its relationship to elapsed time (measured as a physical quantity in the dimension of time.) A point in time plus an elapsed time yields another point in time. Inversely, a point in time minus another point in time yields an elapsed time.

As nobody knows when time began, a point in time is conceptualized as the amount of time that has elapsed from some arbitrary zero-point, called an epoch. Because there is no absolute zero-point on the time axis natural time is a difference-scale quantity, where only differences are defined but no ratios. (For example, no point in time is — absolutely speaking — "twice as late" as another point in time.)

Given some arbitrary zero-point, one can express any point in time as an elapsed time measured from that offset. Such an arbitrary zero-point is called an epoch. This epoch-offset form is used as a semantic representation here, without implying that any system would have to implement the TS data type in that way. Systems that do not need to compute distances between points in time will not need any other representation than a calendar expression literal.

Definition: The elapsed time since any constant epoch, measured as a physical quantity in the dimension of time (i.e., comparable to one second.)

It is not necessary for this specification to define a canonical epoch; the semantics is the same for any epoch, as long as the epoch is constant.

Two point-in-time values are equal if and only if their offsets (relative to the same epoch) are equal.

Definition: A code specifying the calendar used in the literal representation of this point in time.⁴⁴

The purpose of this property is mainly to faithfully convey what has been entered or seen by a user in a system originating such a point-in-time value. The calendar property also advises any system rendering a point-in-time value into a literal form of which calendar to use. However, this is only advice; any system that renders point-in-time values to users may choose to use the calendar and literal form demanded by its users rather than the calendar mentioned in the calendar property. Hence, the calendar property is not constant in communication between systems, the calendar is not part of the equality test.

For the purpose of defining the relationship between calendar expression and epoch/offset form, two private data types, Calendar (CAL) and CalendarCycle (CLCY,) are defined. These calendar data types exist only for defining this specification. These private data types may not be used at all outside this specification.

Definition: The number of significant digits of the calendar expression representation.

The precision attribute is only the precision of a decimal digit representation, not the accuracy of the point in time value.

The purpose of the precision property for the point in time data type is to faithfully capture the whole information presented to humans in a calendar expression. The number of digits shown conveys information about the uncertainty (i.e., precision and accuracy) of a measured point in time.

The precision property is dependent on the calendar. A given precision value relative to one calendar does not mean the same in another calendar with different periods.

For example "20000403" has 8 significant digits in the representation, but the uncertainty of the value may be in any digit shown or not shown, i.e., the uncertainty may be to the day, to the week, or to the hour. Note that external representations should adjust their representational precision with the uncertainty of the value. However, since the precision in the digit string depends on the calendar and is granular to the calendar periods, uncertainty may not fall into that grid (e.g., 2000040317 is an adequate representation for the value between 2000040305 and 2000040405.)

Definition: The difference between the local time in that time zone and Universal Coordinated Time (UTC, formerly called Greenwich Mean Time, GMT). The time zone is a physical quantity in the dimension of time (i.e., comparable to one second.) A zero time zone value specifies UTC. The time zone value does not permit conclusions about the geographical longitude or a conventional time zone name.

For example, 200005121800-0500 may be eastern standard time (EST) in Indianapolis, IN, or central daylight savings time (CDT) in Decatur, IL. Furthermore in other countries having other latitude the time zones may be named differently.

When the time zone is NULL (unknown), "local time" is assumed. However, "local time" is always local to some place, and without knowledge of that place, the time zone is unknown. Hence, a local time cannot be converted into UTC. The time zone should be specified for all point in time values in order to avoid a significant loss of precision when points in time are compared. The difference of two local times where the locality is unknown has an error of ±12 hours.

In administrative data context, some time values do not carry a time zone. For a date of birth in administrative data, for example, it would be incorrect to specify a time zone, since this may effectively change the date of birth when converted into other time zones. For such administrative data the time zone is NULL (not applicable.)

Definition: A point in time plus an elapsed time (i.e., physical quantity in the dimension of time) is a point in time.

Point-in-time literals are simple calendar expressions, as defined by the calendar definition table. By default, the western (Gregorian) calendar shall be used (Table 37).

For the default Gregorian calendar the calendar expression literals of this specification conform to the constrained ISO 8601 that is defined in ISO 8824 (ASN.1) under clause 32 (generalized time) and to the HL7 version 2 TS data format.

Calendar expression literals are sequences of integer numbers ordered according to the "Counter/ord." column of Table 37. Periods with lower order numbers stand to the left of periods with higher order numbers. Periods with no assigned order number cannot occur in the calendar expression for points in time.

The "Counter/digits" column of Table 37 specifies the exact number of digits for the counter number for any period.

Thus, Table 37 specifies that western calendar expressions begin with the 4-digit year (beginning counting at zero); followed by the 2-digit month of the year (beginning counting at one); followed by the 2-digit day of the month (beginning with one); followed by the 2-digit hour of the day (beginning with zero); and so forth. For example, "200004010315" is a valid expression for April 1, 2000, 3:15 am.

A calendar expression can be of variable precision, omitting parts from the right.

The least defined calendar period may be written as a real number, with the number of integer digits specified, followed by the decimal point and any number of fractional digits.

When other calendars will be used in the future, a prefix "GREG:" can be placed before the western (Gregorian) calendar expression to disambiguate from other calendars. Each calendar shall have its own prefix. However, the western calendar is the default if no prefix is present.

In the modern Gregorian calendars (and all calendars where time of day is based on UTC,) the calendar expression may contain a time zone suffix. The time zone suffix begins with a plus (+) or minus (() followed by digits for the hour and minute cycles. UTC is designated as offset "+00" or "-00"; the ISO 8601 and ISO 8824 suffix "Z" for UTC is not permitted.

Definition: A calendar is a concept of measuring time in various cycles. Such cycles are years, months, days, hours, minutes, seconds, and weeks. Some of these cycles are synchronized and some are not (e.g., weeks and months are not synchronized.)

After "rolling the time axis" into these cycles (See ) a calendar expresses a point in time as a sequence of integer counts of cycles, e.g., for year, month, day, hour, etc. The calendar is rooted in some conventional start point, called the "epoch."

Calendar is defined as a set of calendar cycles, and has a name and a code. The head of the Calendar is the largest CalendarCycle appearing right most in the calendar expression. The epoch is the beginning of that calendar, i.e., the point in time where all calendar cycles are zero.

The calendar definition can be shown as in Table 37 for the modern Gregorian calendar. The calendar definition table lists a calendar cycle in each row. The calendar units are dependent on each other and defined in the value column. The sequence column shows the relationship through the next property. The other columns are as in the formal calendar cycle definition.⁴⁵

Definition: A calendar cycle defines one group of decimal digits in the calendar expression. Examples for calendar cycles are year, month, day, hour, minute, second, and week.

A calendar cycle has a name and two codes, a one-letter code and a two-letter code. The property ndigits is the number of decimal digits occupied in the calendar expression. The property start specifies where counting starts (i.e., at 0 or 1.) The next property is the next lower cycle in the order of the calendar expression. The max(t) property is the maximum number of cycles at time t (max depends on the time t to account for leap years and leap seconds.) The property value(t) is the integer number of cycles shown in the calendar expression of time t. The property sum(t, n) is the sum of n calendar cycles added to the time t.

This section defines data types that can "collect" other data values, Set, Sequence, Bag and Interval.⁴⁶ These collection types are defined as generic (parameterized) types. The concept of generic types is described in (§ ).

Definition: A value that contains other distinct values in no particular order.

Definition: A relation of the set with its elements, true if the given value is an element of the set.

This is the primitive semantic property of a set, based on which all other properties are defined.

A set may only contain distinct non-NULL elements. Exceptional values (NULL-values) cannot be elements of a set.

Definition: The relation between a set and its subsets, where each element in the subset is also an element of the superset.

Definition: A predicate indicating that this set has no elements (negation of the SET.nonEmpty. The empty set is a proper set value, not an exceptional (NULL) value.

Definition: The cardinality of a set is the number of distinct elements in the set.

The cardinality definition is not sufficient since it doesn't converge for uncountably infinite sets (REAL, PQ, etc.) and it doesn't terminate for infinite sets. In addition, the definition of integer number type in this specification is incomplete for these cases, as it doesn't account for infinities. Finally the cardinality value is an example where it would be necessary to distinguish the cardinality ℵ₀ (aleph₀) of countably infinite sets (e.g., INT) from ℵ₁ (aleph₁), the cardinality of uncountable sets (e.g., REAL, PQ).

Definition: A union of two sets (component sets) is a set where each of the union's elements also is an element of either one component set.

Definition: The difference of this set and its subtracting set is the set that contains all elements of this set that are not elements of the subtracting set.

Definition: The difference between this set and an element value is the set that contains all elements of this set except for the subtracting element value. If the element value is not contained in this set, the difference is equal to this set.

Definition: The intersection between two sets is a set containing all and only those elements that are contained in both of the operand sets.

When the element type T has a literal form, the set of T elements has a literal form, wherein the elements of the set are enumerated within curly braces and separated by semicolon characters.

A data value of type T can be promoted into a trivial set of T with that data value as its only element.

Sets of quantities may be totally ordered sets when there is an order relationship defined between any two elements in the set. Note that "ordered set" does not mean the same as Sequence (LIST). For example, the set {3; 2; 4; 88; 1} is an ordered set. The ordering of the elements in the set notation is still irrelevant, but elements can be compared to establish an order (1; 2; 4; 88).

Totally ordered sets have convex hull. A convex hull of a totally ordered set S is the smallest interval that is a superset of S. This concept is going to be important later on.

Note that hull is defined if and only if the actual set is a totally ordered set. The data type of the elements itself need not be totally ordered. For example, the data type PQ is only partially ordered (since only quantities of the same kind can be compared), but a SET<PQ> may still be totally ordered (if it contains only comparable quantities.) For example, the convex hull of {4 s, 20 s, 55 s} is [4 s;55 s]; the convex hull of {"apples"; "oranges"; "bananas"} is undefined because the elements have no order relationship among them; and the convex hull of {2 m; 4 m; 8 s} is likewise undefined, because it is not totally ordered (seconds are not comparable with meters.)

Definition: A value that contains other discrete values in a defined sequence.

Definition: The first item in this sequence. The is a definitional property for the semantics of the sequence.

Definition: The sequence following the first item in this sequence. The is a definitional property for the semantics of the sequence.

Definition: A predicate that is true if this sequence is an empty sequence, i.e., if it contains no items.

Notice the difference between empty-sequence and NULL: an empty sequence is a proper sequence, not a null-value.

Notice that head and tail being NULL is only a necessary condition but not sufficient for determining an empty list, since a sequence may contain NULL-values as items, this condition can mean that this list has only a head item that happens to be NULL.

Definition: A predicate that is true if this sequence is non-empty. Negation of LIST.isEmpty.

Definition: The item at the given sequential position (index) in the sequence. The index zero refers to the first element (head) of the sequence.

Definition: A predicate that is true if this sequence contains the given item value.

Definition: The number of elements in the sequence. NULL elements are counted as regular sequence elements.

Two lists are equal if and only if they are both empty, or if both their head and their tail are equal.

When the element type T has a literal form, the sequence LIST<T> has a literal form. List elements are enumerated, separated by semicolon, and enclosed in parentheses.

A data value of type T can be promoted into a trivial sequence of T with that data value as its only item.

Definition: A periodic or monotone sequence of values generated from a few parameters, rather than being enumerated. Used to specify regular sampling points for biosignals.

The item at a certain index in the list is calculated by performing an integer division on the index (i) with the GLIST.denominator (d) and then take that value's remainder with the GLIST.period (p). Multiply this value with the GLIST.increment (Δx) and add to the GLIST.head (x₀.)

Definition: The difference between one value and its pervious different value. For example, to generate the sequence (1; 4; 7; 10; 13; ...) the increment is 3; likewise to generate the sequence (1; 1; 4; 4; 7; 7; 10; 10; 13; 13; ...) the increment is also 3.

Definition: If non-NULL, specifies that the sequence alternates, i.e., after this many increments, the sequence item values roll over to start from the initial sequence item value. For example, the sequence (1; 2; 3; 1; 2; 3; 1; 2; 3; ...) has period 3; also the sequence (1; 1; 2; 2; 3; 3; 1; 1; 2; 2; 3; 3; ...) has period 3 too.

The period allows to repeatedly sample the same sample space. The "waveform" of this periodic generator is always a "saw", just like the x-function of your oscilloscope.⁴⁷

Definition: The the integer by which the index for the sequence is divided, effectively the number of times the sequence generates the same sequence item value before incrementing to the next sequence item value. For example, to generate the sequence (1; 1; 1; 2; 2; 2; 3; 3; 3; ...) the is 3.

The use of the denominator is to allow multiple generated sequences to periodically scan a multidimensional space. For example, an (abstract) TV screen uses 2 such generators for the columns and rows of pixels. For instance, if there are 200 scan lines and 320 raster colunmns, the column-generator would have denominator 1 and the line-generator would have denominator 320.

Definition: A sequence of sampled values scaled and translated from a list of integer values. Used to specify sampled biosignals.

The item at a certain index (i) in the list is calculated by multiplying the item at the same index in the SLIST.digits sequence (d_i) with the SLIST.scale (s) and then add that value to the SLIST.origin (x_o ).

Definition: A sequence of raw digits for the sample values. This is typically the raw output of an A/D converter.

Definition: An unordered collection of values, where each value can be contained more than once in the bag.

This is the primitive semantic property of a bag, based on which all other properties are defined.

Definition: A predicate indicating that this bag has no elements (negation of the BAG.nonEmpty predicate. The empty bag is a proper set value, not an exceptional (NULL) value.

Definition: A bag that contains all items of the operand bags, i.e. the number of items of each item value are added.

Definition: A bags that contains all items of this bag (minuend) diminished by the items in the other bag (subtrahend). Bags cannot carry deficits. When the subtrahend contains more items of one value than the minuend, the difference contais zero items of that value.

A data value of type T can be promoted into a trivial bag of type T with that data value as its only item.

Any ordered type can be the basis of an interval; it does not matter whether the base type is discrete or continuous. If the base data type is only partially ordered, all elements of the interval must be elements of a totally ordered subset of the partially ordered data type.

For example, physical quantities are considered ordered. However the ordering of physical quantities is only partial; a total order is only defined among comparable quantities (quantities of the same physical dimension.) While intervals between 2 and 4 meter exists, there is no interval between 2 meters and 4 seconds.

Intervals are sets and have all the properties of sets. However, union and differences of intervals may not be intervals any more, since the elements of these union and difference sets might not be contiguous. Intersections of intervals are always intervals.

Definition: The difference between high and low boundary. The purpose of distinguishing a width property is to handle all cases of incomplete information symmetrically. In any interval representation only two of the three properties high, low, and width need to be stated and the third can be derived.

When both boundaries are known, width can be derived as high minus low. When one boundary and the width is known, the other boundary is also known. When no boundary is known, the width may still be known. For example, one knows that an activity takes about 30 minutes, but one may not yet know when that activity is started.

Note that the data type of the width is not always the same as for the boundaries. For ratio scale quantities (REAL, PQ, MO) it is the same. For difference scale quantities (e.g., TS) is is the data type of the difference (e.g., PQ in the dimension of time for TS). For discrete elements (INT) the width may be a REAL indicating the number of elements in the interval divided by 2.

Definition: The arithmetic mean of the interval (low plus high divided by 2). The purpose of distinguishing the center as a semantic property is for conversions of intervals from and to point values.

Note that a center doesn't always exist for every interval. Notably intervals that are infinite on one side do not have a center. Also intervals of discrete base types with an even number of elements do not have a center. If an interval is unknown on one (or both) boundaries, the center can still be asserted. In fact, the main use case for the center is to be asserted when no boundary is known.

Definition: Specifies whether the low limit is included in the interval (interval is closed) or excluded from the interval (interval is open).

Definition: Specifies whether the high limit is included in the interval (interval is closed) or excluded from the interval (interval is open).

The literal form for the interval data type is defined such that it is as intuitive to humans as possible. Five different forms are defined:⁴⁸

A quantity of type T can be promoted into a trivial interval of T where low and high boundaries are equal and boundaries closed.

An interval of T can be demoted to a simple quantity of type T that is representative for the whole interval. If both boundaries are finite, this is the IVL.center. If one boundary is infinite, the representative value is the other boundary. If both boundaries are infinite, the conversion to a point value is not applicable.

Definition: A convex hull or "interval hull" of two intervals is the least interval that is a superset of its operands. This concept will play an important role later on.

An interval of physical quantities is constructed from the generic interval type. However, recognizing that the unit can be factored from the boundaries, we add additional semantics and a separate literal form. The additional view of an interval of physical quantities is an interval of real numbers with one unit.

The special literal form is simply an interval of real numbers a space and the unit.

For example: "[0;5] mmol/L" or "<20 mg/dL" are valid literal forms of intervals of physical quantities. The generic interval form, e.g., "[50 nm; 2 m]" is also allowed.

The generic interval data type defines the interval of points in time too. However, there are some special considerations about literal representations and conversions of intervals of point in time, which are specified in this section.

A TS can be promoted to an IVL<TS> whereby the low boundary is the TS value itself, and the width is inferred from the precision of the TS and the duration of the least significant calendar period specified. The high boundary is open. For example, the TS literal "200009" is converted to an IVL<> with low boundary 200009 and width 30 days, which is the interval "[200009;200010[".

In order to avoid syntactic conflicts with the timezone and slightly different usage profiles of the ISO 8601 that occur on some ITS platforms, the dash form of the interval is not permitted for IVL<>. The interval-form using square brackets is preferred.

The "hull-form" of the literal is defined as the convex hull (see IVL.hull) of interval-promotions from two time stamps.

For example, "19870901..19870930" is a valid literal using the hull form. The value is equivalent to the interval form "[19870901;19871001[". ⁵⁰

The hull-form further allows an abbreviation, where the higher timestamp literal does not need to repeat digits on the left that are the same as for the lower timestamp literal. The two timestamps are right-aligned and the digits to the left copied from the lower to the higher timestamp literal. This is a simple string operation and is not formally defined here.

Example: May 12, 1987 to May, 23, 1987 is "19870512..23". However, note that May 12, 1987 to June 2, 1987 is "19870512..0602", and not "20000512..02".

Generic type extensions are generic types with one parameter type, and that extend (specialize) their parameter type. In the formal data type definition language, generic type extensions follow the pattern: template<ANY T> typeGenericTypeExtensionNameextends T { ... }; These generic type extensions inherit most properties of their base type and add some specific feature to it. The generic type extension is a specialization of the base type, thus a value of the extension data type can be used instead of its base data type.

Definition: A generic data type extension that tags a time range to any data value of any data type. The time range is the time in which the information represented by the value is (was) valid.

If the base type T does not possess a valid time property, the HXIT adds that property to the base type. If, however, the base type T does have a valid time property, that property can be mapped to the valid time property of the HXIT.⁵²

Definition: The time interval during which the given information was, is, or is expected to be valid. The interval can be open or closed infinite or undefined on either side.

Definition: A set of data values that conform to the history item (HXIT) type, (i.e., that have a valid-time property). The history information is not limited to the past; expected future values can also appear.

The history information is not limited to the past; expected future values can also appear.

The semantics does not principally forbid the time intervals to overlap. However, if two history items have the same low (high) boundary in the valid time interval, it is undefined which one is considered the earliest (latest).

Definition: The item in the set whose valid time's low boundary (validity start time) is less or equal (i.e. before) that of any other history item in the set.

Definition: The item in the set whose valid time's high boundary (validity end time) is greater or equal (i.e. after) that of any other history item in the set.

A type conversion between an entire history HIST<T> and a single history item HXIT<T>. This conversion takes the latest data from the history.

The purpose of this conversion is to allow an information producer to produce a history of any value instead of sending just one value. An information-consumer, who does not expect a history but a simple value, will convert the history to the latest value.

Note from the definition of history item (HXIT) that HXIT semantically extends T. This means, that the information-consumer expecting a T but given an HXIT extension of T will not recognize any difference (substitutability of specializations.)

Definition: A generic data type extension used to specify a probability expressing the information producer's belief that the given value holds.

How the probability number was arrived at is outside the scope of this specification.

Probabilities are subjective and (as any data value) must be interpreted in their individual context, for example, when new information is found the probability might change. Thus, for any message (document, or other information representation) the information — and particularly the probabilities — reflect what the information producer believed was appropriate for the purpose and at the time the message (document) was created.

For example, at the beginning of the 2000 baseball season (May), the Las Vegas odds makers may have given the New York Yankees a probability of 1 in 10 (0.100) of winning the World Series. At the time of this writing, the Yankees and Mets have won their respective pennants, but the World Series has yet to begin. The probability of the Yankees winning the World Series is obviously significantly greater at this point in time, perhaps 6 in 10 (0.600). The context, and in particular the time of year, made all the difference in the world.

Since probabilities are subjective measures of belief, they can be stated without being "correct" or "incorrect" per se, let alone "precise" or "imprecise". Notably, one does not have to conduct experiments to measure a frequency of some outcome in order to specify a probability. In fact, whenever statements about individual people or events are made, it is not possible to confirm such probabilities with "frequentists" experiments.

Returning to our example, the Las Vegas odds makers can not insist on the Yankees and Mets playing 1000 trial games prior to the Series; even if they could, they would not have the fervor of the real Series and therefore not be accurate. Instead, the odds makers must derive the probability from past history, player statistics, injuries, etc.

The type T is not formally constrained. In theory, discrete probabilities can only be stated for discrete data values. Thus, generally UVP should not be used with REAL, PQ, or values.

Definition: The probability assigned to the value, a decimal number between 0 (very uncertain) and 1 (certain).

There is no "default probability" that one can assume when the probability is unstated. Therefore, it is impossible to make any semantic difference between an UVP of T without probability and a simple T. UVP of T does not mean "uncertain", and a simple T does not mean "certain". In fact, the probability of the UVP could be 0.999 or 1, which is quite certain, where a simple T value could be a very vague guess.

Definition: A set of uncertain values with probabilities (also known as histogram.) All the elements in the set are considered alternatives and are rated each with its probability expressing the belief (or frequency) that each given value holds.

The purpose of the non-parametric probability distribution is chiefly to support statistical data reporting as it occurs in measurements taken from many subjects and consolidated in a histogram. This occurs in epidemiology, veterinary medicine, laboratory medicine, but also in cost controlling and business process engineering.

Semantically, the information of a stated value exists in contrast to the complement set of unstated possible values. Thus, semantically, a non-parametric probability distribution contains all possible values and assigns probabilities to each of them.

This example illustrates the probability of selected major league baseball teams winning the World Series (prior to the season start). Each team is mutually exclusive, and were we to include all of the teams, the sum of the probabilities would equal 1 (i.e., it is certain that one of the teams will win).

Just as with UVP, the type T is not formally constrained, even though there are reasonable and unreasonable uses. Typically one would use the NPPD for unordered types, if only a "small" set of possible values is assigned explicit probabilities, or if the probability distribution cannot (or should not) be approximated with parametric methods. For other cases, one may prefer PPD.

Definition: A generic data type extension specifying uncertainty of quantitative data using a distribution function and its parameters. Aside from the specific parameters of the distribution, a mean (expected value) and standard deviation is always given to help maintain a minimum layer of interoperability if receiving applications cannot deal with a certain probability distribution.

For example, the most common college entrance exam in the United States is the SAT, which is comprised of two parts: verbal and math. Each part has a minimum score of 400 (no questions answered correctly) and a perfect score of 800. In 1998, according to the College Board, 1,172,779 college-bound seniors took the test. The mean score for the math portion of the test was 512, and the standard deviation 112. These parameter values (512, 112), tagged as the normal distribution parameters, paint a pretty good picture of test score distribution. In most cases, there is no need to specify all 1-million+ points of data when just 2 parameters will do!

Note that the normal distribution is only one of several distributions defined for HL7.

Since a PPD extends its parameter type T, a simple T value is the mean (expected value or first moment) of the probability distribution. Applications that cannot deal with distributions will take the simple T value neglecting the uncertainty. That simple value of type T is also used to standardize the data for computing the distribution.

Probability distributions are defined over integer or real numbers and normalized to a certain reference point (typically zero) and reference unit (e.g., standard deviation = 1). When other quantities defined in this specification are used as base types, the mean and the standard deviation are used to scale the probability distribution. For example, if a PPD of PQ for a length is given with mean 20 ft and a standard deviation of 2 in, the normalized distribution function f(x) that maps a real number x to a probability density would be translated to f′(x′) that maps a length x′ to a probability density as f′(x′) = f((x′ - μ ) / σ).

Where applicable, the PPD specification conforms to the ISO Guide to the Expression of Uncertainty in Measurement (GUM) as reflected by NIST publication 1297 Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement Results. The PPD specification does not describe how uncertainty is to be evaluated but only how it is expressed. The concept of "standard uncertainty" as set forth by the ISO GUM corresponds to the "standard deviation" property of the PPD.

Definition: The primary measure of variance/uncertainty of the value (the square root of the sum of the squares of the differences between all data points and the mean). The standard deviation is used to normalize the data for computing the distribution function. Applications that cannot deal with probability distributions can still get an idea about the confidence level by looking at the standard deviation.

The standard deviation of a probability distribution over a type T is of the related type T.diff that expresses differences between values of type T. If T is REAL or INT, T.diff is also REAL or INT respectively. However if T is a point in time (TS), T.diff is a physical quantity (PQ) in the dimension of time.

Definition: A code specifying the type of probability distribution. Possible values are as shown in the attached table. The NULL value (unknown) for the type code indicates that the probability distribution type is unknown. In that case, the standard deviation has the meaning of an informal guess.

Table 44 lists the defined probability distributions. Many distribution types are defined in terms of special parameters (e.g., the parameters α and β for the γ-distribution, number of degrees of freedom for the t-distribution, etc.) For all distribution types, however, the mean and standard deviation are defined.

The three distribution-types unknown (NULL), uniform and normal must be supported by every system that claims to support PPD. All other distribution types are optional. When a system interpreting a PPD representation encounters a distribution type that it does not recognize, it maps this type to the unknown (NULL) distribution-type.

The parametric probability distribution has a literal form. The general syntax is as follows:

Examples: an example for a PPD<REAL> is "1.23(N0.005)" for a normal distribution of a real number around 1.23 with a standard deviation of 0.005. An example for a PPD<PQ> is "1.23 m (5 mm)" for a distribution of unknown type around the length 1.23 meter with a standard deviation of 5 millimeter. An example for a PPD<TS> is "2000041113(U4 h)" for a uniform distribution around April 11, 2000 at 1pm with standard deviation of 4 hours.

The parametric probability distribution of real numbers is fully defined by the generic data type.

However, there are some special considerations about literal representations and conversions of probability distributions over real numbers, which are specified in this section.

When converting a REAL into a PPD_REAL, the standard deviation is calculated from the REAL value's order of magnitude and precision (number of significant digits). Let x be a real number with precision n. We can determine the order of magnitude e of x as e = log10 |x| where e is rounded to the next integer that is closer to zero (special case: if x is zero, e is zero.) The value of least significant digit l is then l = 10^e-n and the standard deviation σ is σ = l / 2.

Besides the generic literal form of the PPD, a concise literal form is defined for PPD over real numbers. This concise literal form is defined such that the standard deviation can be expressed in terms of the least significant digit in the mantissa. This literal is defined as an extension of the REAL literal:

Examples: "1.23e-3 (U5e-6)" is a the uniform distribution around 1.23 ( 10(3 with 5 ( 10(6 standard deviation in generic literal form. "1.230(U5)e-3" is the same value in concise literal form.

A parametric probability distribution over physical quantities is constructed from the generic PPD type. However, recognizing that the unit can be factored from the boundaries, we add additional semantics and a separate literal form. The additional view of a probability distribution over physical quantities is a probability distribution over real numbers with one unit.

A concise literal form for probability distributions of physical quantities is defined based on the concise literal form of PPD<REAL> where REAL is the value. This literal is defined as an extension of the PQ literal.

Examples: "1.23e-3 m (N5e-6 m)" is the normal-distributed length of 1.23 × 10^-3 m with 5 × 10^-6 m standard deviation in generic literal form. "1.230(N5)e-3 m" is the same value in concise literal form. "1.23e-3(N0.005e-3) m " is also valid; it is the concise literal form for PPD<> combined with the generic literal form for PPD<>.

The parametric probability distribution over time points is fully defined by the generic data type.

The standard deviation is of type TS.diff, which is a duration (a physical quantity in the dimension of time.)

When converting a TS into a PPD<TS>, the standard deviation is calculated from the TS value's order of magnitude and precision (number of significant digits) such that two standard deviations span the maximal time range of the digits not specified. For example, in 20000609 the unspecified digits are hour of the day and lower. All these digits together span a duration of 24 hours, and thus, the standard deviation ( is( = 12 h from 20000609000000.0000... up to 20000609999999.9999... (= 20000610)

This rule is different from real numbers in that the range of uncertainty lies above the time value specified. This is to go with the common sense judgment that June 9th spans all day of June 9th with noon as the center, not midnight.

The timing specification suite of data types is used to specify the complex timing of events and actions such as those that occur in order management and scheduling systems. It also supports the cyclical validity patterns that may exist for certain kinds of information, such as phone numbers (evening, daytime), addresses (so called "snowbirds," residing in the south during winter and north during summer) and office hours.

The timing specification data types include point in time (TS) and the interval of time (<TS>) and add types that are specifically suited to repeated schedules. These additional types include PIVL, EIVL, and finally the GTS type itself. All these timing types describe the time distribution of repeating states or events.

Definition: An interval of time that recurs periodically. Periodic intervals have two properties, phase and period. The phase specifies the "interval prototype" that is repeated every period.

For example, "every eight hours for two minutes" is a periodic interval where the interval's width equals 2 minutes and the period at which the interval recurs equals 8 hours.

The phase also marks the anchor point in time for the entire series of periodically recurring intervals. The recurrence of a periodic interval has no beginning or ending, but is infinite in both future and past.

A periodic interval is fully specified when both the period and the phase are fully specified. The interval may be only partially specified where either only the width or only one boundary is specified.

For example: "every eight hours for two minutes" specifies only the period and the phase's width but no boundary of the phase. Conversely, "every eight hours starting at 4 o'clock" specifies only the period and the phase's low boundary but not the phase's high boundary. "Every eight hours for two minutes starting at 4 o'clock" is fully specified since the period, and both the phase's low boundary and width are specified (low boundary and width implies the high boundary.)

The periodic interval of time is a generic type extension whose type parameter T is restricted to the point in time (TS) data type and its extensions. The parametric probability distribution of point in time (PPD<TS>) is an extension of point in time and therefore can be used to form periodic intervals of probability distributions of point in time (PIVL<PPD<TS>>) values (uncertain periodic interval.)

Oftentimes repeating schedules are only approximately specified. For instance "three times a day for ten minutes each" does not usually mean a period of precisely 8 hours and does often not mean exactly 10 minutes intervals. Rather the distance between each occurrence may vary as much as between 3 and 12 hours and the width of the interval may be less than 5 minutes or more than 15 minutes. An uncertain periodic interval can be used to indicate how much leeway is allowed or how "timing-critical" the specification is.

Definition: A prototype of the repeating interval specifying the duration of each occurrence and anchors the periodic interval sequence at a certain point in time.

The phase also marks the anchor point in time for the entire series of periodically recurring intervals. The recurrence of a periodic interval has no begin or end but is infinite in both future and past. A phase must be specified for every non-NULL periodic interval. The width of the phase must be less or equal the period.

Definition: A time duration specifying as a reciprocal measure of the frequency at which the periodic interval repeats.

The period is a physical quantity in the dimension of time (TS.diff.) For an uncertain periodic interval the period is a probability distribution over elapsed time.

Definition: Specifies if and how the repetitions are aligned to the cycles of the underlying calendar (e.g., to distinguish every 30 days from "the 5th of every month".) A non-aligned periodic interval recurs independently from the calendar. An aligned periodic interval is synchronized with the calendar.

The calendar alignment specifies a calendar cycle to which the periodic interval is aligned. The even flow of time will then be partitioned by the calendar cycle. The partitioning is called the calendar "grid" generated by the aligned-to calendar cycle. The boundaries of each occurrence interval will then have equal distance from the earliest point in each partition. In other words, the distance from the next lower grid-line to the beginning of the interval is constant.

For example, "every 5th of the month" is a calendar aligned periodic interval. The period spans 28 to 31 days depending on the calendar month. Conversely, "every 30 days" is an independent period that will fall on a different date each month.

The calendar alignment specifies a calendar cycle to which the periodic interval is aligned. The even flow of time will then be partitioned by this calendar cycle. The partitioning is called the calendar "grid" generated by the aligned-to calendar cycle. The boundaries of each occurrence interval will then have equal distance from the earliest point in each partition. In other words, the distance from the next lower grid-line to the beginning of the interval is constant.

For example, with “every 5th of the month” the alignment calendar cycle would be month of the year (MY.) The even flow of time is partitioned in months of the year. The distance between the beginning of each month and the beginning of its occurrence interval is 4 days (4 days because day of month (DM) starts counting with 1.) Thus, as months differ in their number of days, the distances between the recurring intervals will vary slightly, so that the interval occurs always on the 5th.

Definition: Indicates whether the exact timing is up to the party executing the schedule (e.g., to distinguish "every 8 hours" from "3 times a day".)

For example, with a schedule "three times a day" the average time between repetitions is 8 hours, however, with institution specified time indicator true, the timing could follow some rule made by the executing person or organization ("institution"), that, e.g., three times a day schedules are executed at 7 am, noon, and 7 pm.

Generic Literal Form. The generic literal form for periodic intervals of time is as follows:

For example, "[200004181100;200004181110]/(7 d)@DW" specifies every Tuesday from 11:00 to 11:10 AM. Conversely, "[200004181100;200004181110]/(1 mo)@DM" specifies every 18th of the month 11:00 to 11:10 AM.

See Table 37 for calendar-period codes defined for the Gregorian calendar. There are 1-character and 2-character symbols. The 2-character symbols are preferred for the alignment period identifier.

Calendar Pattern Form. This form is used to specify calendar-aligned timing more intuitively using "calendar patterns." The calendar pattern syntax is (semi-formally) defined as follows:

(anchor( [ (calendar digits( [ .. (calendar digits( ]] / (number : INT( [ IST ]

A calendar pattern is a calendar date where the higher significant digits (e.g., year and month) are omitted. In order to interpret the digits, a period identifier is prefixed that identifies the calendar period of the left-most digits. This calendar period identifier anchors the calendar digits following to the right.

See Table 37 for calendar-period codes defined for the Gregorian calendar. There are 1-character and 2-character symbols. The 1-character symbols are preferred for the calendar pattern anchor.

For example: "M0219" is February 19 the entire day every year. This periodic interval has the February 19 of any year as its phase (e.g., "[19690219;19690220[" ), a period of one year, and alignment month of the year (M). The alignment calendar-cycle is the same as the anchor (e.g., in this example, month of the year.)

The calendar digits may also omit digits on the right. When digits are omitted on the right, this means the interval from lowest to highest for these digits. For example, "M0219" is February 19 the entire day; "M021918" is February 19, the entire hour between 6 and 7 PM.

In absence of a formal definition for this, the rules for parsing a calendar pattern are as follows (example is "M021918..21")

Interleave. A calendar pattern followed by a slash and an integer number n indicates that the given calendar pattern is to apply every nth time.

A calendar pattern expression is evaluated at the time the pattern is first enacted. At this time, the calendar digits missing from the left are completed using the earliest date matching the pattern (and following a preceding pattern in a combination of time sets).

For example: "D19/2" is the 19th of every second month. If this expression is evaluated on March 14, 2000 the phase is completed to: "[20000319;20000320[/(2 mo)@DM" and thus the two-months cycle begins with March 19, followed by May 19, etc. If the expression were evaluated by March 20, the cycle would begin at April 19, followed by June 19, etc.

If no calendar digits follow after the calendar period identifier, the pattern matches any date. The integer number following the slash indicates the length of the cycle. The phase interval in these cases has only the width specified to be the duration of the anchoring calendar-cycle (e.g., in this example 1 day.)

For example: "CD/2" is every other day, "H/8" is every 8th hour, for the duration of one hour.

Institution Specified Time. Both a generic periodic interval literal and a calendar pattern may be followed by the three letters "IST" to indicate that within the larger calendar cycle (e.g., for "hour of the day" the larger calendar cycle is "day") the repeating events are to be appointed at institution specified times. This is used to specify such schedules as "three times a day" where the periods between two subsequent events may vary well between 4 hours (between breakfast and lunch) and 10 hours (over night.)

The essential property of a set is that it contains elements. For non-aligned periodic intervals, the contains-property is defined as follows. A point in time t is contained in the periodic interval of time if and only if there is an integer i for which t plus the period times i is an element of the phase interval.

For calendar-aligned periodic intervals the contains property is defined using the calendar-cycle's sum(t, n) property that adds n such calendar cycles to the time t.

Definition: Specifies a periodic interval pf time where the recurrence is based on activities of daily living or other important events that are time-related but not fully determined by time.

For example, "one hour after breakfast" specifies the beginning of the interval at one hour after breakfast is finished. Breakfast is assumed to occur before lunch but is not determined to occur at any specific time.

Such events qualify for being adopted in the domain of this attribute for which all of the following is true:

Definition: An interval of elapsed time (duration, not absolute point in time) that marks the offsets for the beginning, width and end of the event-related periodic interval measured from the time each such event actually occurred.

For example: if the specification is "one hour before breakfast for 10 minutes" the offset's low boundary is (1 h and the offset's width is 10 min (consequently the offset's high boundary is (50 min.)

The literal form for an event related interval begins with the event code followed by an optional interval of the time-difference.

For example, one hour after meal would be "PC+[1h;1h]". One hour before bedtime for 10 minutes: "HS-[50min;1h]".

An event-related periodic interval of time is a set of time, that is, one can test whether a particular time or time interval is an element of the set. Whether an event-related periodic interval of time contains a given interval of time is decided using a relation event χ time referred to as EVENT(event, time). The property occurrenceAt(t) is the occurrence interval that would exist if the event occurred at time t.

Thus, an event related interval of time contains a point in time t if there is an event time e with an occurrence interval v such that v contains t.

Definition: A set of points in time, specifying the timing of events and actions and the cyclical validity-patterns that may exist for certain kinds of information, such as phone numbers (evening, daytime), addresses (so called "snowbirds," residing in the south during winter and north during summer) and office hours.

In all cases the GTS is defined as a set of point in time (SET<TS>). Using the set operations, union, intersection and difference, more complex sets of time can be constructed from simpler ones. Ultimately the building blocks from which all GTS values are constructed are interval, periodic interval, and event-related periodic interval. The construction of the GTS can be specified in the literal form. No special data type structure is defined that would generate a combination of simpler time-sets from a given GTS value. While any implementation would have to contain such a structured representation, it is not needed in order to exchange GTS values given the literal form.⁵³

The GTS data type is defined as using intervals, periodic intervals, and event-related periodic intervals. Intervals of time have been defined above

A convex hull is the least interval that is a superset of all occurrence intervals. As noted in Section 3.1.2, all totally ordered sets have a convex hull. Because a GTS is a SET<TS> and because a SET<TS> is a totally ordered set, all GTS values have a convex hull.

The convex hull of a GTS can less formally be called "outer bound interval". Thus, the convex hull of a GTS describes the absolute beginning and end of the repeating schedule. For infinite repetitions (e.g., a simple periodic interval) the convex hull has infinite bounds.

A GTS value is a generator of a sequence of time intervals during which an event or activity occurs, or during which a state is effective.

The nextTo-property maps to every point in time t the greatest continuous subset (an "occurrence interval") v of the GTS value S, where v is the interval closest to t that begins later than t or that contains t.

The nextAfter-property maps to every point in time t the greatest continuous subset (an "occurrence interval") v of the GTS value S, where v is the interval closest to t that begins later than t.

A GTS value can be converted into a generic Sequence of time intervals (LIST<>) of occurrence intervals.

For two GTS values A and B we say that A interleaves B if their occurrence intervals interleave on the time line. This concept is visualized in Figure 15.

For the GTS values A and B to interleave the occurrence intervals of both groups can be arranged in pairs of corresponding occurrence intervals. It must further hold that for all corresponding occurrence intervals a ∈ A and b ∈ B, a starts before b starts (or at the same time) and b ends after a ends (or at the same time).

The interleaves-relation holds when two schedules have the same average frequency, and when the second schedule never "outpaces" the first schedule. That is, no occurrence interval in the second schedule may start before its corresponding occurrence interval in the first schedule.

With two interleaving GTS values one can derive a periodic hull such that the occurrence intervals of the periodic hull is the convex hull of the corresponding occurrence intervals.

The periodic hull is important to construct two schedules by combining GTS expressions. For example, to construct the periodic interval from Memorial Day to Labor Day every year, one first needs to set up the schedules M for Memorial Day (the last Monday in May) and L for Labor Day (the first Monday in September) and then combine these two schedules using the periodic hull of M and L.

For two GTS values A and B where A interleaves B, a periodic hull is defined as the pair wise convex hull of the corresponding occurrence intervals of A and B.

The interleaves-relation is reflexive, asymmetric, and intransitive. The periodic hull operation is non-commutative and non-associative.⁵⁴

The GTS literal allows specifying combinations of intervals, periodic intervals, and event related periodic intervals of time using the set operations, unions and intersection. This literal form is specified based on the simpler time set data types interval, periodic interval, and event related periodic interval.⁵⁵

Unions are speechified by a semicolon-separated list. Intersections are specified by a white space separated list. Intersection has higher priority than union. Exclusions (set differences) can be specified using a backslash; exclusions have an intermediate priority, i.e. weaker than intersection but stronger than union.

The following table contains paradigmatic examples for complex GTS literals. For simpler examples confer to the literal forms for interval, periodic interval, and event related interval.

The following Table 46 defines symbolic abbreviations for GTS values that can be used in GTS literals instead of their equivalent GTS term. Abbreviations are defined for common periods of the day (AM, PM), for periods of the week (business day, weekend), and for holidays. The computation for the dates of some holidays, namely the Easter holiday, involve some sophistication that goes beyond what one would represent in a GTS literal term. It is assumed that the dates of these holidays are drawn from some table or some generator module that is outside the scope of this specification.

These abbreviations are named GTS values and they can in turn be a factor of a GTS term. For example, one can say "JHCHRXME H08..12" to indicate that the office hours on Christmas Eve is from 8 AM to 1PM only. And one can say "JHNUSMEM..JHNUSLBR" for the typical midwestern swimming pool season from Memorial Day to Labor Day.

Endnotes

[source] The HL7 Message Development Framework defines "update modes" for fields in a message. Note that because data values have neither identity nor state nor changing of state, these update modes do not apply for the properties of data values. Data values and their properties are never updated. A field of an object (e.g., a message) can be updated in which case the field's value is replaced by another value. But the value itself is never updated.
[source] This is the reason why the ISO Abstract Syntax Notation 1 (ASN.1) is not an appropriate formalism for semantic data type specifications.
[source] The data type definition language employed here is a conclusion of experiments and experience with various alternatives. These alternatives include data type definition tables and the use of the Object Management Group's (OMG) Interface Definition Language (IDL). The disadvantage of the data type definition tables was that they gave the wrong impression of this specification being a specification of abstract syntax rather than semantics. Conversely, the disadvantage with IDL was that IDL gave the wrong impression of this specification being an application programming interface (API) definition.
The resulting data type definition language borrows significantly from IDL, the Object Constraint Language (OCL), JAVA, C++, and the parser generation tools LEX and YACC. It is inspired by features and style of these languages but amalgamating and augmenting these languages into precisely what is needed for this data type specification. The goal was a language that is minimal, and self-consistent. Also, as the main purpose of this language is to define data types it tries to get by without any built-in data types.
[source] As can be seen, the type keyword is in place of IDL's and Java's interface and C++ amd Java's class keyword. The alias clause is unique to this specification as we do have the need for extremely short data type mnemonics in addition to more descriptive names. The extends clause is the same as JAVA's, which is preferred over C++ or IDL's colon clause as its meaning is more obvious.
[source] Note that the IDL's notion of input and output arguments and IDL's, JAVA's and C++'s notion of return values and exceptions are all irrelevant concepts for this specification. The semantics of data types is not about procedure calls and parameter passing or normal and abnormal returns of control from a procedure body. Instead, each semantic property is conceptualized as a function that maps a value and optional arguments to another value. This mapping is not "computed" or "generated" it logically exists and we do not need to "call" such a function to actualize the mapping.
[source] "Extends" means "refines" or "specializes and adds properties." This kind of "extension" (specialization) has nothing to do with the "extensional" (vs. "intentional") definitions of data types.
[source] The restriction variant of specialization deserves explanation. It is generally touted that inheritance should not retract properties that have been defined for the genus. This is still true for the restriction as properties are not actually retracted but constrained to a smaller value set. This may mean constraining properties to NULL, if NULL was an allowed value for that property in the parent type. In any case, logically, restriction is a specialization, with inheritance and substitutability. Furthermore extends and restricts are not hard opposites as a specialized type may both extend and constrain; the two keywords are mainly used to guide a human reader as to the intention of the design.
[source] Note the meaning of protected is a little different from the accessibility qualifiers (public, package, protected, private) as known from JAVA and C++. The protection used here is not about hiding the type information or barring properties defined by a protected type from access outside of this specification "package." It mainly is a strong recommendation not to declare attributes or other features of such protected types. Protected types should be used as "wrapped" in other types. The protected type is still directly accessible within the "wrap," no notion of "delegated properties" exists.
[source] The invariant statement syntax and semantics is similar to the OCL "inv" clause. However, we did not use OCL in this specification for several reasons. (1) OCL syntax has a Smalltalk style that does not fit the C++/Java style of the data type definition language. (2) OCL has many primitive constructs and data types, while this specification avoids primitives as much as possible. (3) In part because of the richness in primitive constructs, OCL is fairly complex, more than is needed in this specification.
[source] This construct is somewhat cyclical; there is a preexisting notion of Boolean values even though the Boolean is a type defined just like any other type. In addition, since this data type definition language is written in character strings, the notion of character strings pre-exists the definition of the character string type. These two types, character string and Boolean are therefore exceptional, but on the surface, they are defined just like any other data type. Since this data type specification language is not meant to be implemented, the cyclicality is not a real issue. Even if this language was implemented, one can use a "bootstrapping" technique as is common, e.g., for compilers that compile themselves.
[source] Most of these syntactic features are in the spirit of the JAVA language, use of argument lists, curly braces to enclose blocks, semicolon to finish a statement, and the period to reference value properties. The double colon :: as used by C++ or IDL to distinguish between member-references and value-references are not used (as in Java). Unlike Java but like C++ and IDL, every statement is ended by a semicolon, including type declarations. Implicit type conversion is also retained from C++.
[source] This means that if a one expects an ED value but actually has an ST value instead, one can turn the ST value into an ED.
[source] The different grammars of literals are not meant to be combined into one overall HL7 value expression grammar. Although attempt have been made to resolve potential ambiguities between the literals of different types where they would be harmful, some of these ambiguities still remain. For example "1.2" can be a valid literal for both Object Identifier (OID) and a Real Number.
[source] The BNF variant used here is similar to the YACC parser and LEX lexical analyzer generator languages but is simplified and made consistent to the syntax and declarative style of this data type definition language. The differences are that all symbols have exactly one attribute, their value strongly typed as one of the defined data types. Each symbol's type is declared in front of the symbol's definition (e.g.: INT digit : "0" | "1" | ... | "9";). The start symbol has no name but just a type (e.g., INT : digit | INT digit;). A data type name can occur as a symbol name meaning a literal of that data type.
[source] Note that the equals property (defined for all data types, see Section 1.4.2.3) is a relation, a test for equality, not an assignment statement. One can not assign a value to another value. Unlike YACC and LEX analyzers, this data type definition language is purely declarative it has no concept of assignment. For this reason, the grammar rules define both parsing and building literal expressions.
[source] Generic type extensions are sometimes called "mixins", since their effect is to mix certain properties into the preexisting data type.
[source] RFC 1766 is the HL7-approved coding system for all reference to human languages, in data types and elsewhere.
[source] For this reason, a system or site that does not deal with multilingual text or names in the real world can safely ignore the language property.
[source] The cryptographically strong checksum algorithm Secure Hash Algorithm-1 (SHA-1) is currently the industry standard. It has superseded the MD5 algorithm only a couple of years ago, when certain flaws in the security of MD5 were discovered. Currently the SHA-1 hash algorithm is the default and required only choice for the integrity check algorithm. However, there is no assurance that SHA-1 will not be superseded at anytime when its flaws will be discovered. In fact, by the time this specification reaches third ballot a new SHA-256 is beginning to pick up popularity.
[source] Originally, the term thumbnail refers to an image in a lower resolution (or smaller size) than another image. However, the thumbnail concept can be metaphorically used for media types other than images. For example, a movie may be represented by a shorter clip; an audio-clip may be represented by another audio-clip that is shorter, has a lower sampling rate, or a lossy compression.
[source] ISO/IEC 10646-1: 1993 defines a character as "A member of a set of elements used for the organization, control, or representation of data." ISO/IEC TR 15285 - An operational model for characters and glyphs. Discusses the problems involved in defining characters. Notably, characters are abstract entities of information, independent of type font or language. The ISO 10646 (UNICODE [http://www.unicode.org]) - or in Japan, JIS X0221 - is a globally applicable character set that uniquely identifies all characters of any language in the world.
In this specification, ISO 10646 serves as a semantic model for character strings. The important point is that for semantic purposes, there is no notion of separate character sets and switching between character sets. Character set and character encoding are ITS layer considerations. The formal definition gives indication to this effect because each character is by itself an ST value that has a charset property. Thus, the binary encoding of each character is always understood in the context of a certain character set. This does not mean that the ITS should represent a character string as a sequence of full blown ED values. What it means is that on the application layer the notion of character encoding is irrelevant when we deal with character strings.
[source] A character string literal is a conversion from a character string to another data type. Obviously, character string literals for character strings is a cyclical if not redundant feature. This literal form, therefore, mainly specifies how character strings are parsed in the data type specification language.
[source] Although post-coding is often performed from free text information, such as documents, scanned images or dictation, multi-media data is explicitly not permitted as original text. Also, the original text property is not meant to be a link into the entire source document. The link between different artifacts of medical information (e.g., document and coded result) is outside the scope of this specification and is maintained elsewhere in the HL7 standards. The original text is an excerpt of the relevant information in the original sources, rather than a pointer or exact reproduction. Thus the original text is to be represented in plain text form.
[source] The code system versions do not count in the equality test since by definition a code symbol must have the same meaning throughout all versions of a code system. Between versions, codes may be retired but not withdrawn or reused.
[source] Translations are not included in the equality test of concept descriptors for safety reasons. An alternative would have been to consider two CD values equal if any of their translations are equal. However, some translations may be equal because the coding system of that translation is very coarse-grained. More sophisticated comparisons between concept descriptors are application considerations that are not covered by this specification.
[source] NULL-values are exceptional values, not proper concepts. It would be unsafe to equate two values merely on the basis that both are exceptional (e.g., not codable or unknown.) Likewise there is no guarantee that original text represents a meaningful or unique description of the concept so that equality of that original text does not constitute concept equality. The reverse is also true: since there is more than one possible original text for a concept, the fact that original text differs does not constitute a difference of the concepts.
[source] This ruling at design-time is necessary to prevent HL7 interfaces from being burdened by code literal style conversions at runtime. This is notwithstanding the fact that some applications may require mapping from one form into another if that application has settled with the representation option that was not chosen by HL7.
[source] This is one reason why the CD.qualifiers for post-coordination are to be used sparingly and with caution. An additional problem of post-coordinated coding is that a general rule for equality may not exist at all.
[source] The advantage of the concept descriptor data type is its expressiveness, however, if all of its features, such as coding exceptions, text, translations and qualifiers are used at all times, implementation and use become very difficult and unsafe. Therefore, the CD type is most often used in a restricted form with reduced features.
[source] This is not withstanding the fact that an external referenced domain, such as the IETF MIME media type may include an extension mechanism. These extended MIME type codes would not be considered "extensions" in the sense of violating the CNE provision. The CNE provision is only violated if an attempt is made in using a different code system (by means of the CD.codeSystem property), which is not possible with the CS data type.
[source] The value/namespace view on ISO object identifiers has important semantic relevance. It represents the notion of identifier value versus identifier assigning authority (= namespace), which is common in healthcare information systems in general, and HL7 v2.x in particular.
[source] DICOM objects are identified by UID only. For the purpose of DICOM/HL7 integration, it would be awkward if HL7 required the extension to be mandatory and to consider the UID only as an assigning authority. Since UID values are simpler and do not contain the risks of containing meaningless decoration, we do encourage systems to use simple UID identifiers as external references to their objects.
[source] This ruling at design-time is necessary to prevent HL7 interfaces from being burdened by identifier literal style conversions at runtime. This is notwithstanding the fact that some applications may require mapping from one form into another if that application has settled with the representation option that was not chosen by HL7.
[source] The data type of the is still CS and for HL7 purposes, the is a CNE domain. This appears to be at odds with the fact that there is no one official list of URL schemes, and so many URL schemes in use may be defined locally. However, we cannot allow extension of the URL scheme using the HL7 mechanism of local alternative code systems, which is why technically the is a CS data type.
[source] Remember that semantic properties are bare of all control flow semantics. The AD.formatted could be implemented as a "procedure" that would "return" the formatted address, but it would not usually be a variable to which one could assign a formatted address. However, HL7 does not define applications but only the semantics of exchanged data values. Hence, the semantic model abstracts from concepts like "procedure", "return", and "assignment" but speaks only of property and value.
[source] These rules for formatting addresses are part of the semantics of addresses because addresses are primarily defined as text displayed or printed and consumed by humans. Other uses (e.g., epidemiology) are secondary — although not forbidden, the AD data type might not serve these other use cases very well, and HL7 defines better ways to handle these use cases. Note that these formatting rules are not ITS issues, since this formatting applies to presentations for humans whereas ITS specifications are presentations for computer interchange.
[source] The XML encoding shown here is according to the XML ITS only in order to avoid introducing another instance notation. This does not imply that the function would only work in XML, nor even that XML is the preferred representation.
[source] This example shows the strength of the mark-up approach to addresses. A typical German system that stores house number and street name in separate fields would print the address with street name first followed by the house number. For U.S. addresses, this would be wrong as the house number in the U.S. is written before the street name. The marked-up address allows keeping the natural order of address parts and still understanding their role.
[source] Remember that semantic properties are bare of all control flow semantics. The AD.formatted could be implemented as a "procedure" that would "return" the formatted address, but it would not usually be a variable to which one could assign a formatted address. However, HL7 does not define applications but only the semantics of exchanged data values. Hence, the semantic model abstracts from concepts like "procedure", "return", and "assignment" but speaks only of property and value.
[source] These rules for formatting names are part of the semantics of names because the name parts have been designed with the important use case of displaying and rendering on labels. Note that these formatting rules are not ITS issues, since this formatting applies to presentations for humans whereas ITS specifications are presentations for computer interchange.
[source] The quantity data type abstraction corresponds to the notion of difference scales in contrast to ordinal scales and ratio scales (Guttman and Stevens). A data type with only the order requirement but not the difference requirement would be an ordinal. Ordinals are not currently defined with a special data type. Instead, ordinals are usually coded values, where the underlying code system specifies ordinal semantics. This ordinal semantics, however, is not reflected in the HL7 data type semantics at this time.
[source] H. Grassman. Lehrbuch der Arithmetik. 1861. We prefer Grassman's original axioms to the Peano axioms, because Grassman's axioms work for all integers, not just for natural numbers. Also, "it is rather well-known, through Peano's own acknowledgment, that Peano borrowed his axioms from Dedekind and made extensive use of Grassmann's work in his development of the axioms." (Hao Wang. The Axiomatization of Arithmetic. J. Symb. Logic; 1957:22(2); p. 145.)
[source] The term "Real" for a fractional number data type originates and is well established in the Algol, Pascal tradition of programming languages.
[source] At this time, no other calendars than the Gregorian calendar are defined. However, the notion of a calendar as an arbitrary convention to specify absolute time is important to properly define the semantics of time and time-related data types. Furthermore, other calendars might be supported when needed to facilitate HL7's use in other cultures.
[source] At present, the CalendarCycle properties sum and value are not formally defined. The computation of calendar digits involves some complex computation which to specify here would be hard to understand and evaluate for correctness. Unfortunately, no standard exists that would formally define the relationship between calendar expressions and elapsed time since an epoch. ASN.1, the XML Schema Data Type specification and SQL92 all refer to ISO 8601, however, ISO 8601 does only specify the syntax of Gregorian calendar expressions, but not their semantics. In this standard, we define the syntax and semantics formally, however, we presume the semantics of the sum-, and value-properties to be defined elsewhere.
[source] In some programming languages, "collection types" are understood as containers of individually enumerated data items, and thus, an interval (low - high) would not be considered a collection. Such narrow interpretation of "collection" however is heavily representation/implementation dependent. From a data type semantics viewpoint, it doesn't matter whether an element of a collection "is actually contained in the collection" or not. There is no need for all elements in a collection to be individually enumerated.
[source] Note the difference to the GTS. The GTS is a generator for a SET<TS> not for a LIST<TS>. A sequence of discrete values from a continuous domain makes not much sense other than in sampling applications. The SET<TS>, however, can be thought of as a sequence of IVL<TS>, which still is different from a LIST<TS>.
[source] The presence of so many options deserves explanation. In principle, the interval form together with the width-only form would be sufficient. However, the interval form is felt alien to many in the field of medical informatics. One important purpose of the literal forms is to eradicate non-compliance through making compliance easy, without compromising on the soundness of the concepts.
Furthermore, the different literal forms all have strength and weaknesses. The interval and center-width forms' strength is that they are most exact, showing closed and open boundaries. The interval form's weakness, however, is that infinite boundaries require special symbols for infinities, not necessary in the "comparator" form. The center-width form cannot specify intervals with an infinite boundary at all. The "comparator" form, however, can only represent single-bounded intervals (i.e., where the other boundary is infinite or unknown.) The dash form, while being the weakest of all, is the most intuitive form for double bounded intervals.
[source] This statement seems to directly contradict the ruling about the promotion of TS to IVL<>. However, there is no contradiction. The precision of a boundary does not have any relevance, but the precision of a simple timestamp (not as an interval boundary) is relevant, when that timestamp is promoted to an interval.
[source] The hull form may appear superfluous for the simple interval all by itself. However, the hull form will become important for the periodic interval notation as it shortens the notation and (perhaps arguably) makes the notation of more complex timing structures more intuitive.
[source] This specification imposes a self-restraint upon itself to allow existing systems a graceful transition. However, the formal specification keeps the generic type extensions as substitutable for their base types. This self-restraint may be omitted in the future. New implementations are advised to accommodate some generalizable support for these generic data type extensions.
[source] Note that data types are specifications of abstract properties of values. This specification does not mandate how these values are represented in an ITS or implemented in an application. Specifically, it does not mandate how the represented components are named or positioned. In addition, the semantic generalization hierarchy may be different from a class hierarchy chosen for implementation (if the implementation technology has inheritance.) Keep the distinction between a type (interface) and an implementation (concrete data structure, class) in mind. The ITS must contain a mapping of ITS defined features of any data type to the semantic properties defined here.
[source] The GTS is an example of a data type that is only defined algebraically without giving any definition of a data structure that might implement the behavior of such a data type. The algebraic definition looks extremely simple, so that one might assume it is incomplete. Since at this point we are relying entirely on the literal form to represent GTS values, all the definition of data structur
[source] The interleaves property may appear overly constrained. However, these constraints are reasonable for the use case for which the interleaves and periodic hull properties are defined. To safely and predictably combine two schedules one would want to know which of the operands sets the start points and which sets the endpoints of the periodic hull's occurrence intervals.
[source] This literal specification again looks surprisingly simple, so one might assume it is incomplete. However, the GTS literal is based on the TS, IVL, PIVL, and EIVL literals and does also imply the literals for the extensions of TS, notably the PPD_TS. The GTS literal specification itself only needs to tie the other literal forms together, which is indeed a fairly simple task by itself.

Chair/Editor	Gunther Schadow Regenstrief Institute for Health Care
Editor	Paul Biron Kaiser Permanente, Southern California
Editor	Doug Pratt Siemens

code	name	definition
NI	NoInformation	No information whatsoever can be inferred from this exceptional value. This is the most general exceptional value. It is also the default exceptional value.
NA	not applicable	No proper value is applicable in this context (e.g., last menstrual period for a male.)
UNK	unknown	A proper value is applicable, but not known.
NASK	not asked	This information has not been sought (e.g., patient was not asked)
ASKU	asked but unknown	Information was sought but not found (e.g., patient was asked but didn't know)
NAV	temporarily unavailable	Information is not available at this time but it is expected that it will be available later.
OTH	other	The actual value is not an element in the value domain of a variable. (e.g., concept not provided by required code system.)
PINF	positive infinity	Positive infinity of numbers.
NINF	negative infinity	Negative infinity of numbers.
NP	not present	Value is not present in a message. This is only defined in messages, never in application data! All values not present in the message must be replaced by the applicable default, or no-information (NI) as the default of all defaults.

code	name	status	definition
text/plain	Plain Text	required	For any plain text. This is the default and is equivalent to a character string (ST) data type.
text/x-hl7-ft	HL7 Text	recommended	For compatibility, this represents the HL7 v2.x FT data type. Its use is recommended only for backward compatibility with HL7 v2.x systems.
text/html	HTML Text	recommended	For marked-up text according to the Hypertext Mark-up Language. HTML markup is sufficient for typographically marking-up most written-text documents. HTML is platform independent and widely deployed.
application/pdf	PDF	recommended	The Portable Document Format is recommended for written text that is completely laid out and read-only. PDF is a platform independent, widely deployed, and open specification with freely available creation and rendering tools.
text/xml	XML Text	indifferent	For structured character based data. There is a risk that general SGML/XML is too powerful to allow a sharing of general SGML/XML documents between different applications.
text/rtf	RTF Text	indifferent	The Rich Text Format is widely used to share word-processor documents. However, RTF does have compatibility problems, as it is quite dependent on the word processor. May be useful if word processor edit-able text should be shared.
application/msword	MSWORD	deprecated	This format is very prone to compatibility problems. If sharing of edit-able text is required, text/plain, text/html or text/rtf should be used instead.
audio/basic	Basic Audio	required	This is a format for single channel audio, encoded using 8bit ISDN mu-law [PCM] at a sample rate of 8000 Hz. This format is standardized by: CCITT, Fascicle III.4 oRecommendation G.711. Pulse Code Modulation (PCM) of Voice Frequencies. Geneva, 1972.
audio/mpeg	MPEG audio layer 3	required	MPEG-1 Audio layer-3 is an audio compression algorithm and file format defined in ISO 11172-3 and ISO 13818-3. MP3 has an adjustable sampling frequency for highly compressed telephone to CD quality audio.
audio/k32adpcm	K32ADPCM Audio	indifferent	ADPCM allows compressing audio data. It is defined in the Internet specification RFC 2421 [ftp://ftp.isi.edu/in-notes/rfc2421.txt]. Its implementation base is unclear.
image/png	PNG Image	required	Portable Network Graphics (PNG) [http://www.cdrom.com/pub/png] is a widely supported lossless image compression standard with open source code available.
image/gif	GIF Image	indifferent	GIF is a popular format that is universally well supported. However GIF is patent encumbered and should therefore be used with caution.
image/jpeg	JPEG Image	required	This format is required for high compression of high color photographs. It is a "lossy" compression, but the difference to lossless compression is almost unnoticeable to the human vision.
image/g3fax	G3Fax Image	recommended	This is recommended only for fax applications.
image/tiff	TIFF Image	indifferent	Although TIFF (Tag Image File Format) is an international standard it has many interoperability problems in practice. Too many different versions that are not handled by all software alike.
video/mpeg	MPEG Video	required	MPEG is an international standard, widely deployed, highly efficient for high color video; open source code exists; highly interoperable.
video/x-avi	X-AVI Video	deprecated	The AVI file format is just a wrapper for many different codecs; it is a source of many interoperability problems.
model/vrml	VRML Model	recommended	This is an openly standardized format for 3D models that can be useful for virtual reality applications such as anatomy or biochemical research (visualization of the steric structure of macromolecules)

code	name	definition
EBCDIC	EBCDIC	HL7 is indifferent to the use of this Charset.
ISO-10646-UCS-2	ISO-10646-UCS-2	Deprecated for HL7 use.
ISO-10646-UCS-4	ISO-10646-UCS-4	Deprecated for HL7 use.
ISO-8859-1	ISO-8859-1	HL7 is indifferent to the use of this Charset.
ISO-8859-2	ISO-8859-2	HL7 is indifferent to the use of this Charset.
ISO-8859-5	ISO-8859-5	HL7 is indifferent to the use of this Charset.
JIS-2022-JP	JIS-2022-JP	HL7 is indifferent to the use of this Charset.
US-ASCII	US-ASCII	Required for HL7 use.
UTF-7	UTF-7	HL7 is indifferent to the use of this Charset.
UTF-8	UTF-8	Required for Unicode support.

code	name	definition
DF	deflate	The deflate compressed data format as specified in RFC 1951 [ftp://ftp.isi.edu/in-notes/rfc1951.txt].
GZ	gzip	A compressed data format that is compatible with the widely used GZIP utility as specified in RFC 1952 [ftp://ftp.isi.edu/in-notes/rfc1952.txt] (uses the deflate algorithm.)
ZL	zlib	A compressed data format that also uses the deflate algorithm. Specified as RFC 1950 [ftp://ftp.isi.edu/in-notes/rfc1950.txt]
Z	compress	Original UNIX compress algorithm and file format using the LZC algorithm (a variant of LZW). Patent encumbered and less efficient than deflate.

NOT		AND	true	false	NULL	OR	true	false	NULL
true	false	true	true	false	NULL	true	true	true	true
false	true	false	false	false	false	false	true	false	NULL
NULL	NULL	NULL	NULL	false	NULL	NULL	true	NULL	NULL

Name	Type	Description
mediaType	CS	Identifies the encoding of the encapsulated data and identifies a method to interpret or render the data.
charset	CS	For character-based encoding types, this property specifies the character set and character encoding used. The charset is defined according to Internet RFC 2278, [].
language	CS	For character based information the language property specifies the human language of the text.
compression	CS	Indicates whether the raw byte data is compressed, and what compression algorithm was used.
reference	TEL	A telecommunication address (TEL), such as a URL for HTTP or FTP, which will resolve to precisely the same binary data that could as well have been provided as inline data.
integrityCheck	BIN	The integrity check is a short binary value representing a cryptographically strong checksum that is calculated over the binary data. The purpose of this property, when communicated with a reference is for anyone to validate later whether the reference still resolved to the same data that the reference resolved to when the encapsulated data value with reference was created.
integrityCheckAlgorithm	CS	Specifies the algorithm used to compute the integrityCheck value.
thumbnail	ED	A thumbnail is an abbreviated rendition of the full data. A thumbnail requires significantly fewer resources than the full data, while still maintaining some distinctive similarity with the full data. A thumbnail is typically used with by-reference encapsulated data. It allows a user to select data more efficiently before actually downloading through the reference.

code	name	definition
SHA-1	secure hash algorithm - 1	This algorithm is defined in FIPS PUB 180-1: Secure Hash Standard. As of April 17, 1995.
SHA-256	secure hash algorithm - 256	This algorithm is defined in FIPS PUB 180-2: Secure Hash Standard.

Name	Type	Description
code	ST	The plain code symbol defined by the code system. For example, "784.0" is the code symbol of the ICD-9 code "784.0" for headache.
codeSystem	UID	Specifies the code system that defines the code.
codeSystemName	ST	A common name of the coding system.
codeSystemVersion	ST	If applicable, a version descriptor defined specifically for the given code system
displayName	ST	A name or title for the code, under which the sending system shows the code value to its users.
originalText	ED	The text or phrase used as the basis for the coding.
translation	SET<CD>	A set of other concept descriptors that translate this concept descriptor into other code systems.
qualifier	LIST<CR>	Specifies additional codes that increase the specificity of the the primary code.

Name	Type	Description
name	CV	Specifies the manner in which the concept role value contributes to the meaning of a code phrase. For example, if SNOMED RT defines a concept "leg", a role relation "has-laterality", and another concept "left", the concept role relation allows to add the qualifier "has-laterality: left" to a primary code "leg" to construct the meaning "left leg". In this example "has-laterality" is the CR.name.
value	CD	The concept that modifies the primary code of a code phrase through the role relation. For example, if SNOMED RT defines a concept "leg", a role relation "has-laterality", and another concept "left", the concept role relation allows adding the qualifier "has-laterality: left" to a primary code "leg" to construct the meaning "left leg". In this example "left" is the CR.value.
inverted	BL	Indicates if the sense of the role name is inverted. This can be used in cases where the underlying code system defines inversion but does not provide reciprocal pairs of role names. By default, inverted is false.

Name	Type	Description
root	UID	A unique identifier that guarantees the global uniqueness of the instance identifier. The root alone may be the entire instance identifier.
extension	ST	A character string as a unique identifier within the scope of the identifier root.
assigningAuthorityName	ST	A human readable name or mnemonic for the assigning authority. This name may be provided solely for the convenience of unaided humans interpreting an II value. Note: no automated processing must depend on the assigning authority name to be present in any form.
displayable	BL	Specifies if the identifier's extension is intendended for human display and data entry (displayable = true) as opposed to pure machine interoperation (displayable = false).
Valid Time	IVL<TS>	If applicable, specifies during what time the identifier is valid. By default, the identifier is valid indefinitely. Any specific interval may be undefined on either side indicating unknown effective or expiry time. Note: identifiers for information objects in computer systems should not have restricted valid times, but should be globally unique at all times. The identifier valid time is provided mainly for real-world identifiers, whose maintenance policy may include expiry (e.g., credit card numbers.)

Name	Type	Description
validTime	GTS	Specifies the periods of time during which the telecommunication address can be used. For a telephone number, this can indicate the time of day in which the party can be reached on that telephone. For a web address, it may specify a time range in which the web content is promised to be available under the given address.
use	SET<CS>	One or more codes advising a system or user which telecommunication address in a set of like addresses to select for a given telecommunication need.

code	name	definition
tel	Telephone	A voice telephone number [draft-antti-telephony-url-11.txt]. Required for HL7 use.
fax	Fax	A telephone number served by a fax device [draft-antti-telephony-url-11.txt]. Required for HL7 use.
mailto	Mailto	Electronic mail address [RFC 2368]. Required for HL7 use.
http	HTTP	Hypertext Transfer Protocol [RFC 2068]. Required for HL7 use.
ftp	FTP	The File Transfer Protocol (FTP) [RFC 1738]. Required for HL7 use.
file	File	Host-specific local file names [RCF 1738]. Note that the file scheme works only for local files. There is little use for exchanging local file names between systems, since the receiving system likely will not be able to access the file. Deprecated for
nfs	NFS	Network File System protocol [RFC 2224]. Some sites use NFS servers to share data files.
telnet	Telnet	Reference to interactive sessions [RFC 1738]. Some sites, (e.g., laboratories) have TTY based remote query sessions that can be accessed through telnet.
modem	Modem	A telephone number served by a modem device [draft-antti-telephony-url-11.txt].

code	name	definition
H	home	A communication address at a home, attempted contacts for business purposes might intrude privacy and chances are one will contact family or other household members instead of the person one wishes to call. Typically used with urgent cases, or if no othe
HP	primary home	The primary home, to reach a person after business hours.
HV	vacation home	A vacation home, to reach a person while on vacation.
WP	work place	An office address. First choice for business related contacts during business hours.
AS	answering service	An automated answering machine used for less urgent cases and if the main purpose of contact is to leave a message or access an automated announcement.
EC	emergency contact	A contact specifically designated to be used for emergencies. This is the first choice in emergencies, independent of any other use codes.
PG	pager	A paging device suitable to solicit a callback or to leave a very short message.
MC	mobile contact	A telecommunication device that moves and stays with its owner. May have characteristics of all other use codes, suitable for urgent matters, not the first choice for routine business.

Name	Type	Description
use	CS	A set of codes advising a system or user which address in a set of like addresses to select for a given purpose.
validTime	GTS	A General Timing Specification (GTS) specifying the periods of time during which the address can be used. This is used to specify different addresses for different times of the year or to refer to historical addresses.
formatted	ST	A character string value with the address formatted in lines and with proper spacing. This is only a semantic property to define the function of some of the address part types.

code	name	definition
DEL	delimiter	Delimiters are printed without framing white space. If no value component is provided, the delimiter appears as a line break.
CNT	country	Country
STA	state or province	A sub-unit of a country with limited sovereignty in a federally organized country.
CPA	County or Parish	A sub-unit of a state or province. (49 of the United States of America use the term "county;" Louisiana uses the term "parish").
CTY	city	City
ZIP	postal code	A postal code designating a region defined by the postal service.
STR	street name	Street name or number.
HNR	house number	The number of a house or lot alongside the street. Also known as "primary street number", but does not number the street but the house.
SAL	Street Address Line	A street address line is often used instead of separately distinguishing street name and house number. The street address line can repeat to represent "street address line 1" and "street address line 2".
DIR	direction	direction (e.g., N, S, W, E)
ADL	additional locator	This can be a unit designator, such as apartment number, suite number, or floor. There may be several unit designators in an address (e.g., "3rd floor, Appt. 342".) This can also be a designator pointing away from the location, rather than specifying a s
POB	post box	A numbered box located in a post station.
CEN	census tract	A sub-unit of country delineated for demographic purposes.

code	name	definition
PHYS	visit address	A physical address, used primarily to visit the addressee.
PST	postal address	Used to send mail.
TMP	temporary address	A temporary address, may be good for visit or mailing. Note that an address history can provide more detailed information.
BAD	bad address	A flag indicating that the address is bad, in fact, useless.
H	home	A communication address at a home, attempted contacts for business purposes might intrude privacy and chances are one will contact family or other household members instead of the person one wishes to call. Typically used with urgent cases, or if no othe
HP	primary home	The primary home, to reach a person after business hours.
HV	vacation home	A vacation home, to reach a person while on vacation.
WP	work place	An office address. First choice for business related contacts during business hours.
ABC	Alphabetic	Alphabetic transcription of name (Japanese: romaji)Alphabetic transcription of name (Japanese: romaji)Alphabetic transcription of name (Japanese: romaji)
SYL	Syllabic	Syllabic transcription of name (e.g., Japanese kana, Korean hangul)Syllabic transcription of name (e.g., Japanese kana, Korean hangul)Syllabic transcription of name (e.g., Japanese kana, Korean hangul)
IDE	Ideographic	Ideographic representation of name (e.g., Japanese kanji, Chinese characters)Ideographic representation of name (e.g., Japanese kanji, Chinese characters)Ideographic representation of name (e.g., Japanese kanji, Chinese characters)

Name	Type	Description
partType	CS	Indicates whether the name part is a given name, family name, prefix, suffix, etc.
qualifier	CS	The qualifier is a set of codes each of which specifies a certain subcategory of the name part in addition to the main name part type. For example, a given name may be flagged as a nickname, a family name may be a pseudonym or a name of public records

code	name	definition
FAM	family	Family name, this is the name that links to the genealogy. In some cultures (e.g. Eritrea) the family name of a son is the first name of his father.
GIV	given	Given name (don't call it "first name" since this given names do not always come first)
PFX	prefix	A prefix has a strong association to the immediately following name part. A prefix has no implicit trailing white space (it has implicit leading white space though). Note that prefixes can be inverted.A prefix has a strong association to the immediately following name part. A prefix has no implicit trailing white space (it has implicit leading white space though). Note that prefixes can be inverted.
SFX	suffix	A suffix has a strong association to the immediately preceding name part. A prefix has no implicit leading white space (it has implicit trailing white space though). Suffices can not be inverted.A suffix has a strong association to the immediately preceding name part. A prefix has no implicit leading white space (it has implicit trailing white space though). Suffices can not be inverted.
DEL	delimiter	A delimiter has no meaning other than being literally printed in this name representation. A delimiter has no implicit leading and trailing white space.A delimiter has no meaning other than being literally printed in this name representation. A delimiter has no implicit leading and trailing white space.

code	name	definition
BR	birth	A name that a person had shortly after being born. Usually for family names but may be used to mark given names at birth that may have changed later.
SP	spouse	The name assumed from the partner in a marital relationship (hence the "M"). Usually the spouse's family name. Note that no inference about gender can be made from the existence of spouse names.
VV	voorvoegsel	A Dutch "voorvoegsel" is something like "van" or "de" that might have indicated nobility in the past but no longer so. Similar prefixes exist in other languages such es Spanish, French or Portugese.
AC	academic	Indicates that a prefix like "Dr." or a suffix like "M.D." or "Ph.D." is an academic title.
PR	professional	Primarily in the British Imperial culture people tend to have an abbreviation of their professional organization as part of their credential suffices.
NB	nobility	In Europe and Asia, there are still people with nobility titles (aristocrats.) German "von" is generally a nobility title, not a mere voorvoegsel. Others are "Earl of" or "His Majesty King of..." etc. Rarely used nowadays, but some systems do keep trac
LS	Legal status	For organizations a suffix indicating the legal status, e.g., "Inc.", "Co.", "AG", "GmbH", "B.V." "S.A.", "Ltd." etc.
CL	callme	A callme name is (usually a given name) that is preferred when a person is directly addressed.
IN	initial	Indicates that a name part is just an initial. Initials do not imply a trailing period since this would not work with non-Latin scripts. Initials may consist of more than one letter, e.g., "Ph." could stand for "Philippe" or "Th." for "Thomas".

code	name	definition
L	Legal	known as/conventional/the one you useknown as/conventional/the one you use
A	Artist/Stage	Includes writer's pseudonym, stage name, etc
I	Indigenous/Tribal	e.g. Chief Red Cloud
R	Religious	e.g. Sister Mary Francis, Brother John
ABC	Alphabetic	Alphabetic transcription of name (Japanese: romaji)Alphabetic transcription of name (Japanese: romaji)Alphabetic transcription of name (Japanese: romaji)
SYL	Syllabic	Syllabic transcription of name (e.g., Japanese kana, Korean hangul)Syllabic transcription of name (e.g., Japanese kana, Korean hangul)Syllabic transcription of name (e.g., Japanese kana, Korean hangul)
IDE	Ideographic	Ideographic representation of name (e.g., Japanese kanji, Chinese characters)Ideographic representation of name (e.g., Japanese kanji, Chinese characters)Ideographic representation of name (e.g., Japanese kanji, Chinese characters)

Literal	Number of Significant Digits
2000	has 4 significant digits.
2e3	has 1 significant digit, used if one would naturally say "2000" but precision is only 1.
0.001	has 1 significant digit.
1e-3	has 1 significant digit, use this if one would naturally say "0.001" but precision is only 1.
0	has 1 significant digit.
0.0	has 2 significant digits.
000.0	has 2 significant digits.
0.00	has 3 significant digits.
4.10	has 3 significant digits.
4.09	has 3 significant digits.
4.1	has 2 significant digits.

value	precision	equals	value	precision
3.14	3	true	3.14	3
3.140000	7	true	3.14	3
3.1415	5	true	3.14	3
3.1415	5	false	3.1400	5
4	1	false	3	1

Name	Type	Description
numerator	N	The quantity that is being devided in the ratio. The default is the integer number 1 (one.)
denominator	D	The quantity that devides the numerator in the ratio. The default is the integer number 1 (one.) The denominator must not be zero.

Name	Type	Description
value	REAL	The magnitude of the quantity measured in terms of the unit.
unit	CS	The unit of measure specified in the Unified Code for Units of Measure (UCUM) [].
translation	SET<PQR>	An alternative representation of the same physical quantity expressed in a different unit, of a different unit code system and possibly with a different value.
canonical	PQ	A physical quantity expressed in a canonical unit. In any given unit system has every physical dimension can be assigned one canonical unit Defining the canonical unit is not subject of this specification, only asserting that such a canonical unit exists (and can be arbitrarily chosen) for every physical quantity. An abstract physical quantity is equal to its canonical form.
	REAL

Data Types - Abstract Specification

1

Preface

1.1

A note to all readers who participated in previous ballots

2

Acknowledgements

Table of contents

Appendices

1

Introduction

1.1

What is a Data Type?

1.2

Representation of Data Values

1.3

Properties of Data Values

1.4

Need for the Abstraction

1.5

Need for an HL7 Data Type Standard

1.6

Forms of Data Type Definitions

1.6.1

Formal Data Type Definition Language

1.6.2

Tables of Properties

1.6.3

Unified Modeling Language (UML) Diagrams

1.7

Overview of Data Types

1.8

Introduction to the Formal Data Type Definition Language

1.8.1

Declaration

1.8.2

Invariant Statements

1.8.2.1

Assertion Expressions

1.8.2.2

Nested Quantifier Expressions

1.8.3

Type Conversion

1.8.3.1

Demotion

1.8.3.2

Promotion

1.8.4

Literal Form

1.8.4.1

Declaration

1.8.4.2

Definition

1.8.5

Generic Data Types

1.8.5.1

Generic Collections

1.8.5.2

Generic Type Extensions

1.9

DataType (type)

1.9.1

Properties of DataType (type)

1.9.1.1

Name (name : CE)

1.10

DataValue (ANY)

1.10.1

Properties of DataValue (ANY)

1.10.1.1

Data Type (dataType : type)

1.10.1.2

Proper Value (nonNull : BL)

1.10.1.3

Exceptional Value (isNull : BL)

1.10.1.4

Exceptional Value Detail (nullFlavor : BL)

1.10.1.5

Inapplicable Proper Value (notApplicable : BL)

1.10.1.6

Name	Type	Description
value	REAL	The magnitude of the measurement value in terms of the unit specified by this code.
code	ST	The plain code symbol defined by the code system. For example, "784.0" is the code symbol of the ICD-9 code "784.0" for headache.
codeSystem	UID	Specifies the code system that defines the code.
codeSystemName	ST	A common name of the coding system.
codeSystemVersion	ST	If applicable, a version descriptor defined specifically for the given code system
displayName	ST	A name or title for the code, under which the sending system shows the code value to its users.
originalText	ED	The text or phrase used as the basis for the coding.

Name	Type	Description
value	REAL	The magnitude of the monetary amount in terms of the currency unit.
currency	CS	The currency unit as defined in ISO 4217.

code	name	definition
ARS	Argentine Peso	Argentine Peso, monetary currency of Argentina
AUD	Australian Dollar	Australian Dollar, monetary currency of Australia
BRL	Brazilian Real	Brazilian Real, monetary currency of Brazil
CAD	Canadian Dollar	Canadian Dollar, monetary currency of Canada
CHF	Swiss Franc	Swiss Franc, monetary currency of Switzerland
CLF	Unidades de Formento	Unidades de Formento, monetary currency of Chile
CNY	Yuan Renminbi	Yuan Renminbi, monetary currency of China
DEM	Deutsche Mark	Deutsche Mark, monetary currency of Germany
ESP	Spanish Peseta	Spanish Peseta, monetary currency of Spain
EUR	Euro	Euro, monetary currency of European Union
FIM	Markka	Markka, monetary currency of Finland
FRF	French Franc	French Franc, monetary currency of France
GBP	Pound Sterling	Pound Sterling, monetary currency of United Kingdom
ILS	Shekel	Shekel, monetary currency of Israel
INR	Indian Rupee	Indian Rupee, monetary currency of India
JPY	Yen	Yen, monetary currency of Japan
KRW	Won	Won, monetary currency of Korea (South)
MXN	Mexican Nuevo Peso	Mexican Nuevo Peso, monetary currency of Mexico
NLG	Netherlands Guilder	Netherlands Guilder, monetary currency of Netherlands
NZD	New Zealand Dollar	New Zealand Dollar, monetary currency of New Zealand
PHP	Philippine Peso	Philippine Peso, monetary currency of Philippines
RUR	Russian Ruble	Russian Ruble, monetary currency of Russian Federation
THB	Baht	Baht, monetary currency of Thailand
TRL	Lira	Lira, monetary currency of Turkey
TWD	Taiwan Dollar	Taiwan Dollar, monetary currency of Taiwan
USD	US Dollar	US Dollar, monetary currency of United States
ZAR	Rand	Rand, monetary currency of South Africa

name	code 1	code 2	counter	digits	start	condition
year	Y	CY	1	4	0	MY12
month of the year	M	MY	2	2	1	MY01,03,05,07,08,10,12 → DM31 MY04,06,09,11 → DM30 MY02 Y/4 Y/100 → DM28 MY02 Y/4 → DM29 MY02 → DM28
month (continuous)		CM			0	continuous MY
week (continuous)	W	CW			0	CD7
week of the year		WY		2	1	continuous DW7
day of the month	D	DM	3	2	1	HD24
day (continuous)		CD			0	CH24
day of the year		DY		3	1	HD24
day of the week (begins with Monday)	J	DW		1	1	HD24
hour of the day	H	HD	4	2	0	MH60
hour (continuous)		CH			0	CN60
minute of the hour	N	NH	5	2	0	UTC leap second → SN61 → SN60
minute (continuous)		CN			0	CS60
second of the minute	S	SN	6	2	0	CS1
second (continuous)		CS			0	basis

literal	meaning
{1; 3; 5; 7; 19}	a set of integer numbers or real numbers
{3; 1; 5; 19; 7}	the same set of integer numbers or real numbers
{1.2 m; 2.67 m; 17.8 m}	a set of discrete physical quantities
{apple; orange; banana}	a set of character strings

literal	meaning
(1; 3; 5; 7; 19)	a sequence of integer numbers or real numbers
(3; 1; 5; 19; 7)	a different sequence of integer numbers or real numbers
(1.2 m; 17.8 m; 2.67 m)	a sequence of discrete physical quantities
(apple; orange; banana)	a sequence of character strings

Name	Type	Description
head	T	The first item in this sequence. The is a definitional property for the semantics of the sequence.
increment	T.diff	The difference between one value and its pervious different value. For example, to generate the sequence (1; 4; 7; 10; 13; ...) the increment is 3; likewise to generate the sequence (1; 1; 4; 4; 7; 7; 10; 10; 13; 13; ...) the increment is also 3.
period	INT	If non-NULL, specifies that the sequence alternates, i.e., after this many increments, the sequence item values roll over to start from the initial sequence item value. For example, the sequence (1; 2; 3; 1; 2; 3; 1; 2; 3; ...) has period 3; also the sequence (1; 1; 2; 2; 3; 3; 1; 1; 2; 2; 3; 3; ...) has period 3 too.
denominator	INT	The the integer by which the index for the sequence is divided, effectively the number of times the sequence generates the same sequence item value before incrementing to the next sequence item value. For example, to generate the sequence (1; 1; 1; 2; 2; 2; 3; 3; 3; ...) the is 3.

head	increment	denominator	period	meaning
0	1	1	∞	The identity-sequence where each item is equal to its index.
198706052000	2 hour	1	∞	Sequence starting on June 5, 1987 at 7 PM and incrementing every two hours: 9 PM, 11 PM, 1 AM (June 6), 3 AM, 5 AM, and so on.
0 V	1 mV	1	100	The x-wave of a digital oscillograph scanning between 0 and 100 mV in 100 steps of 1 mV. The frequency is unknown from these data as we do not know how much time elapses between each step of the index.
2002072920300	100 us	1	∞	A timebase from June 29, 2002 at 8:30 PM with 100 us between each steps of the index. If combined with the previous generator as a second sampling dimension this would now describe our digital oscilloscope's x-timebase as 1 mV per 100 us. At 100 steps per period, the period is 10 ms, which is equal to a frequency of 100 Hz.
0 V	1 mV	100	100	Combining this generator to the previous two generators could describe a three-dimensional sampling space with two voltages and time. This generator also steps at 1 mV and has 100 steps per period, however, it only steps every 100 index increments, so, the first voltage generator makes one full cycle before this generator is incremented. One can think of the two voltages as "rows" and "columns" of a "sampling frame". With the previous generator as the timebase, this results in a scan of sampling frames of 100 mV × 100 mV with a framerate of 1 Hz.

Name	Type	Description
origin	T	The origin of the list item value scale.
scale	T.diff	A ratio quantity that is factored out of the digit sequence.
digits	LIST<INT>	A sequence of raw digits for the sample values. This is typically the raw output of an A/D converter.

Name	Type	Description
standardDeviation	T.diff	The primary measure of variance/uncertainty of the value (the square root of the sum of the squares of the differences between all data points and the mean). The standard deviation is used to normalize the data for computing the distribution function. Applications that cannot deal with probability distributions can still get an idea about the confidence level by looking at the standard deviation.
distributionType	CE	A code specifying the type of probability distribution. Possible values are as shown in the attached table. The NULL value (unknown) for the type code indicates that the probability distribution type is unknown. In that case, the standard deviation has the meaning of an informal guess.