Data Types - Implementation Technology Specification for XML

This document specifies the HL7 Version 3 Data Types in the context of their XML Implementation Technology Specification (ITS). Although ITS dependent, this document can stand on it's own for the purpose of this ballot.

This document is based on the Data Types Abstract Specification, which is defines the data types on an abstract layer independent from representation. Because data types are very close to implementation technology (many even believe data types are nothing else then "formats" for data fields), the truly ITS independent specification was felt to be too abstract to serve as a commonly understood, primary reference ' for data types. Therefore it was decided at this point in tome to make the data type ITS the primary normative specification.

Vocabulary tables within this specification list the current contents of vocabulary domains for ease of reference by the reader. However, at any given time the normative source for these domains is the vocabulary tables in the RIM database. For some large domains, only a sample of possible values is shown. The complete domains can be referenced in the vocabulary tables by looking up the domain name associated with the table in the RIM vocabulary tables.

This standard is the result of several years of intense work through e-mail, telephone conferences and meeting discussions. Gunther Schadow (Regenstrief Institute for Health Care) chaired this task force, and is the main author of this document. Major contributions are from Mark Tucker (Regenstrief Institute), Paul V. Biron (Kaiser Permanente), Lloyd McKenzie (IBM), George Beeler, Douglas Pratt (Siemens), and Stan Huff (Intermountain Health Care), as well as Mike Henderson (Kaiser Permanente, now Eastern Informatics), Anthony Julian (Mayo), Joann Larson (Kaiser Permanente), Mark Shafarman (Oacis Healthcare Systems, now Oracle), Wes Rishel (Gartner Group), and Robin Zimmerman (Kaiser Permanente). Acknowledgements for their critical review and infusion of ideas go to Bob Dolin (Kaiser Permanente), Clem McDonald (Regenstrief Institute), Kai Heitmann (HL7 Germany), Rob Seliger (Sentillion), and Harold Solbrig (Mayo Clinic). Vital support came from the members of the task force, Laticia Fitzpatrick (Kaiser Permanente), Matt Huges, Randy Marbach (Kaiser Permanente), Larry Reis (Wizdom Systems), Carlos Sanroman (Kaiser Permanente), Greg Thomas (Kaiser Permanente). Thanks James Case (University of California, Davis), Norman Daoust (Partners HealthCare Systems), Irma Jongeneel (HL7 The Netherlands), Michio Kimura (HL7 Japan), John Molina (SMS), Richard Ohlmann (McKessonHBOC), David Rowed (HL7 Australia), and Klaus Veil (Macquarie Health Corp., HL7 Australia), for sharing their expertise in critical questions. This work was made possible by the Regenstrief Institute for Health Care.

What is a Data Type? Data types are the basic building blocks used to construct messages, computerized patient record documents, business objects and their transactions. Data types define the meaning of any given field's value. Without knowing a field's data type, it is impossible to interpret the field's value.

Representation of Data Values. On an abstract layer, independent from representation, a data types define properties of values. When values are represented, some of their properties are directly represented as atomic literal forms or as data structures. At that point we call those properties "components". On the representation layer we can also distinguish simple data types, represented as atomic literal forms, from complex ones, represented as structures with components. For the implementor, it is important to realize that data types have more properties than shown as components, and that it only depends on the implementation technology and ITS specification what data types are simple or complex and which of their propertics are represented as "components" and which are inferred from those components.

This specification defines standard representations for data values in XML only. Other ITS, and programming environments may choose different representations and data structures, all of which must be consistent with the Data Types Abstract Specification.

The fully specified data types are organized approximately in the same order in which they appear in the Data Types Abstract Specification, devided in roughly three categories: (1) boolean, binary, text and multimedia, (2) codes and identifiers, and (3) quantitative data types.

Generic types are about collections (sets, lists, etc.) and common data type extensions to deal with uncertainty, time-dependency and other qualifications of data values. Finally, the framework of specifying complex timing patterns (e.g., for scheduling periodic activities) is mostly specified in terms of generic data types.

Table 1: Overview of HL7 version 3 data types
Name	Symbol	Description
Data Value	ANY	Defines the basic properties of every data value. This is an abstract type, meaning that no value can be just a data value without belonging to any concrete type. Every concrete type is a specialization of this general abstract DataValue type.
Boolean	BL	The Boolean type stands for the values of two-valued logic. A Boolean value can be either or , or, as any other value may be NULL.
Binary Data	BIN	Binary data is a raw block of bits. Binary data is a protected type that should not be declared outside the data type specification.
Encapsulated Data	ED	Data that is primarily intended for human interpretation or for further machine processing outside the scope of HL7. This includes unformatted or formatted written language, multimedia data, or structured information in as defined by a different standard (e.g., XML-signatures.) Instead of the data itself, an may contain only a reference (see .) Note that the data type is a specialization of the data type when the media type is text/plain.
Character String	ST	The character string data type stands for text data, primarily intended for machine processing (e.g., sorting, querying, indexing, etc.) Used for names, symbols, and formal expressions.
Coded Simple Value	CS	Coded data in its simplest form, consists of a code and display name. The code system and code system version is fixed by the context in which the CS value occurs. CS is used for coded attributes that have a single HL7-defined value set.
Coded Value	CV	Coded data, consists of a code, display name, code system, and original text. Used when a single code value must be sent.
Coded with Equivalents	CE	Coded data, consists of a coded value (CV) and, optionally, coded value(s) from other coding systems that identify the same concept. Used when alternative codes may exist.
Concept Descriptor	CD	A concept descriptor represents any kind of concept usually by giving a code defined in a code system. A concept descriptor can contain the original text or phrase that served as the basis of the coding and one or more translations into different coding systems. A concept descriptor can also contain qualifiers to describe, e.g., the concept of a "left foot" as a postcoordinated term built from the primary code "FOOT" and the qualifier "LEFT". In exceptional cases, the concept descriptor need not contain a code but only the original text describing that concept.
Concept Role	CR	A concept qualifier code with optionally named role. Both qualifier role and value codes must be defined by the coding system. For example, if SNOMED RT defines a concept "leg", a role relation "has-laterality", and another concept "left", the concept role relation allows to add the qualifier "has-laterality: left" to a primary code "leg" to construct the meaning "left leg".
Unique Identifier String	UID	A unique identifier string is a character string which identifies an object in a globally unique and timeless manner. The allowable formats and values and procedures of this data type are strictly controlled by HL7. At this time, user-assigned identifiers may be certain character representations of ISO Object Identifiers (OID) and DCE Universally Unique Identifiers (UUID). HL7 also reserves the right to assign other forms of UIDs, such as mnemonic identifiers for code systems.
Instance Identifier	II	An identifier that uniquely identifies a thing or object. Examples are object identifier for HL7 RIM objects, medical record number, order id, service catalog item id, Vehicle Identification Number (VIN), etc. Instance identifiers are defined based on ISO object identifiers.
Universal Resource Locator	URL	A telecommunications address specified according to Internet standard RFC 1738 []. The URL specifies the protocol and the contact point defined by that protocol for the resource. Notable uses of the telecommunication address data type are for telephone and telefax numbers, e-mail addresses, Hypertext references, FTP references, etc.
Telecommunication Address	TEL	A telephone number (voice or fax), e-mail address, or other locator for a resource (information or service) mediated by telecommunication equipment. The address is specified as a Universal Resource Locator (URL) qualified by time specification and use codes that help deciding which address to use for a given time and purpose.
Address Part	ADXP	A character string that may have a type-tag signifying its role in the address. Typical parts that exist in about every address are street, house number, or post box, postal code, city, country but other roles may be defined regionally, nationally, or on an enterprise level (e.g. in military addresses). Addresses are usually broken up into lines, which are indicated by special line-breaking delimiter elements (e.g., DEL).
Postal Address	AD	Mailing and home or office addresses. A sequence of address parts, such as street or post office Box, city, postal code, country, etc.
Entity Name Part	ENXP	A character string token representing a part of a name. May have a type code signifying the role of the part in the whole entity name, and a qualifyer code for more detail about the name part type. Typical name parts for person names are given names, and family names, titles, etc.
Entity Name	EN	A name for a person, organization, place or thing. A sequence of name parts, such as first name or family name, prefix, suffix, etc. Examples for entity name values are "Jim Bob Walton, Jr.", "Health Level Seven, Inc.", "Lake Tahoe", etc. An entity name may be as simple as a character string or may consist of several entity name parts, such as, "Jim", "Bob", "Walton", and "Jr.", "Health Level Seven" and "Inc.", "Lake" and "Tahoe".
Person Name Part	PNXP	A restriction of entity name part that only allows those entity name parts qualifiers applicable to person names. Since the structure of entity name is mostly determined by the requirements of person name, the restriction is very minor.
Person Name	PN	A name for a person. A sequence of name parts, such as first name or family name, prefix, suffix, etc.
Organization Name Part	ONXP	A restriction of entity name part that only allows those entity name parts qualifiers applicable to organization names.
Organization Name	ON	A name for an organization. A sequence of name parts.
Trivial Name	TN	A restriction of entity name that is effectively a simple string used for a simple name for things and places.
Quantity	QTY	The quantity data type is an abstract generalization for all data types (1) whose value set has an order relation (less-or-equal) and (2) where difference is defined in all of the data type's totally ordered value subsets. The quantity type abstraction is needed in defining certain other types, such as the interval and the probability distribution.
Integer Number	INT	Integer numbers (-1,0,1,2, 100, 3398129, etc.) are precise numbers that are results of counting and enumerating. Integer numbers are discrete, the set of integers is infinite but countable. No arbitrary limit is imposed on the range of integer numbers. Two NULL flavors are defined for the positive and negative infinity.
Real Number	REAL	Fractional numbers. Typically used whenever quantities are measured, estimated, or computed from other real numbers. The typical representation is decimal, where the number of significant decimal digits is known as the precision. Real numbers are needed beyond integers whenever quantities of the real world are measured, estimated, or computed from other real numbers. The term "Real number" in this specification is used to mean that fractional values are covered without necessarily implying the full set of the mathematical real numbers.
Physical Quantity	PQ	A dimensioned quantity expressing the result of a measurement act.
Physical Quantity Representation	PQR	A representation of a physical quantity in a unit from any code system. Used to show alternative representation for a physical quantity.
Monetary Amount	MO	A monetary amount is a quantity expressing the amount of money in some currency. Currencies are the units in which monetary amounts are denominated in different economic regions. While the monetary amount is a single kind of quantity (money) the exchange rates between the different units are variable. This is the principle difference between physical quantity and monetary amounts, and the reason why currency units are not physical units.
Ratio	RTO	A quantity constructed as the quotient of a numerator quantity divided by a denominator quantity. Common factors in the numerator and denominator are not automatically cancelled out. The data type supports titers (e.g., "1:128") and other quantities produced by laboratories that truly represent ratios. Ratios are not simply "structured numerics", particularly blood pressure measurements (e.g. "120/60") are not ratios. In many cases the should be used instead of the .
Point in Time	TS	A a quantity specifying a point on the axis of natural time. A point in time is most often represented as a calendar expression.
Sequence	LIST	A value that contains other discrete values in a defined sequence.
Bag	BAG	An unordered collection of values, where each value can be contained more than once in the bag.
Bag Item	BXIT	A generic data type extension that represents a collection of a certain number of identical items in a bag.
Set	SET	A value that contains other distinct values in no particular order.
Set Component	SXCM	An ITS-defined generic type extension for the base data type of a set, representing a component of a general set over a discrete or continuous value domain. It's use is mainly for continuous value domains. Discrete (enumerable) set components are the individual elements of the base data type.
Interval	IVL	A set of consecutive values of an ordered base data type.
History Item	HXIT	A generic data type extension that tags a time range to any data value of any data type. The time range is the time in which the information represented by the value is (was) valid.
History	HIST	A set of data values that conform to the history item (HXIT) type, (i.e., that have a valid-time property). The history information is not limited to the past; expected future values can also appear.
Uncertain Value - Narrative	UVN	A generic data type extension to specify one uncertain value tagged with a coded confidence qualifier.
Non-Parametric Probability Distribution	NPPD	A set of uncertain values with probabilities (also known as histogram.)
Uncertain Value - Probabilistic	UVP	A generic data type extension used to specify a probability expressing the information producer's belief that the given value holds.
Parametric Probability Distribution	PPD	A generic data type extension specifying uncertainty of quantitative data using a distribution function and its parameters. Aside from the specific parameters of the distribution, a mean (expected value) and standard deviation is always given to help maintain a minimum layer of interoperability if receiving applications cannot deal with a certain probability distribution.
Periodic Interval of Time	PIVL	An interval of time that recurs periodically. Periodic intervals have two properties, phase and period. The phase specifies the "interval prototype" that is repeated every period.
Event-Related Interval of Time	EIVL	Specifies a periodic interval pf time where the recurrence is based on activities of daily living or other important events that are time-related but not fully determined by time.
Parenthetic Set Expression	SXPR	A set-component that is itself made up of set-components that are evaluated as one value.
General Timing Specification	GTS	A set of points in time, specifying the timing of events and actions and the cyclical validity-patterns that may exist for certain kinds of information, such as phone numbers (evening, daytime), addresses (so called "snowbirds," residing in the south during winter and north during summer) and office hours.

Definition: Defines the basic properties of every data value. This is an abstract type, meaning that no value can be just a data value without belonging to any concrete type. Every concrete type is a specialization of this general abstract DataValue type.

In the XML schema we use an abstract complex data type ANY. All other complex XML schema data types are extensions of this base data type.

Schema Fragment 1:
`<xsd:complexType name="ANY" abstract="true" mixed="true"> <xsd:attribute name='nullFlavor' type='cs_NullFlavor' ... /> </xsd:complexType>`

Definition: An exceptional value expressing missing information and possibly the reason why the information is missing.

Every data element has either a proper value or it is considered NULL. If (and only if) it is NULL, a "null flavor" provides more detail as to in what way or why no proper value is supplied.

Table 2: Domain NullFlavor:
code	name	definition
NI	NoInformation	No information whatsoever can be inferred from this exceptional value. This is the most general exceptional value. It is also the default exceptional value.
NA	not applicable	No proper value is applicable in this context (e.g., last menstrual period for a male.)
UNK	unknown	A proper value is applicable, but not known.
NASK	not asked	This information has not been sought (e.g., patient was not asked)
ASKU	asked but unknown	Information was sought but not found (e.g., patient was asked but didn't know)
NAV	temporarily unavailable	Information is not available at this time but it is expected that it will be available later.
OTH	other	The actual value is not an element in the value domain of a variable. (e.g., concept not provided by required code system.)
PINF	positive infinity	Positive infinity of numbers.
NINF	negative infinity	Negative infinity of numbers.
NP	not present	Value is not present in a message. This is only defined in messages, never in application data! All values not present in the message must be replaced by the applicable default, or no-information (NI) as the default of all defaults.

Schema Fragment 2:
`<xsd:attribute name="nullFlavor" type="cs_NullFlavor" use="optional"/>`

Schema Fragment 3:
`<xsd:simpleType name="cs_NullFlavor"> <xsd:restriction base="cs"> <xsd:enumeration value="NI"/> <xsd:enumeration value="NA"/> <xsd:enumeration value="UNK"/> <xsd:enumeration value="NASK"/> <xsd:enumeration value="ASKU"/> <xsd:enumeration value="NAV"/> <xsd:enumeration value="OTH"/> <xsd:enumeration value="PINF"/> <xsd:enumeration value="NINF"/> </xsd:restriction> </xsd:simpleType>`

Definition: The Boolean type stands for the values of two-valued logic. A Boolean value can be either true or false, or, as any other value may be NULL.

All Boolean values obey the common operators negation, conjunction, and disjunction. With the NULL value these common Boolean operations are extended as shown in the following tables.

The XML representation of a Boolean can be either as a simple XML schema type (suitable for use in an XML attribute) or as a complex XML schema type preferred for use as an XML element. The simple type is used only in the HL7 data type schema (including defining the complex type.) The simple type cannot distinguish the different flavors of null.

By convention, simple types have a schema name that uses the lower case form of the HL7 data type short name.

Table 3: Truth tables for Boolean logic with NULL values
NOT		AND	true	false	NULL	OR	true	false	NULL
true	false	true	true	false	NULL	true	true	true	true
false	true	false	false	false	false	false	true	false	NULL
NULL	NULL	NULL	NULL	false	NULL	NULL	true	NULL	NULL

Schema Fragment 4:
`<xsd:simpleType name="bl"> <xsd:restriction base="xsd:boolean"> <xsd:pattern value="true\|false"/> </xsd:restriction> </xsd:simpleType>`

Normally the Boolean is used as a complex XML schema type that has all the common data type components, such as the null flavor available. The simple Boolean value itself appears as the XML-attribute value.

By convention, complex XML types have a schema name that uses the upper case form of the HL7 data type short name.

Schema Fragment 5:
`<xsd:complexType name="BL"> <xsd:complexContent> <xsd:extension base="ANY"> <xsd:attribute name="value" use="optional" type="bl"/> </xsd:extension> </xsd:complexContent> </xsd:complexType>`

Although the use of the XML-attribute value is optional, the constraint (expressed as an XPath predicate), specifies that there must be either an XML-attribute value or the ANY.nullFlavor attribute, but not both.

Schema Fragment 6:
`<hl7:pr assert="(@nullFlavor or @value) and not(@nullFlavor and @value)"/>`

Definition: Binary data is a raw block of bits. Binary data is a protected type that should not be declared outside the data type specification.

The XML representation of binary data exists in two forms, a simple type and a complex type. The simple type is used for certain data type components represented as XML attributes carrying arbitrary binary values. Simple binary data is always base 64 encoded

The complex type serves as the basis of the encapsulated data type, used for both written text and multimedia (binary data.) When an element is defined to be of type BIN that property is represented as character data (e.g., #PCDATA) in the content model of the data type in question.

Schema Fragment 7:
`<xsd:simpleType name="bin"> <xsd:restriction base="xsd:base64Binary"/> </xsd:simpleType>`

Schema Fragment 8:
`<xsd:complexType name="BIN" mixed="true"> <xsd:complexContent> <xsd:extension base="ANY"> <xsd:attribute name='encoding' type='cs_BinaryDataEncoding' ... /> </xsd:extension> </xsd:complexContent> </xsd:complexType>`

Definition: The data itself represented in the XML instance encoding according to the binary data encoding element (text or base64 form.) Character strings and derived types use only the text encoding, which is the same encoding as the overall XML instance.

Definition: Specifies the encoding of the binary data that is the content of the binary data complex XML schema data type.

The Data Types Abstract Specification does not recognize an encoding property for the BIN data type. The XML-attribute encoding is a component that only specifies the representation of the data in the XML rendition. The encoding is not a meaningful property of the data, only its encoding This means, application need not retain the encoding of the data.

Binary content can either appear as TEXT (in the encoding of the overall message) or as base 64 data

Schema Fragment 9:
`<xsd:attribute name="encoding" use="optional" type="cs_BinaryDataEncoding" default="TXT"/>`

Schema Fragment 10:
`<xsd:simpleType name="cs_BinaryDataEncoding"> <xsd:restriction base="xsd:NMTOKEN"> <xsd:enumeration value="B64"/> <xsd:enumeration value="TXT"/> </xsd:restriction> </xsd:simpleType>`

"TXT" indicates that the character data of the ED is to be interpreted directly as characters; "B64" indicates that the BIN data has been base64 encoded and must be decoded in order to recover the original data.

Definition: Data that is primarily intended for human interpretation or for further machine processing outside the scope of HL7. This includes unformatted or formatted written language, multimedia data, or structured information in as defined by a different standard (e.g., XML-signatures.) Instead of the data itself, an ED may contain only a reference (see TEL.) Note that the ST data type is a specialization of the ED data type when the ED media type is text/plain.

Encapsulated data can be present in two forms, inline or by reference. Inline data is communicated or moved as part of the encapsulated data value, whereas by-reference data may reside at a different (remote) location. The data is the same whether it is located inline or remote.

The XML schema for ED is a complex type that is an extension of BIN. Inline binary data is conveyed as character content.

Table 4: Components of Encapsulated Data
Name	Type	Description
data	xsl:string	The data itself represented in the XML instance encoding according to the binary data encoding element (text or base64 form.) Character strings and derived types use only the text encoding, which is the same encoding as the overall XML instance.
encoding	cs	Specifies the encoding of the binary data that is the content of the binary data complex XML schema data type.
mediaType	cs	Identifies the encoding of the encapsulated data and identifies a method to interpret or render the data.
xml:lang	cs	For character based information the language property specifies the human language of the text.
compression	cs	Indicates whether the raw byte data is compressed, and what compression algorithm was used.
reference	TEL	A telecommunication address (TEL), such as a URL for HTTP or FTP, which will resolve to precisely the same binary data that could as well have been provided as inline data.
integrityCheck	bin	The integrity check is a short binary value representing a cryptographically strong checksum that is calculated over the binary data. The purpose of this property, when communicated with a reference is for anyone to validate later whether the reference still resolved to the same data that the reference resolved to when the encapsulated data value with reference was created.
integrityCheckAlgorithm	cs	Specifies the algorithm used to compute the integrityCheck value.
thumbnail	ED	A thumbnail is an abbreviated rendition of the full data. A thumbnail requires significantly fewer resources than the full data, while still maintaining some distinctive similarity with the full data. A thumbnail is typically used with by-reference encapsulated data. It allows a user to select data more efficiently before actually downloading through the reference.

Schema Fragment 11:
<xsd:complexType name="ED" mixed="true"> <xsd:complexContent> <xsd:extension base="BIN"> <xsd:sequence> <xsd:element name='reference' type='TEL' ... /> <xsd:element name='thumbnail' ... /> </xsd:sequence> <xsd:attribute name='mediaType' type='cs' ... /> <xsd:attribute ref='xml:lang' ... /> <xsd:attribute name='compression' type='cs_CompressionAlgorithm' ... /> <xsd:attribute name='integrityCheck' type='bin' ... /> <xsd:attribute name='integrityCheckAlgorithm' type='cs_IntegrityCheckAlgorithm' ... /> </xsd:extension> </xsd:complexContent> </xsd:complexType>

ED (and its restricted form, ST) are encoded as character data, resulting in a mixed content model for ED. Because of this mixed content model, the text content could be split into several text nodes. Only XML encodings are valid that produce at most one text node.

EDITORIAL NOTE: Perhaps this is not correct, we may want to allow any content here (open content) so that another XML document (most likely XHTML or HL7 CDA) could be enclosed here effortlessly.

Schema Fragment 12:
`<hl7:pr assert="count(text())<=1"/>`

Definition: Identifies the encoding of the encapsulated data and identifies a method to interpret or render the data.

To promote interoperability, this specification prefers certain media types to others. This is to define a greatest common denominator on which interoperability is not only possible, but that is powerful enough to support even advanced multimedia communication needs.

Table Table 5 below assigns a status to certain MIME media types, where the status means one of the following:

required Every HL7 application must support at least the required media types if it supports a given kind of media. One required media-type for each kind of media exists. Some media types are required for a specific purpose, which is then indicated as "required for ..."

recommended Other media types are recommended for a particular purpose. For any given purpose there should be only very few additionally recommended media types and the rationale, conditions and assumptions of such recommendations must be made very clear.

indifferent This status means, HL7 does neither forbid nor endorse the use of this media type. All media types not mentioned here by default belong into the indifferent category. Since there is one required and several recommended media types for most practically relevant use cases, media types of this status should be used very conservatively.

deprecated Deprecated media types should not be used, because these media types are flawed, because there are better alternatives, or because of certain risks. Such risks could be security risks, for example, the risk that such a media type could spread computer viruses. Not every flawed media type is marked as deprecated, though. A media type that is not mentioned, and thus considered other by default, may well be flawed.

The set of required media types is very small so that no undue requirements are forced on HL7 applications, especially legacy systems. In general, no HL7 application is forced to support any given kind of media other than written text. For example, many systems just do not want to receive audio data, because those systems can only show written text to their users. It is a matter of application conformance statements to say: "I will not handle audio". Only if a system claims to handle audio media, it must support the required media type for audio.

Table 5: Domain MediaType:
code	name	status	definition
text/plain	Plain Text	required	For any plain text. This is the default and is equivalent to a character string (ST) data type.
text/x-hl7-ft	HL7 Text	recommended	For compatibility, this represents the HL7 v2.x FT data type. Its use is recommended only for backward compatibility with HL7 v2.x systems.
text/html	HTML Text	recommended	For marked-up text according to the Hypertext Mark-up Language. HTML markup is sufficient for typographically marking-up most written-text documents. HTML is platform independent and widely deployed.
application/pdf	PDF	recommended	The Portable Document Format is recommended for written text that is completely laid out and read-only. PDF is a platform independent, widely deployed, and open specification with freely available creation and rendering tools.
text/xml	XML Text	indifferent	For structured character based data. There is a risk that general SGML/XML is too powerful to allow a sharing of general SGML/XML documents between different applications.
text/rtf	RTF Text	indifferent	The Rich Text Format is widely used to share word-processor documents. However, RTF does have compatibility problems, as it is quite dependent on the word processor. May be useful if word processor edit-able text should be shared.
application/msword	MSWORD	deprecated	This format is very prone to compatibility problems. If sharing of edit-able text is required, text/plain, text/html or text/rtf should be used instead.
audio/basic	Basic Audio	required	This is a format for single channel audio, encoded using 8bit ISDN mu-law [PCM] at a sample rate of 8000 Hz. This format is standardized by: CCITT, Fascicle III.4 oRecommendation G.711. Pulse Code Modulation (PCM) of Voice Frequencies. Geneva, 1972.
audio/mpeg	MPEG audio layer 3	required	MPEG-1 Audio layer-3 is an audio compression algorithm and file format defined in ISO 11172-3 and ISO 13818-3. MP3 has an adjustable sampling frequency for highly compressed telephone to CD quality audio.
audio/k32adpcm	K32ADPCM Audio	indifferent	ADPCM allows compressing audio data. It is defined in the Internet specification RFC 2421 [ftp://ftp.isi.edu/in-notes/rfc2421.txt]. Its implementation base is unclear.
image/png	PNG Image	required	Portable Network Graphics (PNG) [http://www.cdrom.com/pub/png] is a widely supported lossless image compression standard with open source code available.
image/gif	GIF Image	indifferent	GIF is a popular format that is universally well supported. However GIF is patent encumbered and should therefore be used with caution.
image/jpeg	JPEG Image	required	This format is required for high compression of high color photographs. It is a "lossy" compression, but the difference to lossless compression is almost unnoticeable to the human vision.
image/g3fax	G3Fax Image	recommended	This is recommended only for fax applications.
image/tiff	TIFF Image	indifferent	Although TIFF (Tag Image File Format) is an international standard it has many interoperability problems in practice. Too many different versions that are not handled by all software alike.
video/mpeg	MPEG Video	required	MPEG is an international standard, widely deployed, highly efficient for high color video; open source code exists; highly interoperable.
video/x-avi	X-AVI Video	deprecated	The AVI file format is just a wrapper for many different codecs; it is a source of many interoperability problems.
model/vrml	VRML Model	recommended	This is an openly standardized format for 3D models that can be useful for virtual reality applications such as anatomy or biochemical research (visualization of the steric structure of macromolecules)

In XML mediaType is represented with the XML-attribute mediaType. The default value for the XML-attribute mediaType is "text/plain".

Schema Fragment 13:
`<xsd:attribute name="mediaType" type="cs" use="optional" default="text/plain"/>`

The ED charset semantic property is is not explicitly represented in the XML ITS. Rather, the value of charset is to be infered from the value of the XML-attribute encoding in the XML-Declaration of the XML entity containing the instance. If the XML-Declaration or the XML-attribute encoding is not present in the instance, then defaults to UTF-8.

Definition: For character based information the language property specifies the human language of the text.

The HL7 table for human languages is based on RFC 1766, Tags for the Identification of Languages [http://www.isi.edu/in-notes/rfc1766.txt]. It is a set of pre-coordinated pairs of one 2-letter ISO 639 language code and one 2-letter ISO 3166 country code (e.g., en-us [English, United States]).

Language tags do not modify the meaning of the characters found in the text; they are only an advice on if and how to present or communicate the text. For this reason, any system or site that does not deal with multilingual text or names in the real world can safely ignore the language property.

In XML a special language setting can be conveyed with the common XML-attribute xml:lang, which has no specific default value.

Schema Fragment 14:
`<xsd:attribute ref="xml:lang" use="optional"/>`

The XML-attribute xml:lang is defined in as being designed for exactly the same purpose as the language property, and hence, it has been adopted for use in the XML ITS.

Definition: Indicates whether the raw byte data is compressed, and what compression algorithm was used.

Table 6: Domain CompressionAlgorithm:
code	name	definition
DF	deflate	The deflate compressed data format as specified in RFC 1951 [ftp://ftp.isi.edu/in-notes/rfc1951.txt].
GZ	gzip	A compressed data format that is compatible with the widely used GZIP utility as specified in RFC 1952 [ftp://ftp.isi.edu/in-notes/rfc1952.txt] (uses the deflate algorithm.)
ZL	zlib	A compressed data format that also uses the deflate algorithm. Specified as RFC 1950 [ftp://ftp.isi.edu/in-notes/rfc1950.txt]
Z	compress	Original UNIX compress algorithm and file format using the LZC algorithm (a variant of LZW). Patent encumbered and less efficient than deflate.

This component is represented as the XML-attribute compression. There is no compression by default.

Schema Fragment 15:
`<xsd:attribute name="compression" type="cs_CompressionAlgorithm" use="optional"/>`

Schema Fragment 16:
`<xsd:simpleType name="cs_CompressionAlgorithm"> <xsd:restriction base="cs"> <xsd:enumeration value="DF"/> <xsd:enumeration value="GZ"/> <xsd:enumeration value="ZL"/> <xsd:enumeration value="Z"/> </xsd:restriction> </xsd:simpleType>`

Definition: A telecommunication address (TEL), such as a URL for HTTP or FTP, which will resolve to precisely the same binary data that could as well have been provided as inline data.

The semantic value of an encapsulated data value is the same, regardless whether the data is present inline data or just by-reference. However, an encapsulated data value without inline data behaves differently, since any attempt to examine the data requires the data to be downloaded from the reference.

An encapsulated data value may have both inline data and a reference. The reference must point to the same data as provided inline.

Schema Fragment 17:
`<xsd:element name="reference" type="TEL" minOccurs="0" maxOccurs="1"/>`

Receivers must ignore all occurances of the XML-element reference after the first occurance following the first text node.

Definition: The integrity check is a short binary value representing a cryptographically strong checksum that is calculated over the binary data. The purpose of this property, when communicated with a reference is for anyone to validate later whether the reference still resolved to the same data that the reference resolved to when the encapsulated data value with reference was created.

The integrity check is calculated according to the integrity check algorithm. By default, the Secure Hash Algorithm-1 (SHA-1) shall be used. The integrity check is binary encoded according to the rules of the integrity check algorithm.

The integrity check is calculated over the raw binary data that is contained in the data component, or that is accessible through the reference. No transformations are made before the integrity check is calculated. If the data is compressed, the Integrity Check is calculated over the compressed data.

In XML, the integrity check component is represented with the XML-attribute integrityCheck, which has no specific default value.

Schema Fragment 18:
`<xsd:attribute name="integrityCheck" type="bin" use="optional"/>`

When generating instances, applications must base-64 encode the integrity check prior to populating the XML-attribute integrityCheck. When receiving instances, applications must base-64 decode the value of the XML-attribute integrityCheck to obtain the integrity check value.

Definition: Specifies the algorithm used to compute the integrityCheck value.

Table 7: Integrity Check Algorithm (domain = IntegrityCheckAlgorithm)
Name	Code	Description
Secure Hash Algorithm - 1	SHA-1	This algorithm is defined in FIPS PUB 180-1: Secure Hash Standard. As of April 17, 1995.

The integrity check algorithm is represented in XML with the XML-attribute integrityCheckAlgorithm, whose value is (currently) fixed to be "SHA-1" and cannot be changed in an instance.

Schema Fragment 19:
`<xsd:attribute name="integrityCheckAlgorithm" type="cs_IntegrityCheckAlgorithm" use="optional" fixed="SHA-1"/>`

Schema Fragment 20:
`<xsd:simpleType name="cs_IntegrityCheckAlgorithm"> <xsd:restriction base="cs"> <xsd:enumeration value="SHA-1"/> </xsd:restriction> </xsd:simpleType>`

Definition: A thumbnail is an abbreviated rendition of the full data. A thumbnail requires significantly fewer resources than the full data, while still maintaining some distinctive similarity with the full data. A thumbnail is typically used with by-reference encapsulated data. It allows a user to select data more efficiently before actually downloading through the reference.

For example, a large image may be represented by a small image; a high quality audio sequence by a shorter low-quality audio; a movie may be represented by a shorter clip (or just an image); text may be summarized to an abstract.

Thumbnail in XML is represented with the child XML-element thumbnail which is a restriction on ED itself.

Schema Fragment 21:
`<xsd:element name="thumbnail" minOccurs="0" maxOccurs="1"> <xsd:complexType> <xsd:complexContent> <xsd:restriction base="ED"> <xsd:sequence> <xsd:element name="reference" type="TEL" minOccurs="0" maxOccurs="1"/> <xsd:element name="thumbnail" type="ED" minOccurs="0" maxOccurs="0"/> </xsd:sequence> </xsd:restriction> </xsd:complexContent> </xsd:complexType> </xsd:element>`

Receivers must ignore all occurences of the XML-element thumbnail after the first occurance following the first text node.

A more complex example contains a reference to an image, stored at particular URL and available for the next month. An integrity check is provided for the image, as well as in inline thumbnail.

Lastly, we show a Microsoft Word document that has been compressed using the GZip compression algorithm.

Schema Fragment 22:
`<hl7:pr assert="(@nullFlavor or text()) and not(@nullFlavor and text())"/>`

Definition: The character string data type stands for text data, primarily intended for machine processing (e.g., sorting, querying, indexing, etc.) Used for names, symbols, and formal expressions.

A character string must at least have one character or else it is NULL. The length of a string is the number of characters, not the number of encoded bytes. Byte encoding is an ITS issue and is not relevant on the application layer.

The character string (ST) data type interprets the encapsulated data as character data (as opposed to bits), depending on the charset property. In other words, the string S1 "Rose" is equal to the string S2 "Rose" even if S1 is ASCII-encoded (hex '526f7365') and S2 is EBCDIC-encoded (hex 'd996a285').

The HL7 character string data type in XML is essentially just that: the simple schema data type string.

However, because as any HL7 data value can be NULL, when the string is used by HL7 elements other than data type components, the string data is packaged as the text content of a complex XML schema type.

In the normative schema, the complex XML schema type for ST is formally a restriction of the ED data type which corresponds with the conceptualization set forth by the Data Types Abstract Specification. However, essentially we arrive at the ST schema type by disabling or fixing all ED components. The media type is fixed to text/plain, and the data encoding (inherited from BIN) is fixed to TXT. This effectively leaves very simple schema type that only allows one text node or the NULL value shown above. The effect of this form of specializing the ST from the ED is that HL7 instances can always replace a simple ST value instead of an element of ED data type.

Definition: Specifies the encoding of the binary data that is the content of the binary data complex XML schema data type.

This component is inherited from a generalization data type but its use in this specialization is prohibited.