MULTEXT - Document MQL2. SgmlQL reference/Data types.

logo

SgmlQL data types



Contents




Introduction

The SgmlQL type system consists of atomic types (numbers, booleans, strings, and names), and complex types (elements, documents, sets, and lists), which are built up from the atomic types.



Atomic types


Numbers

In SgmlQL there is a single numeric type. Numeric literals are specified in the usual floating point or integer formats, according to the following syntax:

[0-9]+("."[0-9]+)?([eE][-+]?[0-9]+)?

Examples

1
1.45
1977
44
.21e02


Booleans

There are two boolean values, TRUE and FALSE (upper case).

Examples

TRUE
FALSE


Strings

Literal strings are enclosed either within single quotes or double quotes, in SGML style. The usual backslash rules apply for representing characters such as newline, tab, etc.:

        \t              tab
        \n              newline
        \r              return
        \f              form feed
        \b              backspace
        \"              double quote
        \'              single quote
        \%              percent sign

The empty string is denoted by "" or ''.

Examples

"Hello"
"Hello\n"
"He said: 'Hello'"
'He said: "Hello"'
""


Names

Names are identifiers which follow the SGML syntax for names, with the following exceptions:

The resulting syntax is :

#?[A-Za-z][\-.0-9A-Za-z]*

Note that there is no length limitation.

Names starting with # are hidden names; all others are visible names. Hidden names are used only within an SgmlQL query; they do not appear in any SGML document--neither a document that is the object of a query, nor one created from the result of a query.

The hidden names #PCDATA and #FROM are predefined.

Examples

HEAD
P
DIV2
AUTHOR.NAME
#PCDATA
#FOO



Complex types

Elements

An element is a triple composed of:

Elements with content string are called pseudo-elements. A (hidden) generic identifier #PCDATA is generated for such elements. Pseudo-elements are treated as strings by operators that require strings as arguments.

Literal elements are enclosed within percent signs: % %.

Examples

%<HEAD>Subject: Energy cooperation: assessment</HEAD>%

%<MEMO>
 <FROM>me</FROM>
 <TO>you</TO>
 <BODY>Hello!</BODY>
 </MEMO>%


When a document is read by SgmlQL, the location of each element in the SGML tree is stored in a hidden attribute on the element itself, called #FROM. The location is represented as a string of numbers separated by dots, where each number indicates the sibling number at the corresponding level of the tree. For example, the location 1.2.1.3.1.2.1 starts at the root of the SGML tree (the initial 1), and descends taking the designated child at each node (i.e., the second child of the root, then the first child of this node, then the third child of this node, etc.). Note that PCDATA is considered as a node in the tree and therefore is given a location. For example, the elements in the following document

<DOCTYPE MEMO SYSTEM "memo.dtd">
<MEMO>
<FROM>me</FROM>
<TO>you</TO>
<BODY>Hello <EMPH>old</EMPH> friend!</BODY>
</MEMO>


have the following locations:


Element Location
<MEMO> 1
<FROM> 1.1
me 1.1.1\1
<TO> 1.2
you 1.2.1\1
<BODY> 1.3
Hello 1.3.1\1
<EMPH> 1.3.2
old 1.3.2.1\1
friend! 1.3.3\1


Documents

A document is a pair composed of:

Literal documents are enclosed within percent signs: % %.

Example

%<DOCTYPE MEMO SYSTEM "memo.dtd">
 <MEMO>
 <FROM>me</FROM>
 <TO>you</TO>
 <BODY>Hello!</BODY>
 </MEMO>%


Sets

Sets can contain members of any atomic type. Sets must be encoded within curly brackets, their members being separated by commas ({ ... , ... , ... }). The empty set is denoted by {}. Note that atomic objects are treated as sets of cardinality one.

Examples

{3, "hello", DIV}
{}

Special types of sets are associative sets, in which each member is a list of two elements, a key and a datum. In the current version, only attribute-value sets are implemented.

An attribute-value set, or atvset, is an associative set, of which each member is composed of:

Literal atvsets are enclosed within curly brackets. Attribute names and values are separated by an equal sign.

Examples

{TYPE="section" , N="4" , ID="s4"}
{}

The datum associated with a given key is denoted by the sign ->.

Example

$d->TYPE

returns: section


Lists

A list is an ordered set. Lists can contain members of any type but lists and sets.

Lists must be enclosed within [ ], their members being separated by commas ([ ... , ... , ... ]). The empty list is denoted by []. Note that atomic objects are treated as lists of length one.

Examples

[3, "hello", DIV]
[]

Special types of lists are element lists whose members are elements, possibly including pseudo-elements, in which case the list is a mixed content list. However, two pseudo-elements cannot be adjacent in a mixed content list.

Element lists can be written either using the notation above, or they can be written within % %, as for elements. When the % % notation is used, the content follows the format of SGML element content (i.e., no commas between elements, no quotes around strings, etc.).

Examples

[ %<FROM>me</FROM>% , %<TO>you</TO>% ]

[ "He was born on ", %<DATE>15 May 1950</DATE>%, " in New York" ]

[]

are equivalent respectively to:

%<FROM>me</FROM><TO>you</TO>%

%He was born on <DATE>15 May 1950</DATE> in New York%

%%





| Top | Next | SgmlQL reference | LPL/CNRS | MULTEXT

Copyright © Centre National de la Recherche Scientifique, 1997.