![]() |
SgmlQL: elements of the language |
An SgmlQL program, or script, is composed of a series of expressions called top-level queries, which can contain keywords, special symbols, variables and literals, and embedded sub-queries. Top-level queries must be separated by a semi-colon.
Example
top DATE within stdin;
Comments can be inserted at any point in a script. Comments must start with a sharp sign (#) followed by a space. The rest of the line is treated as a comment.
Example
# this extracts all dates in a document given on standard input
The first line of the program can optionally start with the characters #! followed by the location of the MtSgmlQL interpreter. Then you can invoke the script file directly from your shell if you mark the file as executable.
The following example assumes that MtSgmlQL has been installed in the default location /usr/local/bin; if it is installed somewhere else then you will have to modify the line to match.
Example
#!/usr/local/bin/mtsgmlql
This is an example of a typical SgmlQL script:
Example
#!/usr/local/bin/mtsgmlql # this script will search for all (upper-level) divisions # that contain a paragraph matching a European # country name, and output the number of the # section together with the country code global $countries = top COUNTRY within file "countries.sgml" ; select text(first NUM within $d) . "\t" . text($c->CODE) . "\n" from $d in (top DIV within stdin), $c in $countries where exists $p in (every P within $d): text($p) match text($c) ;In interactive mode, queries are typed and evaluated one by one and results are output after each query.
Example
Query[001]>> global $mybook = "myfile-ces.sgml"; Query[002]>> global $divs = top DIV within file $mybook; Query[003]>> global $countries = top COUNTRY within file "countries.sgml" ; Query[003]>> select text(first NUM within $d) . "\t" . text($c->CODE) . "\n" from $d in $divs, $c in $countries where exists $p in (every P within $d): text($p) match text($c) ; 1463/91 DE 1463/91 UK 2579/91 DE 2579/91 NL Query[004]>> quit; Goodbye!
The keywords of the language are the following:
#FROM #PCDATA and as attr() attr: body() body: bye content() content: count() docdtd: dockw: doctype: document element else empty() empty: eq eval() every excluding exec() exists exit FALSE file first forall format: from ge gi() global gt if in into le let ls lt match ne not() or quit replace remove restrict-to select transpose shrink() split() stderr stdin stdout text() then top TRUE where within
Special symbols are as follows:
? " ~ # % ' ( ) * + , - . / : ; < = > [ ] ` { }
!= -> <= == >=
Variables can be of any of the SgmlQL types. There is no type declaration and typing is implicit. Variables can contain lower and upper case and are not case-sensitive. Variable names must begin with a dollar sign ($).
The syntax of user-definable variables is as follows:
$[A-Za-z_][0-9A-Za-z_]*Note that there is no length limitation.
Examples
$n $firstname $div3 $div_qVariables have a scope, also called an environment. The scope of a variable can be global to the all subsequent top-level queries (see section Queries and evaluation", or local to a given query.
Note: In the current version of the MtSgmlQL interpreter, all variables are kept in memory. Some assignments (such as the content of an entire file) may cause memory problems.
A number of special variables are predefined:
name definition $_version MtSgmlQL version number $_authors Authors' name of MtSgmlQL $_copyright MtSgmlQL Copyright information $_lang The working language (default: fr) $_wsd The working charset (default: iso_8859_1) $_interactive** TRUE in interactive mode
FALSE otherwise$_status** TRUE when the last query return normaly
FALSE otherwise$_query** The last query as a #PCDATA $_leasy TRUE if you want a dynamic data type control
FALSE otherwise.$_globals** A list of #PCDATA: all names of global variables $_functions** A list of #PCDATA: all names of user functions $_format default format for output (NSGML, SGMLS or LINES) $#** number of arguments on command line
corresponds to argc in C$0** interpreter name $1** script name
(in non-interactive mode)$2** first argument on command line
(in non-interactive mode)$3** second argument on command line
(in non-interactive mode)... etc. $*** list of all arguments on command line
(in non-interactive mode)**Those variable are readonly and cannot be modified. The value of the other predefined variables can be changed using the global operator.
SqmlQL is a functional language, like Lisp, SQL, etc. All data and program statements are expressions, called queries. A query is either a literal, such as a number, boolean, string, name, etc.; a variable; or an expression built from predefined operators. The arguments of an operator are themselves queries.
Examples:
3;3global $n = 3; $n = 3;TRUEglobal $myfile = "myfile-ces.sgml"; count(top P within file $myfile);17
A query is evaluated in the following way: