Please note that this is a draft, and may be subject to change.
SAX -- the Simple API for XML -- is a simple, common event-based interface for XML parsers written in object-oriented languages. This document presents a draft specification for the interface, with examples in Java (to be replaced with IDL in a future draft).
The first version of SAX is designed for the finite set of XML applications that require access only to the logical structure of XML documents. These cover a very wide range of applications, including most browsers, formatters, production systems, database tools, search engines, online transaction processors, and meta-data exchange.
There are, however, some important XML applications -- most notably authoring tools and document repositories -- that require access to purely lexical information such as comments and the boundaries of CDATA sections, character references, and internal entity references. SAX has been designed with an open architecture so that, if desired, future versions may add additional types of handlers for this sort of information, but the special needs of these applications are not supported by the current version of the API, since their required information set can be extremely large.
SAX consists of four core interfaces, one for the parser and three for user-supplied event handlers:
In addition to the core interfaces, SAX implementations contain a convenience base class (for deriving handlers) and exception (in languages that support exceptions):
Parser
InterfaceEvery SAX-conformant XML parser (or front-end driver) must implement the following methods:
void setEntityHandler (EntityHandler handler)
EntityHandler
). You must
register the handler before the parse begins. If no handler is
registered, the parser will perform default actions specified under
the description of EntityHandler
.public void setDocumentHandler (DocumentHandler
handler)
DocumentHandler
). You must
register the handler before the parse begins. If no handler is
registered, the parser will perform default actions specified under
the description of DocumentHandler
.public void setErrorHandler (ErrorHandler handler)
ErrorHandler
). You must
register the handler before the parse begins. If no handler is
registered, the parser will perform default actions specified under
the description of ErrorHandler
.public void parse (String publicID, String systemID) throws
Exception
parse
method may throw any exception at all, though except for I/O-related
exceptions, the exception will originate in your own handler code
rather than in the parser.In Java, the parser implements an interface named
org.xml.sax.Parser
; in languages that do no support
interfaces, it may extend an abstract base class.
EntityHandler
InterfaceWhile SAX concentrates on logical structure, there are two areas where a document's physical structure affects general processing:
In the first case, a user might want to substitute a different URI
than the default provided in an XML document, possibly by looking up
the public identifier in a table. In the second case, a user might
want to resolve a relative URI against the URI of the current external
entity. The EntityHandler
interface provides the
following methods:
public String resolveEntity (String entityName, String
publicID, String systemID) throws Exception
public void changeEntity (String systemID) throws
Exception
EntityHandler
: Default BehaviourIf the user does not register an EntityHandler
, the
parser will behave as if the handlers were implemented as follows:
public String resolveEntity (String entityName, String publicID, String systemID) { return systemID; } public void changeEntity (String systemID) {}
DocumentHandler
InterfaceThe DocumentHandler
interface provides most of the
basic functionality of SAX. The parser will inform this interface of
basic XML structural events, such as character data and the start and
end of elements:
public void startDocument () throws Exception
public void endDocument () throws Exception
public void doctype (String name, String publicID, String
systemID) throws Exception
public void startElement (String name, AttributeMap
attributes) throws Exception
startElement
method.public void endElement (String name) throws
Exception
public void characters (char ch[], int start, int length)
throws Exception
characters
method -- if you need to use the characters
elsewhere, you must copy them.public void ignorable (char ch[], int start, int length)
throws Exception
characters
callback). The ch
argument is an array containing the whitespace characters, the
start argument provides the starting offset in the array,
and the length argument provides the number of characters
to read. Note that the ch argument is volatile, and will
provide correct results only during the invocation of the
ignorable
method -- if you need to use the whitespace
characters elsewhere, you must copy them.public void processingInstruction (String target, String
remainder) throws Exception
DocumentHandler
: Default BehaviourIf the user does not register a DocumentHandler
, the
parser will behave as if the handlers were implemented as follows:
public void startDocument () {} public void endDocument () {} public void doctype (String name, String systemID, String publicID) {} public void startElement (String name, AttributeMap attributes) {} public void endElement (String name) {} public void characters (char ch[], int start, int length) {} public void ignorable (char ch[], int start, int length) {} public void processingInstruction (String target, String remainder) {}
ErrorHandler
InterfaceThis interface gives you a chance to implement your own error
handling routines. Upon encountering a fatal error, the behaviour of
parsers (after calling the fatal
handler) is unspecified:
some may attempt to continue parsing normally, some may report errors,
and some may stop parsing altogether.
void warning (String message, String systemID, int line, int
column) throws Exception
void fatal (String message, String systemID, int line, int
column) throws Exception
ErrorHandler
: Default BehaviourIf the user does not register a ErrorHandler
, the
parser print a warning to the standard error stream for
warning
. For fatal
, the parser will throw
an exception of type XmlException
in languages that
support exceptions, or will invoke some other sort of non-local goto in
other languages.
AttributeMap
InterfaceThis interface represents a map of attributes for a single element. It allows you to retrieve the attribute's value, to check for special characteristics (whether it is an entity, notation, ID, or IDREF), and to look up related information if the attribute value is an entity or notation name (applies only to documents with DTDs parsed with a DTD-aware parser).
public Enumeration getAttributeNames ()
Enumeration
data type, return an array or list
of attribute names.public String getValue (String attributeName)
public boolean isEntity (String attributeName)
public boolean isNotation (String attributeName)
public boolean isId (String attributeName)
public boolean isIdref (String attributeName)
public String getEntityPublicID (String
attributeName)
public String getEntitySystemID (String attributeName)
public String getNotationNameID (String attributeName)
public String getNotationPublicID (String attributeName)
public String getNotationSystemID (String attributeName)
XmlException
ClassXmlException
is an exception especially designed for
reporting XML errors (in languages that support exceptions). The
exception encapsulates all of the information provided to handlers in
the ErrorHandler
interface:
public XmlException (String message, String systemID, int
line, int column)
XmlException
. For the
significance of the arguments, see below.public String getMessage()
public String getSystemID()
public int getLine()
public int getColumn()
HandlerBase
ClassHandlerBase
is a convenience base class that provides
default implementations for the EntityHandler
, DocumentHandler
, and ErrorHandler
interfaces, as
specified under each of the implementations. A user can simply extend
this class and override the default behaviour where necessary.
dmeggins@microstar.com
>