<?xml version="1.0" encoding="UTF-8"?>
<section version="5.0" xml:id="sax" xmlns="http://docbook.org/ns/docbook"
         xmlns:xlink="http://www.w3.org/1999/xlink"
         xmlns:xi="http://www.w3.org/2001/XInclude"
         xmlns:svg="http://www.w3.org/2000/svg"
         xmlns:m="http://www.w3.org/1998/Math/MathML"
         xmlns:html="http://www.w3.org/1999/xhtml"
         xmlns:db="http://docbook.org/ns/docbook">
  <title>XML APIs, the Simple API for XML (SAX)</title>

  <section xml:id="sda1SaxRecommendedReading">
    <title>Recommended reading</title>

    <itemizedlist>
      <listitem>
        <para><link
        xlink:href="https://www.ibm.com/developerworks/xml/tutorials/x-usax/x-usax.html">Understanding
        SAX</link></para>
      </listitem>

      <listitem>
        <para>Sections <link
        xlink:href="http://tutorials.jenkov.com/java-xml/">1</link> till <link
        xlink:href="http://tutorials.jenkov.com/java-xml/sax-example.html">6</link>
        from the <link xlink:href="http://tutorials.jenkov.com/java-xml/">Java
        &amp; XML Tutorial</link>.</para>
      </listitem>
    </itemizedlist>

    <para>Try to answer the following question: Why do developers sometimes
    derive a handler from <classname
    xlink:href="https://docs.oracle.com/javase/10/docs/api/org/xml/sax/helpers/DefaultHandler.html">DefaultHandler</classname>
    and why do they sometimes prefer implementing <link
    xlink:href="https://docs.oracle.com/javase/10/docs/api/org/xml/sax/ContentHandler.html">ContentHandler</link>?</para>
  </section>

  <section xml:id="saxPrinciple">
    <title>The principle of a <acronym
    xlink:href="http://www.saxproject.org">SAX</acronym> application</title>

    <para>We are already familiar with transformations of XML document
    instances to other formats. Sometimes the capabilities being offered by a
    given transformation approach do not suffice for a given problem.
    Obviously a general purpose programming language like <xref
    linkend="glo_Java"/> offers superior means to perform advanced
    manipulations of XML document trees.</para>

    <para>Before diving into technical details we present an example exceeding
    the limits of our present transformation capabilities. We want to format
    an XML catalog document with article descriptions to HTML. The price
    information however shall resides in a XML document external database
    namely a RDBMS:</para>

    <figure xml:id="saxRdbmsAccessPrinciple">
      <title>Generating HTML from a XML document and an RDBMS.</title>

      <mediaobject>
        <imageobject>
          <imagedata fileref="Ref/Fig/saxxmlrdbms.fig" scale="65"/>
        </imageobject>
      </mediaobject>
    </figure>

    <para>Our catalog might look like:</para>

    <figure xml:id="simpleCatalog">
      <title>A <xref linkend="glo_XML"/> based catalog.</title>

      <programlisting language="xml">&lt;catalog&gt;
  &lt;item orderNo="<emphasis role="bold">3218</emphasis>"&gt;Swinging headset&lt;/item&gt;
  &lt;item orderNo="<emphasis role="bold">9921</emphasis>"&gt;200W Stereo Amplifier&lt;/item&gt;
&lt;/catalog&gt;</programlisting>
    </figure>

    <para>The RDBMS may hold some relation with a field <code>orderNo</code>
    as primary key and a corresponding attribute like <code>price</code>. In a
    real world application <code>orderNo</code> should probably be an integer
    typed <code>IDENTITY</code> attribute.</para>

    <figure xml:id="saxRdbmsSchema">
      <title>A Relation containing price information.</title>

      <programlisting language="sql">CREATE TABLE Product (
  orderNo CHAR(10) PRIMARY KEY
 ,price Money
)

INSERT INTO Product VALUES('<emphasis role="bold">3218</emphasis>', 42.57)
INSERT INTO Product VALUES('<emphasis role="bold">9921</emphasis>', 121.50)</programlisting>

      <caption>
        <para>Prices are depending on article numbers.</para>
      </caption>
    </figure>

    <para>The intended HTML output with order numbers being highlighted looks
    like:</para>

    <figure xml:id="saxPriceOut">
      <title>HTML generated output.</title>

      <programlisting language="xml">&lt;!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"&gt;
        &lt;html&gt;
          &lt;head&gt;&lt;title&gt;Available products&lt;/title&gt;&lt;/head&gt;
          &lt;body&gt;
            &lt;table border="1"&gt;
              &lt;tbody&gt;
                &lt;tr&gt;
                  &lt;th&gt;<emphasis role="bold">Order number</emphasis>&lt;/th&gt;
                  &lt;th&gt;Price&lt;/th&gt;
                  &lt;th&gt;Product&lt;/th&gt;
                &lt;/tr&gt;
                &lt;tr&gt;
                  &lt;td&gt;<emphasis role="bold">3218</emphasis>&lt;/td&gt;
                  &lt;td&gt;42,57&lt;/td&gt;
                  &lt;td&gt;Swinging headset&lt;/td&gt;
                &lt;/tr&gt;
                &lt;tr&gt;
                  &lt;td&gt;<emphasis role="bold">9921</emphasis>&lt;/td&gt;
                  &lt;td&gt;121,50&lt;/td&gt;
                  &lt;td&gt;200W Stereo Amplifier&lt;/td&gt;
                &lt;/tr&gt;
              &lt;/tbody&gt;
            &lt;/table&gt;
          &lt;/body&gt;
        &lt;/html&gt;</programlisting>

      <caption>
        <para>This result HTML document contains content both from our XML
        document an from the database table <code>Product</code>.</para>
      </caption>
    </figure>

    <para>The intended transformation is beyond the XSLT standard's processing
    capabilities: XSLT does not enable us to RDBMS content. However some XSLT
    processors provide extensions for this task.</para>

    <para>It is tempting to write a <xref linkend="glo_Java"/> application
    which might use e.g. <trademark
    xlink:href="https://en.wikipedia.org/wiki/Java_Database_Connectivity">JDBC</trademark>
    for database access. But how do we actually read and parse a XML file?
    Sticking to the <xref linkend="glo_Java"/> standard we might use a <link
    xlink:href="https://docs.oracle.com/javase/10/docs/api/java/io/InputStream.html">FileInputStream</link>
    instance to read from <code>catalog.xml</code> and write a XML parser by
    ourself. Fortunately <orgname>SUN</orgname>'s <trademark
    xlink:href="http://www.oracle.com/technetwork/java/javase">JDK</trademark>
    already includes an API denoted <acronym
    xlink:href="http://www.saxproject.org">SAX</acronym>, the
    <emphasis>S</emphasis>imple <emphasis>A</emphasis>pi for
    <emphasis>X</emphasis>ml. The<productname
    xlink:href="http://www.oracle.com/technetwork/java/javase/jdk-7-readme-429198.html">JDK</productname>
    also includes a corresponding parser implementation. In addition there are
    third party <acronym xlink:href="http://www.saxproject.org">SAX</acronym>
    parser implementations available like <productname
    xlink:href="https://xerces.apache.org">Xerces</productname> from the
    <orgname xlink:href="https://www.apache.org">Apache
    Foundation</orgname>.</para>

    <para>The <acronym xlink:href="http://www.saxproject.org">SAX</acronym>
    API is event based and will be illustrated by the relationship between
    customers and a software vendor company:</para>

    <mediaobject>
      <imageobject>
        <imagedata fileref="Ref/Fig/updateinfo.fig"/>
      </imageobject>
    </mediaobject>

    <para>After purchasing software customers are asked to register their
    software. This way the vendor receives the customer's address. Each time a
    new release is being completed all registered customers will receive a
    notification typically including a <quote>special offer</quote> to upgrade
    their software. From an abstract point of view the following two actions
    take place:</para>

    <variablelist>
      <varlistentry>
        <term>Registration</term>

        <listitem>
          <para>The customer registers itself at the company's site indicating
          it's interest in updated versions.</para>
        </listitem>
      </varlistentry>

      <varlistentry>
        <term>Notification</term>

        <listitem>
          <para>Upon completion of each new software release (considered to be
          an <emphasis>event</emphasis>) a message is sent to all registered
          customers.</para>
        </listitem>
      </varlistentry>
    </variablelist>

    <para>The same principle applies to GUI applications in software
    development. A key press <emphasis>event</emphasis> for example will be
    forwarded by an application's <emphasis>event handler</emphasis> to a
    callback function (sometimes called a <emphasis>handler</emphasis> method)
    being implemented by an application developer. The <acronym
    xlink:href="http://www.saxproject.org">SAX</acronym> API works the same
    way: A parser reads a XML document generating events which
    <emphasis>may</emphasis> be handled by an application. During document
    parsing the XML tree structure gets <quote>flattened</quote> to a sequence
    of events:</para>

    <figure xml:id="saxFlattenEvent">
      <title>Parsing a XML document creates a corresponding sequence of
      events.</title>

      <mediaobject>
        <imageobject>
          <imagedata fileref="Ref/Fig/saxmodel.pdf"/>
        </imageobject>
      </mediaobject>
    </figure>

    <para>An application may register components to the parser:</para>

    <figure xml:id="figureSax">
      <title><acronym xlink:href="http://www.saxproject.org">SAX</acronym>
      Principle</title>

      <mediaobject>
        <imageobject>
          <imagedata fileref="Ref/Fig/saxapparch.pdf"/>
        </imageobject>

        <caption>
          <para>A <acronym
          xlink:href="http://www.saxproject.org">SAX</acronym> application
          consists of a <acronym
          xlink:href="http://www.saxproject.org">SAX</acronym> parser and an
          implementation of event handlers being specific to the application.
          The application is developed by implementing the two
          handlers.</para>
        </caption>
      </mediaobject>
    </figure>

    <para>An Error Handler is required since the XML stream may contain
    errors. In order to implement a <acronym
    xlink:href="http://www.saxproject.org">SAX</acronym> application we have
    to:</para>

    <orderedlist>
      <listitem>
        <para>Instantiate required objects:</para>

        <itemizedlist>
          <listitem>
            <para>Parser</para>
          </listitem>

          <listitem>
            <para>Event Handler</para>
          </listitem>

          <listitem>
            <para>Error Handler</para>
          </listitem>
        </itemizedlist>
      </listitem>

      <listitem>
        <para>Register handler instances</para>

        <itemizedlist>
          <listitem>
            <para>register Event Handler to Parser</para>
          </listitem>

          <listitem>
            <para>register Error Handler to Parser</para>
          </listitem>
        </itemizedlist>
      </listitem>

      <listitem>
        <para>Start the parsing process by calling the parser's appropriate
        method.</para>
      </listitem>
    </orderedlist>
  </section>

  <section xml:id="saxIntroExample">
    <title>First steps</title>

    <para>Our first <acronym
    xlink:href="http://www.saxproject.org">SAX</acronym> toy application
    <classname>sax.stat.v1.ElementCount</classname> shall simply count the
    number of elements it finds in an arbitrary XML document. In addition the
    <acronym xlink:href="http://www.saxproject.org">SAX</acronym> events shall
    be written to standard output generating output sketched in <xref
    linkend="saxFlattenEvent"/>. The application's central implementation
    reads:</para>

    <figure xml:id="saxElementCount">
      <title>Counting XML elements.</title>

      <programlisting language="java">package sax.stat.v1;
...

public class ElementCount {

  public void parse(final String uri) {
    try {
      final SAXParserFactory saxPf = SAXParserFactory.newInstance();
      final SAXParser saxParser = saxPf.newSAXParser();
      saxParser.parse(uri, eventHandler);
    } catch (ParserConfigurationException e){
      e.printStackTrace(System.err);
    } catch (org.xml.sax.SAXException e) {
      e.printStackTrace(System.err);
    } catch (IOException e){
      e.printStackTrace(System.err);
    }
  }

  public int getElementCount() {
    return eventHandler.getElementCount();
  }
  private final MyEventHandler eventHandler = new MyEventHandler();
}</programlisting>

      <caption>
        <para>This application works for arbitrary well-formed XML
        documents.</para>
      </caption>
    </figure>

    <para>We now explain this application in detail. The first part deals with
    the instantiation of a parser:</para>

    <programlisting language="java">try {
   final SAXParserFactory saxPf = <emphasis role="bold">SAXParserFactory</emphasis>.newInstance();
   final SAXParser saxParser = saxPf.newSAXParser();
   saxParser.parse(uri, eventHandler);
} catch (ParserConfigurationException e){
   e.printStackTrace(System.err);
} ...</programlisting>

    <para>In order to keep an application independent from a specific parser
    implementation the <acronym
    xlink:href="http://www.saxproject.org">SAX</acronym> uses the so called
    <link
    xlink:href="http://www.dofactory.com/Patterns/PatternAbstract.aspx">Abstract
    Factory Pattern</link> instead of simply calling a constructor from a
    vendor specific parser class.</para>

    <para>In order to be useful the parser has to be instructed to do
    something meaningful when a XML document gets parsed. For this purpose our
    application supplies an event handler instance:</para>

    <programlisting language="java">public void parse(final String uri) {
  try {
    final SAXParserFactory saxPf = SAXParserFactory.newInstance();
    final SAXParser saxParser = saxPf.newSAXParser();
    saxParser.parse(uri, <emphasis role="bold">eventHandler</emphasis>);
  } catch (org.xml.sax.SAXException e) {
 ...
  private final MyEventHandler <emphasis role="bold">eventHandler = new MyEventHandler()</emphasis>;
}</programlisting>

    <para>What does the event handler actually do? It offers methods to the
    parser being callable during the parsing process:</para>

    <programlisting language="java">package sax.stat.v1;
...
public class MyEventHandler extends <classname>org.xml.sax.helpers.DefaultHandler</classname> {

  public void <emphasis role="bold"><emphasis role="bold">startDocument()</emphasis></emphasis><co
        xml:id="programlisting_eventhandler_startDocument"/> {
    System.out.println("Opening Document");
  }
  public void <emphasis role="bold">endDocument()</emphasis><co
        xml:id="programlisting_eventhandler_endDocument"/> {
    System.out.println("Closing Document");
  }
  public void <emphasis role="bold">startElement(String namespaceUri, String localName, String rawName,
                     Attributes attrs)</emphasis> <co
        xml:id="programlisting_eventhandler_startElement"/>{
    System.out.println("Opening \"" + rawName + "\"");
    elementCount++;
  }
  public void <emphasis role="bold">endElement(String namespaceUri, String localName,
    String rawName)</emphasis><co
        xml:id="programlisting_eventhandler_endElement"/>{
    System.out.println("Closing \"" + rawName + "\"");
  }
  public void <emphasis role="bold">characters(char[] ch, int start, int length)</emphasis><co
        xml:id="programlisting_eventhandler_characters"/>{
    System.out.println("Content \"" + new String(ch, start, length) + '"');
  }
  public int getElementCount() <co
        xml:id="programlisting_eventhandler_getElementCount"/>{
    return elementCount;
  }
  private int elementCount = 0;
}</programlisting>

    <calloutlist>
      <callout arearefs="programlisting_eventhandler_startDocument">
        <para>This method gets called exactly once namely when opening the XML
        document as a whole.</para>
      </callout>

      <callout arearefs="programlisting_eventhandler_endDocument">
        <para>After successfully parsing the whole document instance this
        method will finally be called.</para>
      </callout>

      <callout arearefs="programlisting_eventhandler_startElement">
        <para>This method gets called each time a new element is parsed. In
        the given catalog.xml example it will be called three times: First
        when the <tag class="starttag">catalog</tag> appears and then two
        times upon each &lt;item ... &gt;. The supplied parameters depend
        whether or not name space processing is enabled.</para>
      </callout>

      <callout arearefs="programlisting_eventhandler_endElement">
        <para>Called each time an element like <tag class="starttag">item
        ...</tag> gets closed by its counterpart <tag
        class="endtag">item</tag>.</para>
      </callout>

      <callout arearefs="programlisting_eventhandler_characters">
        <para>This method is responsible for the treatment of textual content
        i.e. handling <code>#PCDATA</code> element content. We will explain
        its uncommon signature a little bit later.</para>
      </callout>

      <callout arearefs="programlisting_eventhandler_getElementCount">
        <para><function>getElementCount()</function> is a getter method to
        read only access the private field <varname>elementCount</varname>
        which gets incremented in <coref
        linkend="programlisting_eventhandler_startElement"/> each time an XML
        element opens.</para>
      </callout>
    </calloutlist>

    <para>The call <code>saxParser.parse(uri, eventHandler)</code> actually
    initiates the parsing process and tells the parser to:</para>

    <itemizedlist>
      <listitem>
        <para>Open the XML document being referenced by the URI
        argument.</para>
      </listitem>

      <listitem>
        <para>Forward XML events to the event handler instance supplied by the
        second argument.</para>
      </listitem>
    </itemizedlist>

    <para>A driver class containing a <code>main(...)</code> method may start
    the whole process and print out the desired number of elements upon
    completion of a parsing run:</para>

    <programlisting language="java">package sax.stat.v1;

public class ElementCountDriver {
  public static void main(String argv[]) {
    ElementCount xmlStats = new ElementCount();
    xmlStats.parse("<emphasis role="bold">Input/Sax/catalog.xml</emphasis>");
    System.out.println("Document contains " + xmlStats.<emphasis role="bold">getElementCount()</emphasis> + " elements");
  }
}</programlisting>

    <para>Processing the catalog example instance yields:</para>

    <screen>Opening Document
<emphasis role="bold">Opening "catalog"</emphasis> <co
        xml:id="programlisting_catalog_output"/>
Content "
  "
<emphasis role="bold">Opening "item"</emphasis> <co
        xml:id="programlisting_catalog_item1"/>
Content "Swinging headset"
Closing "item"
Content "
  "
<emphasis role="bold">Opening "item"</emphasis>  <co
        xml:id="programlisting_catalog_item2"/>
Content "200W Stereo Amplifier"
Closing "item"
Content "
"
Closing "catalog"
Closing Document
<emphasis role="bold">Document contains 3 elements</emphasis> <co
        xml:id="programlisting_catalog_elementcount"/></screen>

    <calloutlist>
      <callout arearefs="programlisting_catalog_output">
        <para>Start parsing element <tag
        class="starttag">catalog</tag>.</para>
      </callout>

      <callout arch="" arearefs="programlisting_catalog_item1">
        <para>Start parsing element <tag class="starttag">item
        orderNo="3218"</tag>Swinging headset<tag class="endtag"
        role="">item</tag>.</para>
      </callout>

      <callout arch="" arearefs="programlisting_catalog_item2">
        <para>Start parsing element <tag class="starttag">item
        orderNo="9921"</tag>200W Stereo Amplifier<tag class="endtag"
        role="">item</tag>.</para>
      </callout>

      <callout arearefs="programlisting_catalog_elementcount">
        <para>After the parsing process has completed the application outputs
        the number of elements being counted so far.</para>
      </callout>
    </calloutlist>

    <para>The output contains some lines of <quote>empty</quote> content. This
    content is due to whitespace being located between elements. For example a
    newline appears between the the <tag class="starttag">catalog</tag> and
    the first <tag class="starttag">item</tag> element. The parser
    encapsulates this whitespace in a call to the <methodname
    xlink:href="https://docs.oracle.com/javase/10/docs/api/org/xml/sax/ContentHandler.html#characters(char%5B%5D,int,int)">characters()</methodname>
    method. In an application this call will typically be ignored. XML
    document instances in a professional context will typically not contain
    any newline characters at all. Instead the whole document is represented
    as a single line. This inhibits human readability which is not required if
    the processing applications work well. In this case empty content as above
    will not appear.</para>

    <para>The <code>characters(char[] ch, int start, int length)</code>
    method's signature looks somewhat strange regarding <xref
    linkend="glo_Java"/> conventions. One might expect <code>characters(String
    s)</code>. But this way the <acronym
    xlink:href="http://www.saxproject.org">SAX</acronym> API allows efficient
    parser implementations: A parser may initially allocate a reasonable large
    <code>char</code> array of say 128 bytes sufficient to hold 64 (<link
    xlink:href="http://unicode.org">Unicode</link>) characters. If this buffer
    gets exhausted the parser might allocate a second buffer of double size
    thus implementing an <quote>amortized doubling</quote> algorithm:</para>

    <mediaobject>
      <imageobject>
        <imagedata fileref="Ref/Fig/saxcharacter.pdf"/>
      </imageobject>
    </mediaobject>

    <para>In this example the first element content fits in the first buffer.
    The second content <code>200W Stereo Amplifier</code> and the third
    content <code>Earphone</code> both fit in the second buffer. Subsequent
    content may require further buffer allocations. Such a strategy minimizes
    the number of time consuming <code>new </code> <classname
    xlink:href="https://docs.oracle.com/javase/10/docs/api/java/lang/String.html">String</classname>
    <code>(...)</code> constructor calls being necessary for the more
    convenient API variant <code>characters(String s)</code>.</para>
  </section>

  <section xml:id="saxRegistry">
    <title>Event- and error handler registration</title>

    <para>Our first <acronym
    xlink:href="http://www.saxproject.org">SAX</acronym> application suffers
    from the following deficiencies:</para>

    <itemizedlist>
      <listitem>
        <para>The error handling is very sparse. It completely relies on
        exceptions being thrown by classes like <classname
        xlink:href="https://docs.oracle.com/javase/10/docs/api/org/xml/sax/SAXException.html">SAXException</classname>
        which frequently do not supply meaningful error information.</para>
      </listitem>

      <listitem>
        <para>The application is not aware of namespaces. Thus reading e.g.
        <abbrev xlink:href="https://www.w3.org/Style/XSL">XSL</abbrev>
        document instances will not allow to distinguish between elements from
        different namespaces like HTML.</para>
      </listitem>

      <listitem>
        <para>The parser will not validate a document instance against a
        schema being present.</para>
      </listitem>
    </itemizedlist>

    <para>We now incrementally add these features to the <acronym
    xlink:href="http://www.saxproject.org">SAX</acronym> parsing process.
    <acronym xlink:href="http://www.saxproject.org">SAX</acronym> offers an
    interface <link
    xlink:href="https://docs.oracle.com/javase/10/docs/api/org/xml/sax/XMLReader.html">XmlReader</link>
    to conveniently <emphasis>register</emphasis> event- and error handler
    instances independently instead of passing both interfaces as a single
    argument to the <link
    xlink:href="https://docs.oracle.com/javase/10/docs/api/javax/xml/parsers/SAXParser.html#parse(java.io.File,org.xml.sax.helpers.DefaultHandler)">parse()</link>
    method. We first code an error handler class by implementing the interface
    <classname>org.xml.sax.ErrorHandler</classname> being part of the <acronym
    xlink:href="http://www.saxproject.org">SAX</acronym> API:</para>

    <programlisting language="java">package sax.stat.v2;
...
public class MyErrorHandler implements ErrorHandler {

  <emphasis role="bold">public void warning(SAXParseException e)</emphasis> {
    System.err.println("[Warning]" + getLocationString(e));
  }
  <emphasis role="bold">public void error(SAXParseException e)</emphasis> {
    System.err.println("[Error]" + getLocationString(e));
  }
  <emphasis role="bold">public void fatalError(SAXParseException e)</emphasis> throws SAXException{
    System.err.println("[Fatal Error]" + getLocationString(e));
  }
  private String getLocationString(SAXParseException e) {
    return " line " + e.getLineNumber() +
    ", column " + e.getColumnNumber()+ ":" +  e.getMessage();
  }
}</programlisting>

    <para>These three methods represent the
    <classname>org.xml.sax.ErrorHandler</classname> interface. The method
    <function>getLocationString</function> is used to supply precise parsing
    error locations by means of line- and column numbers within a document
    instance. If errors or warnings are encountered the parser will call one
    of the appropriate public methods:</para>

    <figure xml:id="saxMissItem">
      <title>A non well formed document.</title>

      <programlisting language="xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;catalog&gt;
  &lt;item orderNo="3218"&gt;Swinging headset&lt;/item&gt;
  &lt;item orderNo="9921"&gt;200W Stereo Amplifier
&lt;/catalog&gt;</programlisting>

      <caption>
        <para>This document is not well formed since due to a missing a
        closing <tag class="endtag">item</tag> tag is missing.</para>
      </caption>
    </figure>

    <para>Our error handler method gets called yielding an informative
    message:</para>

    <screen>[Fatal Error] line 5, column -1:Expected "&lt;/item&gt;" to terminate
element starting on line 4.</screen>

    <para>This error output is achieved by <emphasis>registering</emphasis> an
    instance of <classname>sax.stat.v2.MyErrorHandler</classname> to the
    parser prior to starting the parsing process. In the following code
    snippet we also register a content handler instance to the parser and thus
    separate the parser's configuration from its invocation:</para>

    <programlisting language="java">package sax.stat.v2;
...
public class ElementCount {
  public ElementCount()
   throws SAXException, ParserConfigurationException{
      final SAXParserFactory saxPf = SAXParserFactory.newInstance();
      final SAXParser saxParser = saxPf.newSAXParser();
      xmlReader = saxParser.getXMLReader();
      xmlReader.setContentHandler(eventHandler); <co
        xml:id="programlisting_assemble_parser_setcontenthandler"/>
      xmlReader.setErrorHandler(errorHandler); <co
        xml:id="programlisting_assemble_parser_seterrorhandler"/>
  }
  public void parse(final String uri)
    throws IOException, SAXException{
    xmlReader.parse(uri); <co
        xml:id="programlisting_assemble_parser_invokeparse"/>
  }
  public int getElementCount() {
    return eventHandler.getElementCount(); <co
        xml:id="programlisting_assemble_parser_getelementcount"/>
  }
  private final XMLReader xmlReader;
  private final MyEventHandler eventHandler = new MyEventHandler(); <co
        xml:id="programlisting_assemble_parser_createeventhandler"/>
  private final MyErrorHandler errorHandler = new MyErrorHandler(); <co
        xml:id="programlisting_assemble_parser_createerrorhandler"/>
}</programlisting>

    <calloutlist>
      <callout arearefs="programlisting_assemble_parser_setcontenthandler programlisting_assemble_parser_seterrorhandler">
        <para>Referring to <xref linkend="figureSax" os=""/> these two calls
        attach the event- and error handler objects to the parser thus
        implementing the two arrows from the parser to the application's
        implementation.</para>
      </callout>

      <callout arearefs="programlisting_assemble_parser_invokeparse">
        <para>The parser is invoked. Note that in this example we only pass a
        document's URI but no reference to a handler object.</para>
      </callout>

      <callout arearefs="programlisting_assemble_parser_getelementcount">
        <para>The method <function>getElementCount()</function> is needed to
        allow a calling object to access the private
        <varname>eventHandler</varname> object's
        <function>getElementCount()</function> method.</para>
      </callout>

      <callout arearefs="programlisting_assemble_parser_createeventhandler programlisting_assemble_parser_createerrorhandler">
        <para>An event handling and an error handling object are created to
        handle events during the parsing process.</para>
      </callout>
    </calloutlist>

    <para>The careful reader might notice a subtle difference between the
    content- and the error handler implementation: The class
    <classname>sax.stat.v2.MyErrorHandler</classname> implements the interface
    <classname>org.xml.sax.ErrorHandler</classname>. But
    <classname>sax.stat.v2.MyEventHandler</classname> is derived from
    <classname>org.xml.sax.helpers.DefaultHandler</classname> which itself
    implements the <classname>org.xml.sax.ContentHandler</classname>
    interface. Actually one might as well start from the latter interface
    requiring to implement all of it's 11 methods. In most circumstances this
    only complicates the application's code since it is unnecessary to react
    to events belonging for example to processing instructions. For this
    reason it is good coding practice to use the empty default implementations
    in <classname>org.xml.sax.helpers.DefaultHandler</classname> and to
    redefine only those methods corresponding to events actually being handled
    by the application in question.</para>

    <qandaset defaultlabel="qanda" xml:id="sda1SaxReadAttributes">
      <title>SAX and attribute values</title>

      <qandadiv>
        <qandaentry>
          <question>
            <label>Reading an element's set of attributes.</label>

            <para>The example document instance does include <tag
            class="attribute">orderNo</tag> attribute values for each <tag
            class="starttag">item</tag> element. The parser does not yet show
            these attribute keys and their corresponding values. Read the
            documentation for <classname
            xlink:href="https://docs.oracle.com/javase/10/docs/api/org/xml/sax/Attributes.html">org.xml.sax.Attributes</classname>
            and extend the given code to use it.</para>

            <para>You should start from the <xref linkend="glo_MIB"/> Maven
            archetype <code>mi-maven-archetype-sax</code>. Configuration hints
            are available at <xref linkend="sd1_sect_idea"/>.</para>
          </question>

          <answer>
            <para>For the given example it would suffice to read the known
            <tag class="attribute">orderNo</tag> attributes value. A generic
            solution may ask for the set of all defined attributes and show
            their values:</para>

            <programlisting language="java">package sax;

public class AttribEventHandler extends DefaultHandler {

  public void startElement(String namespaceUri, String localName,
      String rawName, Attributes attrs) {
    System.out.println("Opening Element " + rawName);
    for (int i = 0; i &lt; attrs.getLength(); i++){
      System.out.println(attrs.getQName(i) + "=" + attrs.getValue(i) + "\n");
    }
  }
}</programlisting>
          </answer>
        </qandaentry>
      </qandadiv>
    </qandaset>

    <section xml:id="sda1SecElementLists">
      <title>The set of element names</title>

      <qandaset defaultlabel="qanda" xml:id="sda1QandaElementNames">
        <title>Element lists of arbitrary XML documents.</title>

        <qandadiv>
          <qandaentry>
            <question>
              <para>We reconsider the simple application reading arbitrary XML
              documents and providing a list of XML Elements being contained
              within:</para>

              <screen>Opening Document
<emphasis role="bold">Opening "catalog"</emphasis>
Content "
  "
<emphasis role="bold">Opening "item"</emphasis>
Content "Swinging headset"
Closing "item"
Content " ...</screen>

              <para>If an element like e.g. <tag class="starttag">item</tag>
              appears multiple times it will also be written to standard
              output multiple times.</para>

              <para>We are now interested to get the list of all elements
              names being present in an arbitrary XML document. Consider the
              following example:</para>

              <programlisting language="xml">&lt;memo&gt;
  &lt;from&gt;
    &lt;name&gt;Martin&lt;/name&gt;
    &lt;surname&gt;Goik&lt;/surname&gt;
  &lt;/from&gt;
  &lt;to&gt;
    &lt;name&gt;Adam&lt;/name&gt;
    &lt;surname&gt;Hacker&lt;/surname&gt;
  &lt;/to&gt;
  &lt;to&gt;
    &lt;name&gt;Eve&lt;/name&gt;
    &lt;surname&gt;Intruder&lt;/surname&gt;
  &lt;/to&gt;
  &lt;date year="2005" month="1" day="6"/&gt;
  &lt;subject&gt;Firewall problems&lt;/subject&gt;
  &lt;content&gt;
    &lt;para&gt;Thanks for your excellent work.&lt;/para&gt;
    &lt;para&gt;Our firewall is definitely broken!&lt;/para&gt;
  &lt;/content&gt;
&lt;/memo&gt;</programlisting>

              <para>The elements <tag class="starttag">to</tag> , <tag
              class="starttag">name</tag>, <tag class="starttag">surname</tag>
              and <tag class="starttag">para</tag> both appear multiple times.
              Write a SAX application which processes arbitrary XML documents
              and creates an alphabetically sorted list of elements being
              contained <emphasis role="bold">excluding duplicates</emphasis>.
              The intended output for the above example is:</para>

              <screen>List of elements: {content date from memo name para subject surname to }</screen>

              <para>The corresponding handler should be implemented in a
              re-usable way. Thus if different XML documents are being handled
              in succession the list of elements should be erased prior to
              processing the current document. Hints:</para>

              <itemizedlist>
                <listitem>
                  <para>Use a <classname>java.util.SortedSet</classname>
                  instance to collect element names thereby excluding
                  duplicates.</para>
                </listitem>

                <listitem>
                  <para>The method
                  <methodname>sax.count.ListTagNamesHandler.startDocument()</methodname>
                  may be used to initialize your handler.</para>
                </listitem>
              </itemizedlist>
            </question>

            <answer>
              <para>A suitable handler reads:</para>

              <programlisting language="java">package sax.count;

import java.util.SortedSet;
import java.util.TreeSet;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

/** Reading attributes from element events */
public class ListTagNamesHandler extends DefaultHandler {

  // A SortedSet by definition does not contain any duplicates.
  private SortedSet&lt;String&gt; elementNames = new TreeSet&lt;&gt;();

  @Override
  public void startDocument() throws SAXException {
    elementNames.clear(); // May contain elements from a previous run.
  }

  public void startElement(String namespaceUri, String localName,
      String rawName, Attributes attrs) {
    // In case the current element name has already been inserted
    // this method call will be silently ignored.
    elementNames.add(rawName);
  }

  /**
   * @return A sorted list of element names of he currently processed XML
   *         document without duplicates.
   */
  public String[] getTagNames() {
    return elementNames.toArray(new String[0]);
  }
}</programlisting>

              <para>A complete application requires a driver:</para>

              <programlisting language="java">package sax.count;

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.XMLReader;

import sax.stat.v2.MyErrorHandler;

public class Driver {

  public static void main(String argv[]) throws Exception {

    final SAXParserFactory saxPf = SAXParserFactory.newInstance();
    final SAXParser saxParser = saxPf.newSAXParser();
    final XMLReader xmlReader = saxParser.getXMLReader();
    final ListTagNamesHandler handler = new ListTagNamesHandler();
    xmlReader.setContentHandler(handler);
    xmlReader.setErrorHandler(new MyErrorHandler());
    xmlReader.parse("Input/Xml/Memo/message.xml");

    System.out.print("List of elements: {");
    for (String elementName : handler.getTagNames()) {
      System.out.print(elementName + " ");
    }
    System.out.println("}");
  }
}</programlisting>
            </answer>
          </qandaentry>
        </qandadiv>
      </qandaset>
    </section>

    <section xml:id="sda1SaxView">
      <title>A limited view on a given XML document instance</title>

      <qandaset defaultlabel="qanda" xml:id="sda1QandamemoView">
        <title>A specific view on memo documents</title>

        <qandadiv>
          <qandaentry>
            <question>
              <para>We reconsider the following memo instance:</para>

              <programlisting language="xml">&lt;memo&gt;
  &lt;from&gt;
    &lt;name&gt;Martin&lt;/name&gt;
    &lt;surname&gt;Goik&lt;/surname&gt;
  &lt;/from&gt;
  &lt;to&gt;
    &lt;name&gt;Adam&lt;/name&gt;
    &lt;surname&gt;Hacker&lt;/surname&gt;
  &lt;/to&gt;
  &lt;to&gt;
    &lt;name&gt;Eve&lt;/name&gt;
    &lt;surname&gt;Intruder&lt;/surname&gt;
  &lt;/to&gt;
  &lt;date year="2005" month="1" day="6"/&gt;
  &lt;subject&gt;Firewall problems&lt;/subject&gt;
  &lt;content&gt;
    &lt;para&gt;Thanks for your excellent work.&lt;/para&gt;
    &lt;para&gt;Our firewall is definitely broken!&lt;/para&gt;
  &lt;/content&gt;
&lt;/memo&gt;</programlisting>

              <para>Every memo instance does have exactly one sender and one
              subject. Write a SAX application to achieve the following
              output:</para>

              <screen>Sender: Martin Goik
Subject: Firewall problems</screen>

              <para>Hint: The callback implementation of
              <methodname>org.xml.sax.helpers.DefaultHandler.characters(char[],int,int)</methodname>
              may be used to filter the desired output. You have to limit its
              output to <tag class="starttag">from</tag> and <tag
              class="starttag">subject</tag> descendant content. Taking the
              <tag class="starttag">subject</tag>Firewall problems<tag
              class="endtag">subject</tag> element as an example the
              corresponding event sequence reads:</para>

              <informaltable border="1">
                <tr>
                  <th>Event</th>

                  <th>Corresponding callback</th>
                </tr>

                <tr>
                  <td>...</td>

                  <td>...</td>
                </tr>

                <tr>
                  <td>Opening <tag class="starttag">subject</tag> element</td>

                  <td>startElement(...)</td>
                </tr>

                <tr>
                  <td>Firewall problems</td>

                  <td>characters(...)</td>
                </tr>

                <tr>
                  <td>Closing <tag class="endtag">subject</tag> element</td>

                  <td>endElement(...)</td>
                </tr>

                <tr>
                  <td>...</td>

                  <td>...</td>
                </tr>
              </informaltable>

              <para>Limiting output of our
              <methodname>org.xml.sax.helpers.DefaultHandler.characters(char[],int,int)</methodname>
              callback method can be achieved by introducing instance scope
              boolean variables being set to true or false inside your
              <methodname>org.xml.sax.helpers.DefaultHandler.startElement(String
              uri,String localName,String qName,org.xml.sax.Attributes
              attributes)</methodname> and
              <methodname>org.xml.sax.helpers.DefaultHandler.endElement(String
              uri, String localName, String qName)</methodname>
              implementations accordingly to keep track of the current event
              state.</para>
            </question>

            <answer>
              <programlisting language="java">package sax.view;
...
/** A view on memo documents restricting to sender name an subject. */
public class MemoViewHandler extends DefaultHandler {

  // These variables help us to keep track of the current event state spanning
  // each startElement(...) -- character(...) -- endElement(...) event sequence
  boolean inFromContext = false,
      inSubjectContext = false;

  public void startElement(String namespaceUri, String localName,
      String rawName, Attributes attrs) {
    switch(rawName) {
    case "from":
      inFromContext = true;
      System.out.print("Sender: ");
      break;
    case "subject":
      inSubjectContext = true;
      System.out.print("Subject: ");
      break;
    case "surname":
      if (inFromContext) {
        System.out.print(" "); // Adding additional space between &lt;name&gt;
      }                        // and &lt;surname&gt;  content.
      break;
    }
  }

  @Override
  public void endElement(String uri, String localName, String rawName)
      throws SAXException {
    switch(rawName) {
    case "from":
      inFromContext = false;
      System.out.println();
      break;
    case "subject":
      inSubjectContext = false;
      System.out.println();
      break;
    }
  }

  @Override
  public void characters(char[] ch, int start, int length) throws SAXException {
    if (inFromContext || inSubjectContext) {
      System.out.print(new String(ch, start, length));
    }
  }
}</programlisting>
            </answer>
          </qandaentry>
        </qandadiv>
      </qandaset>
    </section>

    <section xml:id="sda1SectImgAlign">
      <title>Searching <tag class="emptytag">img</tag> elements for obsoleted
      attributes.</title>

      <qandaset defaultlabel="qanda" xml:id="sda1QandaImgAlign">
        <qandadiv>
          <qandaentry>
            <question>
              <para>Consider the following <xref linkend="glo_XHTML"/>
              document instance example:</para>

              <programlisting language="xml">&lt;html xmlns='http://www.w3.org/1999/xhtml'&gt;
    &lt;head&gt;
        &lt;title&gt;A simple image&lt;/title&gt;
    &lt;/head&gt;
    &lt;body&gt;
        &lt;img src='a.gif' align='top'/&gt;

        &lt;p&gt;Some inline image without alignment &lt;img src="b.gif"/&gt;&lt;/p&gt;
        &lt;p&gt;Some inline image with alignment &lt;img src="c.gif" align="bottom"/&gt;&lt;/p&gt;
    &lt;/body&gt;
&lt;/html&gt;</programlisting>

              <para>This instance contains three <tag
              class="emptytag">img</tag> elements. Two of them have an old
              style <property>align</property> property. Modern HTML versions
              prohibit this usage in favour of CSS <code>&lt;img
              style="vertical-align: text-top;" /&gt;</code>.</para>

              <para>Write an application which produces the following list of
              non-conforming images:</para>

              <screen>Found image element 'a.gif' having attribute align='top'
Found image element 'c.gif' having attribute align='bottom'
</screen>

              <para>Write your application in a testable fashion and provide
              unit test(s).</para>
            </question>

            <answer>
              <annotation role="make">
                <para role="eclipse">P/Sda1/Alignimg</para>
              </annotation>
            </answer>
          </qandaentry>
        </qandadiv>
      </qandaset>
    </section>

    <section xml:id="sda1SectFilterImg">
      <title>Filtering <tag class="emptytag">img</tag> elements.</title>

      <qandaset defaultlabel="qanda" xml:id="sda1QandaFilterImg">
        <qandadiv>
          <qandaentry>
            <question>
              <para>Consider the following <xref linkend="glo_XHTML"/>
              document instance example:</para>

              <programlisting language="xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;!DOCTYPE html&gt;
&lt;html xmlns:html="http://www.w3.org/1999/xhtml"
    xmlns="http://www.w3.org/1999/xhtml"&gt;
    &lt;head&gt;
        &lt;title&gt;&lt;/title&gt;
    &lt;/head&gt;
    &lt;body&gt;
        &lt;h1&gt;Some Title&lt;/h1&gt;
        &lt;!-- Block level image --&gt;

        &lt;div&gt;
            &lt;img src="dsfcjws.jpeg"/&gt; <co
                  linkends="sda1XhtmlImgBlockInline-1"
                  xml:id="sda1XhtmlImgBlockInline-1-co"/>
        &lt;/div&gt;

        &lt;img src="someimage.png"/&gt; <co
                  linkends="sda1XhtmlImgBlockInline-2"
                  xml:id="sda1XhtmlImgBlockInline-2-co"/>

        &lt;!-- inline image within a paragraph --&gt;
        &lt;p&gt;This is an &lt;em&gt;&lt;img src="fds.gif"/&gt;<co
                  linkends="sda1XhtmlImgBlockInline-3"
                  xml:id="sda1XhtmlImgBlockInline-3-co"/>&lt;/em&gt; inline image:&lt;img
                                                   src="otherdata.png"/&gt;<co
                  linkends="sda1XhtmlImgBlockInline-4"
                  xml:id="sda1XhtmlImgBlockInline-4-co"/>.&lt;/p&gt;

    &lt;/body&gt;
&lt;/html&gt;</programlisting>

              <calloutlist>
                <callout arearefs="sda1XhtmlImgBlockInline-1-co"
                         xml:id="sda1XhtmlImgBlockInline-1">
                  <para>First block level image.</para>
                </callout>

                <callout arearefs="sda1XhtmlImgBlockInline-2-co"
                         xml:id="sda1XhtmlImgBlockInline-2">
                  <para>Second block level image.</para>
                </callout>

                <callout arearefs="sda1XhtmlImgBlockInline-3-co"
                         xml:id="sda1XhtmlImgBlockInline-3">
                  <para>First inline image.</para>
                </callout>

                <callout arearefs="sda1XhtmlImgBlockInline-4-co"
                         xml:id="sda1XhtmlImgBlockInline-4">
                  <para>Second inline image.</para>
                </callout>
              </calloutlist>

              <para>We will assume:</para>

              <orderedlist>
                <listitem>
                  <para>All <tag class="emptytag">img</tag> elements having
                  either <tag class="emptytag">body</tag>, <tag
                  class="emptytag">div</tag>, <tag class="emptytag">th</tag>
                  or <tag class="emptytag">td</tag> parent elements are
                  considered to be block level images.</para>
                </listitem>

                <listitem>
                  <para>All remaining <tag class="emptytag">img</tag> elements
                  are to be considered inline.</para>
                </listitem>
              </orderedlist>

              <para>Write a <xref linkend="glo_SAX"/> application which counts
              both the number of block level and inline images separately. On
              invocation the above instance shall yield the following
              output:</para>

              <screen>Document contains 2 block level &lt;img&gt; elements.
Document contains 2 inline &lt;img&gt; elements.</screen>

              <para>Write your application in a testable fashion and provide
              unit test(s).</para>
            </question>

            <answer>
              <annotation role="make">
                <para role="eclipse">P/Sda1/ImageSearch</para>
              </annotation>
            </answer>
          </qandaentry>
        </qandadiv>
      </qandaset>
    </section>
  </section>

  <section xml:id="saxValidate">
    <title><acronym xlink:href="http://www.saxproject.org">SAX</acronym>
    validation</title>

    <para>So far we only parsed well formed document instances. Our current
    parser may operate on valid XML instances:</para>

    <figure xml:id="saxNotValid">
      <title>An invalid XML document.</title>

      <programlisting language="xml">&lt;xs:element name="catalog"&gt;
  &lt;xs:complexType&gt;
    &lt;xs:sequence&gt;
      &lt;xs:element ref="item"/&gt;
    &lt;/xs:sequence&gt;
  &lt;/xs:complexType&gt;
&lt;/xs:element&gt;

&lt;xs:element name="item"&gt;
  &lt;xs:complexType mixed="true"&gt;
    &lt;xs:attribute name="orderNo" type="xs:int" use="required"/&gt;
  &lt;/xs:complexType&gt;
&lt;/xs:element&gt;</programlisting>

      <programlisting language="xml">&lt;catalog&gt;
  &lt;item orderNo="3218"&gt;Swinging headset&lt;/item&gt;
  &lt;item orderNo="9921"&gt;200W Stereo Amplifier&lt;/item&gt; <emphasis
          role="bold">&lt;!-- second entry forbidden
                                                                      by schema --&gt;</emphasis>
&lt;/catalog&gt;</programlisting>

      <caption>
        <para>In contrast to <xref linkend="saxMissItem"/> this document is
        well formed. But it is not <emphasis role="bold">valid</emphasis> with
        respect to the schema since more than one <tag
        class="starttag">item</tag> elements are present.</para>
      </caption>
    </figure>

    <para>This document instance is well-formed but not valid: Only one
    element <tag class="starttag">item</tag> is allowed due to an ill-defined
    schema. The parser will not report any error or warning. In order to
    enable validation we need to configure our parser:</para>

    <programlisting language="java">xmlReader.setFeature("http://xml.org/sax/features/validation", true);</programlisting>

    <para>The string <code>http://xml.org/sax/features/validation</code>
    serves as a key. Since this is an ordinary string value a parser may or
    may not implement it. The <acronym
    xlink:href="http://www.saxproject.org">SAX</acronym> standard defines two
    exception classes for dealing with feature related errors:</para>

    <variablelist>
      <varlistentry>
        <term><link
        xlink:href="https://docs.oracle.com/javase/10/docs/api/org/xml/sax/SAXNotRecognizedException.html">SAXNotRecognizedException</link></term>

        <listitem>
          <para>The feature is not known to the parser.</para>
        </listitem>
      </varlistentry>

      <varlistentry>
        <term><link
        xlink:href="https://docs.oracle.com/javase/10/docs/api/org/xml/sax/SAXNotSupportedException.html">SAXNotSupportedException</link></term>

        <listitem>
          <para>The feature is known to the parser but the parser does not
          support it or it does not support a specific value being set as a
          value.</para>
        </listitem>
      </varlistentry>
    </variablelist>

    <para>The <productname
    xlink:href="https://projects.apache.org/project.html?xerces-xml_commons_resolver">xml-commons
    resolver project </productname>offers an implementation being able to
    process various catalog file formats. Maven based project allow the
    corresponding library import by adding the following dependency:</para>

    <programlisting language="xml">&lt;dependency&gt;
  &lt;groupId&gt;xml-resolver&lt;/groupId&gt;
  &lt;artifactId&gt;xml-resolver&lt;/artifactId&gt;
  &lt;version&gt;1.2&lt;/version&gt;
&lt;/dependency&gt;</programlisting>

    <para>We need a properties file <link
    xlink:href="https://xerces.apache.org/xml-commons/components/resolver/tips.html">CatalogManager.properties</link>
    defining XML catalogs to be used and additional parameters:</para>

    <literallayout># Catalogs are relative to this properties file
relative-catalogs=false
# Catalog list

catalogs=\
/.../plugins/com.oxygenxml.editor_.../frameworks/xhtml/dtd/xhtmlcatalog.xml;\
/.../plugins/com.oxygenxml.editor_.../frameworks/xhtml11/dtd/xhtmlcatalog.xml
# PUBLIC in favour of SYSTEM
prefer=public</literallayout>

    <para>This configuration uses some catalogs from the
    <trademark>Oxygen</trademark> <trademark>Eclipse</trademark> plugin. We
    may now add a resolver to our SAX application by referencing the above
    configuration file <coref linkend="resolverPropertyFile"/> and registering
    the resolver to our SAX parser instance <coref
    linkend="resolverRegister"/>:</para>

    <programlisting language="java">xmlReader = saxParser.getXMLReader();

      // Set up resolving PUBLIC identifier
      final CatalogManager cm = new CatalogManager("<emphasis role="bold">CatalogManager.properties</emphasis>" <co
        xml:id="resolverPropertyFile"/> );
      final CatalogResolver resolver = new CatalogResolver(cm);
      xmlReader.setEntityResolver(resolver) <co xml:id="resolverRegister"/>;</programlisting>
  </section>

  <section xml:id="saxNamespace">
    <title>Namespaces</title>

    <para>In order to make a <acronym
    xlink:href="http://www.saxproject.org">SAX</acronym> parser application
    namespace aware we have to activate two <acronym
    xlink:href="http://www.saxproject.org">SAX</acronym> parsing
    features:</para>

    <programlisting language="java">xmlReader = saxParser.getXMLReader();
xmlReader.setFeature("http://xml.org/sax/features/namespaces", true);
xmlReader.setFeature("http://xml.org/sax/features/namespace-prefixes", true);</programlisting>

    <para>This instructs the parser to pass the namespace's name for each
    element. Namespace prefixes like <code>xsl</code> in <tag
    class="starttag">xsl:for-each</tag> are also passed and may be used by an
    application:</para>

    <programlisting language="java">package sax;
...
public class NamespaceEventHandler extends DefaultHandler {
...
 public void startElement(String <emphasis role="bold">namespaceUri</emphasis>, String localName,
                           String rawName, Attributes attrs) {
   System.out.println("Opening Element rawName='" + rawName + "'\n"
       + "namespaceUri='" + <emphasis role="bold">namespaceUri</emphasis> + "'\n"
       + "localName='" + localName
       + "'\n--------------------------------------------");
}</programlisting>

    <para>As an example we take a XSLT script:</para>

    <programlisting language="xml">&lt;?xml version="1.0" encoding="utf-8"?&gt;
&lt;xsl:stylesheet version="1.0"
  xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
  xmlns:fo='http://www.w3.org/1999/XSL/Format'&gt;

  &lt;xsl:template match="/"&gt;
    &lt;fo:block&gt;A block&lt;/fo:block&gt;
    &lt;HTML/&gt;
  &lt;/xsl:template&gt;

&lt;/xsl:stylesheet&gt;</programlisting>

    <para>This XSLT script being conceived as a XML document instance contains
    elements belonging to two different namespaces namely
    <code>http://www.w3.org/1999/XSL/Transform</code> and
    <code>http://www.w3.org/1999/XSL/Format</code>. The script also contains a
    <quote>raw</quote> <tag audience="" class="emptytag">HTML</tag> element
    being introduced only for demonstration purposes belonging to the default
    namespace. The result reads:</para>

    <screen>Opening Element rawName='xsl:stylesheet'
namespaceUri='http://www.w3.org/1999/XSL/Transform'
localName='stylesheet'
--------------------------------------------
Opening Element rawName='xsl:template'
namespaceUri='http://www.w3.org/1999/XSL/Transform'
localName='template'
--------------------------------------------
Opening Element rawName='fo:block'
namespaceUri='http://www.w3.org/1999/XSL/Format'
localName='block'
--------------------------------------------
Opening Element rawName='HTML'
namespaceUri=''
localName='HTML'</screen>

    <para>Now the parser tells us to which namespace a given element node
    belongs to. A XSLT engine for example uses this information to build two
    classes of elements:</para>

    <itemizedlist>
      <listitem>
        <para>Elements belonging to the namespace
        <code>http://www.w3.org/1999/XSL/Transform</code> like <tag
        class="emptytag">xsl:value-of select="..."</tag> have to be
        interpreted as instructions by the processor.</para>
      </listitem>

      <listitem>
        <para>Elements <emphasis role="bold">not</emphasis> belonging to the
        namespace <code>http://www.w3.org/1999/XSL/Transform</code> like <tag
        class="emptytag">html</tag> or <tag class="starttag">fo:block</tag>
        are copied <quote>as is</quote> to the output.</para>
      </listitem>
    </itemizedlist>

    <qandaset defaultlabel="qanda" xml:id="quandaentry_SqlFromXml">
      <title>Generating SQL INSERT statements from XML data</title>

      <qandadiv>
        <qandaentry>
          <question>
            <para>Consider the following schema and document instance
            example:</para>

            <figure xml:id="catalogProductDescriptionsExample">
              <title>A sample catalog containing products and corresponding
              descriptions.</title>

              <programlisting language="xml">&lt;xs:element name="catalog"&gt;
  &lt;xs:complexType&gt;
    &lt;xs:sequence&gt;
      &lt;xs:element ref="product" minOccurs="0" maxOccurs="unbounded"/&gt;
    &lt;/xs:sequence&gt;
  &lt;/xs:complexType&gt;
&lt;/xs:element&gt;

&lt;xs:element name="product"&gt;
  &lt;xs:complexType&gt;
    &lt;xs:sequence&gt;
      &lt;xs:element name="name" type="xs:string"/&gt;
      &lt;xs:element name="description" type="xs:string" minOccurs="0"
                         maxOccurs="unbounded"/&gt;
      &lt;xs:element name="age" type="xs:int" minOccurs="0" maxOccurs="1"/&gt;
    &lt;/xs:sequence&gt;
    &lt;xs:attribute name="id" type="xs:ID" use="required"/&gt;
  &lt;/xs:complexType&gt;
&lt;/xs:element&gt;</programlisting>

              <programlisting language="xml">&lt;catalog ... xsi:noNamespaceSchemaLocation="catalog.xsd"&gt;
   &lt;product id="mpt"&gt;
       &lt;name&gt;Monkey Picked Tea&lt;/name&gt;
       &lt;description&gt;Rare wild Chinese tea&lt;/description&gt;
       &lt;description&gt;Picked only by specially trained monkeys&lt;/description&gt;
   &lt;/product&gt;
    &lt;product id="instantTent"&gt;
        &lt;name&gt;4-Person Instant Tent&lt;/name&gt;
        &lt;description&gt;4-person, 1-room tent&lt;/description&gt;
        &lt;description&gt;Pre-attached tent poles&lt;/description&gt;
        &lt;description&gt;Exclusive WeatherTec system.&lt;/description&gt;
        &lt;age&gt;15&lt;/age&gt;
    &lt;/product&gt;
&lt;/catalog&gt;</programlisting>
            </figure>

            <para>Data being contained in catalog instances shall be
            transferred to a relational database system. Implement and test a
            <xref linkend="glo_SAX"/> application by following the
            subsequently described steps:</para>

            <glosslist>
              <glossentry>
                <glossterm>Database schema</glossterm>

                <glossdef>
                  <para>Create a database schema matching a product of your
                  choice (<productname>Postgresql</productname>,
                  <productname>Oracle</productname>, ...). Your schema should
                  map type and integrity constraints of the given DTD. In
                  particular:</para>

                  <itemizedlist>
                    <listitem>
                      <para>The element <tag class="starttag">age</tag> is
                      optional.</para>
                    </listitem>

                    <listitem>
                      <para><tag class="starttag">description</tag> elements
                      are children of &lt;product&gt; elements and should thus
                      be modeled by a 1:n relation.</para>
                    </listitem>

                    <listitem>
                      <para>In a catalog the order of descriptions of a given
                      product matters. Thus your schema should allow for
                      descriptions being ordered.</para>
                    </listitem>
                  </itemizedlist>
                </glossdef>
              </glossentry>

              <glossentry>
                <glossterm>SAX Application</glossterm>

                <glossdef>
                  <para>The order of appearance of the XML elements <tag
                  class="starttag">product</tag>, <tag
                  class="starttag">name</tag> and <tag
                  class="starttag">age</tag> does not permit a linear
                  generation of suitable SQL <code>INSERT</code> statements by
                  a <xref linkend="glo_SAX"/> content handler. Instead you
                  will have to keep copies of local element values when
                  implementing
                  <methodname>org.xml.sax.ContentHandler.startElement(String,String,String,org.xml.sax.Attributes)</methodname>
                  and related callback methods. The following sequence of
                  insert statements corresponds to the XML data being
                  contained in <xref
                  linkend="catalogProductDescriptionsExample"/>. You may use
                  these statements as a blueprint to be generated by your
                  <xref linkend="glo_SAX"/> application:</para>

                  <programlisting language="sql"><emphasis role="bold">INSERT INTO Product VALUES ('mpt', 'Monkey picked tea', NULL);</emphasis>
INSERT INTO Description VALUES('mpt', 0,
                                'Picked only by specially trained monkeys');
INSERT INTO Description VALUES('mpt', 1, 'Rare wild Chinese tea');

<emphasis role="bold">INSERT INTO Product VALUES ('instantTent', '4-person instant tent', 15);</emphasis>
INSERT INTO Description VALUES('instantTent', 0, 'Exclusive WeatherTec system.');
INSERT INTO Description VALUES('instantTent', 1, '4-person, 1-room tent');
INSERT INTO Description VALUES('instantTent', 2, 'Pre-attached tent poles');</programlisting>

                  <para>Provide a suitable <xref linkend="glo_Junit"/>
                  test.</para>
                </glossdef>
              </glossentry>
            </glosslist>
          </question>

          <answer>
            <annotation role="make">
              <para role="eclipse">P/Sda1/catalog2sql</para>
            </annotation>

            <para>Running this project and executing tests requires the
            following Maven project dependency to be installed (e.g. locally
            via <command>mvn</command> <option>install</option>) to satisfy a
            dependency:</para>

            <annotation role="make">
              <para role="eclipse">P/Sda1/saxerrorhandler</para>
            </annotation>

            <para>Some remarks are in order here:</para>

            <orderedlist>
              <listitem>
                <para>The <xref linkend="glo_SQL"/> database schema might
                read:</para>

                <programlisting language="sql">CREATE TABLE Product (
   id CHAR(20) NOT NULL PRIMARY KEY <co linkends="catalog2sqlSchema-1"
                    xml:id="catalog2sqlSchema-1-co"/>
  ,name VARCHAR(255) NOT NULL
  ,age SMALLINT <co linkends="catalog2sqlSchema-2"
                    xml:id="catalog2sqlSchema-2-co"/>
);

CREATE TABLE Description (
   product CHAR(20) NOT NULL REFERENCES Product <co
                    linkends="catalog2sqlSchema-3"
                    xml:id="catalog2sqlSchema-3-co"/>
  ,orderIndex int NOT NULL  <co linkends="catalog2sqlSchema-4"
                    xml:id="catalog2sqlSchema-4-co"/> -- preserving the order of descriptions
                               -- belonging to a given product
  ,text VARCHAR(255) NOT NULL
  ,UNIQUE(product, orderIndex) <co linkends="catalog2sqlSchema-5"
                    xml:id="catalog2sqlSchema-5-co"/>
);</programlisting>

                <calloutlist>
                  <callout arearefs="catalog2sqlSchema-1-co"
                           xml:id="catalog2sqlSchema-1">
                    <para>The primary key constraint implements the uniqueness
                    of <tag class="starttag">product id='xyz'</tag>
                    values</para>
                  </callout>

                  <callout arearefs="catalog2sqlSchema-2-co"
                           xml:id="catalog2sqlSchema-2">
                    <para>Nullability of <code>age</code> implements <tag
                    class="starttag">age</tag> elements being optional.</para>
                  </callout>

                  <callout arearefs="catalog2sqlSchema-3-co"
                           xml:id="catalog2sqlSchema-3">
                    <para><tag class="starttag">description</tag> elements
                    being children of <tag class="starttag">product</tag> are
                    being implemented by a foreign key to its identifying
                    owner thus forming weak entities.</para>
                  </callout>

                  <callout arearefs="catalog2sqlSchema-4-co"
                           xml:id="catalog2sqlSchema-4">
                    <para>The attribute <code>orderIndex</code> allows
                    descriptions to be sorted thus maintaining the original
                    order of appearance of <tag
                    class="starttag">description</tag> elements.</para>
                  </callout>

                  <callout arearefs="catalog2sqlSchema-5-co"
                           xml:id="catalog2sqlSchema-5">
                    <para>The <code>orderIndex</code> attribute is unique
                    within the set of descriptions belonging to the same
                    product.</para>
                  </callout>
                </calloutlist>
              </listitem>

              <listitem>
                <para>The result of the given input XML sample file should be
                similar to the content of the supplied reference file
                <filename>products.reference.xml</filename>:</para>

                <programlisting language="sql">INSERT INTO Product (id, name) VALUES ('mpt', 'Monkey Picked Tea');
INSERT INTO Description VALUES('mpt', 0, 'Rare wild Chinese tea');
INSERT INTO Description VALUES('mpt', 1,
                               'Picked only by specially trained monkeys');
-- end of current product entry --

INSERT INTO Product VALUES ('instantTent', '4-Person Instant Tent', 15);
INSERT INTO Description VALUES('instantTent', 0, '4-person, 1-room tent');
INSERT INTO Description VALUES('instantTent', 1, 'Pre-attached tent poles');
INSERT INTO Description VALUES('instantTent', 2, 'Exclusive WeatherTec system.');
-- end of current product entry --</programlisting>

                <para>So a <xref linkend="glo_Junit"/> test may just execute
                the XML to SQL converter and then compare the effective output
                to the above reference file.</para>
              </listitem>
            </orderedlist>
          </answer>
        </qandaentry>
      </qandadiv>
    </qandaset>

    <qandaset defaultlabel="qanda" xml:id="quandaentry_NumElemByNs">
      <title>Counting element names grouped by namespaces</title>

      <qandadiv>
        <qandaentry>
          <question>
            <para>We want to extend the SAX examples counting <link
            linkend="saxElementCount">elements</link> and of arbitrary
            document instances. Consider the following XSL sample document
            containing <xref linkend="glo_XHTML"/>:</para>

            <programlisting language="xml">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" <co
                xml:id="xhtmlCombinedNs_Svg"/>
    xmlns:h="http://www.w3.org/1999/xhtml" <co xml:id="xhtmlCombinedNs_Xhtml"/>
    exclude-result-prefixes="xs" version="2.0"&gt;

    &lt;xsl:template match="/"&gt;
        &lt;h:html&gt;
            &lt;h:head&gt;
                &lt;h:title&gt;&lt;/h:title&gt;
            &lt;/h:head&gt;
            &lt;h:body&gt;
                &lt;h:h1&gt;A heading&lt;/h:h1&gt;
                &lt;h:p&gt;A paragraph&lt;/h:p&gt;
                &lt;h:h1&gt;Yet another heading&lt;/h:h1&gt;
                &lt;xsl:apply-templates/&gt;
            &lt;/h:body&gt;
        &lt;/h:html&gt;
    &lt;/xsl:template&gt;

    &lt;xsl:template match="*"&gt;
        &lt;xsl:message&gt;
            &lt;xsl:text&gt;No template defined for element '&lt;/xsl:text&gt;
            &lt;xsl:value-of select="name(.)"/&gt;
            &lt;xsl:text&gt;'&lt;/xsl:text&gt;
        &lt;/xsl:message&gt;
    &lt;/xsl:template&gt;

&lt;/xsl:stylesheet&gt;</programlisting>

            <para>This XSL stylesheet defines two different namespaces <coref
            linkend="xhtmlCombinedNs_Svg"/> and <coref
            linkend="xhtmlCombinedNs_Xhtml"/>.</para>

            <para>Implement a <xref linkend="glo_SAX"/> application being able
            to group elements from arbitrary XML documents by namespaces along
            with their corresponding frequencies of occurrence. The intended
            output for the previous <xref linkend="glo_XSL"/> example shall
            look like:</para>

            <screen>Namespace '<emphasis role="bold">http://www.w3.org/1999/xhtml</emphasis>' contains:
&lt;head&gt; (1 occurrence)
&lt;p&gt; (1 occurrence)
&lt;h1&gt; (2 occurrences)
&lt;html&gt; (1 occurrence)
&lt;title&gt; (1 occurrence)
&lt;body&gt; (1 occurrence)

Namespace '<emphasis role="bold">http://www.w3.org/1999/XSL/Transform</emphasis>' contains:
&lt;stylesheet&gt; (1 occurrence)
&lt;template&gt; (2 occurrences)
&lt;value-of&gt; (1 occurrence)
&lt;apply-templates&gt; (1 occurrence)
&lt;text&gt; (2 occurrences)
&lt;message&gt; (1 occurrence)</screen>

            <para>Hint: Counting frequencies and grouping by namespaces may be
            achieved by using standard Java container implementations of
            <classname>java.util.Map</classname>. You may for example define
            sets of related XML elements and group them by their corresponding
            namespaces. Thus nested maps are being required.</para>
          </question>

          <answer>
            <annotation role="make">
              <para role="eclipse">P/Sda1/xmlstatistics</para>
            </annotation>

            <para>Running this project and executing tests requires the
            following Maven project dependency to be installed (e.g. locally
            via <command>mvn</command> <option>install</option>) to satisfy
            the following dependency:</para>

            <annotation role="make">
              <para role="eclipse">P/Sda1/saxerrorhandler</para>
            </annotation>

            <para>The above solution contains both a running application and a
            (incomplete) <xref linkend="glo_Junit"/> test.</para>
          </answer>
        </qandaentry>
      </qandadiv>
    </qandaset>
  </section>
</section>