<?xml version="1.0" encoding="UTF-8"?> <section version="5.0" xml:id="sax" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xi="http://www.w3.org/2001/XInclude" xmlns:svg="http://www.w3.org/2000/svg" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:html="http://www.w3.org/1999/xhtml" xmlns:db="http://docbook.org/ns/docbook"> <title>XML APIs, the Simple API for XML (SAX)</title> <section xml:id="sda1SaxRecommendedReading"> <title>Recommended reading</title> <itemizedlist> <listitem> <para><link xlink:href="https://www.ibm.com/developerworks/xml/tutorials/x-usax/x-usax.html">Understanding SAX</link></para> </listitem> <listitem> <para>Sections <link xlink:href="http://tutorials.jenkov.com/java-xml/">1</link> till <link xlink:href="http://tutorials.jenkov.com/java-xml/sax-example.html">6</link> from the <link xlink:href="http://tutorials.jenkov.com/java-xml/">Java & XML Tutorial</link>.</para> </listitem> </itemizedlist> <para>Try to answer the following question: Why do developers sometimes derive a handler from <classname xlink:href="https://docs.oracle.com/javase/10/docs/api/org/xml/sax/helpers/DefaultHandler.html">DefaultHandler</classname> and why do they sometimes prefer implementing <link xlink:href="https://docs.oracle.com/javase/10/docs/api/org/xml/sax/ContentHandler.html">ContentHandler</link>?</para> </section> <section xml:id="saxPrinciple"> <title>The principle of a <acronym xlink:href="http://www.saxproject.org">SAX</acronym> application</title> <para>We are already familiar with transformations of XML document instances to other formats. Sometimes the capabilities being offered by a given transformation approach do not suffice for a given problem. Obviously a general purpose programming language like <xref linkend="glo_Java"/> offers superior means to perform advanced manipulations of XML document trees.</para> <para>Before diving into technical details we present an example exceeding the limits of our present transformation capabilities. We want to format an XML catalog document with article descriptions to HTML. The price information however shall resides in a XML document external database namely a RDBMS:</para> <figure xml:id="saxRdbmsAccessPrinciple"> <title>Generating HTML from a XML document and an RDBMS.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/saxxmlrdbms.fig" scale="65"/> </imageobject> </mediaobject> </figure> <para>Our catalog might look like:</para> <figure xml:id="simpleCatalog"> <title>A <xref linkend="glo_XML"/> based catalog.</title> <programlisting language="xml"><catalog> <item orderNo="<emphasis role="bold">3218</emphasis>">Swinging headset</item> <item orderNo="<emphasis role="bold">9921</emphasis>">200W Stereo Amplifier</item> </catalog></programlisting> </figure> <para>The RDBMS may hold some relation with a field <code>orderNo</code> as primary key and a corresponding attribute like <code>price</code>. In a real world application <code>orderNo</code> should probably be an integer typed <code>IDENTITY</code> attribute.</para> <figure xml:id="saxRdbmsSchema"> <title>A Relation containing price information.</title> <programlisting language="sql">CREATE TABLE Product ( orderNo CHAR(10) PRIMARY KEY ,price Money ) INSERT INTO Product VALUES('<emphasis role="bold">3218</emphasis>', 42.57) INSERT INTO Product VALUES('<emphasis role="bold">9921</emphasis>', 121.50)</programlisting> <caption> <para>Prices are depending on article numbers.</para> </caption> </figure> <para>The intended HTML output with order numbers being highlighted looks like:</para> <figure xml:id="saxPriceOut"> <title>HTML generated output.</title> <programlisting language="xml"><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head><title>Available products</title></head> <body> <table border="1"> <tbody> <tr> <th><emphasis role="bold">Order number</emphasis></th> <th>Price</th> <th>Product</th> </tr> <tr> <td><emphasis role="bold">3218</emphasis></td> <td>42,57</td> <td>Swinging headset</td> </tr> <tr> <td><emphasis role="bold">9921</emphasis></td> <td>121,50</td> <td>200W Stereo Amplifier</td> </tr> </tbody> </table> </body> </html></programlisting> <caption> <para>This result HTML document contains content both from our XML document an from the database table <code>Product</code>.</para> </caption> </figure> <para>The intended transformation is beyond the XSLT standard's processing capabilities: XSLT does not enable us to RDBMS content. However some XSLT processors provide extensions for this task.</para> <para>It is tempting to write a <xref linkend="glo_Java"/> application which might use e.g. <trademark xlink:href="https://en.wikipedia.org/wiki/Java_Database_Connectivity">JDBC</trademark> for database access. But how do we actually read and parse a XML file? Sticking to the <xref linkend="glo_Java"/> standard we might use a <link xlink:href="https://docs.oracle.com/javase/10/docs/api/java/io/InputStream.html">FileInputStream</link> instance to read from <code>catalog.xml</code> and write a XML parser by ourself. Fortunately <orgname>SUN</orgname>'s <trademark xlink:href="http://www.oracle.com/technetwork/java/javase">JDK</trademark> already includes an API denoted <acronym xlink:href="http://www.saxproject.org">SAX</acronym>, the <emphasis>S</emphasis>imple <emphasis>A</emphasis>pi for <emphasis>X</emphasis>ml. The<productname xlink:href="http://www.oracle.com/technetwork/java/javase/jdk-7-readme-429198.html">JDK</productname> also includes a corresponding parser implementation. In addition there are third party <acronym xlink:href="http://www.saxproject.org">SAX</acronym> parser implementations available like <productname xlink:href="https://xerces.apache.org">Xerces</productname> from the <orgname xlink:href="https://www.apache.org">Apache Foundation</orgname>.</para> <para>The <acronym xlink:href="http://www.saxproject.org">SAX</acronym> API is event based and will be illustrated by the relationship between customers and a software vendor company:</para> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/updateinfo.fig"/> </imageobject> </mediaobject> <para>After purchasing software customers are asked to register their software. This way the vendor receives the customer's address. Each time a new release is being completed all registered customers will receive a notification typically including a <quote>special offer</quote> to upgrade their software. From an abstract point of view the following two actions take place:</para> <variablelist> <varlistentry> <term>Registration</term> <listitem> <para>The customer registers itself at the company's site indicating it's interest in updated versions.</para> </listitem> </varlistentry> <varlistentry> <term>Notification</term> <listitem> <para>Upon completion of each new software release (considered to be an <emphasis>event</emphasis>) a message is sent to all registered customers.</para> </listitem> </varlistentry> </variablelist> <para>The same principle applies to GUI applications in software development. A key press <emphasis>event</emphasis> for example will be forwarded by an application's <emphasis>event handler</emphasis> to a callback function (sometimes called a <emphasis>handler</emphasis> method) being implemented by an application developer. The <acronym xlink:href="http://www.saxproject.org">SAX</acronym> API works the same way: A parser reads a XML document generating events which <emphasis>may</emphasis> be handled by an application. During document parsing the XML tree structure gets <quote>flattened</quote> to a sequence of events:</para> <figure xml:id="saxFlattenEvent"> <title>Parsing a XML document creates a corresponding sequence of events.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/saxmodel.pdf"/> </imageobject> </mediaobject> </figure> <para>An application may register components to the parser:</para> <figure xml:id="figureSax"> <title><acronym xlink:href="http://www.saxproject.org">SAX</acronym> Principle</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/saxapparch.pdf"/> </imageobject> <caption> <para>A <acronym xlink:href="http://www.saxproject.org">SAX</acronym> application consists of a <acronym xlink:href="http://www.saxproject.org">SAX</acronym> parser and an implementation of event handlers being specific to the application. The application is developed by implementing the two handlers.</para> </caption> </mediaobject> </figure> <para>An Error Handler is required since the XML stream may contain errors. In order to implement a <acronym xlink:href="http://www.saxproject.org">SAX</acronym> application we have to:</para> <orderedlist> <listitem> <para>Instantiate required objects:</para> <itemizedlist> <listitem> <para>Parser</para> </listitem> <listitem> <para>Event Handler</para> </listitem> <listitem> <para>Error Handler</para> </listitem> </itemizedlist> </listitem> <listitem> <para>Register handler instances</para> <itemizedlist> <listitem> <para>register Event Handler to Parser</para> </listitem> <listitem> <para>register Error Handler to Parser</para> </listitem> </itemizedlist> </listitem> <listitem> <para>Start the parsing process by calling the parser's appropriate method.</para> </listitem> </orderedlist> </section> <section xml:id="saxIntroExample"> <title>First steps</title> <para>Our first <acronym xlink:href="http://www.saxproject.org">SAX</acronym> toy application <classname>sax.stat.v1.ElementCount</classname> shall simply count the number of elements it finds in an arbitrary XML document. In addition the <acronym xlink:href="http://www.saxproject.org">SAX</acronym> events shall be written to standard output generating output sketched in <xref linkend="saxFlattenEvent"/>. The application's central implementation reads:</para> <figure xml:id="saxElementCount"> <title>Counting XML elements.</title> <programlisting language="java">package sax.stat.v1; ... public class ElementCount { public void parse(final String uri) { try { final SAXParserFactory saxPf = SAXParserFactory.newInstance(); final SAXParser saxParser = saxPf.newSAXParser(); saxParser.parse(uri, eventHandler); } catch (ParserConfigurationException e){ e.printStackTrace(System.err); } catch (org.xml.sax.SAXException e) { e.printStackTrace(System.err); } catch (IOException e){ e.printStackTrace(System.err); } } public int getElementCount() { return eventHandler.getElementCount(); } private final MyEventHandler eventHandler = new MyEventHandler(); }</programlisting> <caption> <para>This application works for arbitrary well-formed XML documents.</para> </caption> </figure> <para>We now explain this application in detail. The first part deals with the instantiation of a parser:</para> <programlisting language="java">try { final SAXParserFactory saxPf = <emphasis role="bold">SAXParserFactory</emphasis>.newInstance(); final SAXParser saxParser = saxPf.newSAXParser(); saxParser.parse(uri, eventHandler); } catch (ParserConfigurationException e){ e.printStackTrace(System.err); } ...</programlisting> <para>In order to keep an application independent from a specific parser implementation the <acronym xlink:href="http://www.saxproject.org">SAX</acronym> uses the so called <link xlink:href="http://www.dofactory.com/Patterns/PatternAbstract.aspx">Abstract Factory Pattern</link> instead of simply calling a constructor from a vendor specific parser class.</para> <para>In order to be useful the parser has to be instructed to do something meaningful when a XML document gets parsed. For this purpose our application supplies an event handler instance:</para> <programlisting language="java">public void parse(final String uri) { try { final SAXParserFactory saxPf = SAXParserFactory.newInstance(); final SAXParser saxParser = saxPf.newSAXParser(); saxParser.parse(uri, <emphasis role="bold">eventHandler</emphasis>); } catch (org.xml.sax.SAXException e) { ... private final MyEventHandler <emphasis role="bold">eventHandler = new MyEventHandler()</emphasis>; }</programlisting> <para>What does the event handler actually do? It offers methods to the parser being callable during the parsing process:</para> <programlisting language="java">package sax.stat.v1; ... public class MyEventHandler extends <classname>org.xml.sax.helpers.DefaultHandler</classname> { public void <emphasis role="bold"><emphasis role="bold">startDocument()</emphasis></emphasis><co xml:id="programlisting_eventhandler_startDocument"/> { System.out.println("Opening Document"); } public void <emphasis role="bold">endDocument()</emphasis><co xml:id="programlisting_eventhandler_endDocument"/> { System.out.println("Closing Document"); } public void <emphasis role="bold">startElement(String namespaceUri, String localName, String rawName, Attributes attrs)</emphasis> <co xml:id="programlisting_eventhandler_startElement"/>{ System.out.println("Opening \"" + rawName + "\""); elementCount++; } public void <emphasis role="bold">endElement(String namespaceUri, String localName, String rawName)</emphasis><co xml:id="programlisting_eventhandler_endElement"/>{ System.out.println("Closing \"" + rawName + "\""); } public void <emphasis role="bold">characters(char[] ch, int start, int length)</emphasis><co xml:id="programlisting_eventhandler_characters"/>{ System.out.println("Content \"" + new String(ch, start, length) + '"'); } public int getElementCount() <co xml:id="programlisting_eventhandler_getElementCount"/>{ return elementCount; } private int elementCount = 0; }</programlisting> <calloutlist> <callout arearefs="programlisting_eventhandler_startDocument"> <para>This method gets called exactly once namely when opening the XML document as a whole.</para> </callout> <callout arearefs="programlisting_eventhandler_endDocument"> <para>After successfully parsing the whole document instance this method will finally be called.</para> </callout> <callout arearefs="programlisting_eventhandler_startElement"> <para>This method gets called each time a new element is parsed. In the given catalog.xml example it will be called three times: First when the <tag class="starttag">catalog</tag> appears and then two times upon each <item ... >. The supplied parameters depend whether or not name space processing is enabled.</para> </callout> <callout arearefs="programlisting_eventhandler_endElement"> <para>Called each time an element like <tag class="starttag">item ...</tag> gets closed by its counterpart <tag class="endtag">item</tag>.</para> </callout> <callout arearefs="programlisting_eventhandler_characters"> <para>This method is responsible for the treatment of textual content i.e. handling <code>#PCDATA</code> element content. We will explain its uncommon signature a little bit later.</para> </callout> <callout arearefs="programlisting_eventhandler_getElementCount"> <para><function>getElementCount()</function> is a getter method to read only access the private field <varname>elementCount</varname> which gets incremented in <coref linkend="programlisting_eventhandler_startElement"/> each time an XML element opens.</para> </callout> </calloutlist> <para>The call <code>saxParser.parse(uri, eventHandler)</code> actually initiates the parsing process and tells the parser to:</para> <itemizedlist> <listitem> <para>Open the XML document being referenced by the URI argument.</para> </listitem> <listitem> <para>Forward XML events to the event handler instance supplied by the second argument.</para> </listitem> </itemizedlist> <para>A driver class containing a <code>main(...)</code> method may start the whole process and print out the desired number of elements upon completion of a parsing run:</para> <programlisting language="java">package sax.stat.v1; public class ElementCountDriver { public static void main(String argv[]) { ElementCount xmlStats = new ElementCount(); xmlStats.parse("<emphasis role="bold">Input/Sax/catalog.xml</emphasis>"); System.out.println("Document contains " + xmlStats.<emphasis role="bold">getElementCount()</emphasis> + " elements"); } }</programlisting> <para>Processing the catalog example instance yields:</para> <screen>Opening Document <emphasis role="bold">Opening "catalog"</emphasis> <co xml:id="programlisting_catalog_output"/> Content " " <emphasis role="bold">Opening "item"</emphasis> <co xml:id="programlisting_catalog_item1"/> Content "Swinging headset" Closing "item" Content " " <emphasis role="bold">Opening "item"</emphasis> <co xml:id="programlisting_catalog_item2"/> Content "200W Stereo Amplifier" Closing "item" Content " " Closing "catalog" Closing Document <emphasis role="bold">Document contains 3 elements</emphasis> <co xml:id="programlisting_catalog_elementcount"/></screen> <calloutlist> <callout arearefs="programlisting_catalog_output"> <para>Start parsing element <tag class="starttag">catalog</tag>.</para> </callout> <callout arch="" arearefs="programlisting_catalog_item1"> <para>Start parsing element <tag class="starttag">item orderNo="3218"</tag>Swinging headset<tag class="endtag" role="">item</tag>.</para> </callout> <callout arch="" arearefs="programlisting_catalog_item2"> <para>Start parsing element <tag class="starttag">item orderNo="9921"</tag>200W Stereo Amplifier<tag class="endtag" role="">item</tag>.</para> </callout> <callout arearefs="programlisting_catalog_elementcount"> <para>After the parsing process has completed the application outputs the number of elements being counted so far.</para> </callout> </calloutlist> <para>The output contains some lines of <quote>empty</quote> content. This content is due to whitespace being located between elements. For example a newline appears between the the <tag class="starttag">catalog</tag> and the first <tag class="starttag">item</tag> element. The parser encapsulates this whitespace in a call to the <methodname xlink:href="https://docs.oracle.com/javase/10/docs/api/org/xml/sax/ContentHandler.html#characters(char%5B%5D,int,int)">characters()</methodname> method. In an application this call will typically be ignored. XML document instances in a professional context will typically not contain any newline characters at all. Instead the whole document is represented as a single line. This inhibits human readability which is not required if the processing applications work well. In this case empty content as above will not appear.</para> <para>The <code>characters(char[] ch, int start, int length)</code> method's signature looks somewhat strange regarding <xref linkend="glo_Java"/> conventions. One might expect <code>characters(String s)</code>. But this way the <acronym xlink:href="http://www.saxproject.org">SAX</acronym> API allows efficient parser implementations: A parser may initially allocate a reasonable large <code>char</code> array of say 128 bytes sufficient to hold 64 (<link xlink:href="http://unicode.org">Unicode</link>) characters. If this buffer gets exhausted the parser might allocate a second buffer of double size thus implementing an <quote>amortized doubling</quote> algorithm:</para> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/saxcharacter.pdf"/> </imageobject> </mediaobject> <para>In this example the first element content fits in the first buffer. The second content <code>200W Stereo Amplifier</code> and the third content <code>Earphone</code> both fit in the second buffer. Subsequent content may require further buffer allocations. Such a strategy minimizes the number of time consuming <code>new </code> <classname xlink:href="https://docs.oracle.com/javase/10/docs/api/java/lang/String.html">String</classname> <code>(...)</code> constructor calls being necessary for the more convenient API variant <code>characters(String s)</code>.</para> </section> <section xml:id="saxRegistry"> <title>Event- and error handler registration</title> <para>Our first <acronym xlink:href="http://www.saxproject.org">SAX</acronym> application suffers from the following deficiencies:</para> <itemizedlist> <listitem> <para>The error handling is very sparse. It completely relies on exceptions being thrown by classes like <classname xlink:href="https://docs.oracle.com/javase/10/docs/api/org/xml/sax/SAXException.html">SAXException</classname> which frequently do not supply meaningful error information.</para> </listitem> <listitem> <para>The application is not aware of namespaces. Thus reading e.g. <abbrev xlink:href="https://www.w3.org/Style/XSL">XSL</abbrev> document instances will not allow to distinguish between elements from different namespaces like HTML.</para> </listitem> <listitem> <para>The parser will not validate a document instance against a schema being present.</para> </listitem> </itemizedlist> <para>We now incrementally add these features to the <acronym xlink:href="http://www.saxproject.org">SAX</acronym> parsing process. <acronym xlink:href="http://www.saxproject.org">SAX</acronym> offers an interface <link xlink:href="https://docs.oracle.com/javase/10/docs/api/org/xml/sax/XMLReader.html">XmlReader</link> to conveniently <emphasis>register</emphasis> event- and error handler instances independently instead of passing both interfaces as a single argument to the <link xlink:href="https://docs.oracle.com/javase/10/docs/api/javax/xml/parsers/SAXParser.html#parse(java.io.File,org.xml.sax.helpers.DefaultHandler)">parse()</link> method. We first code an error handler class by implementing the interface <classname>org.xml.sax.ErrorHandler</classname> being part of the <acronym xlink:href="http://www.saxproject.org">SAX</acronym> API:</para> <programlisting language="java">package sax.stat.v2; ... public class MyErrorHandler implements ErrorHandler { <emphasis role="bold">public void warning(SAXParseException e)</emphasis> { System.err.println("[Warning]" + getLocationString(e)); } <emphasis role="bold">public void error(SAXParseException e)</emphasis> { System.err.println("[Error]" + getLocationString(e)); } <emphasis role="bold">public void fatalError(SAXParseException e)</emphasis> throws SAXException{ System.err.println("[Fatal Error]" + getLocationString(e)); } private String getLocationString(SAXParseException e) { return " line " + e.getLineNumber() + ", column " + e.getColumnNumber()+ ":" + e.getMessage(); } }</programlisting> <para>These three methods represent the <classname>org.xml.sax.ErrorHandler</classname> interface. The method <function>getLocationString</function> is used to supply precise parsing error locations by means of line- and column numbers within a document instance. If errors or warnings are encountered the parser will call one of the appropriate public methods:</para> <figure xml:id="saxMissItem"> <title>A non well formed document.</title> <programlisting language="xml"><?xml version="1.0" encoding="UTF-8"?> <catalog> <item orderNo="3218">Swinging headset</item> <item orderNo="9921">200W Stereo Amplifier </catalog></programlisting> <caption> <para>This document is not well formed since due to a missing a closing <tag class="endtag">item</tag> tag is missing.</para> </caption> </figure> <para>Our error handler method gets called yielding an informative message:</para> <screen>[Fatal Error] line 5, column -1:Expected "</item>" to terminate element starting on line 4.</screen> <para>This error output is achieved by <emphasis>registering</emphasis> an instance of <classname>sax.stat.v2.MyErrorHandler</classname> to the parser prior to starting the parsing process. In the following code snippet we also register a content handler instance to the parser and thus separate the parser's configuration from its invocation:</para> <programlisting language="java">package sax.stat.v2; ... public class ElementCount { public ElementCount() throws SAXException, ParserConfigurationException{ final SAXParserFactory saxPf = SAXParserFactory.newInstance(); final SAXParser saxParser = saxPf.newSAXParser(); xmlReader = saxParser.getXMLReader(); xmlReader.setContentHandler(eventHandler); <co xml:id="programlisting_assemble_parser_setcontenthandler"/> xmlReader.setErrorHandler(errorHandler); <co xml:id="programlisting_assemble_parser_seterrorhandler"/> } public void parse(final String uri) throws IOException, SAXException{ xmlReader.parse(uri); <co xml:id="programlisting_assemble_parser_invokeparse"/> } public int getElementCount() { return eventHandler.getElementCount(); <co xml:id="programlisting_assemble_parser_getelementcount"/> } private final XMLReader xmlReader; private final MyEventHandler eventHandler = new MyEventHandler(); <co xml:id="programlisting_assemble_parser_createeventhandler"/> private final MyErrorHandler errorHandler = new MyErrorHandler(); <co xml:id="programlisting_assemble_parser_createerrorhandler"/> }</programlisting> <calloutlist> <callout arearefs="programlisting_assemble_parser_setcontenthandler programlisting_assemble_parser_seterrorhandler"> <para>Referring to <xref linkend="figureSax" os=""/> these two calls attach the event- and error handler objects to the parser thus implementing the two arrows from the parser to the application's implementation.</para> </callout> <callout arearefs="programlisting_assemble_parser_invokeparse"> <para>The parser is invoked. Note that in this example we only pass a document's URI but no reference to a handler object.</para> </callout> <callout arearefs="programlisting_assemble_parser_getelementcount"> <para>The method <function>getElementCount()</function> is needed to allow a calling object to access the private <varname>eventHandler</varname> object's <function>getElementCount()</function> method.</para> </callout> <callout arearefs="programlisting_assemble_parser_createeventhandler programlisting_assemble_parser_createerrorhandler"> <para>An event handling and an error handling object are created to handle events during the parsing process.</para> </callout> </calloutlist> <para>The careful reader might notice a subtle difference between the content- and the error handler implementation: The class <classname>sax.stat.v2.MyErrorHandler</classname> implements the interface <classname>org.xml.sax.ErrorHandler</classname>. But <classname>sax.stat.v2.MyEventHandler</classname> is derived from <classname>org.xml.sax.helpers.DefaultHandler</classname> which itself implements the <classname>org.xml.sax.ContentHandler</classname> interface. Actually one might as well start from the latter interface requiring to implement all of it's 11 methods. In most circumstances this only complicates the application's code since it is unnecessary to react to events belonging for example to processing instructions. For this reason it is good coding practice to use the empty default implementations in <classname>org.xml.sax.helpers.DefaultHandler</classname> and to redefine only those methods corresponding to events actually being handled by the application in question.</para> <qandaset defaultlabel="qanda" xml:id="sda1SaxReadAttributes"> <title>SAX and attribute values</title> <qandadiv> <qandaentry> <question> <label>Reading an element's set of attributes.</label> <para>The example document instance does include <tag class="attribute">orderNo</tag> attribute values for each <tag class="starttag">item</tag> element. The parser does not yet show these attribute keys and their corresponding values. Read the documentation for <classname xlink:href="https://docs.oracle.com/javase/10/docs/api/org/xml/sax/Attributes.html">org.xml.sax.Attributes</classname> and extend the given code to use it.</para> <para>You should start from the <xref linkend="glo_MIB"/> Maven archetype <code>mi-maven-archetype-sax</code>. Configuration hints are available at <xref linkend="sd1_sect_idea"/>.</para> </question> <answer> <para>For the given example it would suffice to read the known <tag class="attribute">orderNo</tag> attributes value. A generic solution may ask for the set of all defined attributes and show their values:</para> <programlisting language="java">package sax; public class AttribEventHandler extends DefaultHandler { public void startElement(String namespaceUri, String localName, String rawName, Attributes attrs) { System.out.println("Opening Element " + rawName); for (int i = 0; i < attrs.getLength(); i++){ System.out.println(attrs.getQName(i) + "=" + attrs.getValue(i) + "\n"); } } }</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> <section xml:id="sda1SecElementLists"> <title>The set of element names</title> <qandaset defaultlabel="qanda" xml:id="sda1QandaElementNames"> <title>Element lists of arbitrary XML documents.</title> <qandadiv> <qandaentry> <question> <para>We reconsider the simple application reading arbitrary XML documents and providing a list of XML Elements being contained within:</para> <screen>Opening Document <emphasis role="bold">Opening "catalog"</emphasis> Content " " <emphasis role="bold">Opening "item"</emphasis> Content "Swinging headset" Closing "item" Content " ...</screen> <para>If an element like e.g. <tag class="starttag">item</tag> appears multiple times it will also be written to standard output multiple times.</para> <para>We are now interested to get the list of all elements names being present in an arbitrary XML document. Consider the following example:</para> <programlisting language="xml"><memo> <from> <name>Martin</name> <surname>Goik</surname> </from> <to> <name>Adam</name> <surname>Hacker</surname> </to> <to> <name>Eve</name> <surname>Intruder</surname> </to> <date year="2005" month="1" day="6"/> <subject>Firewall problems</subject> <content> <para>Thanks for your excellent work.</para> <para>Our firewall is definitely broken!</para> </content> </memo></programlisting> <para>The elements <tag class="starttag">to</tag> , <tag class="starttag">name</tag>, <tag class="starttag">surname</tag> and <tag class="starttag">para</tag> both appear multiple times. Write a SAX application which processes arbitrary XML documents and creates an alphabetically sorted list of elements being contained <emphasis role="bold">excluding duplicates</emphasis>. The intended output for the above example is:</para> <screen>List of elements: {content date from memo name para subject surname to }</screen> <para>The corresponding handler should be implemented in a re-usable way. Thus if different XML documents are being handled in succession the list of elements should be erased prior to processing the current document. Hints:</para> <itemizedlist> <listitem> <para>Use a <classname>java.util.SortedSet</classname> instance to collect element names thereby excluding duplicates.</para> </listitem> <listitem> <para>The method <methodname>sax.count.ListTagNamesHandler.startDocument()</methodname> may be used to initialize your handler.</para> </listitem> </itemizedlist> </question> <answer> <para>A suitable handler reads:</para> <programlisting language="java">package sax.count; import java.util.SortedSet; import java.util.TreeSet; import org.xml.sax.Attributes; import org.xml.sax.SAXException; import org.xml.sax.helpers.DefaultHandler; /** Reading attributes from element events */ public class ListTagNamesHandler extends DefaultHandler { // A SortedSet by definition does not contain any duplicates. private SortedSet<String> elementNames = new TreeSet<>(); @Override public void startDocument() throws SAXException { elementNames.clear(); // May contain elements from a previous run. } public void startElement(String namespaceUri, String localName, String rawName, Attributes attrs) { // In case the current element name has already been inserted // this method call will be silently ignored. elementNames.add(rawName); } /** * @return A sorted list of element names of he currently processed XML * document without duplicates. */ public String[] getTagNames() { return elementNames.toArray(new String[0]); } }</programlisting> <para>A complete application requires a driver:</para> <programlisting language="java">package sax.count; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.XMLReader; import sax.stat.v2.MyErrorHandler; public class Driver { public static void main(String argv[]) throws Exception { final SAXParserFactory saxPf = SAXParserFactory.newInstance(); final SAXParser saxParser = saxPf.newSAXParser(); final XMLReader xmlReader = saxParser.getXMLReader(); final ListTagNamesHandler handler = new ListTagNamesHandler(); xmlReader.setContentHandler(handler); xmlReader.setErrorHandler(new MyErrorHandler()); xmlReader.parse("Input/Xml/Memo/message.xml"); System.out.print("List of elements: {"); for (String elementName : handler.getTagNames()) { System.out.print(elementName + " "); } System.out.println("}"); } }</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="sda1SaxView"> <title>A limited view on a given XML document instance</title> <qandaset defaultlabel="qanda" xml:id="sda1QandamemoView"> <title>A specific view on memo documents</title> <qandadiv> <qandaentry> <question> <para>We reconsider the following memo instance:</para> <programlisting language="xml"><memo> <from> <name>Martin</name> <surname>Goik</surname> </from> <to> <name>Adam</name> <surname>Hacker</surname> </to> <to> <name>Eve</name> <surname>Intruder</surname> </to> <date year="2005" month="1" day="6"/> <subject>Firewall problems</subject> <content> <para>Thanks for your excellent work.</para> <para>Our firewall is definitely broken!</para> </content> </memo></programlisting> <para>Every memo instance does have exactly one sender and one subject. Write a SAX application to achieve the following output:</para> <screen>Sender: Martin Goik Subject: Firewall problems</screen> <para>Hint: The callback implementation of <methodname>org.xml.sax.helpers.DefaultHandler.characters(char[],int,int)</methodname> may be used to filter the desired output. You have to limit its output to <tag class="starttag">from</tag> and <tag class="starttag">subject</tag> descendant content. Taking the <tag class="starttag">subject</tag>Firewall problems<tag class="endtag">subject</tag> element as an example the corresponding event sequence reads:</para> <informaltable border="1"> <tr> <th>Event</th> <th>Corresponding callback</th> </tr> <tr> <td>...</td> <td>...</td> </tr> <tr> <td>Opening <tag class="starttag">subject</tag> element</td> <td>startElement(...)</td> </tr> <tr> <td>Firewall problems</td> <td>characters(...)</td> </tr> <tr> <td>Closing <tag class="endtag">subject</tag> element</td> <td>endElement(...)</td> </tr> <tr> <td>...</td> <td>...</td> </tr> </informaltable> <para>Limiting output of our <methodname>org.xml.sax.helpers.DefaultHandler.characters(char[],int,int)</methodname> callback method can be achieved by introducing instance scope boolean variables being set to true or false inside your <methodname>org.xml.sax.helpers.DefaultHandler.startElement(String uri,String localName,String qName,org.xml.sax.Attributes attributes)</methodname> and <methodname>org.xml.sax.helpers.DefaultHandler.endElement(String uri, String localName, String qName)</methodname> implementations accordingly to keep track of the current event state.</para> </question> <answer> <programlisting language="java">package sax.view; ... /** A view on memo documents restricting to sender name an subject. */ public class MemoViewHandler extends DefaultHandler { // These variables help us to keep track of the current event state spanning // each startElement(...) -- character(...) -- endElement(...) event sequence boolean inFromContext = false, inSubjectContext = false; public void startElement(String namespaceUri, String localName, String rawName, Attributes attrs) { switch(rawName) { case "from": inFromContext = true; System.out.print("Sender: "); break; case "subject": inSubjectContext = true; System.out.print("Subject: "); break; case "surname": if (inFromContext) { System.out.print(" "); // Adding additional space between <name> } // and <surname> content. break; } } @Override public void endElement(String uri, String localName, String rawName) throws SAXException { switch(rawName) { case "from": inFromContext = false; System.out.println(); break; case "subject": inSubjectContext = false; System.out.println(); break; } } @Override public void characters(char[] ch, int start, int length) throws SAXException { if (inFromContext || inSubjectContext) { System.out.print(new String(ch, start, length)); } } }</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="sda1SectImgAlign"> <title>Searching <tag class="emptytag">img</tag> elements for obsoleted attributes.</title> <qandaset defaultlabel="qanda" xml:id="sda1QandaImgAlign"> <qandadiv> <qandaentry> <question> <para>Consider the following <xref linkend="glo_XHTML"/> document instance example:</para> <programlisting language="xml"><html xmlns='http://www.w3.org/1999/xhtml'> <head> <title>A simple image</title> </head> <body> <img src='a.gif' align='top'/> <p>Some inline image without alignment <img src="b.gif"/></p> <p>Some inline image with alignment <img src="c.gif" align="bottom"/></p> </body> </html></programlisting> <para>This instance contains three <tag class="emptytag">img</tag> elements. Two of them have an old style <property>align</property> property. Modern HTML versions prohibit this usage in favour of CSS <code><img style="vertical-align: text-top;" /></code>.</para> <para>Write an application which produces the following list of non-conforming images:</para> <screen>Found image element 'a.gif' having attribute align='top' Found image element 'c.gif' having attribute align='bottom' </screen> <para>Write your application in a testable fashion and provide unit test(s).</para> </question> <answer> <annotation role="make"> <para role="eclipse">P/Sda1/Alignimg</para> </annotation> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="sda1SectFilterImg"> <title>Filtering <tag class="emptytag">img</tag> elements.</title> <qandaset defaultlabel="qanda" xml:id="sda1QandaFilterImg"> <qandadiv> <qandaentry> <question> <para>Consider the following <xref linkend="glo_XHTML"/> document instance example:</para> <programlisting language="xml"><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html> <html xmlns:html="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml"> <head> <title></title> </head> <body> <h1>Some Title</h1> <!-- Block level image --> <div> <img src="dsfcjws.jpeg"/> <co linkends="sda1XhtmlImgBlockInline-1" xml:id="sda1XhtmlImgBlockInline-1-co"/> </div> <img src="someimage.png"/> <co linkends="sda1XhtmlImgBlockInline-2" xml:id="sda1XhtmlImgBlockInline-2-co"/> <!-- inline image within a paragraph --> <p>This is an <em><img src="fds.gif"/><co linkends="sda1XhtmlImgBlockInline-3" xml:id="sda1XhtmlImgBlockInline-3-co"/></em> inline image:<img src="otherdata.png"/><co linkends="sda1XhtmlImgBlockInline-4" xml:id="sda1XhtmlImgBlockInline-4-co"/>.</p> </body> </html></programlisting> <calloutlist> <callout arearefs="sda1XhtmlImgBlockInline-1-co" xml:id="sda1XhtmlImgBlockInline-1"> <para>First block level image.</para> </callout> <callout arearefs="sda1XhtmlImgBlockInline-2-co" xml:id="sda1XhtmlImgBlockInline-2"> <para>Second block level image.</para> </callout> <callout arearefs="sda1XhtmlImgBlockInline-3-co" xml:id="sda1XhtmlImgBlockInline-3"> <para>First inline image.</para> </callout> <callout arearefs="sda1XhtmlImgBlockInline-4-co" xml:id="sda1XhtmlImgBlockInline-4"> <para>Second inline image.</para> </callout> </calloutlist> <para>We will assume:</para> <orderedlist> <listitem> <para>All <tag class="emptytag">img</tag> elements having either <tag class="emptytag">body</tag>, <tag class="emptytag">div</tag>, <tag class="emptytag">th</tag> or <tag class="emptytag">td</tag> parent elements are considered to be block level images.</para> </listitem> <listitem> <para>All remaining <tag class="emptytag">img</tag> elements are to be considered inline.</para> </listitem> </orderedlist> <para>Write a <xref linkend="glo_SAX"/> application which counts both the number of block level and inline images separately. On invocation the above instance shall yield the following output:</para> <screen>Document contains 2 block level <img> elements. Document contains 2 inline <img> elements.</screen> <para>Write your application in a testable fashion and provide unit test(s).</para> </question> <answer> <annotation role="make"> <para role="eclipse">P/Sda1/ImageSearch</para> </annotation> </answer> </qandaentry> </qandadiv> </qandaset> </section> </section> <section xml:id="saxValidate"> <title><acronym xlink:href="http://www.saxproject.org">SAX</acronym> validation</title> <para>So far we only parsed well formed document instances. Our current parser may operate on valid XML instances:</para> <figure xml:id="saxNotValid"> <title>An invalid XML document.</title> <programlisting language="xml"><xs:element name="catalog"> <xs:complexType> <xs:sequence> <xs:element ref="item"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="item"> <xs:complexType mixed="true"> <xs:attribute name="orderNo" type="xs:int" use="required"/> </xs:complexType> </xs:element></programlisting> <programlisting language="xml"><catalog> <item orderNo="3218">Swinging headset</item> <item orderNo="9921">200W Stereo Amplifier</item> <emphasis role="bold"><!-- second entry forbidden by schema --></emphasis> </catalog></programlisting> <caption> <para>In contrast to <xref linkend="saxMissItem"/> this document is well formed. But it is not <emphasis role="bold">valid</emphasis> with respect to the schema since more than one <tag class="starttag">item</tag> elements are present.</para> </caption> </figure> <para>This document instance is well-formed but not valid: Only one element <tag class="starttag">item</tag> is allowed due to an ill-defined schema. The parser will not report any error or warning. In order to enable validation we need to configure our parser:</para> <programlisting language="java">xmlReader.setFeature("http://xml.org/sax/features/validation", true);</programlisting> <para>The string <code>http://xml.org/sax/features/validation</code> serves as a key. Since this is an ordinary string value a parser may or may not implement it. The <acronym xlink:href="http://www.saxproject.org">SAX</acronym> standard defines two exception classes for dealing with feature related errors:</para> <variablelist> <varlistentry> <term><link xlink:href="https://docs.oracle.com/javase/10/docs/api/org/xml/sax/SAXNotRecognizedException.html">SAXNotRecognizedException</link></term> <listitem> <para>The feature is not known to the parser.</para> </listitem> </varlistentry> <varlistentry> <term><link xlink:href="https://docs.oracle.com/javase/10/docs/api/org/xml/sax/SAXNotSupportedException.html">SAXNotSupportedException</link></term> <listitem> <para>The feature is known to the parser but the parser does not support it or it does not support a specific value being set as a value.</para> </listitem> </varlistentry> </variablelist> <para>The <productname xlink:href="https://projects.apache.org/project.html?xerces-xml_commons_resolver">xml-commons resolver project </productname>offers an implementation being able to process various catalog file formats. Maven based project allow the corresponding library import by adding the following dependency:</para> <programlisting language="xml"><dependency> <groupId>xml-resolver</groupId> <artifactId>xml-resolver</artifactId> <version>1.2</version> </dependency></programlisting> <para>We need a properties file <link xlink:href="https://xerces.apache.org/xml-commons/components/resolver/tips.html">CatalogManager.properties</link> defining XML catalogs to be used and additional parameters:</para> <literallayout># Catalogs are relative to this properties file relative-catalogs=false # Catalog list catalogs=\ /.../plugins/com.oxygenxml.editor_.../frameworks/xhtml/dtd/xhtmlcatalog.xml;\ /.../plugins/com.oxygenxml.editor_.../frameworks/xhtml11/dtd/xhtmlcatalog.xml # PUBLIC in favour of SYSTEM prefer=public</literallayout> <para>This configuration uses some catalogs from the <trademark>Oxygen</trademark> <trademark>Eclipse</trademark> plugin. We may now add a resolver to our SAX application by referencing the above configuration file <coref linkend="resolverPropertyFile"/> and registering the resolver to our SAX parser instance <coref linkend="resolverRegister"/>:</para> <programlisting language="java">xmlReader = saxParser.getXMLReader(); // Set up resolving PUBLIC identifier final CatalogManager cm = new CatalogManager("<emphasis role="bold">CatalogManager.properties</emphasis>" <co xml:id="resolverPropertyFile"/> ); final CatalogResolver resolver = new CatalogResolver(cm); xmlReader.setEntityResolver(resolver) <co xml:id="resolverRegister"/>;</programlisting> </section> <section xml:id="saxNamespace"> <title>Namespaces</title> <para>In order to make a <acronym xlink:href="http://www.saxproject.org">SAX</acronym> parser application namespace aware we have to activate two <acronym xlink:href="http://www.saxproject.org">SAX</acronym> parsing features:</para> <programlisting language="java">xmlReader = saxParser.getXMLReader(); xmlReader.setFeature("http://xml.org/sax/features/namespaces", true); xmlReader.setFeature("http://xml.org/sax/features/namespace-prefixes", true);</programlisting> <para>This instructs the parser to pass the namespace's name for each element. Namespace prefixes like <code>xsl</code> in <tag class="starttag">xsl:for-each</tag> are also passed and may be used by an application:</para> <programlisting language="java">package sax; ... public class NamespaceEventHandler extends DefaultHandler { ... public void startElement(String <emphasis role="bold">namespaceUri</emphasis>, String localName, String rawName, Attributes attrs) { System.out.println("Opening Element rawName='" + rawName + "'\n" + "namespaceUri='" + <emphasis role="bold">namespaceUri</emphasis> + "'\n" + "localName='" + localName + "'\n--------------------------------------------"); }</programlisting> <para>As an example we take a XSLT script:</para> <programlisting language="xml"><?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:xsl='http://www.w3.org/1999/XSL/Transform' xmlns:fo='http://www.w3.org/1999/XSL/Format'> <xsl:template match="/"> <fo:block>A block</fo:block> <HTML/> </xsl:template> </xsl:stylesheet></programlisting> <para>This XSLT script being conceived as a XML document instance contains elements belonging to two different namespaces namely <code>http://www.w3.org/1999/XSL/Transform</code> and <code>http://www.w3.org/1999/XSL/Format</code>. The script also contains a <quote>raw</quote> <tag audience="" class="emptytag">HTML</tag> element being introduced only for demonstration purposes belonging to the default namespace. The result reads:</para> <screen>Opening Element rawName='xsl:stylesheet' namespaceUri='http://www.w3.org/1999/XSL/Transform' localName='stylesheet' -------------------------------------------- Opening Element rawName='xsl:template' namespaceUri='http://www.w3.org/1999/XSL/Transform' localName='template' -------------------------------------------- Opening Element rawName='fo:block' namespaceUri='http://www.w3.org/1999/XSL/Format' localName='block' -------------------------------------------- Opening Element rawName='HTML' namespaceUri='' localName='HTML'</screen> <para>Now the parser tells us to which namespace a given element node belongs to. A XSLT engine for example uses this information to build two classes of elements:</para> <itemizedlist> <listitem> <para>Elements belonging to the namespace <code>http://www.w3.org/1999/XSL/Transform</code> like <tag class="emptytag">xsl:value-of select="..."</tag> have to be interpreted as instructions by the processor.</para> </listitem> <listitem> <para>Elements <emphasis role="bold">not</emphasis> belonging to the namespace <code>http://www.w3.org/1999/XSL/Transform</code> like <tag class="emptytag">html</tag> or <tag class="starttag">fo:block</tag> are copied <quote>as is</quote> to the output.</para> </listitem> </itemizedlist> <qandaset defaultlabel="qanda" xml:id="quandaentry_SqlFromXml"> <title>Generating SQL INSERT statements from XML data</title> <qandadiv> <qandaentry> <question> <para>Consider the following schema and document instance example:</para> <figure xml:id="catalogProductDescriptionsExample"> <title>A sample catalog containing products and corresponding descriptions.</title> <programlisting language="xml"><xs:element name="catalog"> <xs:complexType> <xs:sequence> <xs:element ref="product" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="product"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="description" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> <xs:element name="age" type="xs:int" minOccurs="0" maxOccurs="1"/> </xs:sequence> <xs:attribute name="id" type="xs:ID" use="required"/> </xs:complexType> </xs:element></programlisting> <programlisting language="xml"><catalog ... xsi:noNamespaceSchemaLocation="catalog.xsd"> <product id="mpt"> <name>Monkey Picked Tea</name> <description>Rare wild Chinese tea</description> <description>Picked only by specially trained monkeys</description> </product> <product id="instantTent"> <name>4-Person Instant Tent</name> <description>4-person, 1-room tent</description> <description>Pre-attached tent poles</description> <description>Exclusive WeatherTec system.</description> <age>15</age> </product> </catalog></programlisting> </figure> <para>Data being contained in catalog instances shall be transferred to a relational database system. Implement and test a <xref linkend="glo_SAX"/> application by following the subsequently described steps:</para> <glosslist> <glossentry> <glossterm>Database schema</glossterm> <glossdef> <para>Create a database schema matching a product of your choice (<productname>Postgresql</productname>, <productname>Oracle</productname>, ...). Your schema should map type and integrity constraints of the given DTD. In particular:</para> <itemizedlist> <listitem> <para>The element <tag class="starttag">age</tag> is optional.</para> </listitem> <listitem> <para><tag class="starttag">description</tag> elements are children of <product> elements and should thus be modeled by a 1:n relation.</para> </listitem> <listitem> <para>In a catalog the order of descriptions of a given product matters. Thus your schema should allow for descriptions being ordered.</para> </listitem> </itemizedlist> </glossdef> </glossentry> <glossentry> <glossterm>SAX Application</glossterm> <glossdef> <para>The order of appearance of the XML elements <tag class="starttag">product</tag>, <tag class="starttag">name</tag> and <tag class="starttag">age</tag> does not permit a linear generation of suitable SQL <code>INSERT</code> statements by a <xref linkend="glo_SAX"/> content handler. Instead you will have to keep copies of local element values when implementing <methodname>org.xml.sax.ContentHandler.startElement(String,String,String,org.xml.sax.Attributes)</methodname> and related callback methods. The following sequence of insert statements corresponds to the XML data being contained in <xref linkend="catalogProductDescriptionsExample"/>. You may use these statements as a blueprint to be generated by your <xref linkend="glo_SAX"/> application:</para> <programlisting language="sql"><emphasis role="bold">INSERT INTO Product VALUES ('mpt', 'Monkey picked tea', NULL);</emphasis> INSERT INTO Description VALUES('mpt', 0, 'Picked only by specially trained monkeys'); INSERT INTO Description VALUES('mpt', 1, 'Rare wild Chinese tea'); <emphasis role="bold">INSERT INTO Product VALUES ('instantTent', '4-person instant tent', 15);</emphasis> INSERT INTO Description VALUES('instantTent', 0, 'Exclusive WeatherTec system.'); INSERT INTO Description VALUES('instantTent', 1, '4-person, 1-room tent'); INSERT INTO Description VALUES('instantTent', 2, 'Pre-attached tent poles');</programlisting> <para>Provide a suitable <xref linkend="glo_Junit"/> test.</para> </glossdef> </glossentry> </glosslist> </question> <answer> <annotation role="make"> <para role="eclipse">P/Sda1/catalog2sql</para> </annotation> <para>Running this project and executing tests requires the following Maven project dependency to be installed (e.g. locally via <command>mvn</command> <option>install</option>) to satisfy a dependency:</para> <annotation role="make"> <para role="eclipse">P/Sda1/saxerrorhandler</para> </annotation> <para>Some remarks are in order here:</para> <orderedlist> <listitem> <para>The <xref linkend="glo_SQL"/> database schema might read:</para> <programlisting language="sql">CREATE TABLE Product ( id CHAR(20) NOT NULL PRIMARY KEY <co linkends="catalog2sqlSchema-1" xml:id="catalog2sqlSchema-1-co"/> ,name VARCHAR(255) NOT NULL ,age SMALLINT <co linkends="catalog2sqlSchema-2" xml:id="catalog2sqlSchema-2-co"/> ); CREATE TABLE Description ( product CHAR(20) NOT NULL REFERENCES Product <co linkends="catalog2sqlSchema-3" xml:id="catalog2sqlSchema-3-co"/> ,orderIndex int NOT NULL <co linkends="catalog2sqlSchema-4" xml:id="catalog2sqlSchema-4-co"/> -- preserving the order of descriptions -- belonging to a given product ,text VARCHAR(255) NOT NULL ,UNIQUE(product, orderIndex) <co linkends="catalog2sqlSchema-5" xml:id="catalog2sqlSchema-5-co"/> );</programlisting> <calloutlist> <callout arearefs="catalog2sqlSchema-1-co" xml:id="catalog2sqlSchema-1"> <para>The primary key constraint implements the uniqueness of <tag class="starttag">product id='xyz'</tag> values</para> </callout> <callout arearefs="catalog2sqlSchema-2-co" xml:id="catalog2sqlSchema-2"> <para>Nullability of <code>age</code> implements <tag class="starttag">age</tag> elements being optional.</para> </callout> <callout arearefs="catalog2sqlSchema-3-co" xml:id="catalog2sqlSchema-3"> <para><tag class="starttag">description</tag> elements being children of <tag class="starttag">product</tag> are being implemented by a foreign key to its identifying owner thus forming weak entities.</para> </callout> <callout arearefs="catalog2sqlSchema-4-co" xml:id="catalog2sqlSchema-4"> <para>The attribute <code>orderIndex</code> allows descriptions to be sorted thus maintaining the original order of appearance of <tag class="starttag">description</tag> elements.</para> </callout> <callout arearefs="catalog2sqlSchema-5-co" xml:id="catalog2sqlSchema-5"> <para>The <code>orderIndex</code> attribute is unique within the set of descriptions belonging to the same product.</para> </callout> </calloutlist> </listitem> <listitem> <para>The result of the given input XML sample file should be similar to the content of the supplied reference file <filename>products.reference.xml</filename>:</para> <programlisting language="sql">INSERT INTO Product (id, name) VALUES ('mpt', 'Monkey Picked Tea'); INSERT INTO Description VALUES('mpt', 0, 'Rare wild Chinese tea'); INSERT INTO Description VALUES('mpt', 1, 'Picked only by specially trained monkeys'); -- end of current product entry -- INSERT INTO Product VALUES ('instantTent', '4-Person Instant Tent', 15); INSERT INTO Description VALUES('instantTent', 0, '4-person, 1-room tent'); INSERT INTO Description VALUES('instantTent', 1, 'Pre-attached tent poles'); INSERT INTO Description VALUES('instantTent', 2, 'Exclusive WeatherTec system.'); -- end of current product entry --</programlisting> <para>So a <xref linkend="glo_Junit"/> test may just execute the XML to SQL converter and then compare the effective output to the above reference file.</para> </listitem> </orderedlist> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset defaultlabel="qanda" xml:id="quandaentry_NumElemByNs"> <title>Counting element names grouped by namespaces</title> <qandadiv> <qandaentry> <question> <para>We want to extend the SAX examples counting <link linkend="saxElementCount">elements</link> and of arbitrary document instances. Consider the following XSL sample document containing <xref linkend="glo_XHTML"/>:</para> <programlisting language="xml"><?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" <co xml:id="xhtmlCombinedNs_Svg"/> xmlns:h="http://www.w3.org/1999/xhtml" <co xml:id="xhtmlCombinedNs_Xhtml"/> exclude-result-prefixes="xs" version="2.0"> <xsl:template match="/"> <h:html> <h:head> <h:title></h:title> </h:head> <h:body> <h:h1>A heading</h:h1> <h:p>A paragraph</h:p> <h:h1>Yet another heading</h:h1> <xsl:apply-templates/> </h:body> </h:html> </xsl:template> <xsl:template match="*"> <xsl:message> <xsl:text>No template defined for element '</xsl:text> <xsl:value-of select="name(.)"/> <xsl:text>'</xsl:text> </xsl:message> </xsl:template> </xsl:stylesheet></programlisting> <para>This XSL stylesheet defines two different namespaces <coref linkend="xhtmlCombinedNs_Svg"/> and <coref linkend="xhtmlCombinedNs_Xhtml"/>.</para> <para>Implement a <xref linkend="glo_SAX"/> application being able to group elements from arbitrary XML documents by namespaces along with their corresponding frequencies of occurrence. The intended output for the previous <xref linkend="glo_XSL"/> example shall look like:</para> <screen>Namespace '<emphasis role="bold">http://www.w3.org/1999/xhtml</emphasis>' contains: <head> (1 occurrence) <p> (1 occurrence) <h1> (2 occurrences) <html> (1 occurrence) <title> (1 occurrence) <body> (1 occurrence) Namespace '<emphasis role="bold">http://www.w3.org/1999/XSL/Transform</emphasis>' contains: <stylesheet> (1 occurrence) <template> (2 occurrences) <value-of> (1 occurrence) <apply-templates> (1 occurrence) <text> (2 occurrences) <message> (1 occurrence)</screen> <para>Hint: Counting frequencies and grouping by namespaces may be achieved by using standard Java container implementations of <classname>java.util.Map</classname>. You may for example define sets of related XML elements and group them by their corresponding namespaces. Thus nested maps are being required.</para> </question> <answer> <annotation role="make"> <para role="eclipse">P/Sda1/xmlstatistics</para> </annotation> <para>Running this project and executing tests requires the following Maven project dependency to be installed (e.g. locally via <command>mvn</command> <option>install</option>) to satisfy the following dependency:</para> <annotation role="make"> <para role="eclipse">P/Sda1/saxerrorhandler</para> </annotation> <para>The above solution contains both a running application and a (incomplete) <xref linkend="glo_Junit"/> test.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> </section>