From 2593f32208fee7b2f62683725b51e136203068cc Mon Sep 17 00:00:00 2001 From: Martin Goik <> Date: Tue, 3 Feb 2015 23:14:04 +0100 Subject: [PATCH] Transforming into a modular document structure --- Sda1/dom.xml | 1544 +++++ Sda1/fo.xml | 1306 ++++ Sda1/items.xml | 191 - Sda1/jdbc.xml | 3740 ++++++++++ Sda1/prerequisites.xml | 753 ++ Sda1/sax.xml | 1614 +++++ Sda1/sda1.xml | 14354 --------------------------------------- Sda1/testng.xml | 326 + Sda1/try.xml | 0 Sda1/xmlintro.xml | 529 ++ Sda1/xmlschema.xml | 1832 +++++ Sda1/xslt.xml | 2253 ++++++ lectures.xml | 54 + 13 files changed, 13951 insertions(+), 14545 deletions(-) create mode 100644 Sda1/dom.xml create mode 100644 Sda1/fo.xml delete mode 100644 Sda1/items.xml create mode 100644 Sda1/jdbc.xml create mode 100644 Sda1/prerequisites.xml create mode 100644 Sda1/sax.xml delete mode 100644 Sda1/sda1.xml create mode 100644 Sda1/testng.xml mode change 100755 => 100644 Sda1/try.xml create mode 100644 Sda1/xmlintro.xml create mode 100644 Sda1/xmlschema.xml create mode 100644 Sda1/xslt.xml create mode 100644 lectures.xml diff --git a/Sda1/dom.xml b/Sda1/dom.xml new file mode 100644 index 000000000..9737da833 --- /dev/null +++ b/Sda1/dom.xml @@ -0,0 +1,1544 @@ + <chapter xml:id="dom" version="5.0" xmlns="" + xmlns:xlink="" + xmlns:xi="" + xmlns:svg="" + xmlns:m="" + xmlns:html="" + xmlns:db=""> + <title>The Document Object Model (<acronym + xlink:href="">DOM</acronym>)</title> + + <titleabbrev><acronym + xlink:href="">DOM</acronym></titleabbrev> + + <section xml:id="domBase"> + <title>Language independent specification</title> + + <titleabbrev>Language independence</titleabbrev> + + <para>XML documents allow for automated content processing. We already + discussed the <acronym + xlink:href="">SAX</acronym> API to access XML + documents by <xref linkend="glo_Java"/> applications. + There are however situations where <acronym + xlink:href="">SAX</acronym> is not + appropriate:</para> + + <itemizedlist> + <listitem> + <para>The <acronym + xlink:href="">SAX</acronym> is event + based. XML node elements are passed to handler methods. Sometimes + we want to access neighbouring nodes from a context node in our + handler methods for example a <tag class="starttag">title</tag> + following a <tag class="starttag">chapter</tag> node. <acronym + xlink:href="">SAX</acronym> does not + offer any support for this. If we need references to neighbouring + nodes we have to create them ourselves during a <acronym + xlink:href="">SAX</acronym> parsing run. + This is tedious and leads to code being hard to understand.</para> + </listitem> + + <listitem> + <para>Some applications may want to select node sets by <acronym + xlink:href="">XPath</acronym> + expressions which is completely impossible in a <acronym + xlink:href="">SAX</acronym> + application.</para> + </listitem> + + <listitem> + <para>We may want to move subtrees within a document itself (for + example exchanging two <tag class="starttag">chapter</tag> nodes) + or even transferring them to a different document.</para> + </listitem> + </itemizedlist> + + <para>The greatest deficiency of the <acronym + xlink:href="">SAX</acronym> is the fact that + an XML instance is not represented as a tree like structure but as a + succession of events. The <acronym + xlink:href="">DOM</acronym> allows us to + represent XML document instances as tree like structures and thus + enables navigational operations between nodes.</para> + + <para>In order to achieve language <emphasis>and</emphasis> software + vendor independence the <acronym + xlink:href="">DOM</acronym> approach uses two + stages:</para> + + <itemizedlist> + <listitem> + <para>The <acronym + xlink:href="">DOM</acronym> is formulated in + an Interface Definition Language (<abbrev + xlink:href="">IDL</abbrev>)</para> + </listitem> + + <listitem> + <para>In order to use the <acronym + xlink:href="">DOM</acronym> API by a concrete + programming language a so called <emphasis>language + binding</emphasis> is required. In languages like <xref linkend="glo_Java"/> the + language binding will still be a set of <xref linkend="glo_Java"/> + interfaces. Thus for actually coding an application an + implementation of these interfaces is needed</para> + </listitem> + </itemizedlist> + + <para>So what exactly may an <abbrev + xlink:href="">IDL</abbrev> + be? The programming language <xref linkend="glo_Java"/> already allows + pure interface definitions without any implementation. In C++ the same + result can be achieved by so called <emphasis>pure virtual + classes</emphasis>. An <abbrev + xlink:href="">IDL</abbrev> + offers extended features to describe such interfaces. For <acronym + xlink:href="">DOM</acronym> the <productname + xlink:href="">CORBA + 2.2</productname> <abbrev + xlink:href="">IDL</abbrev> + had been chosen to describe an XML document programming interface. As + a first example we take an excerpt from the <acronym + xlink:href="">DOM</acronym>'s <link + xlink:href="">Node</link> + interface definition:</para> + + <programlisting language="none">interface Node { + // NodeType + const unsigned short ELEMENT_NODE = 1; + const unsigned short ATTRIBUTE_NODE = 2; + const unsigned short TEXT_NODE = 3; + ... + + readonly attribute DOMString nodeName; + attribute DOMString nodeValue; + // raises(DOMException) on setting + // raises(DOMException) on retrieval + readonly attribute unsigned short nodeType; + readonly attribute Node parentNode; + ... + readonly attribute NodeList childNodes; + readonly attribute Node firstChild; + ... + Node insertBefore(in Node newChild, + in Node refChild) + raises(DOMException); + ...</programlisting> + + <para>If we want to implement the <abbrev + xlink:href="">IDL</abbrev> + <classname>org.w3c.dom.Node</classname> specification in e.g. <xref linkend="glo_Java"/> a language + binding has to be defined. This means writing <xref linkend="glo_Java"/> code which + closely resembles the <abbrev + xlink:href="">IDL</abbrev> + specification. Obviously this task depends on and is restricted by the + constructs being offered by the target programming language. The W3C + <link + xlink:href="">defines</link> + the <xref linkend="glo_Java"/> + <classname>org.w3c.dom.Node</classname> interface by:</para> + + <programlisting language="none">package org.w3c.dom; + +public interface Node { + public static final short ELEMENT_NODE = 1; // Node Types + public static final short ATTRIBUTE_NODE = 2; + public static final short TEXT_NODE = 3; + ... + public String getNodeName(); + public String getNodeValue() throws DOMException; + public void setNodeValue(String nodeValue) throws DOMException; + public short getNodeType(); + public Node getParentNode(); + public NodeList getChildNodes(); + public Node getFirstChild(); + ... + public Node insertBefore(Node newChild, + Node refChild) + throws DOMException; + ... + }</programlisting> + + <para>We take + <methodname>org.w3c.dom.Node.getChildNodes()</methodname> as an + example:</para> + + <figure xml:id="domRetrieveChildren"> + <title>Retrieving child nodes of a given context node</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/domtree.fig" scale="65"/> + </imageobject> + </mediaobject> + </figure> + + <para>The <classname>org.w3c.dom.Node</classname> interface offers a + set of common operations for objects being part of a XML document. But + a XML document tree contains different types of nodes such as:</para> + + <itemizedlist> + <listitem> + <para>Elements</para> + </listitem> + + <listitem> + <para>Attributes</para> + </listitem> + + <listitem> + <para>Entities</para> + </listitem> + </itemizedlist> + + <para>An XML API may address this issue by offering data types to + represent these different kinds of nodes. The <acronym + xlink:href="">DOM</acronym> <xref linkend="glo_Java"/> Binding + defines an inheritance hierarchy of interfaces for this + purpose:</para> + + <figure xml:id="domJavaNodeInterfaces"> + <title>Inheritance interface hierarchy in the <acronym + xlink:href="">DOM</acronym> <xref linkend="glo_Java"/> + binding</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/nodeHierarchy.svg"/> + </imageobject> + </mediaobject> + </figure> + + <para>Two commonly used <xref linkend="glo_Java"/> + implementations of these interfaces are:</para> + + <variablelist> + <varlistentry> + <term>Xerces</term> + + <listitem> + <para><orgname + xlink:href="">Apache Software + foundation</orgname></para> + </listitem> + </varlistentry> + + <varlistentry> + <term>Jaxp</term> + + <listitem> + <para><orgname xlink:href="">Sun + microsystems</orgname></para> + </listitem> + </varlistentry> + </variablelist> + + <para>Both implementations offer additional interfaces beyond the + <acronym xlink:href="">DOM</acronym>'s + scope.</para> + + <para>Going back to the <acronym + xlink:href="">DOM</acronym> itself the + specification is divided into <link + xlink:href="">modules</link>:</para> + + <figure xml:id="figureDomModules"> + <title><acronym xlink:href="">DOM</acronym> + modules.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Screen/dom-architecture.screen.png"/> + </imageobject> + </mediaobject> + </figure> + </section> + + <section xml:id="domCreate"> + <title>Creating a new document from scratch</title> + + <titleabbrev>New document</titleabbrev> + + <para>If we want to export non-XML content (e.g. from a RDBMS) into + XML we may achieve this by the following recipe:</para> + + <orderedlist> + <listitem> + <para>Create a document builder instance.</para> + </listitem> + + <listitem> + <para>Create an empty <link + xlink:href="">Document</link> + instance.</para> + </listitem> + + <listitem> + <para>Fill in the desired Elements and Attributes.</para> + </listitem> + + <listitem> + <para>Create a serializer.</para> + </listitem> + + <listitem> + <para>Serialize the resulting tree to a stream.</para> + </listitem> + </orderedlist> + + <para>An introductory piece of code illustrates these steps:</para> + + <figure xml:id="simpleDomCreate"> + <title>Creation of a XML document instance from scratch.</title> + + <programlisting language="none">package dom; +... +public class CreateDoc { + public static void main(String[] args) throws Exception { + + // Create the root element + <emphasis role="bold">final Element titel = new Element("titel"); +</emphasis> + //Set a date + <emphasis role="bold">titel.setAttribute("date", "23.02.2000");</emphasis> + + // Append a text node as child + <emphasis role="bold">titel.addContent(new Text("Versuch 1"));</emphasis> + + + // Set formatting for the XML output + <emphasis role="bold">final Format outFormat = Format.getPrettyFormat();</emphasis> + + // Serialize to console + <emphasis role="bold">final XMLOutputter printer = new XMLOutputter(outFormat); + printer.output(titel, System.out);</emphasis> + } +}</programlisting> + </figure> + + <para>We get the following result:</para> + + <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> +<titel date="23.02.2000">Versuch 1</titel></programlisting> + </section> + + <section xml:id="domCreateExercises"> + <title>Exercises</title> + + <qandaset defaultlabel="qanda" xml:id="createDocModify"> + <title>A sub structured <tag class="starttag">title</tag></title> + + <qandadiv> + <qandaentry> + <question> + <label>Creation of an extended XML document instance</label> + + <para>In order to run the examples given during the lecture + the <filename + xlink:href="">jdom2.jar</filename> + library must be added to the <envar>CLASSPATH</envar>.</para> + + <para>The <acronym + xlink:href="">DOM</acronym> creating + example given before may be used as a starting point. Extend + the <acronym xlink:href="">DOM</acronym> + tree created in <xref linkend="simpleDomCreate"/> to produce + an extended XML document:</para> + + <programlisting language="none"><title> + <long>The long version of this title</long> + <short>Short version</short> +</title></programlisting> + </question> + + <answer> + <programlisting language="none">package dom; +... +public class CreateExtended { + /** + * @param args + * @throws IOException + */ + public static void main(String[] args) throws IOException { + + final Element titel = new Element("titel"), + tLong = new Element("long"), + tShort = new Element("short"); + + <emphasis role="bold">// Append <long> and <short> to parent <title></emphasis> + titel.addContent(tLong).addContent(tShort); + + <emphasis role="bold">// Append text to <long> and <short></emphasis> + tLong.addContent(new Text("The long version of this title")); + tShort.addContent(new Text("Short version")); + + <emphasis role="bold">// Set formatting for the XML output</emphasis> + Format outFormat = Format.getPrettyFormat(); + + <emphasis role="bold">// Serialize to console</emphasis> + final XMLOutputter printer = new XMLOutputter(outFormat); + printer.output(titel, System.out); + } +}</programlisting> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="domParse"> + <title>Parsing existing XML documents</title> + + <titleabbrev>Parsing</titleabbrev> + + <para>We already used a <acronym + xlink:href="">SAX</acronym> to parse an XML + document. Rather than handling <acronym + xlink:href="">SAX</acronym> events ourselves + these events may be used to construct a <acronym + xlink:href="">DOM</acronym> representation of our + document. This work is done by an instance of. We use our catalog + example from <xref linkend="simpleCatalog"/> as an introductory + example.</para> + + <para>We already noticed the need for an + <classname>org.xml.sax.ErrorHandler</classname> object during <acronym + xlink:href="">SAX</acronym> processing. A + <acronym xlink:href="">DOM</acronym> Parser + requires a similar type of Object in order to react to parsing errors + in a meaningful way. In principle a <acronym + xlink:href="">DOM</acronym> parser implementor is + free to choose his implementation but most implementations are based + on top of a <acronym + xlink:href="">SAX</acronym> parser. For this + reason it was natural to choose a <acronym + xlink:href="">DOM</acronym> error handling + interface which is similar to a <acronym + xlink:href="">SAX</acronym> + <classname>org.xml.sax.ErrorHandler</classname>. The following code + serves the needs described before:</para> + + <figure xml:id="domTreeTraversal"> + <title>Accessing a XML Tree purely by <acronym + xlink:href="">DOM</acronym> methods.</title> + + <programlisting language="none">package dom; +... +public class ArticleOrder { + +<emphasis role="bold"> // Though we are playing DOM here, a <acronym + xlink:href="">SAX</acronym> parser still + // assembles our DOM tree.</emphasis> + private SAXBuilder builder = new SAXBuilder(); + + public ArticleOrder() { + <emphasis role="bold">// Though an ErrorHandler is not strictly required it allows + // for easierlocalization of XML document errors</emphasis> + builder.setErrorHandler(new MySaxErrorHandler(System.out));<co + linkends="domSetSaxErrorHandler-co" + xml:id="domSetSaxErrorHandler"/> + } + + /** Descending a catalog till its <item> elements. For each product + * its name and order number are being written to the output. + * @throws ... + */ + public void process(final String filename) throws JDOMException, IOException { + + <emphasis role="bold">// Parsing our XML file</emphasis> + final Document docInput =; + + <emphasis role="bold">// Accessing the document's root element</emphasis> + final Element docRoot = docInput.getRootElement(); + + <emphasis role="bold">// Accessing the <item> children of parent element <catalog></emphasis> + final List<Element> items = docRoot.getChildren(); // Element nodes only + for (final Element item : items) { + System.out.println("Article: " + item.getText() + + ", order number: " + item.getAttributeValue("orderNo")); + } ...</programlisting> + + <para>Note <coref linkend="domSetSaxErrorHandler" + xml:id="domSetSaxErrorHandler-co"/>: This is our standard <acronym + xlink:href="">SAX</acronym> error handler + implementing the <classname>org.xml.sax.ErrorHandler</classname> + interface.</para> + </figure> + + <para>Executing this method needs a driver instance providing an input + XML filename:</para> + + <programlisting language="none">package dom; +... +public class ArticleOrderDriver { + public static void main(String[] argv) throws Exception { + final ArticleOrder ao = new ArticleOrder(); + ao.process("<emphasis role="bold">Input/article.xml</emphasis>"); + } +}</programlisting> + + <para>This yields:</para> + + <programlisting language="none">Article: Swinging headset, order number: 3218 +Article: 200W Stereo Amplifier, order number: 9921</programlisting> + + <para>To illustrate the internal processes we take a look at the + sequence diagram:</para> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/sequenceDomParser.svg"/> + </imageobject> + </mediaobject> + + <qandaset defaultlabel="qanda" xml:id="exercise_domHtmlSimple"> + <title>Creating HTML output</title> + + <qandadiv> + <qandaentry> + <question> + <label>Simple HTML output</label> + + <para>Instead exporting simple text output in <xref + linkend="domTreeTraversal"/> we may also create HTML pages + like:</para> + + <programlisting language="none"><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html> + <head> + <title>Available articles</title> + </head> + <body> + <h1>Available articles</h1> + <table> + <tbody> + <tr> + <th align="left">Article Description</th><th>Order Number</th> + </tr> + <tr> + <td align="left"><emphasis role="bold">Swinging headset</emphasis></td><td><emphasis + role="bold">3218</emphasis></td> + </tr> + <tr> + <td align="left"><emphasis role="bold">200W Stereo Amplifier</emphasis></td><td><emphasis + role="bold">9921</emphasis></td> + </tr> + </tbody> + </table> + </body> +</html></programlisting> + + <para>Instead of simply writing + <code>...println(<html>\n\t<head>...)</code> + statements you are expected to code a more sophisticated + solution. We may combine<xref linkend="createDocModify"/> and + <xref linkend="createDocModify"/>. The idea is reading the XML + catalog instance as a <acronym + xlink:href="">DOM</acronym> as before. + Then construct a <emphasis>second</emphasis> <acronym + xlink:href="">DOM</acronym> tree for the + desired HTML output and fill in the article information from + the first <acronym + xlink:href="">DOM</acronym> tree + accordingly.</para> + </question> + + <answer> + <para>We introduce a class + <classname>solve.dom.HtmlTree</classname>:</para> + + <programlisting language="none">package solve.dom; +... +package solve.dom; + +import; +import; + +import org.jdom2.DocType; +import org.jdom2.Document; +import org.jdom2.Element; +import org.jdom2.Text; +import org.jdom2.output.Format; +import org.jdom2.output.XMLOutputter; + +/** + * Holding a HTML DOM to produce output. + * @author goik + */ +public class HtmlTree { + + private Document htmlOutput; + private Element tableBody; + + public HtmlTree(final String titleText, + final String[] tableHeaderFields) { <co + linkends="programlisting_catalog2html_htmlskel_co" + xml:id="programlisting_catalog2html_htmlskel"/> + + DocType doctype = new DocType("html", + "-//W3C//DTD XHTML 1.0 Strict//EN", + ""); + + final Element htmlRoot = new Element("html"); <co + linkends="programlisting_catalog2html_tablehead_co" + xml:id="programlisting_catalog2html_tablehead"/> + htmlOutput = new Document(htmlRoot); + htmlOutput.setDocType(doctype); + + // We create a HTML skeleton including an "empty" table + final Element head = new Element("head"), + body = new Element("body"), + table = new Element("table"); + + htmlRoot.addContent(head).addContent(body); + + head.addContent(new Element("title").addContent(new Text(titleText))); + + body.addContent(new Element("h1").addContent(new Text(titleText))); + + body.addContent(table); + + + tableBody = new Element("tbody"); + table.addContent(tableBody); + + final Element tr = tableBody.addContent(new Element("tr")); + for (final String headerField: tableHeaderFields) { + tr.addContent(new Element("th").addContent(new Text(headerField))); + } + } + + public void appendItem(final String itemName, final String orderNo) {<co + linkends="programlisting_catalog2html_insertproduct_co" + xml:id="programlisting_catalog2html_insertproduct"/> + final Element tr = new Element("tr"); + tableBody.addContent(tr); + tr.addContent(new Element("td").addContent(new Text(itemName))); + tr.addContent(new Element("td").addContent(new Text(orderNo))); + } + public void serialize(PrintStream out){ + + // Set formatting for the XML output + final Format outFormat = Format.getPrettyFormat(); + + // Serialize to console + final XMLOutputter printer = new XMLOutputter(outFormat); + try { + printer.output(htmlOutput, System.out); + } catch (IOException e) { + e.printStackTrace(); + System.exit(1); + } + } + /** + * @return the table's <tbody> element + */ + public Element getTable() { + return tableBody; + } +} + + </programlisting> + + <calloutlist> + <callout arearefs="programlisting_catalog2html_htmlskel" + xml:id="programlisting_catalog2html_htmlskel_co"> + <para>A basic HTML skeleton is is being created:</para> + + <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" + ""> +<html xmlns=""> + <head> + <title>Available articles</title> + </head> + <body> + <h1>Available articles</h1> + <table> + <emphasis role="bold"><tbody></emphasis> <!-- Data to be inserted here in next step --> + <emphasis role="bold"></tbody></emphasis> + </table> + </body> +</html></programlisting> + + <para>The table containing the product's data is empty at + this point and thus invalid.</para> + </callout> + + <callout arearefs="programlisting_catalog2html_tablehead" + xml:id="programlisting_catalog2html_tablehead_co"> + <para>The table's header is appended but the actual data + from our two products is still missing:</para> + + <programlisting language="none">... <h1>Available articles</h1> + <table> + <tbody> + <tr> + <th>Article Description</th> + <th>Order Number</th> + <emphasis role="bold"></tr></emphasis><!-- Data to be appended after this row in next step --> + <emphasis role="bold"></tbody></emphasis> + </table> ...</programlisting> + </callout> + + <callout arearefs="programlisting_catalog2html_insertproduct" + xml:id="programlisting_catalog2html_insertproduct_co"> + <para>Calling + <methodname>solve.dom.HtmlTree.appendItem(String,String)</methodname> + once per product completes the creation of our HTML DOM + tree:</para> + + <programlisting language="none">... </tr> + <tr> + <td>Swinging headset</td> + <td>3218</td> + </tr> + <tr> + <td>200W Stereo Amplifier</td> + <td>9921</td> + </tr> + </tbody> ...</programlisting> + </callout> + </calloutlist> + + <para>The class <classname>solve.dom.Article2Html</classname> + reads the catalog data:</para> + + <programlisting language="none">package solve.dom; +... +public class Article2Html { + + private final SAXBuilder builder = new SAXBuilder(); + private final HtmlTree htmlResult; + + public Article2Html() { + + builder.setErrorHandler(new MySaxErrorHandler(System.out)); + + htmlResult = new HtmlTree("Available articles", new String[] { <co + linkends="programlisting_catalog2html_glue_createhtmldom_co" + xml:id="programlisting_catalog2html_glue_createhtmldom"/> + "Article Description", "Order Number" }); + } + + /** Read an Xml catalog instance and insert product names among with their + * order numbers into the HTML DOM. Then serialize HTML tree to a stream. + * + * @param + * filename of the Xml source. + * @param out + * The output stream for HTML serialization. + * @throws IOException + * @throws JDOMException + */ + public void process(final String filename, final PrintStream out) throws JDOMException, IOException{ + final List<Element> items = +; + + for (final Element item : items) { <co + linkends="programlisting_catalog2html_glue_prodloop_co" + xml:id="programlisting_catalog2html_glue_prodloop"/> + htmlResult.appendItem(item.getText(), item.getAttributeValue("orderNo")); <co + linkends="programlisting_catalog2html_glue_insertprod_co" + xml:id="programlisting_catalog2html_glue_insertprod"/> + } + htmlResult.serialize(out); <co + linkends="programlisting_catalog2html_glue_serialize_co" + xml:id="programlisting_catalog2html_glue_serialize"/> + } +}</programlisting> + + <calloutlist> + <callout arearefs="programlisting_catalog2html_glue_createhtmldom" + xml:id="programlisting_catalog2html_glue_createhtmldom_co"> + <para>Create an instance holding a HTML <acronym + xlink:href="">DOM</acronym> with a + table header containing the strings <emphasis>Article + Description</emphasis> and <emphasis>Order + Number</emphasis>.</para> + </callout> + + <callout arearefs="programlisting_catalog2html_glue_prodloop" + xml:id="programlisting_catalog2html_glue_prodloop_co"> + <para>Iterate over all product nodes.</para> + </callout> + + <callout arearefs="programlisting_catalog2html_glue_insertprod" + xml:id="programlisting_catalog2html_glue_insertprod_co"> + <para>Insert the product's name an order number into the + HTML <acronym + xlink:href="">DOM</acronym>.</para> + </callout> + + <callout arearefs="programlisting_catalog2html_glue_serialize" + xml:id="programlisting_catalog2html_glue_serialize_co"> + <para>Serialize the completed HTML <acronym + xlink:href="">DOM</acronym> tree to + the output stream.</para> + </callout> + </calloutlist> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="domJavaScript"> + <title>Using <acronym xlink:href="">DOM</acronym> + with HTML/Javascript</title> + + <para>Due to script language support in a variety of browsers we may + also use the <acronym xlink:href="">DOM</acronym> + to implement client side event handling. As an example we <link + xlink:href="Ref/src/tablesort.html">demonstrate</link> how a HTML + table can be made sortable by clicking on a header's column. The + example code along with the code description can be found at <uri + xlink:href=""></uri>.</para> + + <para>Quite remarkably there are only few ingredients required to + enrich an ordinary static HTML table with this functionality:</para> + + <itemizedlist> + <listitem> + <para>An external Javascript library has to be included via + <code><script type="text/javascript" + src="sorttable.js"></code></para> + </listitem> + + <listitem> + <para>Each sortable HTML table needs:</para> + + <itemizedlist> + <listitem> + <para>A unique <code>id</code> attribute</para> + </listitem> + + <listitem> + <para>A <code>class="sortable"</code> attribute</para> + </listitem> + </itemizedlist> + </listitem> + </itemizedlist> + </section> + + <section xml:id="domXpath"> + <title>Using <acronym + xlink:href="">XPath</acronym></title> + + <para><xref linkend="domTreeTraversal"/> demonstrated the possibility + to traverse trees solely by using <acronym + xlink:href="">DOM</acronym> Method calls. Though + this approach is possible it will in general not lead to stable + applications. Real world examples are often based on large XML + documents with complex hierarchical structures. Thus using this rather + primitive approach will foster deeply nested method calls being + necessary to access desired node sets. In addition changing the + conceptional schema will require rewriting large code + portions..</para> + + <para>As we already know from <abbrev + xlink:href="">XSL</abbrev> transformations + <code>Xpath</code> allows to address node sets inside a XML tree. The + role of <acronym + xlink:href="">XPath</acronym> can be + compared to SQL queries when working with relational databases. + <acronym xlink:href="">XPath</acronym> may + also be used within <xref linkend="glo_Java"/> code. As a + first example we show an image filename extracting application + operating on XHTML documents. The following example contains three + <tag class="starttag">img</tag> elements:</para> + + <figure xml:id="htmlGallery"> + <title>A HTML document containing <code>IMG</code> tags.</title> + + <programlisting language="none"><?xml version="1.0"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" + ""> +<html> + <head> + <title>Picture gallery</title> + </head> + <body> + <h1>Picture gallery</h1> + <p>Images may appear inline:<emphasis role="bold"><img src="inline.gif" alt="none"/></emphasis></p> + <table> + <tbody> + <tr> + <td>Number one:</td> + <td><emphasis role="bold"><img src="one.gif" alt="none"/></emphasis></td> + </tr> + <tr> + <td>Number two:</td> + <td><emphasis role="bold"><img src="" alt="none"/></emphasis></td> + </tr> + </tbody> + </table> + </body> +</html> +</programlisting> + </figure> + + <para>A given HTML document may contain <tag + class="starttag">img</tag> elements at <emphasis>arbitrary</emphasis> + positions. It is sometimes desirable to check for existence and + accessibility of such external objects being necessary for the page's + correct rendering. A simple XSL script will do first part the job + namely extracting the <tag class="starttag">img</tag> elements:</para> + + <figure xml:id="gallery2imagelist"> + <title>A <abbrev + xlink:href="">XSL</abbrev> script for + image name extraction.</title> + + <programlisting language="none"><xsl:stylesheet version="1.0" xmlns:xsl="" + xmlns:html=""> + <xsl:output method="text"/> + + <xsl:template match="/"> + <xsl:for-each select="//html:img"> + <xsl:value-of select="@src"/> + <xsl:text> </xsl:text> + </xsl:for-each> + </xsl:template> + +</xsl:stylesheet></programlisting> + </figure> + + <para>Note the necessity for <code>html</code> namespace inclusion + into the <acronym + xlink:href="">XPath</acronym> expression in + <code><xsl:for-each select="//html:img"></code>. A simple + <code>select="//img"></code> results in an empty node set. + Executing the <abbrev + xlink:href="">XSL</abbrev> script yields a + list of image filenames being contained in the HTML page i.e. + <code>inline.gif one.gif two.gif</code>.</para> + + <para>Now we want to write a <xref linkend="glo_Java"/> application + which allows to check whether these referenced image files do exist + and have sufficient permissions to be accessed. A simple approach may + pipe the <abbrev xlink:href="">XSL</abbrev> + output to our application which then executes the readability checks. + Instead we want to incorporate the <acronym + xlink:href="">XPath</acronym> based search + into the application. Ignoring Namespaces and trying to resemble the + <abbrev xlink:href="">XSL</abbrev> actions + as closely as possible our application will have to search for <link + xlink:href="">Element</link> + Nodes by the <acronym + xlink:href="">XPath</acronym> expression + <code>//html:img</code>:</para> + + <figure xml:id="domFindImages"> + <title>Extracting <tag class="emptytag">img</tag> element image + references from a HTML document.</title> + + <programlisting language="none">package dom.xpath; +... +public class DomXpath { + private final SAXBuilder builder = new SAXBuilder(); + + public DomXpath() { + builder.setErrorHandler(new MySaxErrorHandler(System.err)); + } + public void process(final String xhtmlFilename) throws JDOMException, IOException { + + final Document htmlInput =;<co + linkends="programlisting_java_searchimg_parse_co" + xml:id="programlisting_java_searchimg_parse"/> + final XPathExpression<Object> xpath = XPathFactory.instance().compile( "//img" ); <co + linkends="programlisting_java_searchimg_pf_co" + xml:id="programlisting_java_searchimg_pf"/> <co + linkends="programlisting_java_searchimg_newxpath_co" + xml:id="programlisting_java_searchimg_newxpath"/> + final List<Object> images = xpath.evaluate(htmlInput);<co + linkends="programlisting_java_searchimg_execquery_co" + xml:id="programlisting_java_searchimg_execquery"/> + + for (Object o: images) { <co + linkends="programlisting_java_searchimg_loop_co" + xml:id="programlisting_java_searchimg_loop"/> + final Element image = (Element ) o;<co + linkends="programlisting_java_searchimg_cast_co" + xml:id="programlisting_java_searchimg_cast"/> + System.out.print(image.getAttribute("src") + " "); + } + } +}</programlisting> + + <caption> + <para>This application searches for <tag + class="emptytag">img</tag> elements and shows their + <code>src</code> attribute value.</para> + </caption> + </figure> + + <calloutlist> + <callout arearefs="programlisting_java_searchimg_parse" + xml:id="programlisting_java_searchimg_parse_co"> + <para>Parse a XHTML document instance into a DOM tree.</para> + </callout> + + <callout arearefs="programlisting_java_searchimg_pf" + xml:id="programlisting_java_searchimg_pf_co"> + <para>Create a <acronym + xlink:href="">XPath</acronym> + factory.</para> + </callout> + + <callout arearefs="programlisting_java_searchimg_newxpath" + xml:id="programlisting_java_searchimg_newxpath_co"> + <para>Create a <acronym + xlink:href="">XPath</acronym> query + instance. This may be used to search for a set of nodes starting + from a context node.</para> + </callout> + + <callout arearefs="programlisting_java_searchimg_execquery" + xml:id="programlisting_java_searchimg_execquery_co"> + <para>Using the document's root node as the context node we search + for <tag class="starttag">img</tag> elements appearing at + arbitrary positions in our document.</para> + </callout> + + <callout arearefs="programlisting_java_searchimg_loop" + xml:id="programlisting_java_searchimg_loop_co"> + <para>We iterate over the retrieved list of images.</para> + </callout> + + <callout arearefs="programlisting_java_searchimg_cast" + xml:id="programlisting_java_searchimg_cast_co"> + <para>Casting to the correct type.</para> + </callout> + </calloutlist> + + <para>The result is a list of image filename references:</para> + + <programlisting language="none">inline.gif one.gif </programlisting> + + <qandaset defaultlabel="qanda" xml:id="quandaentry_CastAlwaysLegal"> + <title>Legal casting?</title> + + <qandadiv> + <qandaentry> + <question> + <para>Why is the cast in <coref + linkend="programlisting_java_searchimg_cast"/> in <xref + linkend="domFindImages"/> guaranteed to never cause a + <classname>java.lang.ClassCastException</classname>?</para> + </question> + + <answer> + <para>The <acronym + xlink:href="">XPath</acronym> + <code>//img</code> expression is guaranteed to return only + <tag class="starttag">img</tag> elements. Thus within our + <xref linkend="glo_Java"/> + context we are sure to find only + <classname>org.jdom2.Element</classname> instances.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + + <qandaset defaultlabel="qanda" xml:id="exercise_htmlImageVerify"> + <title>Verification of referenced images readability</title> + + <qandadiv> + <qandaentry> + <question> + <para>We want to extend the example given in <xref + linkend="domFindImages"/> by testing the existence and + checking for readability of referenced images. The following + HTML document contains <quote>dead</quote> image + references:</para> + + <programlisting language="none" + xml:id="domCheckImageAccessibility"><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" + ""> +<html xmlns=""> ... + <body> + <h1>External Pictures</h1> + <p>A local image reference:<img src="inline.gif" alt="none"/></p> + <table> + <tbody> + <tr> + <td>An existing picture:</td> + <td><img + src="" + alt="none"/></td> + </tr> + <tr> + <td>A non-existing picture:</td> + <td><img src="<emphasis role="bold"></emphasis>" alt="none"/></td> + </tr> + </tbody> + </table> + </body> +</html></programlisting> + + <para>Write an application which checks for readability of + <abbrev + xlink:href="">URL</abbrev> + image references to <emphasis>external</emphasis> Servers + starting either with <code>http://</code> or + <code>ftp://</code> ignoring other protocol types. Internal + image references referring to the <quote>current</quote> + server typically look like <code><img + src="/images/test.gif"</code>. So in order to distinguish + these two types of references we may use the XSL built in + function <link + xlink:href="">starts-with()</link> + testing for the <code>http</code> or <code>ftp</code> protocol + definition part of an <abbrev + xlink:href="">URL</abbrev>. + A possible output for the example being given is:</para> + + <programlisting language="none">Received 'sun.awt.image.URLImageSource' from + +Unable to open ''</programlisting> + + <para>The following code snippet shows a helpful class method + to check for both correctness of <abbrev + xlink:href="">URL</abbrev>'s + and accessibility of referenced objects:</para> + + <programlisting language="none">package dom.xpath; +... +public class CheckUrl { + public static void checkReadability(final String urlRef) { + try { + final URL url = new URL(urlRef); + try { + final Object imgCandidate = url.getContent(); + if (null == imgCandidate) { + System.err.println("Unable to open '" + urlRef + "'"); + } else { + System.out.println("Received '" + + imgCandidate.getClass().getName() + "' from " + + urlRef); + } + } catch (IOException e) { + System.err.println("Unable to open '" + urlRef + "'"); + } + } catch (MalformedURLException e) { + System.err.println("Adress '" + urlRef + "' is malformed"); + } + } +}</programlisting> + </question> + + <answer> + <para>We are interested in the set of images within a given + HTML document containing an <link + xlink:href="">URL</link> reference + starting either with <code>http://</code> or + <code>ftp://</code>. This is achieved by the following + <acronym + xlink:href="">XPath</acronym> + expression:</para> + + <programlisting language="none">//html:img[starts-with(@src, 'http://') or starts-with(@src, 'ftp://')]</programlisting> + + <para>The application only needs to pass the corresponding + <abbrev + xlink:href="">URL</abbrev>'s + to the method <link + xlink:href="domCheckUrlObjectExistence">CheckUrl.checkReadability()</link>. + The rest of the code is identical to the <link + linkend="domFindImages">introductory example</link>:</para> + + <informalfigure xml:id="solutionFintExtImgRef"> + <programlisting language="none">package dom.xpath; +... +public class CheckExtImage { + private final SAXBuilder builder = new SAXBuilder(); + + public CheckExtImage() { + builder.setErrorHandler(new MySaxErrorHandler(System.err)); + } + public void process(final String xhtmlFilename) throws JDOMException, IOException { + + final Document htmlInput =; + final XPathExpression<Object> xpath = XPathFactory.instance().compile( + "<emphasis role="bold">//img[starts-with(@src, 'http://') or starts-with(@src, 'ftp://')]</emphasis>"); + final List<Object> images = xpath.evaluate(htmlInput); + + for (Object o: images) { + final Element image = (Element ) o; + <emphasis role="bold">CheckUrl.checkReadability(image.getAttributeValue("src"));</emphasis> + } + } +}</programlisting> + </informalfigure> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="domXsl"> + <title><acronym xlink:href="">DOM</acronym> and + <abbrev xlink:href="">XSL</abbrev></title> + + <para><xref linkend="glo_Java"/> + based <xref linkend="glo_XML"/> + applications may use XSL style sheets for processing. A <acronym + xlink:href="">DOM</acronym> tree may for example + be transformed into another tree. The package <link + xlink:href="">javax.xml.transform</link> + provides interfaces and classes for this purpose. We consider the + following product catalog example:</para> + + <figure xml:id="climbingCatalog"> + <title>A simplified <xref linkend="glo_XML"/> product + catalog</title> + + <programlisting language="none"><catalog xmlns:xsi="" + xsi:noNamespaceSchemaLocation="catalog.xsd"> + <title>Outdoor products</title> + <introduction> + <para>We offer a great variety of basic stuff for mountaineering + such as ropes, harnesses and tents.</para> + <para>Our shop is proud for its large number of available + sleeping bags.</para> + </introduction> + <product id="x-223"> + <title>Multi freezing bag Nightmare camper</title> + <description> + <para>You will feel comfortable till minus 20 degrees - At + least if you are a penguin or a polar bear.</para> + </description> + </product> + <product id="r-334"> + <title>Rope 40m</title> + <description> + <para>Excellent for indoor climbing.</para> + </description> + </product> +</catalog></programlisting> + + <para>A corresponding schema file <filename>catalog.xsd</filename> + is straightforward:</para> + + <programlisting language="none"><xs:schema xmlns:xs="" + xmlns:vc="" elementFormDefault="qualified" + vc:minVersion="1.0" vc:maxVersion="1.1"> + + <xs:simpleType name="money"> + <xs:restriction base="xs:decimal"> + <xs:fractionDigits value="2"/> + </xs:restriction> + </xs:simpleType> + + <xs:element name="title" type="xs:string"/> + <xs:element name="para" type="xs:string"/> + + <xs:element name="description" type="paraSequence"/> + <xs:element name="introduction" type="paraSequence"/> + + <xs:complexType name="paraSequence"> + <xs:sequence> + <xs:element ref="para" minOccurs="1" maxOccurs="unbounded"/> + </xs:sequence> + </xs:complexType> + + <xs:element name="product"> + <xs:complexType> + <xs:sequence> + <xs:element ref="title"/> + <xs:element ref="description"/> + </xs:sequence> + <xs:attribute name="id" type="xs:ID" use="required"/> + <xs:attribute name="price" type="money" use="optional"/> + </xs:complexType> + </xs:element> + + <xs:element name="catalog"> + <xs:complexType> + <xs:sequence> + <xs:element ref="title"/> + <xs:element ref="introduction"/> + <xs:element ref="product" minOccurs="1" maxOccurs="unbounded"/> + </xs:sequence> + </xs:complexType> + </xs:element> + +</xs:schema> +</programlisting> + </figure> + + <para>A <abbrev xlink:href="">XSL</abbrev> + style sheet may be used to transform this document into the HTML + Format:</para> + + <figure xml:id="catalog2html"> + <title>A <abbrev + xlink:href="">XSL</abbrev> style sheet + for catalog transformation to HTML.</title> + + <programlisting language="none"><?xml version="1.0" encoding="utf-8"?> +<xsl:stylesheet xmlns:xsl="" + version="2.0" xmlns=""> + + <xsl:template match="/catalog"> + <html> + <head><title><xsl:value-of select="title"/></title></head> + <body style="background-color:#FFFFFF"> + <h1><xsl:value-of select="title"/></h1> + <xsl:apply-templates select="product"/> + </body> + </html> + </xsl:template> + + <xsl:template match="product"> + <h3><xsl:value-of select="title"/></h3> + <xsl:for-each select="description/para"> + <p><xsl:value-of select="."/></p> + </xsl:for-each> + <xsl:if test="price"> + <p> + <xsl:text>Price:</xsl:text> + <xsl:value-of select="price/@value"/> + </p> + </xsl:if> + </xsl:template> +</xsl:stylesheet></programlisting> + </figure> + + <para>As a preparation for <xref linkend="exercise_catalogRdbms"/> we + now demonstrate the usage of <abbrev + xlink:href="">XSL</abbrev> within a + <xref linkend="glo_Java"/> application. + This is done by a <link + xlink:href="">Transformer</link> + instance:</para> + + <figure xml:id="xml2xml"> + <title>Transforming an XML document instance to HTML by a XSL style + sheet.</title> + + <programlisting language="none">package dom.xsl; +... +public class Xml2Html { + private final SAXBuilder builder = new SAXBuilder(); + + final XSLTransformer transformer; + + public Xml2Html(final String xslFilename) throws XSLTransformException { + builder.setErrorHandler(new MySaxErrorHandler(System.err)); + transformer = new XSLTransformer(xslFilename); + } + public void transform(final String xmlInFilename, + final String resultFilename) throws JDOMException, IOException { + + final Document inDoc =; + Document result = transformer.transform(inDoc); + + // Set formatting for the XML output + final Format outFormat = Format.getPrettyFormat(); + + // Serialize to console + final XMLOutputter printer = new XMLOutputter(outFormat); + printer.output(result.getDocument(), System.out); + + } +}</programlisting> + </figure> + + <para>A corresponding driver file is needed to invoke a + transformation:</para> + + <figure xml:id="xml2xmlDriver"> + <title>A driver class for the xml2xml transformer.</title> + + <programlisting language="none">package dom.xsl; +... +public class Xml2HtmlDriver { +... + public static void main(String[] args) { + final String + inFilename = "Input/Dom/climbing.xml", + xslFilename = "Input/Dom/catalog2html.xsl", + htmlOutputFilename = "Input/Dom/climbing.html"; + try { + final Xml2Html converter = new Xml2Html(xslFilename); + converter.transform(inFilename, htmlOutputFilename); + } catch (Exception e) { + System.err.println("The conversion of '" + inFilename + + "' by stylesheet '" + xslFilename + + "' to output HTML file '" + htmlOutputFilename + + "' failed with the following error:" + e); + e.printStackTrace(); + } + } +}</programlisting> + </figure> + + <qandaset defaultlabel="qanda" xml:id="exercise_catalogRdbms"> + <title>HTML from XML and relational data</title> + + <qandadiv> + <qandaentry> + <question> + <label>Catalogs and RDBMS</label> + + <para>We want to extend the transformation being described + before in <xref linkend="xml2xml"/> by reading price + information from a RDBMS. Consider the following schema and + <code>INSERT</code>s:</para> + + <programlisting language="none">CREATE TABLE Product( + orderNo CHAR(10) + ,price NUMERIC(10,2) +); + +INSERT INTO Product VALUES('x-223', 330.20); +INSERT INTO Product VALUES('w-124', 110.40);</programlisting> + + <para>Adding prices may be implemented the following + way:</para> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/xml2html.fig"/> + </imageobject> + </mediaobject> + + <para>You may implement this by following these steps:</para> + + <orderedlist> + <listitem> + <para>You may reuse class + <classname>sax.rdbms.RdbmsAccess</classname> from <xref + linkend="saxRdbms"/>.</para> + </listitem> + + <listitem> + <para>Use the previous class to modify <xref + linkend="xml2xml"/> by introducing a new method + <code>addPrices(final Document catalog)</code> which adds + prices to the <acronym + xlink:href="">DOM</acronym> tree + accordingly. The insertion points may be reached by an + <acronym + xlink:href="">XPath</acronym> + expression.</para> + </listitem> + </orderedlist> + </question> + + <answer> + <para>The additional functionality on top of <xref + linkend="xml2xml"/> is represented by a method + <methodname>dom.xsl.XmlRdbms2Html.addPrices()</methodname>. + This method modifies the <acronym + xlink:href="">DOM</acronym> input tree + prior to applying the XSL. Prices are being inserting based on + data received from an RDBMS via <trademark + xlink:href="">JDBC</trademark>:</para> + + <programlisting language="none">package dom.xsl; +... +public class XmlRdbms2Html { + private final SAXBuilder builder = new SAXBuilder(); + + DbAccess db = new DbAccess(); + + final XSLTransformer transformer; + Document catalog; + + final org.jdom2.xpath.XPathExpression<Object> selectProducts = + XPathFactory.instance().compile("/catalog/product"); + + /** + * @param xslFilename the stylesheet being used for subsequent + * transformations by {@link #transform(String, String)}. + * + * @throws XSLTransformException + */ + public XmlRdbms2Html(final String xslFilename) throws XSLTransformException { + builder.setErrorHandler(new MySaxErrorHandler(System.err)); + transformer = new XSLTransformer(xslFilename); + } + + /** + * The actual workhorse carrying out the transformation + * and adding prices from the database table. + * + * @param xmlInFilename input file to be transformed + * @param resultFilename the result file holding the generated HTML document + * @throws JDOMException The transformation may fail for various reasons. + * @throws IOException + */ + public void transform(final String xmlInFilename, + final String resultFilename) throws JDOMException, IOException { + + catalog =; + + addPrices(); + + final Document htmlResult = transformer.transform(catalog); + + // Set formatting for the XML output + final Format outFormat = Format.getPrettyFormat(); + + // Serialize to console + final XMLOutputter printer = new XMLOutputter(outFormat); + printer.output(htmlResult, System.out); + + } + private void addPrices() { + final List<Object> products = selectProducts.evaluate(catalog.getRootElement()); + + db.connect("jdbc:mysql://localhost:3306/hdm", "hdmuser", "XYZ"); + for (Object p: products) { + final Element product = (Element ) p; + final String productId = product.getAttributeValue("id"); + product.setAttribute("price", db.readPrice(productId)); + } + db.close(); + } +}</programlisting> + + <para>The method <code>addPrices(...)</code> utilizes our + RDBMS access class:</para> + + <programlisting language="none">package dom.xsl; +... +public class DbAccess { + public void connect(final String jdbcUrl, + final String userName, final String password) { + try { + conn = DriverManager.getConnection(jdbcUrl, userName, password); + priceQuery = conn.prepareStatement(sqlPriceQuery); + } catch (SQLException e) { + System.err.println("Unable to open connection to database:" + e);} + } + public String readPrice(final String articleNumber) { + String result; + try { + priceQuery.setString(1, articleNumber); + final ResultSet rs = priceQuery.executeQuery(); + if ( { + result = rs.getString("price"); + } else { + result = "No price available for article '" + articleNumber + "'"; + } + } catch (SQLException e) { + result = "Error reading price for article '" + articleNumber + "':" + e; + } + return result; + } + ... +}</programlisting> + + <para>Of course the connection details should be moved to a + configuration file.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + </chapter> diff --git a/Sda1/fo.xml b/Sda1/fo.xml new file mode 100644 index 000000000..f40464dd1 --- /dev/null +++ b/Sda1/fo.xml @@ -0,0 +1,1306 @@ + <chapter xml:id="fo" version="5.0" xmlns="" + xmlns:xlink="" + xmlns:xi="" + xmlns:svg="" + xmlns:m="" + xmlns:html="" + xmlns:db=""> + <title>Generating printed output</title> + + <titleabbrev>Print</titleabbrev> + + <section xml:id="foIntro"> + <title>Online and print versions</title> + + <titleabbrev>online / print</titleabbrev> + + <para>We already learned how to transform XML documents into HTML by + means of a <abbrev xlink:href="">XSL</abbrev> + style sheet processor. In principle we may create printed output by + using a HTML Browser's print function. However the result will not meet + reasonable typographical standards. A list of commonly required features + for printed output includes:</para> + + <variablelist> + <varlistentry> + <term>Line breaks</term> + + <listitem> + <para>Text paragraphs have to be divided into lines. To achieve + best results the processor must implement the hyphenation rules of + the language in question in order to automatically hyphenate long + words. This is especially important for text columns of limited + width as appearing in newspapers.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>Page breaks</term> + + <listitem> + <para>Since printed pages are limited in height the content has to + be broken into pages. This may be difficult to achieve:</para> + + <itemizedlist> + <listitem> + <para>Large images being indivisible may have to be deferred + to the following page leaving large amounts of empty + space.</para> + </listitem> + + <listitem> + <para>Long tables may have to be subdivided into smaller + blocks. Thus it may be required to define sets of additional + footers like <quote>to be continued on the next page</quote> + and additional table headers containing column descriptions on + subsequent pages.</para> + </listitem> + </itemizedlist> + </listitem> + </varlistentry> + + <varlistentry> + <term>Page references</term> + + <listitem> + <para>Document internal references via <link + xlink:href="">ID</link> / <link + xlink:href="">IDREF</link> pairs may + be represented as page references like <quote>see page + 32</quote>.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>Left and right pages</term> + + <listitem> + <para>Books usually have a different layout for + <quote>left</quote> and <quote>right</quote> pages. Page numbers + usually appear on the left side of a <quote>left</quote> page and + vice versa.</para> + + <para>Very often the head of each page contains additional + information e.g. a chapter's name on each <quote>left</quote> page + head and the actual section's name on each <quote>right</quote> + page's head.</para> + + <para>In addition chapters usually start on a <quote>right</quote> + page. Sometimes a chapter's starting page has special layout + features e.g. a missing description in the page's head which will + only be given on subsequent pages.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>Footnotes</term> + + <listitem> + <para>Footnotes have to be numbered on a per page basis and have + to appear on the current page.</para> + </listitem> + </varlistentry> + </variablelist> + </section> + + <section xml:id="foStart"> + <title>A simple <abbrev + xlink:href="">FO</abbrev> + document</title> + + <titleabbrev>Simple <abbrev + xlink:href="">FO</abbrev></titleabbrev> + + <para>A renderer for printed output from XML content also needs + instructions how to format the different elements. A common way to + define these formatting properties is by using <emphasis>Formatting + Objects</emphasis> (<abbrev + xlink:href="">FO</abbrev>) + standard. <abbrev + xlink:href="">FO</abbrev> + documents may be compared to HTML. A HTML document has to be rendered by + a piece of software called a browser in order to be viewed as an image. + Likewise <abbrev + xlink:href="">FO</abbrev> + documents have to be rendered by a piece of software called a formatting + objects processor which typically yields PostScript or PDF output. As a + starting point we take a simple example:</para> + + <figure xml:id="foHelloWorld"> + <title>The most simple <abbrev + xlink:href="">FO</abbrev> + document</title> + + <programlisting language="none"><?xml version="1.0" encoding="utf-8"?> +<fo:root xmlns:fo=""> + + <fo:layout-master-set> + <!-- Define a simple page layout --> + <fo:simple-page-master master-name="simplePageLayout" + page-width="60mm" page-height="100mm"> + <fo:region-body/> + </fo:simple-page-master> + </fo:layout-master-set> + <!-- Print a set of pages using the previously defined layout --> + <fo:page-sequence master-reference="simplePageLayout"> + <fo:flow flow-name="xsl-region-body"> + <emphasis role="bold"><fo:block>Hello, World ...</fo:block></emphasis> + </fo:flow> + </fo:page-sequence> +</fo:root></programlisting> + </figure> + + <para>PDF generation is initiated by executing a <abbrev + xlink:href="">FO</abbrev> + processor. At the MI department the script <code>fo2pdf</code> invokes + <orgname>RenderX</orgname>'s <productname + xlink:href="">xep</productname> processor:</para> + + <programlisting language="none">fo2pdf -fo -pdf hello.pdf</programlisting> + + <para>This creates a PDF file which may be printed or previewed by e.g. + <productname xlink:href="">Adobe</productname>'s + acrobat reader or evince under Linux. For a list of command line options + see <productname + xlink:href="">xep's + documentation</productname>.</para> + </section> + + <section xml:id="layoutParam"> + <title>Page layout</title> + + <para>The result from of our <quote>Hello, World ...</quote> code is not + very impressive. In order to develop more elaborated examples we have to + understand the underlying layout model being defined in a <link + xlink:href="">fo:simple-page-master</link> + element. First of all <abbrev + xlink:href="">FO</abbrev> + allows to subdivide a physical page into different regions:</para> + + <figure xml:id="foRegionList"> + <title>Regions being defined in a page.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/regions.fig"/> + </imageobject> + </mediaobject> + </figure> + + <para>The most important area in this model is denoted by <link + xlink:href="">fo:region-body</link>. + Other regions like <link + xlink:href="">fo:region-before</link> + are typically used as containers for meta information such as chapter + headings and page numbering. We take a closer look to the <link + xlink:href="">fo:region-body</link> + area and supply an example of parameterization:</para> + + <figure xml:id="foParamRegBody"> + <title>A complete <abbrev + xlink:href="">FO</abbrev> + parameterizing of a physical page and the <link + xlink:href="">fo:region-body</link>.</title> + + <programlisting language="none"><?xml version="1.0" encoding="utf-8"?> +<fo:root xmlns:fo="" + font-size="6pt"> + + <fo:layout-master-set> <co xml:id="programlisting_fobodyreg_masterset"/> + <fo:simple-page-master master-name="<emphasis role="bold">simplePageLayout</emphasis>" <co + xml:id="programlisting_fobodyreg_simplepagelayout"/> + page-width = "50mm" page-height = "80mm" + margin-top = "5mm" margin-bottom = "20mm" + margin-left = "5mm" margin-right = "10mm"> + + <fo:region-body <co xml:id="programlisting_fobodyreg_regionbody"/> + margin-top = "10mm" margin-bottom = "5mm" + margin-left = "10mm" margin-right = "5mm"/> + </fo:simple-page-master> + </fo:layout-master-set> + + <fo:page-sequence master-reference="<emphasis role="bold">simplePageLayout</emphasis>"> <co + xml:id="programlisting_fobodyreg_pagesequence"/> + <fo:flow flow-name="xsl-region-body"> <co + xml:id="programlisting_fobodyreg_flow"/> + <fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <co + xml:id="programlisting_fobodyreg_block"/> + <fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <coref + linkend="programlisting_fobodyreg_block"/> + <fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <coref + linkend="programlisting_fobodyreg_block"/> + <fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <coref + linkend="programlisting_fobodyreg_block"/> + </fo:flow> + </fo:page-sequence> +</fo:root></programlisting> + </figure> + + <calloutlist> + <callout arearefs="programlisting_fobodyreg_masterset"> + <para>As the name suggests multiple layout definitions can appear + here. In this example only one layout is defined.</para> + </callout> + + <callout arearefs="programlisting_fobodyreg_simplepagelayout"> + <para>Each layout definition carries a key attribute master-name + being unique with respect to all defined layouts appearing in + <emphasis>the</emphasis> <tag + class="starttag">fo:layout-master-set</tag>. We may thus call it a + <emphasis>primary key</emphasis> attribute. The current layout + definition's key has the value <code>simplePageLayout</code>. The + length specifications appearing here are visualized in <xref + linkend="paramRegBodyVisul"/> and correspond to the white + rectangle.</para> + </callout> + + <callout arearefs="programlisting_fobodyreg_regionbody"> + <para>Each layout definition <emphasis>must</emphasis> have a region + body being the region in which the documents main text flow will + appear. A layout definition <emphasis>may</emphasis> also define + top, bottom and side regions as we will see <link + linkend="paramHeadFoot">later</link>. The body region is shown with + pink background in <xref linkend="paramRegBodyVisul"/>.</para> + </callout> + + <callout arearefs="programlisting_fobodyreg_pagesequence"> + <para>A <abbrev + xlink:href="">FO</abbrev> + document may have multiple page sequences for example one per each + chapter of a book. It <emphasis>must</emphasis> reference an + <emphasis>existing</emphasis> layout definition via its + <code>master-reference</code> attribute. So we may regard this + attribute as a foreign key targeting the set of all defined layout + definitions.</para> + </callout> + + <callout arearefs="programlisting_fobodyreg_flow"> + <para>A flow allows us to define in which region output shall + appear. In the current example only one layout containing one region + of type body definition being able to receive text output + exists.</para> + </callout> + + <callout arearefs="programlisting_fobodyreg_block"> + <para>A <tag class="starttag">fo:block</tag> element may be compared + to a paragraph element <tag class="starttag">p</tag> in HTML. The + attribute <link + xlink:href="">space-after</link>="2mm" + adds a space of two mm after each <link + xlink:href="">fo:block</link> + container.</para> + </callout> + </calloutlist> + + <para>The result looks like:</para> + + <figure xml:id="paramRegBodyVisul"> + <title>Parameterizing page- and region view port. All length + dimensions are in mm.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/overlay.fig"/> + </imageobject> + </mediaobject> + </figure> + </section> + + <section xml:id="headFoot"> + <title>Headers and footers</title> + + <titleabbrev>Header/footer</titleabbrev> + + <para>Referring to <xref linkend="foRegionList"/> we now want to add + fixed headers and footers frequently being used for page numbers. In a + textbook each page might have the actual chapter's name in its header. + This name should not change as long as the text below <link + xlink:href="">fo:region-body</link> + still belongs to the same chapter. In <abbrev + xlink:href="">FO</abbrev> + this is achieved by:</para> + + <itemizedlist> + <listitem> + <para>Encapsulating each chapter's content in a <link + xlink:href="">fo:page-sequence</link> + of its own.</para> + </listitem> + + <listitem> + <para>Defining the desired header text below <link + xlink:href="">fo:static-content</link> + in the area defined by <link + xlink:href="">fo:region-before</link>.</para> + </listitem> + </itemizedlist> + + <para>The notion <link + xlink:href="">fo:static-content</link> + refers to the fact that the content is constant (static) within the + given page sequence. The new version reads:</para> + + <figure xml:id="paramHeadFoot"> + <title>Parameterizing header and footer.</title> + + <programlisting language="none"><?xml version="1.0" encoding="utf-8"?> +<fo:root xmlns:fo="" + font-size="6pt"> + + <fo:layout-master-set> + <fo:simple-page-master master-name="simplePageLayout" + page-width = "50mm" page-height = "80mm" + margin-top = "5mm" margin-bottom = "20mm" + margin-left = "5mm" margin-right = "10mm"> + + <fo:region-body margin-top = "10mm" margin-bottom = "5mm" <co + xml:id="programlisting_head_foot_bodydef"/> + margin-left = "10mm" margin-right = "5mm"/> + + <fo:region-before extent="5mm"/> <co + xml:id="programlisting_head_foot_beforedef"/> + <fo:region-after extent="5mm"/> <co + xml:id="programlisting_head_foot_afterdef"/> + + </fo:simple-page-master> + </fo:layout-master-set> + + <fo:page-sequence master-reference="simplePageLayout"> + + <fo:static-content flow-name="xsl-region-before"> <co + xml:id="programlisting_head_foot_beforeflow"/> + <fo:block + font-weight="bold" + font-size="8pt">Headertext</fo:block> + </fo:static-content> + + <fo:static-content flow-name="xsl-region-after"> <co + xml:id="programlisting_head_foot_afterflow"/> + <fo:block> + <fo:page-number/> + </fo:block> + </fo:static-content> + + <fo:flow flow-name="xsl-region-body"> + <fo:block space-after="8mm">Dumb text .. dumb text.</fo:block> + <fo:block space-after="8mm">Dumb text .. dumb text.</fo:block> + <fo:block space-after="8mm">More text .. more text.</fo:block> + <fo:block space-after="8mm">More text .. more text.</fo:block> + <fo:block space-after="8mm">More text .. more text.</fo:block> + </fo:flow> + </fo:page-sequence> +</fo:root></programlisting> + </figure> + + <calloutlist> + <callout arearefs="programlisting_head_foot_bodydef"> + <para>Defining the body region.</para> + </callout> + + <callout arearefs="programlisting_head_foot_beforedef programlisting_head_foot_afterdef"> + <para>Defining two regions at the top and bottom of each page. The + <code>extent</code> attribute denotes the height of these regions. + <emphasis>Caveat</emphasis>: The attribute <code>extent</code>'s + value gets subtracted from the <code>margin-top</code> or + <code>margin-bottom</code> value being defined in the corresponding + <tag class="starttag">fo:region-body</tag> element. So if we + consider for example the <tag>fo:region-before</tag> we have to + obey:</para> + + <para>extent <= margin-top</para> + + <para>Otherwise we may not even see any output.</para> + </callout> + + <callout arearefs="programlisting_head_foot_beforeflow"> + <para>A <code>fo:static-content</code> denotes text portions which + are decoupled from the <quote>usual</quote> text flow. For example + as a book's chapter advances over multiple pages we expect the + constant chapter's title to appear on top of each page. In the + current example the static string <code>Headertext</code> will + appear on each page's top for the whole <tag + class="starttag">fo:page-sequence</tag> in which it is defined. + Notice the <code>flow-name="xsl-region-after"</code> reference to + the region being defined in <coref + linkend="programlisting_head_foot_beforedef"/>.</para> + </callout> + + <callout arearefs="programlisting_head_foot_afterflow"> + <para>We do the same here for the page's footer. Instead of static + text we output <tag>fo_page-number</tag> yielding the current page's + number.</para> + + <para>This time <code>flow-name="xsl-region-after"</code> references + the region definition in <coref + linkend="programlisting_head_foot_afterdef"/>. Actually the + attribute <code>flow-name</code> is restricted to the following five + values corresponding to all possible region definitions within a + layout:</para> + + <informaltable> + <?dbhtml table-width="50%" ?> + + <?dbfo table-width="50%" ?> + + <tgroup cols="2"> + <colspec align="left" colwidth="1*"/> + + <colspec align="left" colwidth="1*"/> + + <tbody> + <row> + <entry><tag class="starttag">fo:region-body</tag></entry> + + <entry>xsl-region-body</entry> + </row> + + <row> + <entry><tag class="starttag">fo:region-before</tag></entry> + + <entry>xsl-region-before</entry> + </row> + + <row> + <entry><tag class="starttag">fo:region-after</tag></entry> + + <entry>xsl-region-after</entry> + </row> + + <row> + <entry><tag class="starttag">fo:region-start</tag></entry> + + <entry>xsl-region-start</entry> + </row> + + <row> + <entry><tag class="starttag">fo:region-end</tag></entry> + + <entry>xsl-region-end</entry> + </row> + </tbody> + </tgroup> + </informaltable> + </callout> + </calloutlist> + + <para>This results in two pages with page numbers 1 and 2:</para> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/headfoot.fig"/> + </imageobject> + </mediaobject> + + <para>The free chapter from <xref linkend="bib_Harold04"/> book contains + additional information on extended <link + xlink:href="">layout + definitions</link>. The <orgname + xlink:href="">W3C</orgname> as the holder of the FO + standard defines the elements <link + xlink:href="">fo:layout-master-set</link>, + <link + xlink:href="">fo:simple-page-master</link> + and <link + xlink:href="">fo:page-sequence</link></para> + </section> + + <section xml:id="foContainer"> + <title>Important Objects</title> + + <section xml:id="fo_block"> + <title><code>fo:block</code></title> + + <para>The FO standard borrows a lot from the CSS standard. Most + formatting objects may have <link + xlink:href="">CSS + like properties</link> with similar semantics, some properties have + been added. We take a <link + xlink:href="">fo:block</link> + container as an example:</para> + + <figure xml:id="blockInline"> + <title>A <link + xlink:href="">fo:block</link> with + a <link + xlink:href="">fo:inline</link> + descendant.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/"/> + </imageobject> + </mediaobject> + + <programlisting language="none">... +<fo:block font-weight='bold' + border-bottom-style='dashed' + border-style='solid' + border='1mm'>A lot of attributes and <fo:inline background-color='black' + color='white'>inverted</fo:inline> text.</fo:block> ...</programlisting> + </figure> + + <para>The <link + xlink:href="">fo:inline</link> + descendant serves as a means to change the <quote>current</quote> + property set. In HTML/CSS this may be achieved by using the + <code>SPAN</code> tag:</para> + + <programlisting language="none"><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html> + <head> + <title>Blocks/spans and CSS</title> + </head> + <body> + <h1>Blocks/spans and CSS</h1> + <p style="font-weight: bold; border: 1mm; + border-style: solid; border-bottom-style: dashed;" + >A lot of attributes and + <span style="color: white;background-color: black;" + >inverted</span> text.</p> + </body> +</html></programlisting> + + <para>Though being encapsulated in an attribute <code>class</code> we + find a one-to-one correspondence between FO and CSS in this case. The + HTML rendering works as expected.<mediaobject> + <imageobject> + <imagedata fileref="Ref/Screen/mozparaspancss.screen.png"/> + </imageobject> + </mediaobject>:</para> + </section> + + <section xml:id="fo_list"> + <title>Lists</title> + + <para>The easiest type of lists are unlabeled (itemized) lists as + being expressed by the <code>UL</code>/<code>LI</code> tags in HTML. + FO allows a much more detailed parametrization regarding indents and + distances between labels and item content. Relevant elements are <link + xlink:href="">fo:list-block</link>, + <link + xlink:href="">fo:list-item</link> + and <link + xlink:href="">fo:list-item-body</link>. + The drawback is a more complex setup for <quote>default</quote> + lists:</para> + + <figure xml:id="listItemize"> + <title>An itemized list and result.</title> + + <programlisting language="none">... +<fo:list-block + provisional-distance-between-starts="2mm"> + <fo:list-item> + <fo:list-item-label end-indent="label-end()"> + <fo:block>&#8226;</fo:block> + </fo:list-item-label> + <fo:list-item-body start-indent="body-start()"> + <fo:block>Flowers</fo:block> + </fo:list-item-body> + </fo:list-item> + + <fo:list-item> + <fo:list-item-label end-indent="label-end()"> + <fo:block>&#8226;</fo:block> + </fo:list-item-label> + <fo:list-item-body start-indent="body-start()"> + <fo:block>Animals</fo:block> + </fo:list-item-body> + </fo:list-item> +</fo:list-block> ...</programlisting> + + <mediaobject> + <imageobject> + <imagedata align="left" fileref="Ref/Fig/"/> + </imageobject> + </mediaobject> + </figure> + + <para>The result looks somewhat primitive in relation to the amount of + source code it necessitates. The power of these constructs shows up + when trying to format nested lists of possibly different types like + enumerations or definition lists under the requirement of + typographical excellence. More complex examples are presented in <link + xlink:href="">Xmlbible + book</link> of <xref linkend="bib_Harold04"/>.</para> + </section> + + <section xml:id="leaderRule"> + <title>Leaders and rules</title> + + <titleabbrev>Leaders/rules</titleabbrev> + + <para>Sometimes adjustable horizontal space between two neighbouring + objects has to be filled e.g. in a book's table of contents. The <link + xlink:href="">fo:leader</link> + serves this purpose:</para> + + <figure xml:id="leaderToc"> + <title>Two simulated entries in a table of contents.</title> + + <programlisting language="none">... +<fo:block text-align-last='justify'>Valid + XML<fo:leader leader-pattern="dots"/> +page 7</fo:block> + +<fo:block text-align-last='justify'>XSL +<fo:leader leader-pattern='dots'/> +page 42</fo:block> ...</programlisting> + + <mediaobject> + <imageobject> + <imagedata align="left" fileref="Ref/Fig/"/> + </imageobject> + </mediaobject> + </figure> + + <para>The attributes' value <link + xlink:href="">text-align-last</link> + = <code>'justify'</code> forces the <link + xlink:href="">fo:block</link> to + extend to the available width of the current <link + xlink:href="">fo:region-body</link> + area. The <link + xlink:href="">fo:leader</link> + inserts the necessary amount of content of the specified type defined + in in <link + xlink:href="">leader-pattern</link> + to fill up the gap between its neighbouring components. This principle + can be extended to multiple objects:</para> + + <figure xml:id="leaderMulti"> + <title>Four entries separated by equal amounts of dotted + space.</title> + + <programlisting language="none"><fo:block text-align-last='justify'>A<fo:leader +leader-pattern="dots"/>B<fo:leader +leader-pattern="dots"/>C<fo:leader leader-pattern="dots"/>D</fo:block></programlisting> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/"/> + </imageobject> + </mediaobject> + </figure> + + <para>A <link + xlink:href="">fo:leader</link> may + also be used to draw horizontal lines to separate objects. In this + case there are no neighbouring components within the + <quote>current</quote> line in which the <link + xlink:href="">fo:leader</link> + appears. This is frequently used to draw a border between + <code>xsl-region-body</code> and <code>xsl-region-before</code> and/or + <code>xsl-region-after</code>:</para> + + <figure xml:id="leaderSeparate"> + <title>A horizontal line separator between header and body of a + page.</title> + + <programlisting language="none">... +<fo:page-sequence master-reference="simplePageLayout"> + <fo:static-content flow-name="xsl-region-before"> + <fo:block text-align-last='justify'>FO<fo:leader/>page 5</fo:block> + <fo:block text-align-last='justify'> + <fo:leader leader-pattern="rule" leader-length="100%"/> + </fo:block> + </fo:static-content> + <fo:flow flow-name="xsl-region-body"> + <fo:block>Some body text ...</fo:block> + </fo:flow> +</fo:page-sequence>...</programlisting> + + <mediaobject> + <imageobject> + <imagedata align="left" fileref="Ref/Fig/"/> + </imageobject> + </mediaobject> + </figure> + + <para>Note the empty leader <code><</code> <link + xlink:href="">fo:leader</link> + <code>/></code> between the <quote> <code>FO</code> </quote> and + the <quote>page 5</quote> text node inserting horizontal whitespace to + get the page number centered to the header's right edge. This is in + accordance with the <link + xlink:href="">leader-pattern</link> + attributes default value <code>space</code>.</para> + </section> + + <section xml:id="pageNumbering"> + <title>Page numbers</title> + + <para>We already saw an example of page numbering via <link + xlink:href="">fo:page-number</link> + in <xref linkend="paramHeadFoot"/>. Sometimes a different style for + page numbering is desired. The default page numbering style may be + changed by means of the <link + xlink:href="">fo:page-sequence</link> + element's attribute <link + xlink:href="">format</link>. For a + closer explanation the <link + xlink:href="">W3X + XSLT standards documentation</link> may be consulted:</para> + + <figure xml:id="pageNumberingRoman"> + <title>Roman style page numbers.</title> + + <programlisting language="none">... +<fo:page-sequence format="i" + master-reference="simplePageLayout"> + <fo:static-content + flow-name="xsl-region-after"> + <fo:block text-align-last='justify'> + <fo:leader leader-pattern="rule" + leader-length="100%"/> + </fo:block> + <fo:block font-weight="bold"> + <fo:page-number/> + </fo:block> + </fo:static-content> + + <fo:flow flow-name="xsl-region-body"> + <fo:block>Some text...</fo:block> + <fo:block>More text, more text, + more text.</fo:block> + <fo:block>More text, more text, + more text.</fo:block> + <fo:block>Enough text.</fo:block> + </fo:flow> +</fo:page-sequence> ...</programlisting> + + <mediaobject> + <imageobject> + <imagedata align="left" fileref="Ref/Fig/pageStack.fig"/> + </imageobject> + </mediaobject> + </figure> + </section> + + <section xml:id="foMarker"> + <title>Marker</title> + + <figure xml:id="dictionary"> + <title>A dictionary with running page headers.</title> + + <programlisting language="none">... +<fo:page-sequence + master-reference="simplePageLayout"> + <fo:static-content flow-name="xsl-region-before"> + <fo:block font-weight="bold"> + <fo:retrieve-marker retrieve-class-name="alpha" + retrieve-position="first-starting-within-page" + />-<fo:retrieve-marker + retrieve-position="last-starting-within-page" + retrieve-class-name="alpha"/> + </fo:block> + <fo:block text-align-last='justify'> + <fo:leader leader-pattern="rule" leader-length="100%"/></fo:block> + </fo:static-content> + + <fo:flow flow-name="xsl-region-body"> + <fo:block> + <fo:marker marker-class-name="alpha">A + </fo:marker>Ant</fo:block> + <fo:block> + <fo:marker marker-class-name="alpha">B + </fo:marker>Bug</fo:block> + <fo:block> + <fo:marker marker-class-name="alpha">L + </fo:marker>Lion</fo:block> + <fo:block> + <fo:marker marker-class-name="alpha">N + </fo:marker>Nose</fo:block> + <fo:block> + <fo:marker marker-class-name="alpha">P + </fo:marker>Peg</fo:block> + </fo:flow> +</fo:page-sequence> ...</programlisting> + + <mediaobject> + <imageobject> + <imagedata align="left" fileref="Ref/Fig/dictionaryStack.fig"/> + </imageobject> + </mediaobject> + </figure> + </section> + + <section xml:id="foIntRef"> + <title>Internal references</title> + + <titleabbrev>References</titleabbrev> + + <para>Regarding printed documents we may define two categories of + document internal references:</para> + + <variablelist> + <varlistentry> + <term><emphasis>Page number references</emphasis></term> + + <listitem> + <para>This is the <quote>classical</quote> type of a reference + e.g. in books. An author refers the reader to a distant location + by writing <quote>... see further explanation in section 4.5 on + page 234</quote>. A book's table of contents assigning page + numbers to topics is another example. This way the + implementation of a reference relies solely on the features a + printed document offers.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term><emphasis>Hypertext references</emphasis></term> + + <listitem> + <para>This way of implementing references utilizes features of + (online) viewers for printable documents. For example PDF + viewers like <productname + xlink:href="">Adobe's Acrobat + reader</productname> or the evince application are able to + follow hypertext links in a fashion known from HTML browsers. + This browser feature is based on hypertext capabilities defined + in the Adobe's PDF de-facto standard.</para> + </listitem> + </varlistentry> + </variablelist> + + <para>Of course the second type of references is limited to people who + use an online viewer application instead of reading a document from + physical paper.</para> + + <para>We now show the implementation of <abbrev + xlink:href="">FO</abbrev> + based page references. As already being discussed for <link + xlink:href="">ID</link> / <link + xlink:href="">IDREF</link> pairs we need + a link destination (anchor) and a link source. The <abbrev + xlink:href="">FO</abbrev> + standard uses the same anchor implementation as in XML for <link + xlink:href="">ID</link> typed attributes: + <abbrev + xlink:href="">FO</abbrev> + objects <emphasis>may</emphasis> have an attribute <link + xlink:href="">id</link> with a document + wide unique value. The <abbrev + xlink:href="">FO</abbrev> + element <link + xlink:href="">fo:page-number-citation</link> + is used to actually create a page reference via its attribute <link + xlink:href="">ref-id</link>:</para> + + <figure xml:id="refJavaXml"> + <title>Two blocks mutual page referencing each other.</title> + + <programlisting language="none">... + <fo:flow flow-name='xsl-region-body'> + <fo:block id='xml'>Java section see page + <fo:page-number-citation ref-id='java'/>. + </fo:block> + + <fo:block id='java'>XML section see page + <fo:page-number-citation ref-id='xml'/>. + </fo:block> + </fo:flow> ...</programlisting> + + <mediaobject> + <imageobject> + <imagedata align="left" fileref="Ref/Fig/pagerefStack.fig"/> + </imageobject> + </mediaobject> + </figure> + + <para>NB: Be careful defining <link + xlink:href="">id</link> attributes for + objects being descendants of <link + xlink:href="">fo:static-content</link> + nodes. Such objects typically appear on multiple pages and are + therefore no unique anchors. A reference carrying such an id value + thus actually refers to 1 <= n values on n different pages. + Typically a user agent will choose the first object of this set when + clicking the link. So in effect the parent <link + xlink:href="">fo:page-sequence</link> + is chosen as the effective link target.</para> + + <para>The element <link + xlink:href="">fo:basic-link</link> + creates PDF hypertext links. We extend the previous example:</para> + + <figure xml:id="refJavaXmlHyper"> + <title>Two blocks with mutual page- and hypertext + references.</title> + + <programlisting language="none"><fo:flow flow-name='xsl-region-body'> + <fo:block id='xml'>Java section see <fo:basic-link color="blue" + internal-destination="java">page<fo:page-number-citation + ref-id='java'/>.</fo:basic-link></fo:block> + +<fo:block id='java'>XML section see + <fo:basic-link color="blue" + internal-destination="xml">page <fo:page-number-citation + ref-id='xml'/>.</fo:basic-link></fo:block > +</fo:flow></programlisting> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/pagerefhyperStack.fig"/> + </imageobject> + </mediaobject> + </figure> + </section> + + <section xml:id="pdfBookmarks"> + <title>PDF bookmarks</title> + + <titleabbrev>Bookmarks</titleabbrev> + + <para>The PDF specification allows to define so called bookmarks + offering an explorer like navigation:</para> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Screen/pdfbookmarks.screen.png"/> + </imageobject> + </mediaobject> + + <para>PDF bookmarks are <link + xlink:href="">part + of the XSL-FO 1.1</link> Standard. Some <abbrev + xlink:href="">FO</abbrev> + processors still continue to use proprietary solutions for bookmark + creation with respect to the older <abbrev + xlink:href="">FO</abbrev> + 1.0 standard. For details of bookmark extensions by + <orgname>RenderX</orgname>'s processor see <link + xlink:href="">xep's + documentation</link>.</para> + </section> + </section> + + <section xml:id="xml2fo"> + <title>Constructing <abbrev + xlink:href="">FO</abbrev> + from XML documents</title> + + <titleabbrev><abbrev + xlink:href="">FO</abbrev> + from XML</titleabbrev> + + <para>So far we have learnt some basic <abbrev + xlink:href="">FO</abbrev> + elements. As with HTML we typically generate FO code from other sources + rather than crafting it by hand. The general picture is:</para> + + <figure xml:id="htmlFoProduction"> + <title>Different target formats from common source.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/crossmedia.fig" scale="65"/> + </imageobject> + + <caption> + <para>We may generate both online and printed documentation from a + common source. This requires style sheets for the desired + destination formats in question.</para> + </caption> + </mediaobject> + </figure> + + <para>We discussed the <abbrev + xlink:href="">FO</abbrev> + standard as an input format for printable output production by a + renderer. In this way a <abbrev + xlink:href="">FO</abbrev> + document is similar to HTML being a format to be rendered by a web + browser for visual (screen oriented) output production. The + transformation from a XML source (e.g. a memo document) to <abbrev + xlink:href="">FO</abbrev> + is still missing. As for HTML we may use <abbrev + xlink:href="">XSL</abbrev> as a + transformation means. We generate the sender's surname from a memo + document instance:</para> + + <figure xml:id="memo2fosurname"> + <title>Generating a sender's surname for printing.</title> + + <programlisting language="none"><?xml version="1.0" encoding="utf-8"?> +<xsl:stylesheet version="1.0" + xmlns:fo="" + xmlns:xsl=""> + + <xsl:output method="xml" indent="yes"/> + + <xsl:template match="/"> + <fo:root> + <fo:layout-master-set> + <fo:simple-page-master master-name="simplePageLayout" + page-width="294mm" page-height="210mm" margin="5mm"> + <fo:region-body margin="15mm"/> + </fo:simple-page-master> + </fo:layout-master-set> + <fo:page-sequence master-reference="simplePageLayout"> + <fo:flow flow-name="xsl-region-body"> + <fo:block font-size="20pt"> + <xsl:text>Sender:</xsl:text> + <fo:inline font-weight='bold'> + <xsl:value-of select="memo/from/surname"/> + </fo:inline> + </fo:block> + </fo:flow> + </fo:page-sequence> + </fo:root> + </xsl:template> +</xsl:stylesheet></programlisting> + </figure> + + <para>A suitable XML document instance reads:</para> + + <figure xml:id="memoMessage"> + <title>A <code>memo</code> document instance.</title> + + <programlisting language="none"><memo ...="memo.xsd"> + <from> + <name>Martin</name> + <surname>Goik</surname> + </from> + <to> + <name>Adam</name> + <surname>Hacker</surname> + </to> + <to> + <name>Eve</name> + <surname>Intruder</surname> + </to> + <date year="2005" month="1" day="6"/> + <subject>Firewall problems</subject> + <content> + <para>Thanks for your excellent work.</para> + <para>Our firewall is definitely broken!</para> + </content> +</memo></programlisting> + </figure> + + <para>Some remarks:</para> + + <orderedlist> + <listitem> + <para>The <link + xlink:href="">xsl_stylesheet</link> + element contains a namespace definition for the target FO document's + namespace, namely:</para> + + <programlisting language="none">xmlns:xsl=""</programlisting> + + <para>This is required to use elements like <link + xlink:href="">fo:block</link> + belonging to the FO namespace.</para> + </listitem> + + <listitem> + <para>The option value <code>indent="yes"</code> in <link + xlink:href="">xsl_output</link> + is usually set to "no" in a production environment to avoid + whitespace related problems.</para> + </listitem> + + <listitem> + <para>The generation of a print format like PDF is actually a two + step process. To generate message.pdf from message.xml by a + stylesheet memo2fo.xsl we need the following calls:</para> + + <variablelist> + <varlistentry> + <term><emphasis>XML document instance to FO</emphasis></term> + + <listitem> + <programlisting language="none">xml2xml message.xml memo2fo.xsl -o</programlisting> + </listitem> + </varlistentry> + + <varlistentry> + <term><emphasis>FO to PDF</emphasis></term> + + <listitem> + <programlisting language="none">fo2pdf -fo -pdf message.pdf</programlisting> + </listitem> + </varlistentry> + </variablelist> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/xml2fo2pdf.fig"/> + </imageobject> + </mediaobject> + + <para>When debugging of the intermediate <abbrev + xlink:href="">FO</abbrev> + file is not required both steps may be combined into a single + call:</para> + + <programlisting language="none">fo2pdf -xml message.xml -xsl memo2fo.xsl -pdf message.pdf</programlisting> + </listitem> + </orderedlist> + </section> + + <section xml:id="foCatalog"> + <title>Formatting a catalog.</title> + + <titleabbrev>A catalog</titleabbrev> + + <para>We now take the <link linkend="climbingCatalog">climbing catalog + example</link> with prices being added and incrementally create a series + of PDF versions improving from one version to another.</para> + + <qandaset defaultlabel="qanda" xml:id="idCatalogStart"> + <title>A first PDF version of the catalog</title> + + <qandadiv> + <qandaentry> + <question> + <para>Write a <abbrev + xlink:href="">XSL</abbrev> script to + generate a starting version <filename + xlink:href="Ref/src/Dom/climbenriched.start.pdf">climbenriched.start.pdf</filename>.</para> + </question> + + <answer> + <programlisting language="none"><?xml version="1.0" encoding="utf-8"?> +<xsl:stylesheet version="1.0" + xmlns:fo="" + xmlns:xsl=""> + + <xsl:output method="xml" indent="yes"/> + + <xsl:template match="/"> + <fo:root font-size="10pt"> + <fo:layout-master-set> + <fo:simple-page-master master-name="productPage" + page-width="80mm" page-height="110mm" margin="5mm"> + <fo:region-body margin="15mm"/> + <fo:region-before extent="10mm"/> + </fo:simple-page-master> + </fo:layout-master-set> + <xsl:apply-templates select="catalog/product" /> + </fo:root> + </xsl:template> + + <xsl:template match="product"> + <fo:page-sequence master-reference="productPage"> + <fo:static-content flow-name="xsl-region-before"> + <fo:block font-weight="bold"> + <xsl:value-of select="title"/> + </fo:block> + </fo:static-content> + <fo:flow flow-name="xsl-region-body"> + <xsl:apply-templates select="description/para"/> + + <fo:block>Price:<xsl:value-of select="@price"/></fo:block> + <fo:block>Order no:<xsl:value-of select="@id"/></fo:block> + </fo:flow> + </fo:page-sequence> + </xsl:template> + + <xsl:template match="para"> + <fo:block space-after="10px"> + <xsl:value-of select="."/> + </fo:block> + </xsl:template> + +</xsl:stylesheet></programlisting> + </answer> + </qandaentry> + + <qandaentry xml:id="idCatalogProduct"> + <question> + <label>Header, page numbers and table formatting</label> + + <para>Extend <xref linkend="idCatalogStart"/> by adding page + numbers. The order number and prices shall be formatted as + tables. Add a ruler to each page's head. The result should look + like <filename + xlink:href="Ref/src/Dom/climbenriched.product.pdf">climbenriched.product.pdf</filename></para> + </question> + + <answer> + <para>Solution see <filename + xlink:href="Ref/src/Dom/catalog2fo.product.xsl">catalog2fo.product.xsl</filename>.</para> + </answer> + </qandaentry> + + <qandaentry xml:id="idCatalogToc"> + <question> + <label>A table of contents.</label> + + <para>Each product description's page number shall appear in a + table of contents together with the product's <code>title</code> + as in <filename + xlink:href="Ref/src/Dom/climbenriched.toc.pdf">climbenriched.toc.pdf</filename>.</para> + </question> + + <answer> + <para>Solution see <filename + xlink:href="Ref/src/Dom/catalog2fo.toc.xsl">catalog2fo.toc.xsl</filename>.</para> + </answer> + </qandaentry> + + <qandaentry xml:id="idCatalogToclink"> + <question> + <label>A table of contents with hypertext links.</label> + + <para>The table of contents' entries may offer hypertext + features to supporting browsers as in <filename + xlink:href="Ref/src/Dom/climbenriched.toclink.pdf">climbenriched.toclink.pdf</filename>. + In addition include the document's <tag + class="starttag">introduction</tag>.</para> + </question> + + <answer> + <para>Solution see <filename + xlink:href="Ref/src/Dom/catalog2fo.toclink.xsl">catalog2fo.toclink.xsl</filename>.</para> + </answer> + </qandaentry> + + <qandaentry xml:id="idCatalogFinal"> + <question> + <label>A final version.</label> + + <para>Add the following features:</para> + + <orderedlist> + <listitem> + <para>Number the table of contents starting with page i, ii, + iii, iv and so on. Start the product descriptions with page + 1. On each page's footer a text <quote>page xx of yy</quote> + shall be displayed. This requires the definition of an + anchor <code>id</code> on the <abbrev + xlink:href="">FO</abbrev> + document's last page.</para> + </listitem> + + <listitem> + <para>Add PDF bookmarks by using <orgname>XEP</orgname>'s + <abbrev + xlink:href="">FO</abbrev> + extensions. This requires the namespace declaration + <code>xmlns:rx=""</code> + in the XSLT script's header.</para> + </listitem> + </orderedlist> + + <para>The result may look like <filename + xlink:href="Ref/src/Dom/"></filename>. + N.B.: It may take some effort to achieve this result. This + effort is left to the <emphasis>interested</emphasis> + participants.</para> + </question> + + <answer> + <para>Solution see <filename + xlink:href="Ref/src/Dom/catalog2fo.toclink.xsl">catalog2fo.toclink.xsl</filename>.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + </chapter> + diff --git a/Sda1/items.xml b/Sda1/items.xml deleted file mode 100644 index 25cc5b834..000000000 --- a/Sda1/items.xml +++ /dev/null @@ -1,191 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<section version="5.0" xml:id="abbreviations" - xmlns="" - xmlns:xlink="" - xmlns:xi="" - xmlns:svg="" - xmlns:m="" - xmlns:html="" - xmlns:db=""> - <title>Items used within Martin Goik's lecture notes</title> - - <section xml:id="trademarks"> - <title>Trademarks</title> - - <itemizedlist> - <listitem> - <para><trademark - xlink:href="" - xml:id="tm_J2ee">J2EE</trademark></para> - </listitem> - - <listitem> - <para><trademark - xlink:href="" - xml:id="tm_Java">Java</trademark></para> - </listitem> - - <listitem> - <para><trademark - xlink:href="" - xml:id="tm_Javadoc">Javadoc</trademark></para> - </listitem> - - <listitem> - <para><trademark - xlink:href="" - xml:id="tm_Jdbc">JDBC</trademark></para> - </listitem> - - <listitem> - <para><trademark - xlink:href="" - xml:id="tm_Jdk">JDK</trademark></para> - </listitem> - - <listitem> - <para><trademark - xlink:href="" - xml:id="tm_Jre">JRE</trademark></para> - </listitem> - - <listitem> - <para><trademark - xlink:href="" - xml:id="tm_Mysql">Mysql</trademark></para> - </listitem> - </itemizedlist> - - </section> - - <section xml:id="abbrev"> - <title>Abbreviations and acronyms</title> - - <itemizedlist> - <listitem> - <para><abbrev xlink:href="" - xml:id="abbr_api">API</abbrev></para> - </listitem> - - <listitem> - <para><abbrev - xlink:href="" - xml:id="abbr_Ddl">DDL (SQL)</abbrev></para> - </listitem> - - <listitem> - <para><acronym xlink:href="" - xml:id="abbr_Dom">DOM</acronym></para> - </listitem> - - <listitem> - <para><abbrev - xlink:href="" - xml:id="abbr_Dtd">DTD</abbrev></para> - </listitem> - - <listitem> - <para><abbrev - xlink:href="" - xml:id="abbr_Ftp">ftp</abbrev></para> - </listitem> - - <listitem> - <para><abbrev - xlink:href="" - xml:id="abbr_Fo">FO</abbrev></para> - </listitem> - - <listitem> - <para><abbrev - xlink:href="" - xml:id="abbr_Hql">HQL</abbrev></para> - </listitem> - - <listitem> - <para><abbrev xlink:href="" - xml:id="abbr_Http">http</abbrev></para> - </listitem> - - <listitem> - <para><abbrev - xlink:href="" - xml:id="abbr_Ide">IDE</abbrev></para> - </listitem> - - <listitem> - <para><abbrev - xlink:href="" - xml:id="abbr_Idl">IDL</abbrev></para> - </listitem> - - <listitem> - <para><abbrev - xlink:href="" - xml:id="abbr_Jpa">JPA</abbrev></para> - </listitem> - - <listitem> - <para><abbrev xlink:href="" - xml:id="abbr_Php">PHP</abbrev></para> - </listitem> - - <listitem> - <para><acronym xlink:href="" - xml:id="abbr_Sax">SAX</acronym></para> - </listitem> - - <listitem> - <para><acronym xlink:href="" - xml:id="abbr_Sql">SQL</acronym></para> - </listitem> - - <listitem> - <para><acronym - xlink:href="" - xml:id="abbr_Tcp">TCP</acronym></para> - </listitem> - - <listitem> - <para><abbrev xlink:href="" - xml:id="abbr_Url">URL</abbrev></para> - </listitem> - - <listitem> - <para><abbrev xlink:href="" - xml:id="abbr_Xml">Xml</abbrev></para> - </listitem> - - <listitem> - <para><acronym xlink:href="" - xml:id="abbr_Xpath">XPath</acronym></para> - </listitem> - - <listitem> - <para><abbrev xlink:href="" - xml:id="abbr_Xsl">XSL</abbrev></para> - </listitem> - </itemizedlist> - </section> - - <section xml:id="organizations"> - <title>Organizations</title> - - <itemizedlist> - <listitem> - <para><orgname xlink:href="" - xml:id="org_W3c">W3C</orgname></para> - </listitem> - - <listitem> - <para><orgname xlink:href="" - xml:id="org_Hdm">Hdm</orgname></para> - </listitem> - - <listitem> - <para><orgname xlink:href="" - xml:id="org_Mib">MIB</orgname></para> - </listitem> - </itemizedlist> - </section> -</section> diff --git a/Sda1/jdbc.xml b/Sda1/jdbc.xml new file mode 100644 index 000000000..4269ad64e --- /dev/null +++ b/Sda1/jdbc.xml @@ -0,0 +1,3740 @@ + <chapter xml:id="introPersistence" version="5.0" xmlns="" + xmlns:xlink="" + xmlns:xi="" + xmlns:svg="" + xmlns:m="" + xmlns:html="" + xmlns:db=""> + + + <title>Accessing Relational Data</title> + + <section xml:id="persistence"> + <title>Persistence in Object Oriented languages</title> + + <para>Following <xref linkend="bib_Bauer05"/> we may define persistence + by:</para> + + <blockquote> + <para>persistence allows an object to outlive the process that created + it. The state of the object may be stored to disk and an object with + the same state re-created at some point in the future.</para> + </blockquote> + + <para>The notion of <quote>process</quote> refers to operating systems. + Let us start wit a simple example assuming a <xref linkend="glo_Java"/> class + User:</para> + + <programlisting language="none">public class User { + String cname; //The user's common name e.g. 'Joe Bix' + String uid; //The user's unique system ID (login name) e.g. 'bix' + +// getters, setters and other stuff + ... +}</programlisting> + + <para>A relational implementation might look like:</para> + + <programlisting language="none">CREATE TABLE User( + CHAR(80) cname + ,CHAR(10) uid PRIMARY KEY +)</programlisting> + + <para>Now a <xref linkend="glo_Java"/> application may + create instances of class <code>User</code> and save these to a + database:</para> + + <figure xml:id="processObjPersist"> + <title>Persistence across process boundaries</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/persistence.fig" scale="65"/> + </imageobject> + </mediaobject> + </figure> + + <para>Both the <trademark + xlink:href="">JRE</trademark> + instances and the RDBMS database server are processes (or sets of + processes) typically existing in different address spaces. The two + <trademark + xlink:href="">JRE</trademark> + processes mentioned here may as well be started in disjoint address + spaces. In fact we might even run two entirely different applications + implemented in different programming languages like <abbrev + xlink:href="">PHP</abbrev>.</para> + + <para>It is important to mention that the two arrows +  <quote>save</quote> and <quote>load</quote> thus typically denote a + communication across machine boundaries.</para> + </section> + + <section xml:id="jdbcIntro"> + <title>Introduction to <trademark + xlink:href="">JDBC</trademark></title> + + <section xml:id="jdbcWrite"> + <title>Write access, principles</title> + + <para>Connecting an application to a database means to establish a + connection from a client to a database server:</para> + + <figure xml:id="jdbcClientServer"> + <title>Networking between clients and database servers</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/clientserv.fig"/> + </imageobject> + </mediaobject> + </figure> + + <para>So <trademark + xlink:href="">JDBC</trademark> + is just one among a whole bunch of protocol implementations connecting + database servers and applications. Consequently <trademark + xlink:href="">JDBC</trademark> + is expected to appear in the lower layer of multi-tier applications. + We take a three-tier application as a starting point:</para> + + <figure xml:id="jdbcThreeTier"> + <title>The role of <trademark + xlink:href="">JDBC</trademark> + in a three-tier application</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/jdbcThreeTier.fig"/> + </imageobject> + </mediaobject> + </figure> + + <para>We may add an additional layer. Web applications are typically + being build on top of an application server (<productname + xlink:href="">WebSphere</productname>, + <productname + xlink:href="">Glassfish</productname>, + <productname + xlink:href="">Jboss</productname>,...) + providing additional services:</para> + + <figure xml:id="jdbcFourTier"> + <title><trademark + xlink:href="">JDBC</trademark> + connecting application server and database.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/jdbcFourTier.fig"/> + </imageobject> + </mediaobject> + </figure> + + <para>So what is actually required to connect to a database server? A + client requires the following parameter values to open a + connection:</para> + + <orderedlist> + <listitem xml:id="ItemJdbcProtocol"> + <para>The type of database server i.e. <productname + xlink:href="">Oracle</productname>, + <productname + xlink:href="">DB2</productname>, + <productname + xlink:href="">Informix</productname>, + <productname xlink:href="">Mysql</productname> + etc. This information is needed because of vendor dependent + <trademark + xlink:href="">JDBC</trademark> + protocol implementations.</para> + </listitem> + + <listitem> + <para>The server's <link + xlink:href="">DNS</link> + name or IP number</para> + </listitem> + + <listitem> + <para>The database service's port number at the previously defined + host. The database server process listens for connections to this + port number.</para> + </listitem> + + <listitem xml:id="itemJdbcDatabaseName"> + <para>The database name within the given database server</para> + </listitem> + + <listitem> + <para>Optional: A database user's account name and + password.</para> + </listitem> + </orderedlist> + + <para>Items <xref linkend="ItemJdbcProtocol"/> - <xref + linkend="itemJdbcDatabaseName"/> will be encapsulated into a so called + <trademark + xlink:href="">JDBC</trademark> + <link + xlink:href="">URL</link>. + We consider a typical example corresponding to the previous parameter + list:</para> + + <figure xml:id="jdbcUrlComponents"> + <title>Components of a <trademark + xlink:href="">JDBC</trademark> + URL</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/jdbcurl.fig" scale="65"/> + </imageobject> + </mediaobject> + </figure> + + <para>In fact this <trademark + xlink:href="">JDBC</trademark> + URL example closely resembles other types of URL strings as being + defined in <uri + xlink:href=""></uri>. + Look for <code>opaque_part</code> to understand the second + <quote>:</quote> in the protocol definition part of a <trademark + xlink:href="">JDBC</trademark> + URL. Common example for <abbrev + xlink:href="">URL</abbrev>s + are:</para> + + <itemizedlist> + <listitem> + <para><code></code></para> + </listitem> + + <listitem> + <para><code></code></para> + </listitem> + + <listitem> + <para><code></code></para> + </listitem> + </itemizedlist> + + <para>We notice the explicit mentioning of a port number 8080 in the + second example; The default <abbrev + xlink:href="">http</abbrev> protocol port + number is 80. So if a web server accepts connections at port 80 we do + not have to specify this value. A web browser will automatically use + this default port.</para> + + <para>Actually the notion <quote><code>jdbc:mysql</code></quote> + denotes a sub protocol implementation namely<orgname> + Mysql</orgname>'s implementation of <trademark + xlink:href="">JDBC</trademark>. + Connecting to an IBM DB2 server would require jdbc:db2 for this + protocol part.</para> + + <para>In contrast to <abbrev + xlink:href="">http</abbrev> no standard + ports are <quote>officially</quote> assigned for <trademark + xlink:href="">JDBC</trademark> + protocol variants. Due to vendor specific implementations this does + not make any sense. Thus we <emphasis role="bold">always</emphasis> + have to specify the port number when opening <trademark + xlink:href="">JDBC</trademark> + connections.</para> + + <para>Writing <trademark + xlink:href="">JDBC</trademark> + based applications follows a simple scheme:</para> + + <figure xml:id="jdbcArchitecture"> + <title>Architecture of JDBC</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/jdbcarch.fig"/> + </imageobject> + </mediaobject> + </figure> + + <para>From a programmer's point of view the + <classname>java.sql.DriverManager</classname> is a bootstrapping + object: Other objects like Statement instances are created from this + central and unique object.</para> + + <para>The first instance being created by the + <classname>java.sql.DriverManager</classname> is an object of type + <classname>java.sql.Connection</classname>. In <xref + linkend="exerciseJdbcWhyInterface"/> we discuss the way vendor + specific implementation details are hidden by Interfaces. We can + distinguish between:</para> + + <orderedlist> + <listitem> + <para>Vendor neutral specific parts of a <trademark + xlink:href="">JDBC</trademark> + environment. These are those components being shipped by Oracle or + other organizations providing <xref linkend="glo_Java"/> runtimes. + The class <classname>java.sql.DriverManager</classname> belongs to + this domain.</para> + </listitem> + + <listitem> + <para>Vendor specific parts. In <xref linkend="jdbcArchitecture"/> + this starts with the <classname>java.sql.Connection</classname> + object.</para> + </listitem> + </orderedlist> + + <para>The <classname>java.sql.Connection</classname> object thus marks + the boundary between a <trademark + xlink:href="">JDK</trademark> + / <trademark + xlink:href="">JRE</trademark> + and a <trademark + xlink:href="">JDBC</trademark> + Driver implementation from e.g. Oracle or other institutions.</para> + + <para><xref linkend="jdbcArchitecture"/> does not show details about + the relations between <classname>java.sql.Connection</classname>, + <classname>java.sql.Statement</classname> and + <classname>java.sql.ResultSet</classname> objects. We start by giving + a rough description of the tasks and responsibilities these three + types have:</para> + + <glosslist> + <glossentry> + <glossterm><classname>java.sql.Connection</classname></glossterm> + + <glossdef> + <para>Holding a permanent connection to a database server. Both + client and server can contact each other. The database server + may for example terminate a transaction if problems like + deadlocks occur.</para> + </glossdef> + </glossentry> + + <glossentry> + <glossterm><classname>java.sql.Statement</classname></glossterm> + + <glossdef> + <para>We have two distinct classes of actions:</para> + + <orderedlist> + <listitem> + <para>Instructions to modify data on the database server. + These include <code>INSERT</code>, <code>UPDATE</code> and + <code>DELETE</code> operations as far as + <abbrev>SQL-DML</abbrev> is concerned. <trademark + xlink:href="">JDBC</trademark> + acts as a means of transport and merely returns integer + values back to the client like the number of rows being + affected by an UPDATE.</para> + </listitem> + + <listitem> + <para>Instructions reading data from the server. This is + done by sending SELECT statements. It is not sufficient to + just return integer values: Instead <trademark + xlink:href="">JDBC</trademark> + needs to copy complete datasets back to the client to fill + containers being accessible by applications. This is being + discussed in <xref linkend="jdbcRead"/>.</para> + </listitem> + </orderedlist> + </glossdef> + </glossentry> + </glosslist> + + <para>We shed some light on the relationship between these important + <trademark + xlink:href="">JDBC</trademark> + components and their respective creation:<figure + xml:id="jdbcObjectCreation"> + <title>Important <trademark + xlink:href="">JDBC</trademark> + instances and relationships.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/jdbcObjectRelation.fig"/> + </imageobject> + </mediaobject> + </figure></para> + </section> + + <section xml:id="writeAccessCoding"> + <title>Write access, coding!</title> + + <para>So how does it actually work with respect to coding? You may + want to read <xref linkend="toolingConfigJdbc"/> before starting your + exercises. We first prepare a database table using Eclipse's database + tools:</para> + + <figure xml:id="figSchemaPerson"> + <title>A relation <code>Person</code> containing names and email + addresses</title> + + <programlisting language="none"><emphasis role="strong">CREATE</emphasis> <emphasis + role="strong">TABLE</emphasis> Person ( + name CHAR(20) + ,email CHAR(20) <emphasis>UNIQUE</emphasis>)</programlisting> + </figure> + + <para>Our actual (toy) <trademark + xlink:href="">JDBC</trademark> + application will insert a single object ('Jim', '') into + the <code>Person</code> relation. This is simpler than reading data + since no client <classname>java.sql.ResultSet</classname> container is + needed:</para> + + <figure xml:id="figJdbcSimpleWrite"> + <title>A simple <trademark + xlink:href="">JDBC</trademark> + application inserting data into a relational table.</title> + + <programlisting language="none">01 package sda.jdbc.intro.v1; +02 +03 import java.sql.Connection; +04 import java.sql.DriverManager; +05 import java.sql.SQLException; +06 import java.sql.Statement; +07 +08 public class SimpleInsert { +09 +10 public static void main(String[] args) throws SQLException { +11 // Step 1: Open a connection to the database server +12 final Connection conn = DriverManager.getConnection( +13 "jdbc:mysql://localhost:3306/hdm", "hdmuser", "XYZ"); +14 // Step 2: Create a Statement instance +15 final Statement stmt = conn.createStatement(); +16 // Step 3: Execute the desired INSERT +17 final int updateCount = stmt.executeUpdate( +18 "INSERT INTO Person VALUES('Jim', '')"); +19 // Step 4: Give feedback to the enduser +20 System.out.println("Successfully inserted " + updateCount + " dataset(s)"); +21 } +22 }</programlisting> + </figure> + + <para>Looks simple? Unfortunately it does not (yet) work:</para> + + <programlisting language="none">Exception in thread "main" java.sql.SQLException: <emphasis + role="bold"> + No suitable driver found for jdbc:mysql://localhost:3306/hdm</emphasis> + at java.sql.DriverManager.getConnection( + at java.sql.DriverManager.getConnection( + at sda.jdbc.intro.SimpleInsert.main(</programlisting> + + <para>What's wrong here? In <xref linkend="figureConfigJdbcDriver"/> + we needed a <productname + xlink:href="">Mysql</productname> <trademark + xlink:href="">JDBC</trademark> + Driver implementation <filename>mysql-connector-java.jar</filename> as + a prerequisite to open connections to a database server. This + implementation is mandatory for our toy application as well. All we + have to do is adding <filename>mysql-connector-java.jar</filename> to + our <xref linkend="glo_Java"/> + <varname>CLASSPATH</varname> at <emphasis + role="bold">runtime</emphasis>.</para> + + <para>Depending on our <xref linkend="glo_Java"/> environment + this will be achieved by different means. Eclipse requires the + definition of a run configuration as being described in <uri + xlink:href=""></uri>. + When configuring a run-time configuration for + <classname>sda.jdbc.intro.SimpleInsert</classname> we have to add + <filename>mysql-connector-java.jar</filename> to the + <varname>Classpath</varname> tab. The following screen shot shows a + working configuration:</para> + + <figure xml:id="figureConfigRunExtJar"> + <title>Creating an Eclipse run time configuration containing a + <productname xlink:href="">Mysql</productname> + <trademark + xlink:href="">JDBC</trademark> + Driver Jar marked red.</title> + + <screenshot> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Screen/runConfigJarAnnot.screen.png" + scale="70"/> + </imageobject> + </mediaobject> + </screenshot> + </figure> + + <para>This time execution works as expected:</para> + + <programlisting language="none">Successfully inserted 1 dataset(s)</programlisting> + + <qandaset defaultlabel="qanda" xml:id="quandaentry_DupInsert"> + <title>Exception on inserting objects</title> + + <qandadiv> + <qandaentry> + <question> + <para>A second invocation of + <classname>sda.jdbc.intro.v1.SimpleInsert</classname> yields + the following runtime error:</para> + + <programlisting language="none">Exception in thread "main" + com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: + <emphasis role="bold">Duplicate entry '' for key 'email'</emphasis> +... + at com.mysql.jdbc.StatementImpl.executeUpdate( + at sda.jdbc.intro.SimpleInsert.main(</programlisting> + </question> + + <answer> + <para>This expected error is easy to understand: The + exception's message text <emphasis role="bold">Duplicate entry + 'Jim' for key 'PRIMARY'</emphasis> informs us about a UNIQUE + key constraint violation with respect to the attribute + <code>email</code> in our schema definition in <xref + linkend="figSchemaPerson"/>. We cannot add a second entry with + the same value <code>''</code>.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + + <para>It is worth to mention that the <productname + xlink:href="">Mysql</productname> driver + implementation does not have to be available at compile time. + <trademark + xlink:href="">JDBC</trademark> + defines interfaces in favour of (concrete) classes. The latter are + only required at runtime.</para> + + <para>When working with eclipse we need a separate runtime + configuration for each runnable <xref linkend="glo_Java"/> application to + add the <trademark + xlink:href="">JDBC</trademark> + driver implementation to the runtime <envar>CLASSPATH</envar>. This + may become tedious. Judging the pros and cons you may simply add + <filename>mysql-connector-java.jar</filename> to your compile time + <envar>CLASSPATH as well</envar>. As a drawback all <trademark + xlink:href="">JDBC</trademark> + implementing classes will now become visible wen e.g. hitting + auto-completion.</para> + + <para>We now discuss some important methods being defined in the + <trademark + xlink:href="">JDBC</trademark> + interfaces:</para> + + <glosslist> + <glossentry> + <glossterm><classname>java.sql.Connection</classname></glossterm> + + <glossdef> + <itemizedlist> + <listitem> + <para><link + xlink:href="">createStatement()</link></para> + </listitem> + + <listitem> + <para><link + xlink:href="">setAutoCommit()</link>, + <link + xlink:href="">getAutoCommit()</link></para> + </listitem> + + <listitem> + <para><link + xlink:href="">getWarnings()</link></para> + </listitem> + + <listitem> + <para><link + xlink:href="">isClosed()</link>, + <link + xlink:href="">isValid(int + timeout)</link></para> + </listitem> + + <listitem> + <para><link + xlink:href="">rollback()</link>, + <link + xlink:href="">commit()</link> + and .</para> + </listitem> + + <listitem> + <para><link + xlink:href="">close()</link></para> + </listitem> + </itemizedlist> + </glossdef> + </glossentry> + + <glossentry> + <glossterm><classname>java.sql.Statement</classname></glossterm> + + <glossdef> + <itemizedlist> + <listitem> + <para><link + xlink:href="">executeUpdate(String + sql)</link></para> + </listitem> + + <listitem> + <para><link + xlink:href="">getConnection()</link></para> + </listitem> + + <listitem> + <para><link + xlink:href="">getResultSet()</link></para> + </listitem> + + <listitem> + <para><link + xlink:href="">close()</link> + and <link + xlink:href="">isClosed()</link></para> + </listitem> + </itemizedlist> + </glossdef> + </glossentry> + </glosslist> + + <qandaset defaultlabel="qanda" xml:id="quandaentry_AutoCommit"> + <title><trademark + xlink:href="">JDBC</trademark> + and transactions</title> + + <qandadiv> + <qandaentry> + <question> + <para><link + xlink:href="">How + does the method setAutoCommit()</link> relate to <link + xlink:href="">commit()</link> + and <link + xlink:href="">rollback()</link>?</para> + </question> + + <answer> + <para>A connections default state is <code>autocommit == + true</code>. This means that individual SQL statements are + executed as separate transactions.</para> + + <para>If we want to group two or more statements into a + transaction we have to:</para> + + <orderedlist> + <listitem> + <para>Call + <code>connection.setAutoComit(false)</code></para> + </listitem> + + <listitem> + <para>From now on subsequent SQL statements will + implicitly become part of a transaction till either of the + three events happens:</para> + + <orderedlist numeration="loweralpha"> + <listitem> + <para><code>connection.commit()</code></para> + </listitem> + + <listitem> + <para><code>connection.rollback()</code></para> + </listitem> + + <listitem> + <para>The transaction gets aborted by the database + server. This may for example happen in case of a + deadlock conflict with a second transaction.</para> + </listitem> + </orderedlist> + + <para>Note that the first two events are initiated by our + client software. The third possible action is being + carried out by the database server.</para> + </listitem> + </orderedlist> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + + <qandaset defaultlabel="qanda" xml:id="quandaentry_Close"> + <title>Closing <trademark + xlink:href="">JDBC</trademark> + connections</title> + + <qandadiv> + <qandaentry> + <question> + <para>Why is it very important to call the close() method for + <classname>java.sql.Connection</classname> and / or + <classname>java.sql.Statement</classname> instances?</para> + </question> + + <answer> + <para>A <trademark + xlink:href="">JDBC</trademark> + connection ties network resources (socket connections). These + may be used up if e.g. new connections get established within + a loop without being closed.</para> + + <para>The situation is comparable to memory leaks when using + programming languages lacking a garbage collector.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + + <qandaset defaultlabel="qanda" xml:id="quandaentry_AbortTran"> + <title>Aborted transactions</title> + + <qandadiv> + <qandaentry> + <question> + <para>In the previous exercise we mentioned the possibility of + a transaction abort issued by the database server. Which + responsibility arises for an application programmer? Hint: How + may an implementation become aware of such an abort + transaction event?</para> + </question> + + <answer> + <para>If a database server aborts a transaction a + <classname>java.sql.SQLException</classname> will be thrown. + An application must be aware of this possibility and thus + implement a sensible <code>catch(...)</code> clause + accordingly.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + + <qandaset defaultlabel="qanda" xml:id="exerciseJdbcWhyInterface"> + <title>Interfaces and classes in <trademark + xlink:href="">JDBC</trademark></title> + + <qandadiv> + <qandaentry> + <question> + <para>The <trademark + xlink:href="">JDBC</trademark> + standard mostly defines interfaces as + <classname>java.sql.Connection</classname> and + <classname>java.sql.Statement</classname>. Why are these not + being defined as classes? Moreover why is + <classname>java.sql.DriverManager</classname> being defined as + a class rather than an interface?</para> + + <para>You may want to supply code examples to explain your + argumentation.</para> + </question> + + <answer> + <para>Figure <xref linkend="jdbcArchitecture"/> tells us about + the vendor independent architecture of <trademark + xlink:href="">JDBC</trademark>. + Oracle for example may implement a class + <code></code>:</para> + + <programlisting annotations="nojavadoc" language="java">package; + +import java.sql.Connection; +import java.sql.Statement; +import java.sql.SQLException; + +public class OracleConnection implements Connection { + +... + +Statement createStatement(int resultSetType, + int resultSetConcurrency) + throws SQLException) { + // Implementation omitted here due to + // limited personal hacking capabilities + ... +} +... +}</programlisting> + + <para>If a programmer only uses the <trademark + xlink:href="">JDBC</trademark> + interfaces rather than a vendor's classes it is much easier to + make the resulting application work with different databases + from other vendors. This way a company's implementation is not + exposed to our own <xref linkend="glo_Java"/> + code.</para> + + <para>Regarding the special role of + <classname>java.sql.DriverManager</classname> we notice the + need of a starting point: We have to create an initial + instance of some class. In theory (<emphasis role="bold">BUT + NOT IN PRACTICE!!!</emphasis>) the following (ugly code) might + be possible:</para> + + <programlisting language="none">package my.personal.application; + +import java.sql.Connection; +import java.sql.Statement; +import java.sql.SQLException; + +public someClass { + + public void someMethod(){ + + Connection conn = <emphasis role="bold">new OracleConnection()</emphasis>; // bad idea! + ... + } + ... +}</programlisting> + + <para>The problem with this approach is the explicit + constructor call: Whenever we want to use another database we + have two possibilities:</para> + + <itemizedlist> + <listitem> + <para>Rewrite our code.</para> + </listitem> + + <listitem> + <para>Introduce some sort of switch statement to provide a + fixed number of databases beforehand:</para> + + <programlisting language="none">public void someMethod(final String vendor){ + + final Connection conn; + + switch(vendor) { + case "ORACLE": + conn = new OracleConnection(); + break; + + case "DB2": + conn = new Db2Connection(); + break; + + default: + conn = null; + break; + } + ... +}</programlisting> + + <para>Adding a new database still requires code + rewriting.</para> + </listitem> + </itemizedlist> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + + <qandaset defaultlabel="qanda" xml:id="quandaentry_DriverDispatch"> + <title>Driver dispatch mechanism</title> + + <qandadiv> + <qandaentry> + <question> + <para>In exercise <xref linkend="exerciseJdbcWhyInterface"/> + we saw a hypothetic way to resolve the interface/class + resolution problem by using a switch clause. How is this + <code>switch</code> clause's logic actually realized in a + <trademark + xlink:href="">JDBC</trademark> + based application? (<quote>behind the scenes</quote>)</para> + + <para>Hint: Read the documentation of + <classname>java.sql.DriverManager</classname>.</para> + </question> + + <answer> + <para>Prior to opening a Connection a <trademark + xlink:href="">JDBC</trademark> + driver registers itself at the + <classname>java.sql.DriverManager</classname> singleton + instance. For this purpose the standard defined the method + <link + xlink:href="">registerDriver(Driver)</link>. + On success the <classname>java.sql.DriverManager</classname> + adds the driver to an internal dictionary:</para> + + <informaltable border="1"> + <col width="20%"/> + + <col width="30%"/> + + <tr> + <th>protocol</th> + + <th>driver instance</th> + </tr> + + <tr> + <td>jdbc:mysql</td> + + <td>mysqlDriver instance</td> + </tr> + + <tr> + <td>jdbc:oracle</td> + + <td>oracleDriver instance</td> + </tr> + + <tr> + <td>...</td> + + <td>...</td> + </tr> + </informaltable> + + <para>So whenever the method <link + xlink:href=",%20java.lang.String,%20java.lang.String)">getConnection()</link> + is being called the + <classname>java.sql.DriverManager</classname> will scan the + <trademark + xlink:href="">JDBC</trademark> + URL and isolate the protocol part. If we start with + <code>jdbc:mysql://</code> + this is just <code>jdbc:mysql</code>. The value is then being + looked up in the above table of registered drivers to choose + an appropriate instance or null otherwise. This way our + hypothetic switch including the default value null is actually + implemented.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="propertiesFile"> + <title>Connection properties</title> + + <para>So far our application depicted in <xref + linkend="figJdbcSimpleWrite"/> suffers both from missing error + handling and hard-coded parameters.</para> + + <para>Professional applications must be configurable. Changing the + password currently requires source code modification and + recompilation. <xref linkend="glo_Java"/> offers a + standard procedure to externalize parameters like + <varname>username</varname>, <varname>password</varname> an <trademark + xlink:href="">JDBC</trademark> + connection URL as being present in <xref + linkend="figJdbcSimpleWrite"/>: We may externalize these parameters to + external so called properties files:</para> + + <figure xml:id="propertyExternalization"> + <title>Externalize a single string <code>"User name"</code> to a + separate file <filename></filename>.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/externalize.fig"/> + </imageobject> + </mediaobject> + </figure> + + <para>The current figure shows the externalization of just a single + property. The file <filename></filename> contains + key-value pairs. The key <code>PropHello.uname</code> contains the + value <code>User name</code>. Multiple strings may be externalized to + the same properties file.</para> + + <para>Eclipse does have tool support for externalization. Simply hit + Source --> Externalize Strings from the context menu. This + activates a wizard to define property keys, renaming the generated + helper class' name and finally create the actual + <filename></filename> file.</para> + + <qandaset defaultlabel="qanda" xml:id="quandaentry_WritProps"> + <title>Moving <trademark + xlink:href="">JDBC</trademark> + <abbrev + xlink:href="">URL</abbrev> and + credentials to a property file</title> + + <qandadiv> + <qandaentry> + <question> + <para>Start executing the code given in <xref + linkend="figJdbcSimpleWrite"/>. Then extend this example by + externalizing all <trademark + xlink:href="">JDBC</trademark> + related connection parameters to a + <filename></filename> file like:</para> + + <programlisting language="none">SimpleInsert.jdbcUrl=jdbc:mysql://localhost:3306/hdm +SimpleInsert.password=XYZ +SimpleInsert.username=hdmuser</programlisting> + + <para>As being stated earlier the eclipse wizard assists you + by generating both the properties file and a helper class + reading that file at runtime.</para> + </question> + + <answer> + <para>The current exercise is mostly related to tooling. From + our <xref linkend="glo_Java"/> code + the context menu allows us to choose the desired + wizard:</para> + + <informalfigure> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Screen/externalize.screen.png"/> + </imageobject> + </mediaobject> + </informalfigure> + + <para>We may now:</para> + + <itemizedlist> + <listitem> + <para>Select the strings to be externalized.</para> + </listitem> + + <listitem> + <para>Supply key names. In the subsequent screenshot this + task has already been started by manually replacing the + default <code>SimpleInsert.1</code> by + <code>Simpleinsert.jdbc</code>.</para> + </listitem> + + <listitem> + <para>Redefine other parameters like prefix, properties + file name etc. In the following screenshot only the first + of three keys has been manually renamed to the sensible + value <varname>SimpleInsert.jdbc</varname>.</para> + </listitem> + </itemizedlist> + + <informalfigure> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Screen/externalize2.screen.png"/> + </imageobject> + </mediaobject> + </informalfigure> + + <para>The wizard also generates a class + <classname>sda.jdbc.intro.v1.DbProps</classname> to actually + access our properties:</para> + + <programlisting language="none">package sda.jdbc.intro.v1; +... +public class DbProps { + private static final String BUNDLE_NAME = "sda.jdbc.intro.v1.database"; + + private static final ResourceBundle RESOURCE_BUNDLE = ResourceBundle + .getBundle(BUNDLE_NAME); + + private DbProps() { + } + + public static String getString(String key) { + try { + return RESOURCE_BUNDLE.getString(key); + } catch (MissingResourceException e) { + return '!' + key + '!'; + } + } +}</programlisting> + + <para>Our <trademark + xlink:href="">JDBC</trademark> + related code now contains three references to external + properties:</para> + + <programlisting language="none">package sda.jdbc.intro.v1; +... +public class SimpleInsert { + + + public static void main(String[] args) throws SQLException { + // Step 1: Open a connection to the database server + final Connection conn = DriverManager.getConnection ( + <emphasis role="bold">DbProps.getString("PersistenceHandler.jdbcUrl"), </emphasis> + <emphasis role="bold">DbProps.getString("PersistenceHandler.username")</emphasis>, + <emphasis role="bold">DbProps.getString("PersistenceHandler.password")</emphasis>); + // Step 2: Create a Statement instance + final Statement stmt = conn.createStatement(); + // Step 3: Execute the desired INSERT + final int updateCount = stmt.executeUpdate( + "INSERT INTO Person VALUES('Jim', '')"); + // Step 4: Give feedback to the enduser + System.out.println("Successfully inserted " + updateCount + " dataset(s)"); + } +}</programlisting> + + <para>The current base name + <classname>sda.jdbc.intro.v1.PersistenceHandler</classname> is + related to a later exercise.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="xmldata2rdbms"> + <title>Moving data from XML to relational systems</title> + + <qandaset defaultlabel="qanda" xml:id="qandaXmldata2relational"> + <title>Avoiding intermediate <xref linkend="glo_SQL"/> file + export</title> + + <qandadiv> + <qandaentry> + <question> + <para>In <xref linkend="quandaentry_SqlFromXml"/> you + implemented a <xref linkend="glo_SAX"/> application + transforming XML product catalog instances into a series of + SQL statements. Modify your solution by directly inserting + corresponding data by means of <xref linkend="glo_JDBC"/> into + a relational database.</para> + + <para>Error handling may be implemented by simply issuing a + corresponding message before exiting the application. In order + to assure data integrity transferring data shall be realized + in a all-or-nothing fashion by grouping all + <code>INSERT</code>s into a single transaction. You may want + to read about <link + xlink:href="">setAutoCommit(boolean + autoCommit)</link> and <link + xlink:href="">commit()</link> + for this purpose.</para> + </question> + + <answer> + <annotation role="make"> + <para role="eclipse">P/catalog2rdbms</para> + </annotation> + + <para>This solution requires a <command>mvn</command> + <option>install</option> on dependent project + <quote>saxerrorhandler</quote>:</para> + + <annotation role="make"> + <para role="eclipse">P/saxerrorhandler</para> + </annotation> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="sectSimpleInsertGui"> + <title>A first GUI sketch</title> + + <para>So far all data records being transferred to the database server + are still hard-coded in our application. In practice a user wants to + enter data of persons to be submitted to the database.</para> + + <para>We now guide you to develop a first version of a simple GUI for + this tasks. A more <link linkend="figureDataInsert2">elaborate + version</link> will be presented in a follow-up exercise. The + screenshot illustrates the intended application behaviour:</para> + + <figure xml:id="simpleInsertGui"> + <title>A simple GUI to insert data into a database server.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Screen/simpleInsertGui.screen.png"/> + </imageobject> + + <caption> + <para>After clicking <quote>Insert</quote> a message is being + presented to the user. This message may as well indicate a + failure.</para> + </caption> + </mediaobject> + </figure> + + <para>Implementing Swing GUI applications requires knowledge as being + taught in e.g. <link + xlink:href="">113300 + Entwicklung von Web-Anwendungen</link>. If you do not (yet) feel + comfortable writing <productname + xlink:href="">Swing</productname> + applications you may want to read <uri + xlink:href=""></uri> + and <emphasis role="bold">really</emphasis> understand the examples + being presented therein.</para> + + <qandaset defaultlabel="qanda" xml:id="quandaentry_GuiDb"> + <title>GUI for inserting Person data to a database server</title> + + <qandadiv> + <qandaentry> + <question> + <para>Write a GUI application as being outlined in <xref + linkend="simpleInsertGui"/>. You may proceed as + follows:</para> + + <orderedlist> + <listitem> + <para>Write a dummy GUI without any database + functionality. Only present the two labels an input fields + and the Insert button.</para> + </listitem> + + <listitem> + <para>Add an + <classname>java.awt.event.ActionListener</classname> which + generates a SQL INSERT Statement when clicking the Insert + button. Return this string to the user as being shown in + the message window of <xref + linkend="simpleInsertGui"/>.</para> + + <para>At this point you still do not need a database + connection. The message shown to the user is just a fake, + so the GUI <emphasis role="bold">appears</emphasis> to be + working.</para> + </listitem> + + <listitem> + <para>Establish a + <classname>java.sql.Connection</classname> and create a + <classname>java.sql.Statement</classname> instance when + launching your application. Use the latter in your + <classname>java.awt.event.ActionListener</classname> to + actually insert datasets into your database.</para> + </listitem> + </orderedlist> + </question> + + <answer> + <para>The complete implementation resides in + <classname>sda.jdbc.intro.v01.InsertPerson</classname>:</para> + + <programlisting language="none">package sda.jdbc.intro.v01; + +import ... + +public class InsertPerson extends JFrame { + + ... + + public InsertPerson () throws SQLException{ + super ("Add a person's data"); + + setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); + + final JPanel databaseFieldPanel = new JPanel(); + databaseFieldPanel.setLayout(new GridLayout(0,2)); + add(databaseFieldPanel, BorderLayout.CENTER); + + databaseFieldPanel.add(new JLabel("Name:")); + final JTextField nameField = new JTextField(15); + databaseFieldPanel.add(nameField); + + databaseFieldPanel.add(new JLabel("E-mail:")); + final JTextField emailField = new JTextField(15); + databaseFieldPanel.add(emailField); + + final JButton insertButton = new JButton("Insert"); + add(insertButton, BorderLayout.SOUTH); + + final Connection conn = DriverManager.getConnection( + "jdbc:mysql://localhost:3306/hdm", "hdmuser", "XYZ"); + final Statement stmt = conn.createStatement(); + + insertButton.addActionListener(new ActionListener() { + // Linking the GUI to the database server. We assume an open + // connection and a correctly initialized Statement instance + @Override + public void actionPerformed(ActionEvent event) { + final String sql = "INSERT INTO Person VALUES('" + nameField.getText()+ "', '" + + emailField.getText() + "')"; + // We have to catch this Exception because an ActionListener's signature + // prohibits the existence of a "throws" clause. + try { + final int updateCount = stmt.executeUpdate(sql); + JOptionPane.showMessageDialog(null, "Successfully executed \n'" + sql + "'\nand inserted " + + updateCount + " dataset"); + } catch (SQLException e) { + e.printStackTrace(); + } + } + }); + pack(); + } +}</programlisting> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="jdbcExceptions"> + <title>Handling possible exceptions</title> + + <para>Our current code lacks any kind of error handling: Exceptions + will not be caught at all and invariably lead to program termination. + This is of course inadequate regarding professional software. In case + of problems we have to:</para> + + <itemizedlist> + <listitem> + <para>Gracefully recover or shut down our application. We may for + example show a pop up window <quote>Terminating due to an internal + error</quote>.</para> + </listitem> + + <listitem> + <para>Enable the customer to supply the development team with + helpful information. The user may for example be asked to submit a + log file in case of errors.</para> + </listitem> + </itemizedlist> + + <para>In addition the solution + <classname>sda.jdbc.intro.v01.InsertPerson</classname> contains an + ugly mix of GUI components and database related code. We take a first + step to decouple these two distinct concerns:</para> + + <qandaset defaultlabel="qanda" xml:id="quandaentry_DbLayer"> + <title>Handling the database layer</title> + + <qandadiv> + <qandaentry> + <question> + <para>Implement a class <code>PersistenceHandler</code> to be + later used as a component of our next step GUI application + prototype. This class should have the following + methods:</para> + + <programlisting language="none">... +/** + * Handle database communication. There are two + * distinct internal states <q>disconnected</q> and <q>connected</q>, see + * {@link #isConnected()}. These two states may be toggled by invoking + * {@link #connect()} and {@link #disconnect()} respectively. + * + * The following snippet illustrates the intended usage: + * <pre> public static void main(String[] args) { + final PersistenceHandler ph = new PersistenceHandler(); + if (ph.connect()) { + if (!ph.add("Jim", "")) { + System.err.println("Insert Error:" + ph.getErrorMessage()); + } + } else { + System.err.println("Connect error:" + ph.getErrorMessage()); + } + }</pre> + * + * @author goik + */ +public class PersistenceHandler { + ... + /** + * Instance in <q>disconnected</q> state. See {@link #isConnected()} + */ + public PersistenceHandler() {/* only present here to supply Javadoc comment */} + + /** + * Inserting a (name, email) record into the database server. In case of + * errors corresponding messages may subsequently be retrieved by calling + * {@link #getErrorMessage()}. + * + * <dt><b>Precondition:</b></dt> <dd>must be in + * <q>connected</q> state, see {@link #isConnected()}</dd> + * + * @param name + * A person's name + * @param email + * A person's email address + * + * @return true if the current data record has been successfully inserted + * into the database server. false in case of error(s). + */ + public boolean add(final String name, final String email){ + ... + } + + /** + * Retrieving error messages in case a call to {@link #add(String, String)}, + * {@link #connect()}, or {@link #disconnect()} yields an error. + * + * @return the error explanation corresponding to the latest failed + * operation, null if no error yet occurred. + */ + public String getErrorMessage() { + return ...; + } + + /** + * Open a connection to a database server. + * + * <dt><b>Precondition:</b><dd> + * <dd>must be in <q>disconnected</q> state, see {@link #isConnected()}</dd> + * + * <dt><b>Precondition:</b><dd> + * <dd>The following properties must be set: + * <pre>PersistenceHandler.jdbcUrl=jdbc:mysql://localhost:3306/hdm +PersistenceHandler.password=XYZ +PersistenceHandler.username=foo</pre> + * </dd> + * + * @return true if connecting was successful + */ + public boolean connect () { + ... + } + + /** + * Close a connection to a database server and clean up JDBC related resources + * + * Error messages in case of failure may subsequently be retrieved by + * calling {@link #getErrorMessage()}. + * + * <dt><b>Precondition:</b></dt> + * <dd>must be in <q>connected</q> state, see {@link #isConnected()}</dd> + * + * @return true if disconnecting was successful, false in case error(s) occur. + */ + public boolean disconnect() { + ... + } + + /** + * An instance can either be in <q>connected</q> or <q>disconnected</q> state. The + * state can be toggled by invoking {@link #connect()} or + * {@link #disconnect()} respectively. + * + * @return true if connected, false otherwise + */ + public boolean isConnected() { + return ...; + } +}</programlisting> + + <para>Notice the two internal states + <quote>disconnected</quote> and + <quote>connected</quote>:</para> + + <figure xml:id="figPersistenceHandlerStates"> + <title>Possible states and transitions for instances of + <code>PersistenceHandler</code>.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/persistHandlerStates.fig"/> + </imageobject> + </mediaobject> + </figure> + + <para>According to the above documentation a newly created + <code>PersistenceHandler</code> instance should be in + disconnected state. As being shown in the <xref linkend="glo_Java"/> class + description you may test your implementation without any GUI + code. If you are already familiar with unit testing this might + be a good start as well.</para> + </question> + + <answer> + <para>We show a possible implementation of + <classname>sda.jdbc.intro.v1.PersistenceHandler</classname>:</para> + + <programlisting language="none">package sda.jdbc.intro.v1; +... + +public class PersistenceHandler { + + Connection conn = null; + Statement stmt = null; + + String errorMessage = null; + + /** + * New instances are in <q>disconnected</q> state. See {@link #isConnected()} + */ + public PersistenceHandler() {/* only present here to supply Javadoc comment */} + + /** + * Inserting a (name, email) record into the database server. In case of + * errors corresponding messages may subsequently be retrieved by calling + * {@link #getErrorMessage()}. + * + * <dt><b>Precondition:</b></dt> <dd>must be in + * <q>connected</q> state, see {@link #isConnected()}</dd> + * + * @param name + * A person's name + * @param email + * A person's email address + * + * @return true if the current data record has been successfully inserted + * into the database server. false in case of error(s). + */ + public boolean add(final String name, final String email){ + final String sql = "INSERT INTO Person VALUES('" + name + "', '" + + email + "')"; + try { + stmt.executeUpdate(sql); + return true; + } catch (SQLException e) { + errorMessage = "Unable to execute '" + sql + "': '" + e.getMessage() + "'"; + return false; + } + } + + /** + * Retrieving error messages in case a call to {@link #add(String, String)}, + * {@link #connect()}, or {@link #disconnect()} yields an error. + * + * @return the error explanation corresponding to the latest failed + * operation, null if no error yet occurred. + */ + public String getErrorMessage() { + return errorMessage; + } + + /** + * Open a connection to a database server. + * + * <dt><b>Precondition:</b><dd> + * <dd>must be in <q>disconnected</q> state, see {@link #isConnected()}</dd> + * + * <dt><b>Precondition:</b><dd> + * <dd>The following properties must be set: + * <pre>PersistenceHandler.jdbcUrl=jdbc:mysql://localhost:3306/hdm +PersistenceHandler.password=XYZ +PersistenceHandler.username=foo</pre> + * </dd> + * + * @return true if connecting was successful + */ + public boolean connect () { + try { + conn = DriverManager.getConnection( + DbProps.getString("PersistenceHandler.jdbcUrl"), + DbProps.getString("PersistenceHandler.username"), + DbProps.getString("PersistenceHandler.password")); + try { + stmt = conn.createStatement(); + return true; + } catch (SQLException e) { + errorMessage = "Connection opened but Statement creation failed:\"" + e.getMessage() + "\"."; + try { + conn.close(); + } catch (SQLException ee) { + errorMessage += "Closing connection failed:\"" + e.getMessage() + "\"."; + } + conn = null; + } + + } catch (SQLException e) { + errorMessage = "Unable to open connection:\"" + e.getMessage() + "\"."; + } + return false; + } + + /** + * Close a connection to a database server and clean up JDBC related resources + * + * Error messages in case of failure may subsequently be retrieved by + * calling {@link #getErrorMessage()}. + * + * <dt><b>Precondition:</b></dt> + * <dd>must be in <q>connected</q> state, see {@link #isConnected()}</dd> + * + * @return true if disconnecting was successful, false in case error(s) occur. + */ + public boolean disconnect() { + boolean resultStatus = true; + final StringBuffer messageCollector = new StringBuffer(); + try { + stmt.close(); + } catch (SQLException e) { + resultStatus = false; + messageCollector.append("Unable to close Statement:\"" + e.getMessage() + "\"."); + } + stmt = null; + try { + conn.close(); + } catch (SQLException e) { + resultStatus = false; + messageCollector.append("Unable to close connection:\"" + e.getMessage() + "\"."); + } + conn = null; + if (!resultStatus) { + errorMessage = messageCollector.toString(); + } + return resultStatus; + } + + /** + * An instance can either be in <q>connected</q> or <q>disconnected</q> state. The + * state can be toggled by invoking {@link #connect()} or + * {@link #disconnect()} respectively. + * + * @return true if connected, false otherwise + */ + public boolean isConnected() { + return null != conn; + } +}</programlisting> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + + <para>We may now complete the next enhancement step of our GUI + database client.</para> + + <qandaset defaultlabel="qanda" xml:id="exerciseGuiWriteTakeTwo"> + <title>Connection on user action</title> + + <qandadiv> + <qandaentry> + <question> + <label>An application writing records to a database + server</label> + + <para>Our aim is to enhance the first GUI prototype being + described in <xref linkend="simpleInsertGui"/>. The + application shall start being disconnected from the database + server. Prior to entering data the user shall be guided to + open a connection. The following video illustrates the desired + user interface:</para> + + <figure xml:id="figureDataInsert2"> + <title>A GUI frontend for adding personal data to a + server.</title> + + <mediaobject> + <videoobject> + <videodata fileref="Ref/Video/dataInsert.mp4"/> + </videoobject> + </mediaobject> + </figure> + + <para>In case a user closes the main window while still being + connected a disconnect from the database server shall be + enforced. For this purpose we must handle the event when the + user clicks on the closing button within the window + decoration. An exit handler method is being required to + terminate a potentially open database connection.</para> + </question> + + <answer> + <para>Our implementation uses the class + <classname>sda.jdbc.intro.v1.PersistenceHandler</classname> + for handling all database communication. The GUI needs to + visualize the two different states <quote>disconnected</quote> + and <quote>connected</quote>. In <quote>disconnected</quote> + state the whole input pane for entering datasets and clicking + the <quote>Insert</quote> button is locked. So the user is + forced to actively open a database connection.</para> + + <para>Notice also the + <classname>java.awt.event.WindowAdapter</classname> + implementation being executed when closing the application's + main window. The + <methodname>java.awt.event.WindowAdapter.windowClosing(java.awt.event.WindowEvent)</methodname> + method disconnects any existing database connection thus + freeing resources.</para> + + <programlisting language="none">package sda.jdbc.intro.v1; + +import ... + +public class InsertPerson extends JFrame { + + private static final long serialVersionUID = 6815975741605247675L; + + final PersistenceHandler persistenceHandler = new PersistenceHandler(); + + final JTextField nameField = new JTextField(15), + emailField = new JTextField(20); + + final JButton toggleConnectButton = new JButton(), + insertButton = new JButton("Insert"); + + final JPanel databaseFieldPanel = new JPanel(); + + private void setGuiConnectionState(final boolean state) { + if (state) { + toggleConnectButton.setText("Disconnect"); + } else { + toggleConnectButton.setText("Connect"); + } + for (final Component c: databaseFieldPanel.getComponents()){ + c.setEnabled(state); + } + } + + public static void main(String[] args) throws SQLException { + InsertPerson app = new InsertPerson(); + app.setVisible(true); + } + + public InsertPerson (){ + super ("Add a person's data"); + + setSize(500, 500); + + addWindowListener(new WindowAdapter() { + // In case a user closes our application window while still being connected + // we have to close the database connection. + @Override + public void windowClosing(WindowEvent e) { + super.windowClosing(e); + if (persistenceHandler.isConnected() && !persistenceHandler.disconnect()) { + System.exit(1); + } else { + System.exit(0); + } + }); + Box top = Box.createHorizontalBox(); + add(top, BorderLayout.NORTH); + top.add(toggleConnectButton); + + toggleConnectButton.addActionListener(new ActionListener() { + + @Override + public void actionPerformed(ActionEvent e) { + if (persistenceHandler.isConnected()) { + if (persistenceHandler.disconnect()){ + setGuiConnectionState(false); + } else { + JOptionPane.showMessageDialog(null, persistenceHandler.getErrorMessage()); + } + } else { + if (persistenceHandler.connect()){ + setGuiConnectionState(true); + } else { + JOptionPane.showMessageDialog(null, persistenceHandler.getErrorMessage()); + } + } + } + }); + + databaseFieldPanel.setLayout(new GridLayout(0,2)); + add(databaseFieldPanel); + + databaseFieldPanel.add(new JLabel("Name:")); + databaseFieldPanel.add(nameField); + + databaseFieldPanel.add(new JLabel("E-mail:")); + databaseFieldPanel.add(emailField); + + insertButton.addActionListener(new ActionListener() { + @Override + public void actionPerformed(ActionEvent e) { + if (persistenceHandler.add(nameField.getText(), emailField.getText())) { + nameField.setText(""); + emailField.setText(""); + JOptionPane.showMessageDialog(null, "Succesfully inserted dataset"); + } else { + JOptionPane.showMessageDialog(null, persistenceHandler.getErrorMessage()); + } + } + }); + databaseFieldPanel.add(Box.createGlue()); + databaseFieldPanel.add(insertButton); + setGuiConnectionState(false); + pack(); + } +}</programlisting> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="jdbcSecurity"> + <title><trademark + xlink:href="">JDBC</trademark> + and security</title> + + <section xml:id="jdbcSecurityNetwork"> + <title>Network sniffing</title> + + <para>Sniffing <trademark + xlink:href="">JDBC</trademark> + network traffic is one possibility for intruders to compromise + database applications. This requires physical access to either + of:</para> + + <itemizedlist> + <listitem> + <para>Server host</para> + </listitem> + + <listitem> + <para>Client host</para> + </listitem> + + <listitem> + <para>intermediate hub, switch or router.</para> + </listitem> + </itemizedlist> + + <figure xml:id="figJdbcSniffing"> + <title>Sniffing a <trademark + xlink:href="">JDBC</trademark> + connection by an intruder.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/jdbcSniffing.fig"/> + </imageobject> + </mediaobject> + </figure> + + <para>We demonstrate a possible attack by analyzing the network + traffic between our application shown in <xref + linkend="figJdbcSimpleWrite"/> and the <productname + xlink:href="">Mysql</productname> database + server. Prior to starting the application we set up <productname + xlink:href="">Wireshark</productname> for + filtered capturing:</para> + + <itemizedlist> + <listitem> + <para>Connecting to the <varname>loopback</varname> (lo) + interface only. This is sufficient since our client connects to + <varname>localhost</varname>.</para> + </listitem> + + <listitem> + <para>Filtering packets if not of type <acronym + xlink:href="">TCP</acronym> + and having port number 3306</para> + </listitem> + </itemizedlist> + + <para>This yields the following capture being shortened for the sake + of brevity:</para> + + <programlisting language="none">[... +5.5.24-0ubuntu0.12.04.1.%...X*e?I1ZQ...................e,F[yoA5$T[N.mysql_native_password. + A...........!.......................hdmuser <co xml:id="tcpCaptureUsername"/>......U.>S.%..~h...!.xhdm............j..../* + + ... INSERT INTO Person VALUES('Jim', '') <co + xml:id="tcpCaptureSqlInsert"/>6... + .&.#23000Duplicate entry '' for key 'email' <co + xml:id="tcpCaptureErrmsg"/></programlisting> + + <calloutlist> + <callout arearefs="tcpCaptureUsername"> + <para>The <varname>username</varname> initiating the connection + to the database server.</para> + </callout> + + <callout arearefs="tcpCaptureSqlInsert"> + <para>The <code>INSERT ...</code> statement.</para> + </callout> + + <callout arearefs="tcpCaptureErrmsg"> + <para>The resulting error message being sent back to the + client.</para> + </callout> + </calloutlist> + + <para>Something seems to be missing here: The user's password. Our + code in <xref linkend="figJdbcSimpleWrite"/> contains the password + <quote><varname>XYZ</varname></quote> in clear text. But even using + the search function of <productname + xlink:href="">Wireshark</productname> does + not show any such string within the above capture. The <productname + xlink:href="">Mysql</productname> documentation + however <link + xlink:href="">reveals</link> + that everything but the password is transmitted in clear text. So + all we might identify is a hash of <code>XYZ</code>.</para> + + <para>So regarding our (current) <productname + xlink:href="">Mysql</productname> implementation + the impact of this attack type is somewhat limited but still severe: + All data being transmitted between client and server may be + disclosed. This typically comprises sensible data as well. Possible + solutions:</para> + + <itemizedlist> + <listitem> + <para>Create an encrypted tunnel between client and server like + e.g. <link + xlink:href="">ssh + port forwarding</link> or <link + xlink:href="">VPN</link>.</para> + </listitem> + + <listitem> + <para>Many database vendors <link + xlink:href="">supply + SSL</link> or similar <trademark + xlink:href="">JDBC</trademark> + protocol encryption extensions. This requires additional + configuration procedures like setting up server side + certificates. Moreover similar to the http/https protocols + encryption generally slows down data traffic.</para> + </listitem> + </itemizedlist> + + <para>Of course this is only relevant if the transport layer is + considered to be insecure. If both server and client reside within + the same trusted infrastructure no action has to be taken. We also + note that this kind of problem is not limited to <trademark + xlink:href="">JDBC</trademark>. + In fact all protocols lacking encryption are subject to this type of + attack.</para> + </section> + + <section xml:id="sqlInjection"> + <title>SQL injection</title> + + <para>Before diving into technical details we shed some light on the + possible impact of this common attack type being described in this + chapter. Our example is the well known Heartland Payment Systems + data breach:</para> + + <figure xml:id="figHeartlandSecurityBreach"> + <title>Summary about possible SQL injection impact based on the + Heartland security breach</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/heartland.fig"/> + </imageobject> + </mediaobject> + </figure> + + <para>Why should we be concerned with SQL injection? In the + introduction of <xref linkend="bib_Clarke09"/> a compelling argument + is being given:</para> + + <blockquote> + <para>Many people say they know what SQL injection is, but all + they have heard about or experienced are trivial examples. SQL + injection is one of the most devastating vulnerabilities to impact + a business, as it can lead to exposure of all of the sensitive + information stored in an application's database, including handy + information such as usernames, passwords, names, addresses, phone + numbers, and credit card details.</para> + </blockquote> + + <para>In this lecture due to limited resources we only deal with + trivial examples mentioned above. One possible way SQL injection + attacks work is by inserting SQL code into fields being designed for + end user input:</para> + + <figure xml:id="figSqlInject"> + <title>SQL injection triggered by ordinary user input.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/sqlinject.fig"/> + </imageobject> + </mediaobject> + </figure> + + <qandaset defaultlabel="qanda" xml:id="sqlInjectDropTable"> + <title>Attack from the dark side</title> + + <qandadiv> + <qandaentry> + <question> + <para>Use the application from <xref + linkend="exerciseGuiWriteTakeTwo"/> and <xref + linkend="figSqlInject"/> to launch a SQL injection attack. + We provide some hints:</para> + + <orderedlist> + <listitem> + <para>The <productname + xlink:href="">Mysql</productname> + <trademark + xlink:href="">JDBC</trademark> + driver implementation already provides precautions to + hamper SQL injection attacks. In its default + configuration a sequence of SQL commands separated by + semicolons (<quote>;</quote>) will not be executed but + flagged as a SQL syntax error. We take an + example:</para> + + <programlisting language="none">INSERT INTO Person VALUES (...);DROP TABLE Person</programlisting> + + <para>In order to execute these so called multi user + queries we explicitly have to enable a <productname + xlink:href="">Mysql</productname> + property. This may be achieved by extending our + <trademark + xlink:href="">JDBC</trademark> + URL:</para> + + <programlisting language="none">jdbc:mysql://localhost:3306/hdm?<emphasis + role="bold">allowMultiQueries=true</emphasis></programlisting> + + <para>The <productname + xlink:href="">Mysql</productname> + manual <link + xlink:href="">contains + </link>a remark regarding this parameter:</para> + + <remark>Notice that this has the potential for SQL + injection if using plain java.sql.Statements and your + code doesn't sanitize input correctly.</remark> + + <para>In other words: You have been warned!</para> + </listitem> + + <listitem> + <para>You may now use either of the two input fields + <quote>name</quote> or <quote>email</quote> to inject + arbitrary SQL code.</para> + </listitem> + </orderedlist> + </question> + + <answer> + <para>We construct a suitable string being injected to drop + our <code>Person</code> table:</para> + + <programlisting language="none">Jim', '');DROP TABLE Person;INSERT INTO Person VALUES('Joe</programlisting> + + <para>This being entered into the name field kills our + <code>Table</code> relation effectively. As the error + message shows two INSERT statements are separated by a DROP + TABLE statement. So after executing the first INSERT our + database server drops the whole table. At last the second + INSERT statement fails giving rise to an error message no + end user will ever understand:</para> + + <figure xml:id="figSqlInjectDropPerson"> + <title>Dropping the <code>Person</code> table by SQL + injection</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Screen/sqlInject.screen.png"/> + </imageobject> + </mediaobject> + </figure> + + <para>According to the message text the table + <code>Person</code> gets dropped as expected. Thus the + subsequent (second) <code>INSERT</code> action is bound to + fail.</para> + + <para>In practice this result my be avoided. The database + user will (hopefully!) not have sufficient permissions to + drop the whole table. Malicious modifications by INSERT, + UPDATE or DELETE statements are still possible.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="sanitizeUserInput"> + <title>Sanitizing user input</title> + + <para>There are at least two general ways to deal with the + disastrous result of <xref linkend="sqlInjectDropTable"/>:</para> + + <itemizedlist> + <listitem> + <para>Keep the database server from interpreting user input + completely. This is probably the best way and will be discussed + in <xref linkend="sectPreparedStatements"/>.</para> + </listitem> + + <listitem> + <para>Let the application check and process user input. + Dangerous user input may be modified prior to being embedded in + SQL statements or being rejected completely.</para> + </listitem> + </itemizedlist> + + <para>The first method is definitely superior in most cases. There + are however cases where the restrictions being implied are too + severe. We may for example choose dynamically which tables shall be + accessed. So an SQL statement's structure rather than just its + predicates is affected by user input. There are at least two + standard procedures dealing with this problem:</para> + + <glosslist> + <glossentry> + <glossterm>Input Filtering</glossterm> + + <glossdef> + <para>In the simplest case we check a user's input by regular + expressions. An example is an input field in a login window + representing a system user name. Legal input may allows + letters and digits only. Special characters, whitespace etc. + are typically prohibited. The input does have a minimum length + of one character. A maximum length may be imposed as well. So + we may choose the regular expression <code>[A-Za-z0-9]+</code> + to check valid user names.</para> + </glossdef> + </glossentry> + + <glossentry> + <glossterm><foreignphrase>Whitelisting</foreignphrase></glossterm> + + <glossdef> + <para>In many cases Input fields only allow a restricted set + of values. Consider an input field for names of planets. An + application may keep a dictionary table to validate user + input:</para> + + <informaltable border="1"> + <col width="10%"/> + + <col width="5%"/> + + <tr> + <td>Mercury</td> + + <td>1</td> + </tr> + + <tr> + <td>Venus</td> + + <td>2</td> + </tr> + + <tr> + <td>Earth</td> + + <td>3</td> + </tr> + + <tr> + <td>...</td> + + <td>...</td> + </tr> + + <tr> + <td>Neptune</td> + + <td>9</td> + </tr> + + <tr> + <td><emphasis role="bold">Default:</emphasis></td> + + <td><emphasis role="bold">0</emphasis></td> + </tr> + </informaltable> + + <para>So if a user enters a valid planet name a corresponding + number representing this particular planet will be sent to the + database. If the user enters an invalid string an error + message may be raised.</para> + + <para>In a GUI in many situations this may be better + accomplished by presenting the list of planets to choose from. + In this case a user has no chance to enter invalid or even + malicious code.</para> + </glossdef> + </glossentry> + </glosslist> + + <para>So we have an <quote>interceptor</quote> sitting between user + input fields and SQL generating code:</para> + + <figure xml:id="figInputFiltering"> + <title>Validating user input prior to dynamically composing SQL + statements.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/filtering.fig"/> + </imageobject> + </mediaobject> + </figure> + + <qandaset defaultlabel="qanda" xml:id="quandaentry_RegexpUse"> + <title>Using regular expressions in <xref linkend="glo_Java"/></title> + + <qandadiv> + <qandaentry> + <question> + <para>This exercise is a preparation for <xref + linkend="exercisefilterUserInput"/>. The aim is to deal with + regular expressions and to use them in + <xref linkend="glo_Java"/>. If + you don't know yet about regular expressions / pattern + matching you may want to read either of:</para> + + <itemizedlist> + <listitem> + <para><link + xlink:href="">Regular + expressions - An introduction</link></para> + </listitem> + + <listitem> + <para><link + xlink:href="">An + Introduction to Regular Expressions</link></para> + </listitem> + + <listitem> + <para><link + xlink:href="">Regular + Expression Tutorial</link></para> + </listitem> + </itemizedlist> + + <para>Complete the implementation of the following + skeleton:</para> + + <programlisting language="none">... +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +public static void main(String[] args) { + final String [] wordList = new String [] {"Eric", "126653BBb", "_login","some text"}; + final String [] regexpList = new String[] {"[A-K].*", "[^0-9]+.*", "_[a-z]+", ""}; + + for (final String word: wordList) { + for (final String regexp: regexpList) { + testMatch(word, regexp); + } + } +} + +/** + * Matching a given word by a regular expression. A log message is being + * written to stdout. + * + * Hint: The implementation is based on the explanation being given in the + * introduction to {@link Pattern} + * + * @param word This string will be matched by the subsequent argument. + * @param regexp The regular expression tested to match the previous argument. + * @return true if regexp matches word, false otherwise. + */ +public static boolean testMatch(final String word, final String regexp) { +.../* to be implemented by <emphasis role="bold">**YOU**</emphasis> */ +}</programlisting> + + <para>As being noted in the <xref linkend="glo_Java"/> + above you may want to read the documentation of class + <classname>java.util.regex.Pattern</classname>. The intended + output of the above application is:</para> + + <programlisting language="none">The expression '[A-K].*' matches 'Eric' +The expression '[^0-9]+.*' ... +...</programlisting> + </question> + + <answer> + <para>A possible implementation is given by + <classname>sda.regexp.RegexpPrimer</classname>.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + + <qandaset defaultlabel="qanda" xml:id="exercisefilterUserInput"> + <title>Input validation by regular expressions</title> + + <qandadiv> + <qandaentry> + <question> + <para>The application of <xref + linkend="sqlInjectDropTable"/> proved to be vulnerable to + SQL injection. Sanitize the two user input field's values to + prevent such behaviour.</para> + + <itemizedlist> + <listitem> + <para>Find appropriate regular expressions to check both + username and email. Some hints:</para> + + <glosslist> + <glossentry> + <glossterm>username</glossterm> + + <glossdef> + <para>Regarding SQL injection the <quote>;</quote> + character is among the most critical. You may want + to exclude certain special characters. This doesn't + harm since their presence in a user's name is likely + to be a typo rather then any sensitive input.</para> + </glossdef> + </glossentry> + + <glossentry> + <glossterm>email</glossterm> + + <glossdef> + <para>There are tons of <quote>ultimate</quote> + regular expressions available to check email + addresses. Remember that rather avoiding + <quote>wrong</quote> email addresses the present + task is to avoid SQL injection. So find a reasonable + one which may be too permissive regarding RFC email + syntax rules but sufficient to secure your + application.</para> + + <para>A concise definition of an email's syntax is + being given in <link + xlink:href="">RFC5322</link>. + Its implementation is beyond scope of the current + lecture. Moreover it is questionable whether E-mail + clients and mail transfer agents implement strict + RFC compliance.</para> + </glossdef> + </glossentry> + </glosslist> + + <para>Both regular expressions must cover the whole user + input from the beginning to the end. This can be + achieved by using <code>^ ... $</code>.</para> + </listitem> + + <listitem> + <para>The <xref linkend="glo_Java"/> + standard class + <classname>javax.swing.InputVerifier</classname> may + help you validating user input.</para> + </listitem> + + <listitem> + <para>The following screenshot may provide an idea for + GUI realization and user interaction in case of errors. + Of course the submit button's action should be disabled + in case of erroneous input. The user should receive a + helpful error message instead.</para> + + <figure xml:id="figInsertValidate"> + <title>Error message being presented to the + user.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Screen/insertValidate.screen.png"/> + </imageobject> + + <caption> + <para>In the current example the trailing + <quote>;</quote> within the E-Mail field is + invalid.</para> + </caption> + </mediaobject> + </figure> + </listitem> + </itemizedlist> + </question> + + <answer> + <para>Extending + <classname>javax.swing.InputVerifier</classname> allows us + to build a generic class to filter user text input by + arbitrary regular expressions:</para> + + <programlisting language="none">package sda.jdbc.intro.v1.sanitize; +... +public class RegexpVerifier extends InputVerifier { + + final Pattern syntaxPattern; + final JLabel validationLabel; + private boolean inputValid = false; + private final String errMsg; +... + public RegexpVerifier (final String regex, final JLabel validationLabel, final String errMsg) { + this.validationLabel = validationLabel; + this.errMsg = errMsg; + syntaxPattern = Pattern.compile(regex); + } + + @Override + public boolean verify(JComponent input) { + if (input instanceof JTextField) { + final String userInput = ((JTextField) input).getText(); + if (syntaxPattern.matcher(userInput).find()) { + validationLabel.setText(""); + inputValid = true; + } else { + validationLabel.setText(errMsg); + inputValid = false; + } + } + return inputValid; + } + public boolean inputIsValid () { + return inputValid; + } +}</programlisting> + + <para>Instances of + <classname>sda.jdbc.intro.v1.sanitize.RegexpVerifier</classname> + <coref linkend="emailVerifier"/> <coref + linkend="nameVerifier"/> may now be used to validate our two + input data fields <coref linkend="setNameValidation"/> + <coref linkend="setEmailValidation"/>. We put emphasis on + the changes with respect to + <classname>sda.jdbc.intro.v1.InsertPerson</classname>:</para> + + <programlisting language="none">package sda.jdbc.intro.v1.sanitize; +... +public class InsertPerson extends JFrame { + + final JTextField nameField = new JTextField(15); + final JLabel nameFieldValidationLabel <co xml:id="nameVerifier"/> = new JLabel(); + final RegexpVerifier nameFieldVerifier = new RegexpVerifier( + "^[^;'\"]+$", + nameFieldValidationLabel, + "No special characters"); + + final JTextField emailField = new JTextField(20); + final JLabel emailFieldValidationLabel <co xml:id="emailVerifier"/> = new JLabel(); + final RegexpVerifier emailFieldVerifier = + new RegexpVerifier("^[\\w\\-\\.\\_]+@[\\w\\-\\.]*[a-zA-Z]{2,4}$", + emailFieldValidationLabel, + "email not valid"); +... + public static void main(String[] args) throws SQLException { + InsertPerson app = new InsertPerson(); + app.setVisible(true); + } + public InsertPerson (){ +... + databaseFieldPanel.add(nameField); + <emphasis role="bold">nameFieldValidationLabel.setForeground(Color.RED); + databaseFieldPanel.add(nameFieldValidationLabel); + nameField.setInputVerifier(nameFieldVerifier);</emphasis> <co + xml:id="setNameValidation"/> + + databaseFieldPanel.add(new JLabel("E-mail:")); + databaseFieldPanel.add(emailField); + <emphasis role="bold">databaseFieldPanel.add(emailFieldValidationLabel); + emailFieldValidationLabel.setForeground(Color.RED); + emailField.setInputVerifier(emailFieldVerifier);</emphasis> <co + xml:id="setEmailValidation"/> + + insertButton.addActionListener(new ActionListener() { + @Override + public void actionPerformed(ActionEvent e) { + <emphasis role="bold">if (!nameFieldVerifier.inputIsValid() || !emailFieldVerifier.inputIsValid()) { + JOptionPane.showMessageDialog(null, "Invalid input value(s)"); + }</emphasis> else { +...</programlisting> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="sectPreparedStatements"> + <title><classname>java.sql.PreparedStatement</classname> + objects</title> + + <para>Sanitizing user input is an essential means to secure an + application. The <trademark + xlink:href="">JDBC</trademark> + standard however provides a mechanism being superior regarding the + purpose of protecting applications against SQL injection attacks. We + shed some light on our current mechanism sending SQL statements to a + database server:</para> + + <figure xml:id="sqlTransport"> + <title>SQL statements in <xref linkend="glo_Java"/> + applications get parsed at the database server</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/sqlTransport.fig"/> + </imageobject> + </mediaobject> + </figure> + + <para>This architecture raises two questions:</para> + + <orderedlist> + <listitem> + <para>What happens in case identical SQL statements are executed + repeatedly? This may happen inside a loop when thousands of + records with identical structure are being sent to a + database.</para> + </listitem> + + <listitem> + <para>Is this architecture adequate with respect to security + concerns?</para> + </listitem> + </orderedlist> + + <para>The first question is related to performance: Parsing + statements being identical despite the properties being contained + within is a waste of resources. We consider the transfer of records + between different databases:</para> + + <programlisting language="none">INSERT INTO Person VALUES ('Jim', '') +INSERT INTO Person VALUES ('Eve', '') +INSERT INTO Person VALUES ('Pete', '') +...</programlisting> + + <para>In this case it does not make sense to repeatedly parse + identical SQL statements. Using single <code>INSERT</code> + statements with multiple data records may not be an option when the + number of records grows.</para> + + <para>The second question is related to our current security topic: + The database server's interpreter my be so <quote>kind</quote> to + interpret an attacker's malicious code as well.</para> + + <para>Both topics are being addressed by + <classname>java.sql.PreparedStatement</classname> objects. Basically + these objects allow for separation of an SQL statements structure + from parameter values contained within. The scenario given in <xref + linkend="sqlTransport"/> may be implemented as:</para> + + <figure xml:id="sqlTransportPrepare"> + <title>Using <classname>java.sql.PreparedStatement</classname> + objects.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/sqlTransportPrepare.fig"/> + </imageobject> + </mediaobject> + </figure> + + <para>Prepared statements are an example for parameterized SQL + statements which exist in various programming languages. When using + <classname>java.sql.PreparedStatement</classname> instances we + actually have three distinct phases:</para> + + <orderedlist> + <listitem> + <para xml:id="exerciseGuiWritePrepared">Creating an instance of + <classname>java.sql.PreparedStatement</classname>. The SQL + statement possibly containing place holders gets parsed.</para> + </listitem> + + <listitem> + <para>Setting all placeholder values. This does not involve any + further SQL syntax parsing.</para> + </listitem> + + <listitem> + <para>Execute the statement.</para> + </listitem> + </orderedlist> + + <para>Steps 2. and 3. may be repeated as often as desired without + any re-parsing of SQL statements thus saving resources on the + database server side.</para> + + <para>Our introductory toy application <xref + linkend="figJdbcSimpleWrite"/> may be rewritten using + <classname>java.sql.PreparedStatement</classname> objects:</para> + + <programlisting language="none">sda.jdbc.intro.v1; +... +public class SimpleInsert { + + public static void main(String[] args) throws SQLException { + + final Connection conn = DriverManager.getConnection (... + + // Step 2: Create a PreparedStatement instance + final PreparedStatement pStmt = conn.prepareStatement( + "INSERT INTO Person VALUES(<emphasis role="bold">?, ?</emphasis>)");<co + xml:id="listPrepCreate"/> + + // Step 3a: Fill in desired attribute values + pStmt.setString(1, "Jim");<co xml:id="listPrepSet1"/> + pStmt.setString(2, "");<co xml:id="listPrepSet2"/> + + // Step 3b: Execute the desired INSERT + final int updateCount = pStmt.executeUpdate();<co xml:id="listPrepExec"/> + + // Step 4: Give feedback to the enduser + System.out.println("Successfully inserted " + updateCount + " dataset(s)"); + } +}</programlisting> + + <calloutlist> + <callout arearefs="listPrepCreate"> + <para>An instance of + <classname>java.sql.PreparedStatement</classname> is being + created. Notice the two question marks representing two place + holders for string values to be inserted in the next + step.</para> + </callout> + + <callout arearefs="listPrepSet1 listPrepSet2"> + <para>Fill in the two placeholder values being defined at <coref + linkend="listPrepCreate"/>.</para> + + <caution> + <para>Since half the world of programming folks will index a + list of n elements starting from 0 to n-1, <trademark + xlink:href="">JDBC</trademark> + apparently counts from 1 to n. Working with <trademark + xlink:href="">JDBC</trademark> + would have been too easy otherwise.</para> + </caution> + </callout> + + <callout arearefs="listPrepExec"> + <para>Execute the beast! Notice the empty parameter list. No SQL + is required since we already prepared it in <coref + linkend="listPrepCreate"/>.</para> + </callout> + </calloutlist> + + <para>The problem of SQL injection disappears completely when using + <classname>java.sql.PreparedStatement</classname> instances. An + attacker may safely enter offending strings like:</para> + + <programlisting language="none">Jim', '');DROP TABLE Person;INSERT INTO Person VALUES('Joe</programlisting> + + <para>The above string will be taken <quote>as is</quote> and thus + simply becomes part of the database server's content.</para> + + <qandaset defaultlabel="qanda" xml:id="exerciseSqlInjectPrepare"> + <title>Prepared Statements to keep the barbarians at the + gate</title> + + <qandadiv> + <qandaentry> + <question> + <para>In <xref linkend="sqlInjectDropTable"/> we found our + implementation in <xref linkend="exerciseGuiWriteTakeTwo"/> + to be vulnerable with respect to SQL injection. Rather than + sanitizing user input you shall use + <classname>java.sql.PreparedStatement</classname> objects to + secure the application.</para> + </question> + + <answer> + <para>Due to our separation of GUI and persistence handling + we only need to re-implement + <classname>sda.jdbc.intro.sqlinject.PersistenceHandler</classname>. + We have to replace <classname>java.sql.Statement</classname> + by <classname>java.sql.PreparedStatement</classname> + instances. A possible implementation is + <classname>sda.jdbc.intro.v1.prepare.PersistenceHandler</classname>. + We may now safely enter offending strings like:</para> + + <programlisting language="none">Jim', '');DROP TABLE Person;INSERT INTO Person VALUES('Joe</programlisting> + + <para>This time the input value is taken <quote>as + is</quote> and yields the following error message:</para> + + <informalfigure> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Screen/sqlInjectPrepare.screen.png"/> + </imageobject> + </mediaobject> + </informalfigure> + + <para>The offending string exceeds the length of the + attribute <code>name</code> within the database table + <code>Person</code>. We may enlarge this value to allow the + <code>INSERT</code> operation:</para> + + <programlisting language="none">CREATE TABLE Person ( + name char(<emphasis role="bold">80</emphasis>) <emphasis role="bold">-- a little bit longer --</emphasis> + ,email CHAR(20) UNIQUE +);</programlisting> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + + <para>We may have followed the track of test-driven development. In + that case we would have written tests before actually implementing + our application. In the current lecture we will do this the other + way round in the following exercise. The idea is to assure software + quality when fixing bugs or extending an application.</para> + + <para>The subsequent exercise requires the <productname + xlink:href="">TestNG</productname> + plugin for Eclipse to be installed. This should already be the case + both in the MI exercise classrooms and in the Virtualbox image + provided at <uri + xlink:href=""></uri>. + If you use a private Eclipse installation you may want to follow + <xref linkend="testngInstall"/>.</para> + + <qandaset defaultlabel="qanda" xml:id="quandaentry_DbLayerUnitTest"> + <title>Testing + <classname>sda.jdbc.intro.v1.PersistenceHandler</classname> using + <productname + xlink:href="">TestNG</productname></title> + + <qandadiv> + <qandaentry> + <question> + <para>Read <xref linkend="chapUnitTesting"/>. Then + test:</para> + + <itemizedlist> + <listitem> + <para>Proper behaviour when opening and closing + connections.</para> + </listitem> + + <listitem> + <para>Proper behavior when inserting data</para> + </listitem> + + <listitem> + <para>Expected behaviour when entering duplicate values + violating integrity constraints. Look for error messages + as well.</para> + </listitem> + </itemizedlist> + + <para>You may write code to initialize the database state + appropriately prior to start tests.</para> + </question> + + <answer> + <para><productname + xlink:href="">TestNG</productname> may be + directed by + <classname>sda.jdbc.intro.v1.prepare.PersistenceHandlerTest</classname>.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + </section> + + <section xml:id="jdbcRead"> + <title>Read Access</title> + + <para>So far we've sent records to a database server. Applications + however need both directions: Pushing data to a Server and receiving + data as well. The overall process looks like:</para> + + <figure xml:id="jdbcReadWrite"> + <title>Server / client object's life cycle</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/jdbcReadWrite.fig"/> + </imageobject> + </mediaobject> + </figure> + + <para>So far we've only covered the second (<code>UPDATE</code>) part + of this picture. Reading objects from a database server into a + client's (transient) address space requires a container object to hold + the data in question. Though <xref linkend="glo_Java"/> offers + standard container interfaces like + <classname>java.util.List</classname> the <trademark + xlink:href="">JDBC</trademark> + standard has created separate specifications like + <classname>java.sql.ResultSet</classname>. Instances of + <classname>java.sql.ResultSet</classname> will hold transient copies + of (database) objects. The next figure outlines the basic + approach:</para> + + <figure xml:id="figJdbcRead"> + <title>Reading data from a database server.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/jdbcread.fig" scale="65"/> + </imageobject> + </mediaobject> + </figure> + + <para>We take an example. Suppose our database contains a table of our + friends' nicknames and their respective birth dates:</para> + + <table border="1" xml:id="figRelationFriends"> + <caption>Names and birth dates of friends.</caption> + + <tr> + <td><programlisting language="none">CREATE TABLE Friends ( + id INTEGER NOT NULL PRIMARY KEY + ,nickname char(10) + ,birthdate DATE +);</programlisting></td> + + <td><programlisting language="none">INSERT INTO Friends VALUES + (1, 'Jim', '1991-10-10') + ,(2, 'Eve', '2003-05-24') + ,(3, 'Mick','2001-12-30') + ;</programlisting></td> + </tr> + </table> + + <para>Following the outline in <xref linkend="figJdbcRead"/> we may + access our data by:</para> + + <figure xml:id="listingJdbcRead"> + <title>Accessing relational data</title> + + <programlisting language="none">package sda.jdbc.intro; +... +public class SimpleRead { + + public static void main(String[] args) throws SQLException { + + // Step 1: Open a connection to the database server + final Connection conn = DriverManager.getConnection ( + DbProps.getString("PersistenceHandler.jdbcUrl"), + DbProps.getString("PersistenceHandler.username"), + DbProps.getString("PersistenceHandler.password")); + + // Step 2: Create a Statement instance + final Statement stmt = conn.createStatement(); + + <emphasis role="bold">// Step 3: Creating the client side JDBC container holding our data records</emphasis> + <emphasis role="bold">final ResultSet data = stmt.executeQuery("SELECT * FROM Friends");</emphasis> <co + linkends="listingJdbcRead-1" xml:id="listingJdbcRead-1-co"/> + + <emphasis role="bold">// Step 4: Dataset iteration + while ( {</emphasis> <co linkends="listingJdbcRead-2" + xml:id="listingJdbcRead-2-co"/> + <emphasis role="bold">System.out.println(data.getInt("id")</emphasis> <co + linkends="listingJdbcRead-3" xml:id="listingJdbcRead-3-co"/> + <emphasis role="bold">+ ", " + data.getString("nickname")</emphasis> <co + linkends="listingJdbcRead-3" xml:id="listingJdbcRead-4-co"/> + <emphasis role="bold">+ ", " + data.getString("birthdate"));</emphasis> <co + linkends="listingJdbcRead-3" xml:id="listingJdbcRead-5-co"/> + } + } +}</programlisting> + </figure> + + <para>The marked code segment above shows difference with respect to + our data insertion application + <classname>sda.jdbc.intro.SimpleInsert</classname>. Some remarks are + in order:</para> + + <calloutlist> + <callout arearefs="listingJdbcRead-1-co" xml:id="listingJdbcRead-1"> + <para>As being mentioned in the introduction to this section the + <trademark + xlink:href="">JDBC</trademark> + standard comes with its own container interface rather than + <classname>java.util.List</classname> or similar.</para> + </callout> + + <callout arearefs="listingJdbcRead-2-co" xml:id="listingJdbcRead-2"> + <para>Calling <link + xlink:href="">next()</link> + prior to actually accessing data on the client side is mandatory! + The <link + xlink:href="">next()</link> + method places the internal iterator to the first element of our + dataset if not empty. Follow the link address and **read** the + documentation.</para> + </callout> + + <callout arearefs="listingJdbcRead-3-co listingJdbcRead-4-co listingJdbcRead-5-co" + xml:id="listingJdbcRead-3"> + <para>The access methods have to be chosen according to matching + types. An overview of database/<xref linkend="glo_Java"/> type + mappings is being given in <uri + xlink:href=""></uri>.</para> + </callout> + </calloutlist> + + <para>We now present a series of exercises thereby exploring important + aspects of <xref linkend="glo_JDBC"/> read access.</para> + + <section xml:id="sectGetterTypeConversion"> + <title>Getter methods and type conversion</title> + + <qandaset defaultlabel="qanda" + xml:id="quandaentry_JdbcTypeConversion"> + <qandadiv> + <qandaentry> + <question> + <para>Apart from type mappings the <xref + linkend="glo_JDBC"/> access methods like <link + xlink:href="">getString()</link> + may also be used for type conversion. Modify <xref + linkend="listingJdbcRead"/> by:</para> + + <itemizedlist> + <listitem> + <para>Read the database attribute <code>id</code> by + <link + xlink:href="">getString(String)</link>.</para> + </listitem> + + <listitem> + <para>Read the database attribute nickname by <link + xlink:href="">getInt(String)</link>.</para> + </listitem> + </itemizedlist> + + <para>What do you observe?</para> + </question> + + <answer> + <para>Modifying our iteration loop:</para> + + <programlisting language="none">// Step 4: Dataset iteration +while ( { + System.out.println(data.<emphasis role="bold">getString</emphasis>("id") <co + linkends="jdbcReadWrongType-1" + xml:id="jdbcReadWrongType-1-co"/> + + ", " + data.<emphasis role="bold">getInt</emphasis>("nickname") <co + linkends="jdbcReadWrongType-2" + xml:id="jdbcReadWrongType-2-co"/> + + ", " + data.getString("birthdate")); +}</programlisting> + + <para>We observe:</para> + + <calloutlist> + <callout arearefs="jdbcReadWrongType-1-co" + xml:id="jdbcReadWrongType-1"> + <para>Calling <link + xlink:href="">getString()</link> + for a database attribute of type INTEGER does not cause + any trouble: The value gets silently converted to a + string value.</para> + </callout> + + <callout arearefs="jdbcReadWrongType-2-co" + xml:id="jdbcReadWrongType-2"> + <para>Calling <link + xlink:href="">getInt(String)</link> + for the database field of type CHAR yields an (expected) + Exception:</para> + </callout> + </calloutlist> + + <programlisting language="none">Exception in thread "main" java.sql.SQLException: Invalid value for getInt() - 'Jim' + at com.mysql.jdbc.SQLError.createSQLException( +...</programlisting> + + <para>We may however provide <quote>compatible</quote> data + records:</para> + + <programlisting language="none">DELETE FROM Friends; +INSERT INTO Friends VALUES (1, <emphasis role="bold">'31'</emphasis>, '1991-10-10');</programlisting> + + <para>This time our application executes perfectly + well:</para> + + <programlisting language="none">1, 31, 1991-10-10</programlisting> + + <para>Conclusion: The <xref linkend="glo_JDBC"/> driver + performs a conversion from a string type to an integer + similar like the <link + xlink:href="">parseInt(String)</link> + method.</para> + + <para>The next series of exercises aims on a more powerful + implementation of our person data insertion application in + <xref linkend="exerciseInsertLoginCredentials"/>.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="sectHandlingNullValues"> + <title>Handling NULL values.</title> + + <qandaset defaultlabel="qanda" xml:id="quandaentry_HandlingNull"> + <qandadiv> + <qandaentry> + <question> + <para>The attribute <code>birthday</code> in our database + table Friends allows <code>NULL</code> values:</para> + + <programlisting language="none">INSERT INTO Friends VALUES + (1, 'Jim', '1991-10-10') + ,(2, <emphasis role="bold"> NULL</emphasis>, '2003-5-24') + ,(3, 'Mick', '2001-12-30');</programlisting> + + <para>Starting our current application yields:</para> + + <programlisting language="none">1, Jim, 1991-10-10 +2, null, 2003-05-24 +3, Mick, 2001-12-30</programlisting> + + <para>This might be confuses with a person having the + nickname <quote>null</quote>. Instead we would like to + have:</para> + + <programlisting language="none">1, Jim, 1991-10-10 +2, -Name unknown- , 2003-05-24 +3, Mick, 2001-12-30</programlisting> + + <para>Extend the current code of + <classname>sda.jdbc.intro.SimpleRead</classname> to produce + the above result in case of nickname <code>NULL</code> + values.</para> + + <para>Hint: Read the documentation of <link + xlink:href="">wasNull()</link>.</para> + </question> + + <answer> + <para>A possible implementation is being given in + <classname>sda.jdbc.intro.v1.SimpleRead</classname>.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="sectUserAuthStrategy"> + <title>A user authentication <quote>strategy</quote></title> + + <qandaset defaultlabel="qanda" xml:id="exerciseInsecureAuth"> + <qandadiv> + <qandaentry> + <question> + <para>Our current application for entering + <code>Person</code> records lacks authentication: A user + simply connects to the database using credentials being hard + coded in a properties file. A programmer suggests to + implement authentication based on the following extension of + the <code>Person</code> table:</para> + + <programlisting language="none">CREATE TABLE Person ( + name char(80) NOT NULL + ,email CHAR(20) NOT NULL UNIQUE + ,login CHAR(10) UNIQUE -- login names must be unique -- + ,password CHAR(20) +);</programlisting> + + <para>On clicking <quote>Connect</quote> a user may enter + his login name and password, <quote>fred</quote> and + <quote>12345678</quote> in the following example:</para> + + <figure xml:id="figLogin"> + <title>Login credentials for database connection</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Screen/login.screen.png" + scale="90"/> + </imageobject> + </mediaobject> + </figure> + + <para>Based on these input values the following SQL query is + being executed by a + <classname>java.sql.Statement</classname> object:</para> + + <programlisting language="none">SELECT * FROM Person WHERE login='<emphasis + role="bold">fred</emphasis>' and password = '<emphasis + role="bold">12345678</emphasis>'</programlisting> + + <para>Since the login attribute is UNIQUE we are sure to + receive either 0 or 1 dataset. Our programmer proposes to + grant login if the query returns at least one + dataset.</para> + + <para>Discuss this implementation sketch with a colleague. + Do you think this is a sensible approach? <emphasis + role="bold">Write down</emphasis> your results.</para> + </question> + + <answer> + <para>The approach is essentially unusable due to severe + security implications. Since it is based on + <classname>java.sql.Statement</classname> rater than on + <classname>java.sql.PreparedStatement</classname> objects it + is vulnerable to SQL injection attacks. A user my enter the + following password value in the GUI:</para> + + <programlisting language="none">sd' OR '1' = '1</programlisting> + + <para>Based on the login name <quote>fred</quote> the + following SQL string is being crafted:</para> + + <programlisting language="none">SELECT * FROM Person WHERE login='fred' and password = 'sd' OR <emphasis + role="bold">'1' = '1'</emphasis>;</programlisting> + + <para>Since the WHERE clause's last component always + evaluates to true, all objects from the <code>Person</code> + relation are returned thus permitting login.</para> + + <para>The implementation approach suffers from a second + deficiency: The passwords are stored in clear text. If an + attacker gains access to the <code>Person</code> table he'll + immediately retrieve the passwords of all users. This + problem can be solved by storing hash values of passwords + rather than the clear text values themselves.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="sectPasswordsHashed"> + <title>Passwords and hash values</title> + + <qandaset defaultlabel="qanda" xml:id="exerciseHashTraining"> + <qandadiv> + <qandaentry> + <question> + <para>In exercise <xref linkend="exerciseInsecureAuth"/> we + discarded the idea of clear text passwords in favour of + password hashes. In order to avoid Rainbow cracking so + called salted hashes are superior. You should read <uri + xlink:href=""></uri> + for overview purposes. The article contains further + references on the bottom of the page.</para> + + <para>With respect to an implementation <uri + xlink:href=""></uri> + provides a simple example for:</para> + + <itemizedlist> + <listitem> + <para>Creating a salted hash from a given password + string.</para> + </listitem> + + <listitem> + <para>Verify if a hash string matches a given clear text + password.</para> + </listitem> + </itemizedlist> + + <para>The example uses an external library. On <productname + xlink:href="">Ubuntu</productname> + Linux this may be installed by issuing + <command>aptitude</command> <option>install</option> + <option>libcommons-codec-java</option>. On successful + install the file + <filename>/usr/share/java/commons-codec-1.5.jar</filename> + may be appended to your <envar>CLASSPATH</envar>.</para> + + <para>You may as well use <uri + xlink:href=""></uri> + as a starting point. This example works standalone without + needing an external library. Note: Tis example produces + different (incompatible) hash values.</para> + + <para>Create a simple main() method to experiment with the + two class methods.</para> + </question> + + <answer> + <para>Starting from <uri + xlink:href=""></uri> + we create a slightly modified class + <classname>sda.jdbc.intro.auth.HashProvider</classname> + offering both hash providing <coref + linkend="hashProviderMethod"/> and verifying <coref + linkend="hashVerifyMethod"/> methods:</para> + + <programlisting language="none">package sda.jdbc.intro.auth; +... +public class HashProvider { +... + /** Computes a salted PBKDF2 hash of given plaintext password + suitable for storing in a database. */ + public static <emphasis role="bold">String getSaltedHash</emphasis> <co + xml:id="hashProviderMethod"/>(char [] password) { + byte[] salt; + try { + salt = SecureRandom.getInstance("SHA1PRNG").generateSeed(saltLen); + // store the salt with the password + return Base64.encodeBase64String(salt) + "$" + hash(password, salt); + } catch (NoSuchAlgorithmException e) { + e.printStackTrace(); + } + System.exit(1); + return null; + } + + /** Checks whether given plaintext password corresponds + to a stored salted hash of the password. */ + public static <emphasis role="bold">boolean check</emphasis> <co + xml:id="hashVerifyMethod"/>(char[] password, String stored){ + String[] saltAndPass = stored.split("\\$"); + if (saltAndPass.length != 2) + return false; + String hashOfInput = hash(password, Base64.decodeBase64(saltAndPass[0])); + return hashOfInput.equals(saltAndPass[1]); + } +...}</programlisting> + + <para>We may test the two class methods + <methodname>sda.jdbc.intro.auth.HashProvider.getSaltedHash(char[])</methodname>(...) + and + <methodname>sda.jdbc.intro.auth.HashProvider.check(char[],String)</methodname> + by a separate driver class. Notice the <quote>$</quote> sign + <coref linkend="saltPwhashSeparator"/> separating salt and + password hash:</para> + + <programlisting language="none">package sda.jdbc.intro.auth; + +public class TestHashProvider { + + public static void main(String [] args) throws Exception { + final char [] clearText = {'s', 'e', 'c'}; + final String hash = <emphasis role="bold">HashProvider.getSaltedHash(clearText)</emphasis>; + System.out.println("Hash:" + hash); + if (HashProvider.check(clearText, <co + xml:id="saltPwhashSeparator"/> + "<emphasis role="bold">HwX2DkuYiwp7xogm3AGndza8DKRVvCMntxRvCrCGFPw=</emphasis>$<emphasis + role="bold">6Ix11yHNB4uPZuF2IQYxVV/MYragJwTDE33OIFR9a24=</emphasis>")) { + System.out.println("hash matches"); + } else { + System.out.println("hash does not match"); ...</programlisting> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="guiAuthenticateTheRealMcCoy"> + <title>Gui authentication: The real McCoy</title> + + <qandaset defaultlabel="qanda" + xml:id="exerciseInsertLoginCredentials"> + <qandadiv> + <qandaentry> + <question> + <para>We now implement a refined version to enter + <code>Person</code> records based on the solutions of two + related exercises:</para> + + <glosslist> + <glossentry> + <glossterm><xref + linkend="exercisefilterUserInput"/></glossterm> + + <glossdef> + <para>Avoiding SQL injection by sanitizing user + input</para> + </glossdef> + </glossentry> + + <glossentry> + <glossterm><xref + linkend="exerciseSqlInjectPrepare"/></glossterm> + + <glossdef> + <para>Avoiding SQL injection by using + <classname>java.sql.PreparedStatement</classname> + objects.</para> + </glossdef> + </glossentry> + </glosslist> + + <para>A better solution should combine both techniques. + Non-vulnerability a basic requirement. Checking an E-Mail + for minimal conformance is an added value.</para> + + <para>In order to address authentication the relation Person + has to be extended appropriately. The GUI needs two + additional fields for login name and password as well. The + following video demonstrates the intended behaviour:</para> + + <figure xml:id="videoConnectAuth"> + <title>Intended usage behaviour for insertion of data + records.</title> + + <mediaobject> + <videoobject> + <videodata fileref="Ref/Video/connectauth.mp4"/> + </videoobject> + </mediaobject> + </figure> + + <para>Don't forget to use password hashes like those from + <xref linkend="exerciseHashTraining"/>. Due to their length + you may want to consider the data type + <code>TEXT</code>.</para> + </question> + + <answer> + <para>In comparison to earlier versions it does make sense + to add some internal container structures. First we note, + that each GUI input field requires:</para> + + <itemizedlist> + <listitem> + <para>A label like <quote>Enter password</quote>.</para> + </listitem> + + <listitem> + <para>A corresponding field object to hold user entered + input.</para> + </listitem> + + <listitem> + <para>A validator checking for correctness of entered + data.</para> + </listitem> + + <listitem> + <para>A label or text field for warning messages in case + of invalid user input.</para> + </listitem> + </itemizedlist> + + <para>First we start by grouping label <coref + linkend="uiuLabel"/>, input field's verifier <coref + linkend="uiuVerifier"/> and the error message label <coref + linkend="uiuErrmsg"/> in + <classname>sda.jdbc.intro.auth.UserInputUnit</classname>:</para> + + <programlisting language="none">package sda.jdbc.intro.auth; +... +public class UserInputUnit { + + final JLabel label; <co xml:id="uiuLabel"/> + final InputVerifierNotify verifier; <co xml:id="uiuVerifier"/> + final JLabel errorMessage; <co xml:id="uiuErrmsg"/> + + public UserInputUnit(final String guiText, final InputVerifierNotify verifier) { + this.label = new JLabel(guiText); + this.verifier = verifier; + errorMessage = new JLabel(); + } ...</programlisting> + + <para>The actual GUI text field is being defined <coref + linkend="verfierGuiField"/> in class + <classname>sda.jdbc.intro.auth.InputVerifierNotify</classname>:</para> + + <programlisting language="none">package sda.jdbc.intro.auth; +... +public abstract class InputVerifierNotify extends InputVerifier { + + protected final String errorMessage; + public final JLabel validationLabel; + public final JTextField field; <co xml:id="verfierGuiField"/> + + public InputVerifierNotify(final JTextField field, final String errorMessage) { ...</programlisting> + + <para>We need two field verifier classes being derived from + <classname>sda.jdbc.intro.auth.InputVerifierNotify</classname>:</para> + + <glosslist> + <glossentry> + <glossterm><classname>sda.jdbc.intro.auth.RegexpVerifier</classname></glossterm> + + <glossdef> + <para>This one is well known from earlier versions and + is used to validate text input fields by regular + expressions.</para> + </glossdef> + </glossentry> + + <glossentry> + <glossterm><classname>sda.jdbc.intro.auth.InputVerifierNotify</classname></glossterm> + + <glossdef> + <para>This verifier class is responsible for comparing + our two password fields to have identical + values.</para> + </glossdef> + </glossentry> + </glosslist> + + <para>All these components get assembled in + <classname>sda.jdbc.intro.auth.InsertPerson</classname>. We + remark some important points:</para> + + <programlisting language="none">package sda.jdbc.intro.auth; +... +public class InsertPerson extends JFrame { +... // GUI attributes for user input + final UserInputUnit name = <co linkends="listingInsertUserAuth-1" + xml:id="listingInsertUserAuth-1-co"/> + new UserInputUnit( + "Name", + new RegexpVerifier(new JTextField(15), "^[^;'\"]+$", "No special characters allowed")); + + // We need a reference to the password field to avoid + // casting from JTextField later. + private final JPasswordField passwordField = new JPasswordField(10); <co + linkends="listingInsertUserAuth-2" + xml:id="listingInsertUserAuth-2-co"/> + private final UserInputUnit password = + new UserInputUnit( + "Password", + new RegexpVerifier(passwordField, "^.{6,20}$", "length from 6 to 20 characters")); +... + private final UserInputUnit passwordRepeat = + new UserInputUnit( + "repeat pass.", + new EqualValueVerifier <co linkends="listingInsertUserAuth-3" + xml:id="listingInsertUserAuth-3-co"/> (new JPasswordField(10), passwordField, "Passwords do not match")); + + private final UserInputUnit [] userInputUnits = <co + linkends="listingInsertUserAuth-4" + xml:id="listingInsertUserAuth-4-co"/> + {name, email, login, password, passwordRepeat}; +... + private void userLoginDialog() {...} +... + public InsertPerson (){ +... + databaseFieldPanel.setLayout(new GridLayout(0, 3)); //Third column for validation label + add(databaseFieldPanel); + + for (UserInputUnit unit: userInputUnits) { <co + linkends="listingInsertUserAuth-5" + xml:id="listingInsertUserAuth-5-co"/> + databaseFieldPanel.add(unit.label); + databaseFieldPanel.add(unit.verifier.field); + databaseFieldPanel.add(unit.verifier.validationLabel); + } + insertButton.addActionListener(new ActionListener() { + @Override public void actionPerformed(ActionEvent e) { + if (inputValuesAllValid()) { + if (persistenceHandler.add( <co + linkends="listingInsertUserAuth-6" + xml:id="listingInsertUserAuth-6-co"/> + name.getText(), + email.getText(), + login.getText(), + passwordField.getPassword())) { + clearMask(); +...} + private void clearMask() { <co linkends="listingInsertUserAuth-7" + xml:id="listingInsertUserAuth-7-co"/> + for (UserInputUnit unit: userInputUnits) { + unit.verifier.field.setText(""); + unit.verifier.clear(); + } + } + private boolean inputValuesAllValid() {<co + linkends="listingInsertUserAuth-8" + xml:id="listingInsertUserAuth-8-co"/> + for (UserInputUnit unit: userInputUnits) { + if (!unit.verifier.verify(unit.verifier.field)){ + return false; + } + } + return true; + } +}</programlisting> + + <calloutlist> + <callout arearefs="listingInsertUserAuth-1-co" + xml:id="listingInsertUserAuth-1"> + <para>All GUI related stuff for entering a user's + name</para> + </callout> + + <callout arearefs="listingInsertUserAuth-2-co" + xml:id="listingInsertUserAuth-2"> + <para>Password fields need special treatment: + <code>getText()</code> is superseded by + <code>getPassword()</code>. In order to avoid casts from + <classname>javax.swing.JTextField</classname> to + <classname>javax.swing.JPasswordField</classname> we + simply keep an extra reference.</para> + </callout> + + <callout arearefs="listingInsertUserAuth-3-co" + xml:id="listingInsertUserAuth-3"> + <para>In order to check both password fields for + identical values we need a different validator + <classname>sda.jdbc.intro.auth.EqualValueVerifier</classname> + expecting both password fields in its + constructor.</para> + </callout> + + <callout arearefs="listingInsertUserAuth-4-co" + xml:id="listingInsertUserAuth-4"> + <para>All 5 user input elements get grouped by an array. + This allows for iterations like in <coref + linkend="listingInsertUserAuth-7-co"/> or <coref + linkend="listingInsertUserAuth-8-co"/>.</para> + </callout> + + <callout arearefs="listingInsertUserAuth-5-co" + xml:id="listingInsertUserAuth-5"> + <para>Adding all GUI elements to the base pane in a + loop.</para> + </callout> + + <callout arearefs="listingInsertUserAuth-6-co" + xml:id="listingInsertUserAuth-6"> + <para>Providing user entered values to the persistence + provider.</para> + </callout> + + <callout arearefs="listingInsertUserAuth-7-co" + xml:id="listingInsertUserAuth-7"> + <para>Whenever a dataset has been successfully sent to + the database we have to clean our GUI to possibly enter + another record.</para> + </callout> + + <callout arearefs="listingInsertUserAuth-8-co" + xml:id="listingInsertUserAuth-8"> + <para>Thanks to our grouping aggregation of individual + input GUI field validation states becomes easy.</para> + </callout> + </calloutlist> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="sectArchitectSecurityConsiderations"> + <title>Architectural security considerations</title> + + <qandaset defaultlabel="qanda" xml:id="quandaentry_ArchSecurity"> + <qandadiv> + <qandaentry> + <question> + <para>In <xref linkend="exerciseInsertLoginCredentials"/> we + achieved end user credential protection. How about the + overall application security? Provide improvement proposals + if appropriate. Hint: Consider the way credentials are being + supplied.</para> + </question> + + <answer> + <para>Connecting the client to our database server solely + depends on credentials <coref + linkend="databaseUserHdmPassword"/> being stored in a + properties file + <filename></filename>:</para> + + <programlisting language="none">PersistenceHandler.jdbcUrl=jdbc:mysql://localhost:3306/hdm +PersistenceHandler.username=hdmuser <co xml:id="databaseUserHdmUsername"/> +PersistenceHandler.password=<emphasis role="bold">XYZ</emphasis> <co + xml:id="databaseUserHdmPassword"/></programlisting> + + <para>This properties file is user accessible and contains + the password in clear text. Arbitrary applications + connecting to the database server using this account do have + all permissions being granted to <code>hdmuser</code> <coref + linkend="databaseUserHdmUsername"/>. In order for our + application to work correctly the set of granted permissions + contains at least inserting datasets. Thus new users e.g. + <code>smith</code> including credentials may be inserted. + Afterwards the original application can be started by + logging in as <code>smith</code>.</para> + + <para>Conclusion: The current application architecture is + seriously flawed with respect to security.</para> + + <para>Rather then using a common database account + <code>hdmuser</code> we may configure per-user accounts on + the database server having individual user credentials. This + way user credentials are no longer stored in our + <code>Person</code> table but are being managed by the + database server's user management and privilege facilities. + This completely avoids storing credentials on the client + side.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="sectRelationadatal2Xml"> + <title>Reversing <xref linkend="glo_XML"/> to Rdbms</title> + + <qandaset defaultlabel="qanda" xml:base="qandaRelationaldata2Xml" + xml:id="qandaRelationaldata2Xml"> + <qandadiv> + <qandaentry> + <question> + <para>Reverse exercise <xref + linkend="qandaXmldata2relational"/> to read Rdbms data via + <xref linkend="glo_JDBC"/> and export corresponding XML data + using Jdom.</para> + </question> + + <answer> + <annotation role="make"> + <para role="eclipse">P/rdbms2catalog</para> + </annotation> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="sda1SaxRdbms"> + <title>Generating HTML from XML and Rdbms data using SAX and <xref + linkend="glo_JDBC"/>.</title> + + <qandaset defaultlabel="qanda" xml:id="exercise_saxAttrib"> + <qandadiv> + <qandaentry xml:id="saxRdbms"> + <question> + <para>Implement the example given in <xref + linkend="saxRdbmsAccessPrinciple"/> to produce the output + sketched in <xref linkend="saxPriceOut"/>. You may start by + implementing <emphasis>and testing</emphasis> the following + methods of a RDBMS interfacing class using <xref + linkend="glo_JDBC"/>:</para> + + <programlisting language="none">package sax.rdbms; + +public class RdbmsAccess { + + public void connect(final String host, final int port, + final String userName, final String password) { + // <emphasis role="bold">open connection to a database</emphasis> + } + public String readPrice(final String articleNumber) { + return "0"; // <emphasis role="bold">To be implemented as access to a ResultSet object</emphasis> + } + public void close() { + // <emphasis role="bold">close database connection</emphasis> + } +}</programlisting> + + <para>You may find it helpful to write a small testbed for + the RDBMS access functionality prior to integrate it into + your <acronym + xlink:href="">SAX</acronym> + application producing HTML output.</para> + </question> + + <answer> + <para>We start by creating a suitable RDBMS Table:</para> + + <programlisting language="none">CREATE SCHEMA AUTHORIZATION midb2 +CREATE TABLE Product( + orderNo CHAR(10) NOT NULL PRIMARY KEY + ,price DECIMAL (9,2) NOT NULL +)</programlisting> + + <para>Next we feed some toy data:</para> + + <programlisting language="none">INSERT INTO Product VALUES('x-223', 330.20); +INSERT INTO Product VALUES('w-124', 110.40);</programlisting> + + <para>Now we implement our RDBMS access class:</para> + + <programlisting language="none">package dom.xsl; +... +public class DbAccess { + + public void connect(final String jdbcUrl, + final String userName, final String password) { + try { + conn = DriverManager.getConnection(jdbcUrl, userName, password); + priceQuery = conn.prepareStatement(sqlPriceQuery); + } catch (SQLException e) { + System.err.println("Unable to open connection to database:" + e);} + } + public String readPrice(final String articleNumber) { + String result; + try { + priceQuery.setString(1, articleNumber); + final ResultSet rs = priceQuery.executeQuery(); + if ( { + result = rs.getString("price"); + } else { + result = "No price available for article '" + articleNumber + "'"; + } + } catch (SQLException e) { + result = "Error reading price for article '" + articleNumber + "':" + e; + } + return result; + } + public void close() { + try {conn.close();} catch (SQLException e) { + System.err.println("Error closing database connection:" + e); + } + } + static { + try { Class.forName(""); + } catch (ClassNotFoundException e) { + System.err.println("Unable to register Driver:" + e);} + } + private static final String sqlPriceQuery = + "SELECT price FROM Product WHERE orderNo = ?"; + private PreparedStatement priceQuery = null; + private Connection conn = null; +}</programlisting> + + <para>This access layer may be tested independently from + handling catalog instances:</para> + + <programlisting language="none">package dom/xsl; + +public class DbAccessDriver { + + public static void main(String[] args) { + final DbAccess dbaccess = new DbAccess(); + dbaccess.connect("jdbc:db2://", + "midb2", "password"); + System.out.println(dbaccess.readPrice("x-223")); + System.out.println(dbaccess.readPrice("..aaargh!")); + dbaccess.close(); + } +}</programlisting> + + <para>If the above test succeeds we may embed the RDBMS + access layer into our The <acronym + xlink:href="">SAX</acronym> + handler:</para> + + <programlisting language="none">package sax.rdbms; +... +public class HtmlEventHandler extends DefaultHandler{ + public void startDocument() { + dbaccess.connect("jdbc:db2://", + "midb2", "password"); + System.out.println("<html><head><title>Catalog</title></head>"); + } + public void endDocument() { + System.out.println("</html>"); + dbaccess.close(); + } + public void startElement(String namespaceUri, String localName, + String rawName, Attributes attrs){ + if (rawName.equals("catalog")){ + System.out.println("<body><H1>A catalog</H1>" + +"<table border='1'><tbody>"); + System.out.println("<tr><th>Order number</th>\n" + + "<th>Price</th>\n" + +" <th>Product</th></tr>"); + } else if (rawName.equals("item")){ + final String orderNo = attrs.getValue("orderNo"); + System.out.print("<tr><td>" + orderNo + + "</td>\n<td>" + dbaccess.readPrice(orderNo) + + "</td>\n<td>"); + } else { + System.err.println("Element '" + rawName + "' unknown"); + } + } + public void endElement(String namespaceUri, String localName, + String rawName) { + if (rawName.equals("catalog")){ + System.out.println("</tbody></table>"); + } else if (rawName.equals("item")){ + System.out.println("</td></tr>\n"); + } + } + public void characters(char[] ch, int start, int length) { + System.out.print(new String(ch, start, length)); + } + private DbAccess dbaccess = new DbAccess(); +}</programlisting> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + </section> + </section> + </chapter> + diff --git a/Sda1/prerequisites.xml b/Sda1/prerequisites.xml new file mode 100644 index 000000000..7c7f3235f --- /dev/null +++ b/Sda1/prerequisites.xml @@ -0,0 +1,753 @@ +<?xml version="1.0" encoding="UTF-8"?> +<chapter version="5.0" xml:id="prerequisites" + xmlns="" + xmlns:xlink="" + xmlns:xi="" + xmlns:svg="" + xmlns:m="" + xmlns:html="" + xmlns:db=""> + <title>Prerequisites</title> + + <section xml:id="resources"> + <title>Lecture resources</title> + + <glosslist> + <glossentry> + <glossterm>Recommended books</glossterm> + + <glossdef> + <itemizedlist> + <listitem> + <para><xref linkend="bib_fawcett2012"/></para> + </listitem> + + <listitem> + <para><xref linkend="bib_Walmsley02"/></para> + </listitem> + </itemizedlist> + </glossdef> + </glossentry> + + <glossentry> + <glossterm>Lecture notes as PDF</glossterm> + + <glossdef> + <para><uri + xlink:href=""></uri></para> + + <caution> + <para>Some figures and videos are left blank.</para> + </caution> + </glossdef> + </glossentry> + + <glossentry> + <glossterm>Live lecture additions</glossterm> + + <glossdef> + <para><link + xlink:href=""></link></para> + </glossdef> + </glossentry> + + <glossentry> + <glossterm>List of exercises</glossterm> + + <glossdef> + <para>The lecture notes contain exercises to be solved by you! A + complete list is available at <uri + xlink:href=""></uri>.</para> + + <para>You may also want to use the corresponding PDF version of the + above table within <filename + xlink:href="">printversion.pdf</filename> + to keep track of your personal advances by filling in your + completion status on individual exercises.</para> + </glossdef> + </glossentry> + + <glossentry> + <glossterm><xref linkend="glo_Javadoc"/> references and source + code</glossterm> + + <glossdef> + <para>The lecture notes contain a lot of <xref + linkend="glo_Javadoc"/> references. Most classes appearing within + these lecture notes have <xref linkend="glo_Javadoc"/> generated + links to the source code as well. For example when clicking on the + class name in <classname>sda.jdbc.intro.v1.SimpleInsert</classname> + you will see the complete implementation.</para> + </glossdef> + </glossentry> + + <glossentry> + <glossterm>Links to animated figures</glossterm> + + <glossdef> + <para>The lecture notes' online version contains links to <uri + xlink:href="">PDF + images</uri>. Clicking on <quote>Animated PDF Version</quote> takes + you to a referenced PDF which in full screen mode of Acrobat Reader + or <trademark>google-chrome</trademark> provides a slide like + animation.</para> + </glossdef> + </glossentry> + + <glossentry> + <glossterm><trademark>Virtualbox</trademark> image</glossterm> + + <glossdef> + <para>A <productname + xlink:href="">Virtualbox</productname> + image is available at <uri + xlink:href=""></uri> + <link + xlink:href=""></link>.</para> + + <caution> + <para>Access from networks being external to + <uri></uri> requires <acronym>VPN</acronym> + access.</para> + </caution> + + <para>It contains (hopefully) all related tools from the <link + xlink:href="">CSM</link> department's + lecture room Linux installation:</para> + + <itemizedlist> + <listitem> + <para>Eclipse J2EE version with <productname + xlink:href="">Database developer + tools</productname>, <productname + xlink:href="">git</productname>, <trademark + xlink:href="">Oxygenxml</trademark>, + <productname + xlink:href="">TestNG</productname> + and <productname + xlink:href="">svn</productname> + plugins installed.</para> + </listitem> + + <listitem> + <para>A running <productname + xlink:href="">Mysql</productname> server + preconfigured with user <quote><code>hdmuser</code></quote>, + password <quote><code>XYZ</code></quote> (<emphasis + role="bold">capital letters!</emphasis>) and database + <quote><code>hdm</code></quote>.</para> + </listitem> + + <listitem> + <para><productname + xlink:href="">Xmlmind XML + editor</productname> for visually editing technical documents + based on <productname + xlink:href="">docbook</productname> + or <productname + xlink:href="">DITA</productname>.</para> + </listitem> + </itemizedlist> + + <caution> + <para>This VM is only accessible from within the <orgname + xlink:href="">HdM</orgname> network. + External downloads require <productname + xlink:href="">OpenVPN</productname>.</para> + </caution> + + <para>The virtual machine is based on the <productname + xlink:href="">Lubuntu</productname> fork of the + <productname xlink:href="">Ubuntu</productname> + Linux distribution for resource saving reasons.</para> + </glossdef> + </glossentry> + + <glossentry xml:id="oxygenLicenseKey"> + <glossterm><uri>Oxygen Xml Editor</uri> license key</glossterm> + + <glossdef> + <para>This is the only software component in this lecture requiring + a license. Your <orgname>HdM</orgname> affiliation entitles you to + use the <productname + xlink:href="">Oxygenxml</productname> software + for educational (non-commercial) purposes. The corresponding key is + available at <uri + xlink:href=""></uri>.</para> + + <para>This license key is compatible both with the standalone and + the eclipse plugin version of the product.</para> + + <caution> + <para>The license key's <abbrev + xlink:href="">ftp</abbrev> + URL is only accessible from within the <orgname + xlink:href="">HdM</orgname> network. + External access requires <link + xlink:href="">Vpn + activation</link>.</para> + </caution> + </glossdef> + </glossentry> + + <glossentry> + <glossterm>Source code of lecture resources</glossterm> + + <glossdef> + <para>The complete lecture sources are available from <link + xlink:href=""></link>.</para> + + <para>You may simply execute <quote><command + xlink:href="">git</command> + <option>clone</option> + <option></option> + <option>.</option></quote> to check out the master tree.</para> + </glossdef> + </glossentry> + + <glossentry> + <glossterm>Source code of exercises and examples</glossterm> + + <glossdef> + <para>These sources contain a subdirectory + <filename>ws/eclipse/Jdbc</filename> which can be imported as an + eclipse project. This allows for browsing solutions to the exercises + and executing sample applications. Import into eclipse works the + following way:</para> + + <itemizedlist> + <listitem> + <para>When starting eclipse choose + <filename>.../ws/eclipse</filename> as workspace</para> + </listitem> + + <listitem> + <para>In eclipse click <quote>File --> Import --> General + --> Existing Projects into Workspace</quote>. After + re-selecting the current workspace + <filename>.../ws/eclipse</filename> the folder + <filename>Jdbc</filename> should be on the list of importable + projects.</para> + + <para>Depending on your eclipse installation you may have to + adjust the <xref linkend="glo_Java"/> system libraries. Right + click on your project root in the package explorer and choose + <quote>Build Path --> Configure Buildpath</quote>. The + <quote>JRE System Library</quote> entry in the + <quote>Libraries</quote> tab may have to be changed to suit your + eclipse's installation needs. You may want to create a dummy + <xref linkend="glo_Java"/> project to find the correct + setting.</para> + </listitem> + </itemizedlist> + </glossdef> + </glossentry> + </glosslist> + </section> + + <section xml:id="tools"> + <title>Tools</title> + + <para>The subsequent sections describe tools being helpful to successfully + carry out the exercises. These descriptions are suitable for current + Linux/Ubuntu systems. However these tool are available for + <trademark>Windows</trademark> or <trademark>Apple</trademark> systems as + well. For the latter some command line hints may have to be replaced by + using GUI based tools.</para> + + <para>You may want to use the <link + xlink:href="">corresponding</link> + <link xlink:href="">Virtualbox image</link> + containing a complete system avoiding installation hassles. This should + work well one reasonable current hardware systems.</para> + + <section xml:id="eclipse"> + <title><productname + xlink:href="">JDK</productname> + and Eclipse</title> + + <para>So you like to take the hard way rather than using <link + xlink:href="">the + virtualbox image</link>? Good! Real programmers tend to complicate + things!</para> + + <para>The Eclipse IDE will be used as the primary coding tool especially + for <xref linkend="glo_Java"/> and XML. Users may use different tools + like e.g. <productname + xlink:href="">Netbeans</productname> or <productname + xlink:href="">XML-Spy</productname>. + There are however some caveats:</para> + + <itemizedlist> + <listitem> + <para>Certain functionalities may not be provided</para> + </listitem> + + <listitem> + <para><orgname>HdM</orgname> staff support in case of troubles will + be limited to coding excluding tool support. In other words: You are + on your own!</para> + </listitem> + </itemizedlist> + + <para>Installation of eclipse requires a suitable <xref + linkend="glo_Java"/> Development Kit.</para> + + <caution> + <para>Your<productname + xlink:href="">JDK</productname> + selection may be affected by your system's hardware. On a 64 bit + system you may install either a 32 bit or a 64 bit <productname + xlink:href="">JDK</productname>. + If you subsequently install eclipse you must select the appropriate 32 + or 64 Bit version matching your <productname + xlink:href="">JDK</productname> + choice.</para> + </caution> + + <para>Due to Oracle's (end-user unfriendly) licensing policy you may + have to install this component manually. For <productname + xlink:href="">Ubuntu</productname> and <productname + xlink:href="">Debian</productname> systems a + standard (package manager compatible) procedure is being described at + <uri + xlink:href=""></uri>. + This boils down to (being executed as user root or preceded by + <command>sudo</command> <option>...</option>):</para> + + <programlisting language="none">add-apt-repository ppa:webupd8team/java +apt-get update +apt-get install oracle-jdk7-installer</programlisting> + + <para>During the installation process you will have to accept Oracle's + license terms. If you do so this information will be cached and not be + asked again for when updating via <command>aptitude + </command><option>update</option>;<command>aptitude</command> + <option>safe-upgrade</option>. After successful installation when + executing <command + xlink:href="">java</command> + <option>-version</option> in a shell you should see something similar + to:</para> + + <programlisting language="none">goik@goiki:~$ <emphasis role="bold">java -version</emphasis> +java version "1.7.0_07" +Java(TM) SE Runtime Environment (build 1.7.0_07-b10) +Java HotSpot(TM) Server VM (build 23.3-b01, mixed mode)</programlisting> + + <para>The Eclipse IDE comes <link + xlink:href="">with various + flavours</link> depending on which plugins are already being shipped. + For our purposes the <quote><productname>Eclipse + Classic</productname></quote> <xref linkend="glo_Java"/> edition is + sufficient. You may however want to install other flavours like + <quote><productname>Eclipse IDE for Java EE + Developers</productname></quote> if you require features beyond this + course's needs. Remember to download the correct 32 or 64 bit version + corresponding to your<productname + xlink:href="">JDK</productname>.</para> + + <para>Follow <uri + xlink:href=""></uri> + to install eclipse on your system.</para> + </section> + + <section xml:id="oxygenxmlInstall"> + <title><productname + xlink:href="">Oxygenxml</productname> plugin</title> + + <para>Go to <uri + xlink:href=""></uri>. + You may choose between the <quote>Plugin Update site</quote> and + <quote>Plugin zip distribution</quote> installation method. The latter + allows for better long term eclipse plugin management and is being + described at</para> + + <para>There are two different ways to install Eclipse plugins:</para> + + <itemizedlist> + <listitem> + <para>Use Eclipse's built in Update manager by <link + xlink:href="">defining + a corresponding update site</link>.</para> + </listitem> + + <listitem> + <para>Unzip <filename></filename> in + a subfolder of <filename>.../eclipse/dropins</filename> and restart + eclipse (as root).</para> + </listitem> + </itemizedlist> + + <para>See <xref linkend="oxygenLicenseKey"/> for obtaining a license + key. You may as well install the standalone version of the Oxygen XML + Editor.</para> + </section> + + <section xml:id="erMaster"> + <title>ERMaster</title> + + <para>Visual editing of physical entity relationship diagrams. See <link + xlink:href="">installation + instructions</link> on top of an existing eclipse installation.</para> + </section> + + <section xml:id="testngInstall"> + <title><foreignphrase>TestNG</foreignphrase> plugin</title> + + <para>Some exercises require the TestNG plugin to be installed in the + Eclipse IDE. You may proceed in a similar way as in <uri + linkend="oxygenxmlInstall">Oxygenxml</uri>. According to <uri + xlink:href=""></uri> + the Eclipse URL being needed is + <quote></quote>.</para> + </section> + + <section xml:id="mysql"> + <title><productname + xlink:href="">Mysql</productname> Database + components</title> + + <para>We start by installing the <productname + xlink:href="">Mysql</productname> server:</para> + + <programlisting language="none">root@goiki:~# aptitude install mysql-server +The following NEW packages will be installed: + libdbd-mysql-perl{a} libdbi-perl{a} libnet-daemon-perl{a} libplrpc-perl{a} + mysql-client-5.5{a} mysql-server-5.5 +0 packages upgraded, 6 newly installed, 0 to remove and 0 not upgraded. +Need to get 0 B/17.8 MB of archives. After unpacking 63.2 MB will be used. +Do you want to continue? [Y/n/?]</programlisting> + + <para>Hit <keycap>Y - return</keycap> to start. During the installation + you will be asked for the <productname + xlink:href="">Mysql</productname> servers + <quote>root</quote> (Administrator) password:</para> + + <programlisting language="none">Package configuration + + + ┌───────────────────────────┤ Configuring mysql-server-5.5 ├────────────────────────────┠+ │ While not mandatory, it is highly recommended that you set a password for the MySQL │ + │ administrative "root" user. │ + │ │ + │ If this field is left blank, the password will not be changed. │ + │ │ + │ New password for the MySQL "root" user: │ + │ │ + │ ********_____________________________________________________________________________ │ + │ │ + │ <Ok> │ + │ │ + └───────────────────────────────────────────────────────────────────────────────────────┘ + + + </programlisting> + + <para>This has to be entered twice. Keep a <emphasis + role="bold">permanent</emphasis> record of this entry. Alternatively set + a bookmark to <uri + xlink:href=""></uri> + for later reference *** and don't blame me! ***.</para> + + <para>At this point we should be able to connect to our newly installed + Server. We create a database <quote>hdm</quote> to be used for our + exercises:</para> + + <programlisting language="none">goik@goiki:~$ mysql -u root -p +Enter password: +Welcome to the MySQL monitor. Commands end with ; or \g. +Your MySQL connection id is 42 +Server version: 5.5.24-0ubuntu0.12.04.1 (Ubuntu) + +Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved. + +Oracle is a registered trademark of Oracle Corporation and/or its +affiliates. Other names may be trademarks of their respective +owners. + +Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. + +mysql> <emphasis role="bold">create database hdm;</emphasis> +Query OK, 1 row affected (0.00 sec)</programlisting> + + <para>Following <uri + xlink:href=""></uri> + we add a new user and grant full access to the newly created + database:</para> + + <programlisting language="none">goik@goiki:~$ mysql -u root -p +Enter password: + ... +mysql> CREATE USER 'hdmuser'@'localhost' IDENTIFIED BY 'XYZ'; +mysql> use hdm; +mysql> GRANT ALL PRIVILEGES ON *.* TO 'hdmuser'@'localhost' WITH GRANT OPTION; +mysql> FLUSH PRIVILEGES;</programlisting> + + <para>The next step is optional. The <productname + xlink:href="">Ubuntu</productname> <productname + xlink:href="">Mysql</productname> server default + configuration allows connections only via <varname>loopback</varname> + interface i.e. <varname>localhost</varname>. If you want your + <productname xlink:href="">Mysql</productname> + server to listen to the external network interface comment out the + bind-address parameter in <filename>/etc/mysql/my.cnf</filename>:</para> + + <programlisting language="none"># Instead of skip-networking the default is now to listen only on +# localhost which is more compatible and is not less secure. +# <emphasis role="bold">bind-address =</emphasis></programlisting> + + <para>Since we are dealing with <xref linkend="glo_Java"/> a <trademark + xlink:href="">JDBC</trademark> + driver is needed to connect Applications to our <productname + xlink:href="">Mysql</productname> server:</para> + + <programlisting language="none">root@goiki:~# aptitude install libmysql-java</programlisting> + + <para>This provides the file + /usr/share/java/mysql-connector-java-5.1.16.jar and two symbolic + links:</para> + + <programlisting language="none">goik@goiki:~$ cd /usr/share/java +goik@goiki:/usr/share/java$ ls -al mysql* +-rw-r--r-- 1 ... 2011 <emphasis role="bold">mysql-connector-java-5.1.16.jar</emphasis> +lrwxrwxrwx 1 ... 2011 <emphasis role="bold">mysql-connector-java.jar -> mysql-connector-java-5.1.16.jar</emphasis> +lrwxrwxrwx 1 ... 2011 <emphasis role="bold">mysql.jar -> mysql-connector-java.jar</emphasis></programlisting> + </section> + </section> + + <section xml:id="lectureNotes"> + <title>Lecture related resources</title> + + <para>The sources for lecture notes and exercises are available from the + <orgname xlink:href="">MIB</orgname> + <productname xlink:href="">git</productname> + repository:</para> + + <para><uri + xlink:href=""></uri></para> + + <para>Check-out is straightforward:</para> + + <programlisting language="none">goik@goiki:~$ mkdir StructuredData;cd StructuredData + +goik@goiki:~/StructuredData$ git clone . +Cloning into '.'... +remote: Counting objects: 694, done +... +Resolving deltas: 100% (296/296), done.</programlisting> + + <para>After checkout an eclipse workspace holding the complete example + source code becomes visible:</para> + + <programlisting language="none">goik@goiki:~/StructuredData$ cd ws/eclipse +goik@goiki:~/StructuredData/ws/eclipse$ ls -al +insgesamt 16 +drwxr-xr-x 3 goik fb1prof 4096 Nov 8 22:04 . +drwxr-xr-x 4 goik fb1prof 4096 Nov 8 22:04 .. +-rw-r--r-- 1 goik fb1prof 11 Nov 8 22:04 .gitignore +<emphasis role="bold">drwxr-xr-x 6 goik fb1prof 4096 Nov 8 22:04 Jdbc</emphasis></programlisting> + + <para>The subdirectory <filename>Jdbc</filename> can be imported as an + eclipse project via File --> import --> General --> Existing + Projects into workspace. This should enable each participant to browse and + execute the examples being provided in the lecture notes. It also contains + the a <productname xlink:href="">Mysql</productname> + driver in Jdbc/lib/mysql-connector-java-5.1.16.jar being required to set + up a <trademark + xlink:href="">JDBC</trademark> + connection.</para> + </section> + + <section xml:id="repeatRelational"> + <title>Some notes on relational databases</title> + + <qandaset defaultlabel="qanda" xml:id="airlineRelationalSchema"> + <title>Airlines, airports and flights</title> + + <qandadiv> + <qandaentry> + <question> + <para>Implement a relational schema describing airlines, flights, + airports and their respective relationships:</para> + + <itemizedlist> + <listitem> + <para>Airline:</para> + + <itemizedlist> + <listitem> + <para>An informal unique name like e.g. + <quote>Lufthansa</quote>.</para> + </listitem> + + <listitem> + <para>A unique <link + xlink:href="">ICAO + abbreviation</link>.</para> + </listitem> + </itemizedlist> + </listitem> + + <listitem> + <para>Destination</para> + + <itemizedlist> + <listitem> + <para>Full name like <quote>Frankfurt am Main + International</quote></para> + </listitem> + + <listitem> + <para>World airport code like <quote>FRA</quote>.</para> + </listitem> + </itemizedlist> + </listitem> + + <listitem> + <para>Flight</para> + + <itemizedlist> + <listitem> + <para>A unique flight number e.g. LH 4234</para> + </listitem> + + <listitem> + <para>The <quote>owning</quote> airline.</para> + </listitem> + + <listitem> + <para>originating airport</para> + </listitem> + + <listitem> + <para>destination airport</para> + </listitem> + + <listitem> + <para>Constraint: origin and destination must differ. + Hint: <productname>Mysql</productname> provides a + syntactical means to implement this constraint. It will + however not be enforced at runtime. Database vendors like + Oracle, IBM/DB2, <productname>Sybase</productname>, + <productname>Informix</productname> <abbrev>etc.</abbrev> + support this type of runtime integrity constraint + enforcement.</para> + </listitem> + </itemizedlist> + </listitem> + </itemizedlist> + + <para>Provide surrogate keys for all entities and provide names + for all constraints (<abbrev>e.g.</abbrev> defining + <code>CONSTRAINT _PK_XYZ PRIMARY KEY(...)</code> etc. ).</para> + </question> + + <answer> + <programlisting language="sql">CREATE Table Airline ( + id INT NOT NULL + ,name CHAR(20) NOT NULL + ,airlineCode CHAR(5) NOT NULL + + ,CONSTRAINT _PK_Airline_id PRIMARY KEY(id) + ,CONSTRAINT _UN_Airline_name UNIQUE(name) + ,CONSTRAINT _UN_Airline_airlineCode UNIQUE(airlineCode) +); + +CREATE TABLE Destination ( + id INT NOT NULL + ,fullName CHAR(20) NOT NULL + ,airportCode CHAR(5) + + ,CONSTRAINT _PK_Destination_id PRIMARY KEY(id) + ,CONSTRAINT _UN_Destination_airportCode UNIQUE(airportCode) +); + +CREATE TABLE Flight ( + id INT NOT NULL + ,flightNumber CHAR(10) NOT NULL + ,airline INT NOT NULL REFERENCES Airline + ,origin int NOT NULL REFERENCES Destination + ,destination int NOT NULL REFERENCES Destination + + -- For yet unknown reasons the following alternative MySQL 5.1 syntax compatible + -- statements fail with message 'Cannot add foreign key constraint": + -- ,CONSTRAINT _FK_Flight_airline FOREIGN KEY(airline) REFERENCES Airline + -- ,CONSTRAINT _FK_Flight_origin FOREIGN KEY(origin) REFERENCES Destination + -- ,CONSTRAINT _FK_Flight_destination FOREIGN KEY(destination) REFERENCES Destination + + ,CONSTRAINT _PK_Flight_id UNIQUE(id) + ,CONSTRAINT _UN_Flight_flightNumber UNIQUE(flightNumber) + ,CONSTRAINT _CK_Flight_origin_destination CHECK(NOT(origin = destination)) +);</programlisting> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="toolingConfigJdbc"> + <title>Tooling: Configuring and using the <link + xlink:href="">Eclipse database + development</link> plugin</title> + + <para>For some basic SQL communications the Eclipse environment offers a + standard plugin (Database development). Establishing connections to a + specific database server generally requires prior installation of a + <trademark + xlink:href="">JDBC</trademark> + driver on the client side as being shown in the following video:</para> + + <figure xml:id="figureConfigJdbcDriver"> + <title>Adding a <trademark + xlink:href="">JDBC</trademark> + Driver for <productname + xlink:href="">Mysql</productname> to the database + plugin.</title> + + <mediaobject> + <videoobject> + <videodata fileref="Ref/Video/jdbcDriverConfig.mp4"/> + </videoobject> + </mediaobject> + </figure> + + <para>During the exercises the eclipse database development perspective + may be used to browse and structure SQL tables and data. The following + video demonstrates the configuration of a <trademark + xlink:href="">JDBC</trademark> + connection to a local (<varname>localhost</varname> network interface) + database server. With respect to the introduction given in <xref + linkend="mysql"/> we assume the existence of a database <code>hdm</code> + and a corresponding account <quote>hdmuser</quote> and password + <quote><code>XYZ</code></quote> (<emphasis role="bold">capital + letters!</emphasis>) on our database server.</para> + + <figure xml:id="figureConfigJdbcConnection"> + <title>Configuring a <trademark + xlink:href="">JDBC</trademark> + connection to a (local) <productname + xlink:href="">Mysql</productname> database + server.</title> + + <mediaobject> + <videoobject> + <videodata fileref="Ref/Video/jdbcConnection.mp4"/> + </videoobject> + </mediaobject> + </figure> + + <para>We are now ready to communicate with our database server. The last + video in this section shows some basic SQL tasks:</para> + + <figure xml:id="figureEclipseBasicSql"> + <title>Executing SQL statements, browsing schema and retrieving + data</title> + + <mediaobject> + <videoobject> + <videodata fileref="Ref/Video/eclipseBasicSql.mp4"/> + </videoobject> + </mediaobject> + </figure> + </section> +</chapter> diff --git a/Sda1/sax.xml b/Sda1/sax.xml new file mode 100644 index 000000000..70ce88838 --- /dev/null +++ b/Sda1/sax.xml @@ -0,0 +1,1614 @@ + <chapter xml:id="sax" version="5.0" xmlns="" + xmlns:xlink="" + xmlns:xi="" + xmlns:svg="" + xmlns:m="" + xmlns:html="" + xmlns:db=""> + + <title>XML APIs, the Simple API for XML (SAX)</title> + + <section xml:id="saxPrinciple"> + <title>The principle of a <acronym + xlink:href="">SAX</acronym> + application</title> + + <para>We are already familiar with transformations of XML document + instances to other formats. Sometimes the capabilities being offered + by a given transformation approach do not suffice for a given problem. + Obviously a general purpose programming language like <xref linkend="glo_Java"/> offers + superior means to perform advanced manipulations of XML document + trees.</para> + + <para>Before diving into technical details we present an example + exceeding the limits of our present transformation capabilities. We + want to format an XML catalog document with article descriptions to + HTML. The price information however shall resides in a XML document + external database namely a RDBMS:</para> + + <figure xml:id="saxRdbmsAccessPrinciple"> + <title>Generating HTML from a XML document and an RDBMS.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/saxxmlrdbms.fig" scale="65"/> + </imageobject> + </mediaobject> + </figure> + + <para>Our catalog might look like:</para> + + <figure xml:id="simpleCatalog"> + <title>A <xref linkend="glo_XML"/> based + catalog.</title> + + <programlisting language="none"><catalog> + <item orderNo="<emphasis role="bold">3218</emphasis>">Swinging headset</item> + <item orderNo="<emphasis role="bold">9921</emphasis>">200W Stereo Amplifier</item> +</catalog></programlisting> + </figure> + + <para>The RDBMS may hold some relation with a field + <code>orderNo</code> as primary key and a corresponding attribute like + <code>price</code>. In a real world application <code>orderNo</code> + should probably be an integer typed <code>IDENTITY</code> + attribute.</para> + + <figure xml:id="saxRdbmsSchema"> + <title>A Relation containing price information.</title> + + <programlisting language="none">CREATE TABLE Product ( + orderNo CHAR(10) PRIMARY KEY + ,price Money +) + +INSERT INTO Product VALUES('<emphasis role="bold">3218</emphasis>', 42.57) +INSERT INTO Product VALUES('<emphasis role="bold">9921</emphasis>', 121.50)</programlisting> + + <caption> + <para>Prices are depending on article numbers.</para> + </caption> + </figure> + + <para>The intended HTML output with order numbers being highlighted + looks like:</para> + + <figure xml:id="saxPriceOut"> + <title>HTML generated output.</title> + + <programlisting language="none"><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> + <html> + <head><title>Available products</title></head> + <body> + <table border="1"> + <tbody> + <tr> + <th><emphasis role="bold">Order number</emphasis></th> + <th>Price</th> + <th>Product</th> + </tr> + <tr> + <td><emphasis role="bold">3218</emphasis></td> + <td>42,57</td> + <td>Swinging headset</td> + </tr> + <tr> + <td><emphasis role="bold">9921</emphasis></td> + <td>121,50</td> + <td>200W Stereo Amplifier</td> + </tr> + </tbody> + </table> + </body> + </html></programlisting> + + <caption> + <para>This result HTML document contains content both from our XML + document an from the database table <code>Product</code>.</para> + </caption> + </figure> + + <para>The intended transformation is beyond the XSLT standard's + processing capabilities: XSLT does not enable us to RDBMS content. + However some XSLT processors provide extensions for this task.</para> + + <para>It is tempting to write a <xref linkend="glo_Java"/> application + which might use e.g. <trademark + xlink:href="">JDBC</trademark> + for database access. But how do we actually read and parse a XML file? + Sticking to the <xref linkend="glo_Java"/> standard we + might use a <link + xlink:href="">FileInputStream</link> + instance to read from <code>catalog.xml</code> and write a XML parser + by ourself. Fortunately <orgname>SUN</orgname>'s <trademark + xlink:href="">JDK</trademark> + already includes an API denoted <acronym + xlink:href="">SAX</acronym>, the + <emphasis>S</emphasis>imple <emphasis>A</emphasis>pi for + <emphasis>X</emphasis>ml. The<productname + xlink:href="">JDK</productname> + also includes a corresponding parser implementation. In addition there + are third party <acronym + xlink:href="">SAX</acronym> parser + implementations available like <productname + xlink:href="">Xerces</productname> from the + <orgname xlink:href="">Apache + Foundation</orgname>.</para> + + <para>The <acronym + xlink:href="">SAX</acronym> API is event + based and will be illustrated by the relationship between customers + and a software vendor company:</para> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/updateinfo.fig"/> + </imageobject> + </mediaobject> + + <para>After purchasing software customers are asked to register their + software. This way the vendor receives the customer's address. Each + time a new release is being completed all registered customers will + receive a notification typically including a <quote>special + offer</quote> to upgrade their software. From an abstract point of + view the following two actions take place:</para> + + <variablelist> + <varlistentry> + <term>Registration</term> + + <listitem> + <para>The customer registers itself at the company's site + indicating it's interest in updated versions.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>Notification</term> + + <listitem> + <para>Upon completion of each new software release (considered + to be an <emphasis>event</emphasis>) a message is sent to all + registered customers.</para> + </listitem> + </varlistentry> + </variablelist> + + <para>The same principle applies to GUI applications in software + development. A key press <emphasis>event</emphasis> for example will + be forwarded by an application's <emphasis>event handler</emphasis> to + a callback function (sometimes called a <emphasis>handler</emphasis> + method) being implemented by an application developer. The <acronym + xlink:href="">SAX</acronym> API works the + same way: A parser reads a XML document generating events which + <emphasis>may</emphasis> be handled by an application. During document + parsing the XML tree structure gets <quote>flattened</quote> to a + sequence of events:</para> + + <figure xml:id="saxFlattenEvent"> + <title>Parsing a XML document creates a corresponding sequence of + events.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/saxmodel.pdf"/> + </imageobject> + </mediaobject> + </figure> + + <para>An application may register components to the parser:</para> + + <figure xml:id="figureSax"> + <title><acronym xlink:href="">SAX</acronym> + Principle</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/saxapparch.pdf"/> + </imageobject> + + <caption> + <para>A <acronym + xlink:href="">SAX</acronym> application + consists of a <acronym + xlink:href="">SAX</acronym> parser and + an implementation of event handlers being specific to the + application. The application is developed by implementing the + two handlers.</para> + </caption> + </mediaobject> + </figure> + + <para>An Error Handler is required since the XML stream may contain + errors. In order to implement a <acronym + xlink:href="">SAX</acronym> application we + have to:</para> + + <orderedlist> + <listitem> + <para>Instantiate required objects:</para> + + <itemizedlist> + <listitem> + <para>Parser</para> + </listitem> + + <listitem> + <para>Event Handler</para> + </listitem> + + <listitem> + <para>Error Handler</para> + </listitem> + </itemizedlist> + </listitem> + + <listitem> + <para>Register handler instances</para> + + <itemizedlist> + <listitem> + <para>register Event Handler to Parser</para> + </listitem> + + <listitem> + <para>register Error Handler to Parser</para> + </listitem> + </itemizedlist> + </listitem> + + <listitem> + <para>Start the parsing process by calling the parser's + appropriate method.</para> + </listitem> + </orderedlist> + </section> + + <section xml:id="saxIntroExample"> + <title>First steps</title> + + <para>Our first <acronym + xlink:href="">SAX</acronym> toy application + <classname>sax.stat.v1.ElementCount</classname> shall simply count the + number of elements it finds in an arbitrary XML document. In addition + the <acronym xlink:href="">SAX</acronym> + events shall be written to standard output generating output sketched + in <xref linkend="saxFlattenEvent"/>. The application's central + implementation reads:</para> + + <figure xml:id="saxElementCount"> + <title>Counting XML elements.</title> + + <programlisting language="none">package sax.stat.v1; +... + +public class ElementCount { + + public void parse(final String uri) { + try { + final SAXParserFactory saxPf = SAXParserFactory.newInstance(); + final SAXParser saxParser = saxPf.newSAXParser(); + saxParser.parse(uri, eventHandler); + } catch (ParserConfigurationException e){ + e.printStackTrace(System.err); + } catch (org.xml.sax.SAXException e) { + e.printStackTrace(System.err); + } catch (IOException e){ + e.printStackTrace(System.err); + } + } + + public int getElementCount() { + return eventHandler.getElementCount(); + } + private final MyEventHandler eventHandler = new MyEventHandler(); +}</programlisting> + + <caption> + <para>This application works for arbitrary well-formed XML + documents.</para> + </caption> + </figure> + + <para>We now explain this application in detail. The first part deals + with the instantiation of a parser:</para> + + <programlisting language="none">try { + final SAXParserFactory saxPf = <emphasis role="bold">SAXParserFactory</emphasis>.newInstance(); + final SAXParser saxParser = saxPf.newSAXParser(); + saxParser.parse(uri, eventHandler); +} catch (ParserConfigurationException e){ + e.printStackTrace(System.err); +} ...</programlisting> + + <para>In order to keep an application independent from a specific + parser implementation the <acronym + xlink:href="">SAX</acronym> uses the so + called <link + xlink:href="">Abstract + Factory Pattern</link> instead of simply calling a constructor from a + vendor specific parser class.</para> + + <para>In order to be useful the parser has to be instructed to do + something meaningful when a XML document gets parsed. For this purpose + our application supplies an event handler instance:</para> + + <programlisting language="none">public void parse(final String uri) { + try { + final SAXParserFactory saxPf = SAXParserFactory.newInstance(); + final SAXParser saxParser = saxPf.newSAXParser(); + saxParser.parse(uri, <emphasis role="bold">eventHandler</emphasis>); + } catch (org.xml.sax.SAXException e) { + ... + private final MyEventHandler <emphasis role="bold">eventHandler = new MyEventHandler()</emphasis>; +}</programlisting> + + <para>What does the event handler actually do? It offers methods to + the parser being callable during the parsing process:</para> + + <programlisting language="none">package sax.stat.v1; +... +public class MyEventHandler extends <classname>org.xml.sax.helpers.DefaultHandler</classname> { + + public void <emphasis role="bold"><emphasis role="bold">startDocument()</emphasis></emphasis><co + xml:id="programlisting_eventhandler_startDocument"/> { + System.out.println("Opening Document"); + } + public void <emphasis role="bold">endDocument()</emphasis><co + xml:id="programlisting_eventhandler_endDocument"/> { + System.out.println("Closing Document"); + } + public void <emphasis role="bold">startElement(String namespaceUri, String localName, String rawName, + Attributes attrs)</emphasis> <co + xml:id="programlisting_eventhandler_startElement"/>{ + System.out.println("Opening \"" + rawName + "\""); + elementCount++; + } + public void <emphasis role="bold">endElement(String namespaceUri, String localName, + String rawName)</emphasis><co + xml:id="programlisting_eventhandler_endElement"/>{ + System.out.println("Closing \"" + rawName + "\""); + } + public void <emphasis role="bold">characters(char[] ch, int start, int length)</emphasis><co + xml:id="programlisting_eventhandler_characters"/>{ + System.out.println("Content \"" + new String(ch, start, length) + '"'); + } + public int getElementCount() <co + xml:id="programlisting_eventhandler_getElementCount"/>{ + return elementCount; + } + private int elementCount = 0; +}</programlisting> + + <calloutlist> + <callout arearefs="programlisting_eventhandler_startDocument"> + <para>This method gets called exactly once namely when opening the + XML document as a whole.</para> + </callout> + + <callout arearefs="programlisting_eventhandler_endDocument"> + <para>After successfully parsing the whole document instance this + method will finally be called.</para> + </callout> + + <callout arearefs="programlisting_eventhandler_startElement"> + <para>This method gets called each time a new element is parsed. + In the given catalog.xml example it will be called three times: + First when the <tag class="starttag">catalog</tag> appears and + then two times upon each <item ... >. The supplied + parameters depend whether or not name space processing is + enabled.</para> + </callout> + + <callout arearefs="programlisting_eventhandler_endElement"> + <para>Called each time an element like <tag class="starttag">item + ...</tag> gets closed by its counterpart <tag + class="endtag">item</tag>.</para> + </callout> + + <callout arearefs="programlisting_eventhandler_characters"> + <para>This method is responsible for the treatment of textual + content i.e. handling <code>#PCDATA</code> element content. We + will explain its uncommon signature a little bit later.</para> + </callout> + + <callout arearefs="programlisting_eventhandler_getElementCount"> + <para><function>getElementCount()</function> is a getter method to + read only access the private field <varname>elementCount</varname> + which gets incremented in <coref + linkend="programlisting_eventhandler_startElement"/> each time an + XML element opens.</para> + </callout> + </calloutlist> + + <para>The call <code>saxParser.parse(uri, eventHandler)</code> + actually initiates the parsing process and tells the parser to:</para> + + <itemizedlist> + <listitem> + <para>Open the XML document being referenced by the URI + argument.</para> + </listitem> + + <listitem> + <para>Forward XML events to the event handler instance supplied by + the second argument.</para> + </listitem> + </itemizedlist> + + <para>A driver class containing a <code>main(...)</code> method may + start the whole process and print out the desired number of elements + upon completion of a parsing run:</para> + + <programlisting language="none">package sax.stat.v1; + +public class ElementCountDriver { + public static void main(String argv[]) { + ElementCount xmlStats = new ElementCount(); + xmlStats.parse("<emphasis role="bold">Input/Sax/catalog.xml</emphasis>"); + System.out.println("Document contains " + xmlStats.<emphasis role="bold">getElementCount()</emphasis> + " elements"); + } +}</programlisting> + + <para>Processing the catalog example instance yields:</para> + + <programlisting language="none">Opening Document +<emphasis role="bold">Opening "catalog"</emphasis> <co + xml:id="programlisting_catalog_output"/> +Content " + " +<emphasis role="bold">Opening "item"</emphasis> <co + xml:id="programlisting_catalog_item1"/> +Content "Swinging headset" +Closing "item" +Content " + " +<emphasis role="bold">Opening "item"</emphasis> <co + xml:id="programlisting_catalog_item2"/> +Content "200W Stereo Amplifier" +Closing "item" +Content " +" +Closing "catalog" +Closing Document +<emphasis role="bold">Document contains 3 elements</emphasis> <co + xml:id="programlisting_catalog_elementcount"/></programlisting> + + <calloutlist> + <callout arearefs="programlisting_catalog_output"> + <para>Start parsing element <tag + class="starttag">catalog</tag>.</para> + </callout> + + <callout arch="" arearefs="programlisting_catalog_item1"> + <para>Start parsing element <tag class="starttag">item + orderNo="3218"</tag>Swinging headset<tag class="endtag" + role="">item</tag>.</para> + </callout> + + <callout arch="" arearefs="programlisting_catalog_item2"> + <para>Start parsing element <tag class="starttag">item + orderNo="9921"</tag>200W Stereo Amplifier<tag class="endtag" + role="">item</tag>.</para> + </callout> + + <callout arearefs="programlisting_catalog_elementcount"> + <para>After the parsing process has completed the application + outputs the number of elements being counted so far.</para> + </callout> + </calloutlist> + + <para>The output contains some lines of <quote>empty</quote> content. + This content is due to whitespace being located between elements. For + example a newline appears between the the <tag + class="starttag">catalog</tag> and the first <tag + class="starttag">item</tag> element. The parser encapsulates this + whitespace in a call to the <link + xlink:href="[],%20int,%20int)">characters</link> + method. In an application this call will typically be ignored. XML + document instances in a professional context will typically not + contain any newline characters at all. Instead the whole document is + represented as a single line. This inhibits human readability which is + not required if the processing applications work well. In this case + empty content as above will not appear.</para> + + <para>The <code>characters(char[] ch, int start, int length)</code> + method's signature looks somewhat strange regarding <xref linkend="glo_Java"/> conventions. + One might expect <code>characters(String s)</code>. But this way the + <acronym xlink:href="">SAX</acronym> API + allows efficient parser implementations: A parser may initially + allocate a reasonable large <code>char</code> array of say 128 bytes + sufficient to hold 64 (<link + xlink:href="">Unicode</link>) characters. If this + buffer gets exhausted the parser might allocate a second buffer of + double size thus implementing an <quote>amortized doubling</quote> + algorithm:</para> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/saxcharacter.pdf"/> + </imageobject> + </mediaobject> + + <para>In this example the first element content fits in the first + buffer. The second content <code>200W Stereo Amplifier</code> and the + third content <code>Earphone</code> both fit in the second buffer. + Subsequent content may require further buffer allocations. Such a + strategy minimizes the number of time consuming <code>new </code> + <link + xlink:href="">String</link> + <code>(...)</code> constructor calls being necessary for the more + convenient API variant <code>characters(String s)</code>.</para> + </section> + + <section xml:id="saxRegistry"> + <title>Event- and error handler registration</title> + + <para>Our first <acronym + xlink:href="">SAX</acronym> application + suffers from the following deficiencies:</para> + + <itemizedlist> + <listitem> + <para>The error handling is very sparse. It completely relies on + exceptions being thrown by classes like <link + xlink:href="">SAXException</link> + which frequently do not supply meaningful error + information.</para> + </listitem> + + <listitem> + <para>The application is not aware of namespaces. Thus reading + e.g. <abbrev xlink:href="">XSL</abbrev> + document instances will not allow to distinguish between elements + from different namespaces like HTML.</para> + </listitem> + + <listitem> + <para>The parser will not validate a document instance against a + schema being present.</para> + </listitem> + </itemizedlist> + + <para>We now incrementally add these features to the <acronym + xlink:href="">SAX</acronym> parsing process. + <acronym xlink:href="">SAX</acronym> offers + an interface <link + xlink:href="">XmlReader</link> + to conveniently <emphasis>register</emphasis> event- and error handler + instances independently instead of passing both interfaces as a single + argument to the <link + xlink:href=",%20org.xml.sax.helpers.DefaultHandler)">parse</link> + method. We first code an error handler class by implementing the + interface <classname>org.xml.sax.ErrorHandler</classname> being part + of the <acronym xlink:href="">SAX</acronym> + API:</para> + + <programlisting language="none">package sax.stat.v2; +... +public class MyErrorHandler implements ErrorHandler { + + <emphasis role="bold">public void warning(SAXParseException e)</emphasis> { + System.err.println("[Warning]" + getLocationString(e)); + } + <emphasis role="bold">public void error(SAXParseException e)</emphasis> { + System.err.println("[Error]" + getLocationString(e)); + } + <emphasis role="bold">public void fatalError(SAXParseException e)</emphasis> throws SAXException{ + System.err.println("[Fatal Error]" + getLocationString(e)); + } + private String getLocationString(SAXParseException e) { + return " line " + e.getLineNumber() + + ", column " + e.getColumnNumber()+ ":" + e.getMessage(); + } +}</programlisting> + + <para>These three methods represent the + <classname>org.xml.sax.ErrorHandler</classname> interface. The method + <function>getLocationString</function> is used to supply precise + parsing error locations by means of line- and column numbers within a + document instance. If errors or warnings are encountered the parser + will call one of the appropriate public methods:</para> + + <figure xml:id="saxMissItem"> + <title>A non well formed document.</title> + + <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> +<catalog> + <item orderNo="3218">Swinging headset</item> + <item orderNo="9921">200W Stereo Amplifier +</catalog></programlisting> + + <caption> + <para>This document is not well formed since due to a missing a + closing <tag class="endtag">item</tag> tag is missing.</para> + </caption> + </figure> + + <para>Our error handler method gets called yielding an informative + message:</para> + + <programlisting language="none">[Fatal Error] line 5, column -1:Expected "</item>" to terminate +element starting on line 4.</programlisting> + + <para>This error output is achieved by + <emphasis>registering</emphasis> an instance of + <classname>sax.stat.v2.MyErrorHandler</classname> to the parser prior + to starting the parsing process. In the following code snippet we also + register a content handler instance to the parser and thus separate + the parser's configuration from its invocation:</para> + + <programlisting language="none">package sax.stat.v2; +... +public class ElementCount { + public ElementCount() + throws SAXException, ParserConfigurationException{ + final SAXParserFactory saxPf = SAXParserFactory.newInstance(); + final SAXParser saxParser = saxPf.newSAXParser(); + xmlReader = saxParser.getXMLReader(); + xmlReader.setContentHandler(eventHandler); <co + xml:id="programlisting_assemble_parser_setcontenthandler"/> + xmlReader.setErrorHandler(errorHandler); <co + xml:id="programlisting_assemble_parser_seterrorhandler"/> + } + public void parse(final String uri) + throws IOException, SAXException{ + xmlReader.parse(uri); <co + xml:id="programlisting_assemble_parser_invokeparse"/> + } + public int getElementCount() { + return eventHandler.getElementCount(); <co + xml:id="programlisting_assemble_parser_getelementcount"/> + } + private final XMLReader xmlReader; + private final MyEventHandler eventHandler = new MyEventHandler(); <co + xml:id="programlisting_assemble_parser_createeventhandler"/> + private final MyErrorHandler errorHandler = new MyErrorHandler(); <co + xml:id="programlisting_assemble_parser_createerrorhandler"/> +}</programlisting> + + <calloutlist> + <callout arearefs="programlisting_assemble_parser_setcontenthandler programlisting_assemble_parser_seterrorhandler"> + <para>Referring to <xref linkend="figureSax" os=""/> these two + calls attach the event- and error handler objects to the parser + thus implementing the two arrows from the parser to the + application's implementation.</para> + </callout> + + <callout arearefs="programlisting_assemble_parser_invokeparse"> + <para>The parser is invoked. Note that in this example we only + pass a document's URI but no reference to a handler object.</para> + </callout> + + <callout arearefs="programlisting_assemble_parser_getelementcount"> + <para>The method <function>getElementCount()</function> is needed + to allow a calling object to access the private + <varname>eventHandler</varname> object's + <function>getElementCount()</function> method.</para> + </callout> + + <callout arearefs="programlisting_assemble_parser_createeventhandler programlisting_assemble_parser_createerrorhandler"> + <para>An event handling and an error handling object are created + to handle events during the parsing process.</para> + </callout> + </calloutlist> + + <para>The careful reader might notice a subtle difference between the + content- and the error handler implementation: The class + <classname>sax.stat.v2.MyErrorHandler</classname> implements the + interface <classname>org.xml.sax.ErrorHandler</classname>. But + <classname>sax.stat.v2.MyEventHandler</classname> is derived from + <classname>org.xml.sax.helpers.DefaultHandler</classname> which itself + implements the <classname>org.xml.sax.ContentHandler</classname> + interface. Actually one might as well start from the latter interface + requiring to implement all of it's 11 methods. In most circumstances + this only complicates the application's code since it is unnecessary + to react to events belonging for example to processing instructions. + For this reason it is good coding practice to use the empty default + implementations in + <classname>org.xml.sax.helpers.DefaultHandler</classname> and to + redefine only those methods corresponding to events actually being + handled by the application in question.</para> + + <qandaset defaultlabel="qanda" xml:id="sda1SaxReadAttributes"> + <title>SAX and attribute values</title> + + <qandadiv> + <qandaentry> + <question> + <label>Reading an element's set of attributes.</label> + + <para>The example document instance does include <tag + class="attribute">orderNo</tag> attribute values for each <tag + class="starttag">item</tag> element. The parser does not yet + show these attribute keys and their corresponding values. Read + the documentation for <classname + xlink:href="">org.xml.sax.Attributes</classname> + and extend the given code to use it.</para> + + <para>You should start from the <xref linkend="glo_MIB"/> + Maven archetype <code>mi-maven-archetype-sax</code>. + Configuration hints are available at <uri + xlink:href=""></uri>.</para> + </question> + + <answer> + <para>For the given example it would suffice to read the known + <tag class="attribute">orderNo</tag> attributes value. A + generic solution may ask for the set of all defined attributes + and show their values:</para> + + <programlisting language="none">package sax; + +public class AttribEventHandler extends DefaultHandler { + + public void startElement(String namespaceUri, String localName, + String rawName, Attributes attrs) { + System.out.println("Opening Element " + rawName); + for (int i = 0; i < attrs.getLength(); i++){ + System.out.println(attrs.getQName(i) + "=" + attrs.getValue(i) + "\n"); + } + } +}</programlisting> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + + <section xml:id="sda1SecElementLists"> + <title>The set of element names</title> + + <qandaset defaultlabel="qanda" xml:id="sda1QandaElementNames"> + <title>Element lists of arbitrary XML documents.</title> + + <qandadiv> + <qandaentry> + <question> + <para>We reconsider the simple application reading arbitrary + XML documents and providing a list of XML Elements being + contained within:</para> + + <programlisting language="none">Opening Document +<emphasis role="bold">Opening "catalog"</emphasis> +Content " + " +<emphasis role="bold">Opening "item"</emphasis> +Content "Swinging headset" +Closing "item" +Content " ...</programlisting> + + <para>If an element like e.g. <tag + class="starttag">item</tag> appears multiple times it will + also be written to standard output multiple times.</para> + + <para>We are now interested to get the list of all elements + names being present in an arbitrary XML document. Consider + the following example:</para> + + <programlisting language="none"><memo> + <from> + <name>Martin</name> + <surname>Goik</surname> + </from> + <to> + <name>Adam</name> + <surname>Hacker</surname> + </to> + <to> + <name>Eve</name> + <surname>Intruder</surname> + </to> + <date year="2005" month="1" day="6"/> + <subject>Firewall problems</subject> + <content> + <para>Thanks for your excellent work.</para> + <para>Our firewall is definitely broken!</para> + </content> +</memo></programlisting> + + <para>The elements <tag class="starttag">to</tag> , <tag + class="starttag">name</tag>, <tag + class="starttag">surname</tag> and <tag + class="starttag">para</tag> both appear multiple times. + Write a SAX application which processes arbitrary XML + documents and creates an alphabetically sorted list of + elements being contained <emphasis role="bold">excluding + duplicates</emphasis>. The intended output for the above + example is:</para> + + <programlisting language="none">List of elements: {content date from memo name para subject surname to }</programlisting> + + <para>The corresponding handler should be implemented in a + re-usable way. Thus if different XML documents are being + handled in succession the list of elements should be erased + prior to processing the current document. Hints:</para> + + <itemizedlist> + <listitem> + <para>Use a <classname>java.util.SortedSet</classname> + instance to collect element names thereby excluding + duplicates.</para> + </listitem> + + <listitem> + <para>The method + <methodname>sax.count.ListTagNamesHandler.startDocument()</methodname> + may be used to initialize your handler.</para> + </listitem> + </itemizedlist> + </question> + + <answer> + <para>A suitable handler reads:</para> + + <programlisting language="none">package sax.count; + +import java.util.SortedSet; +import java.util.TreeSet; + +import org.xml.sax.Attributes; +import org.xml.sax.SAXException; +import org.xml.sax.helpers.DefaultHandler; + +/** Reading attributes from element events */ +public class ListTagNamesHandler extends DefaultHandler { + + // A SortedSet by definition does not contain any duplicates. + private SortedSet<String> elementNames = new TreeSet<>(); + + @Override + public void startDocument() throws SAXException { + elementNames.clear(); // May contain elements from a previous run. + } + + public void startElement(String namespaceUri, String localName, + String rawName, Attributes attrs) { + // In case the current element name has already been inserted + // this method call will be silently ignored. + elementNames.add(rawName); + } + + /** + * @return A sorted list of element names of he currently processed XML + * document without duplicates. + */ + public String[] getTagNames() { + return elementNames.toArray(new String[0]); + } +}</programlisting> + + <para>A complete application requires a driver:</para> + + <programlisting language="none">package sax.count; + +import javax.xml.parsers.SAXParser; +import javax.xml.parsers.SAXParserFactory; + +import org.xml.sax.XMLReader; + +import sax.stat.v2.MyErrorHandler; + +public class Driver { + + public static void main(String argv[]) throws Exception { + + final SAXParserFactory saxPf = SAXParserFactory.newInstance(); + final SAXParser saxParser = saxPf.newSAXParser(); + final XMLReader xmlReader = saxParser.getXMLReader(); + final ListTagNamesHandler handler = new ListTagNamesHandler(); + xmlReader.setContentHandler(handler); + xmlReader.setErrorHandler(new MyErrorHandler()); + xmlReader.parse("Input/Xml/Memo/message.xml"); + + System.out.print("List of elements: {"); + for (String elementName : handler.getTagNames()) { + System.out.print(elementName + " "); + } + System.out.println("}"); + } +}</programlisting> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="sda1SaxView"> + <title>A limited view on a given XML document instance</title> + + <qandaset defaultlabel="qanda" xml:id="sda1QandamemoView"> + <title>A specific view on memo documents</title> + + <qandadiv> + <qandaentry> + <question> + <para>We reconsider the following memo instance:</para> + + <programlisting language="none"><memo> + <from> + <name>Martin</name> + <surname>Goik</surname> + </from> + <to> + <name>Adam</name> + <surname>Hacker</surname> + </to> + <to> + <name>Eve</name> + <surname>Intruder</surname> + </to> + <date year="2005" month="1" day="6"/> + <subject>Firewall problems</subject> + <content> + <para>Thanks for your excellent work.</para> + <para>Our firewall is definitely broken!</para> + </content> +</memo></programlisting> + + <para>Every memo instance does have exactly one sender and + one subject. Write a SAX application to achieve the + following output:</para> + + <programlisting language="none">Sender: Martin Goik +Subject: Firewall problems</programlisting> + + <para>Hint: The callback implementation of + <methodname>org.xml.sax.helpers.DefaultHandler.characters(char[],int,int)</methodname> + may be used to filter the desired output. You have to limit + its output to <tag class="starttag">from</tag> and <tag + class="starttag">subject</tag> descendant content. Taking + the <tag class="starttag">subject</tag>Firewall problems<tag + class="endtag">subject</tag> element as an example the + corresponding event sequence reads:</para> + + <informaltable border="1"> + <tr> + <th>Event</th> + + <th>Corresponding callback</th> + </tr> + + <tr> + <td>...</td> + + <td>...</td> + </tr> + + <tr> + <td>Opening <tag class="starttag">subject</tag> + element</td> + + <td>startElement(...)</td> + </tr> + + <tr> + <td>Firewall problems</td> + + <td>characters(...)</td> + </tr> + + <tr> + <td>Closing <tag class="endtag">subject</tag> + element</td> + + <td>endElement(...)</td> + </tr> + + <tr> + <td>...</td> + + <td>...</td> + </tr> + </informaltable> + + <para>Limiting output of our + <methodname>org.xml.sax.helpers.DefaultHandler.characters(char[],int,int)</methodname> + callback method can be achieved by introducing instance + scope boolean variables being set to true or false inside + your + <methodname>org.xml.sax.helpers.DefaultHandler.startElement(String + uri,String localName,String qName,org.xml.sax.Attributes + attributes)</methodname> and + <methodname>org.xml.sax.helpers.DefaultHandler.endElement(String + uri, String localName, String qName)</methodname> + implementations accordingly to keep track of the current + event state.</para> + </question> + + <answer> + <programlisting language="none">package sax.view; +... +/** A view on memo documents restricting to sender name an subject. */ +public class MemoViewHandler extends DefaultHandler { + + // These variables help us to keep track of the current event state spanning + // each startElement(...) -- character(...) -- endElement(...) event sequence + boolean inFromContext = false, + inSubjectContext = false; + + public void startElement(String namespaceUri, String localName, + String rawName, Attributes attrs) { + switch(rawName) { + case "from": + inFromContext = true; + System.out.print("Sender: "); + break; + case "subject": + inSubjectContext = true; + System.out.print("Subject: "); + break; + case "surname": + if (inFromContext) { + System.out.print(" "); // Adding additional space between <name> and <surname> content. + } + break; + } + } + + @Override + public void endElement(String uri, String localName, String rawName) + throws SAXException { + switch(rawName) { + case "from": + inFromContext = false; + System.out.println(); + break; + case "subject": + inSubjectContext = false; + System.out.println(); + break; + } + } + + @Override + public void characters(char[] ch, int start, int length) throws SAXException { + if (inFromContext || inSubjectContext) { + System.out.print(new String(ch, start, length)); + } + } +}</programlisting> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + </section> + + <section xml:id="saxValidate"> + <title><acronym xlink:href="">SAX</acronym> + validation</title> + + <para>So far we only parsed well formed document instances. Our + current parser may operate on valid XML instances:</para> + + <figure xml:id="saxNotValid"> + <title>An invalid XML document.</title> + + <programlisting language="none"><xs:element name="catalog"> + <xs:complexType> + <xs:sequence> + <xs:element ref="item"/> + </xs:sequence> + </xs:complexType> +</xs:element> + +<xs:element name="item"> + <xs:complexType mixed="true"> + <xs:attribute name="orderNo" type="xs:int" use="required"/> + </xs:complexType> +</xs:element></programlisting> + + <programlisting language="none"><catalog> + <item orderNo="3218">Swinging headset</item> + <item orderNo="9921">200W Stereo Amplifier</item> <emphasis + role="bold"><!-- second entry forbidden by schema --></emphasis> +</catalog></programlisting> + + <caption> + <para>In contrast to <xref linkend="saxMissItem"/> this document + is well formed. But it is not <emphasis + role="bold">valid</emphasis> with respect to the schema since more + than one <tag class="starttag">item</tag> elements are + present.</para> + </caption> + </figure> + + <para>This document instance is well-formed but not valid: Only one + element <tag class="starttag">item</tag> is allowed due to an + ill-defined schema. The parser will not report any error or warning. + In order to enable validation we need to configure our parser:</para> + + <programlisting language="none">xmlReader.setFeature("", true);</programlisting> + + <para>The string <code></code> + serves as a key. Since this is an ordinary string value a parser may + or may not implement it. The <acronym + xlink:href="">SAX</acronym> standard defines + two exception classes for dealing with feature related errors:</para> + + <variablelist> + <varlistentry> + <term><link + xlink:href="">SAXNotRecognizedException</link></term> + + <listitem> + <para>The feature is not known to the parser.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term><link + xlink:href="">SAXNotSupportedException</link></term> + + <listitem> + <para>The feature is known to the parser but the parser does not + support it or it does not support a specific value being set as + a value.</para> + </listitem> + </varlistentry> + </variablelist> + + <para>The <productname + xlink:href="">xml-commons + resolver project </productname>offers an implementation being able to + process various catalog file formats. Maven based project allow the + corresponding library import by adding the following + dependency:</para> + + <programlisting language="none"><dependency> + <groupId>xml-resolver</groupId> + <artifactId>xml-resolver</artifactId> + <version>1.2</version> +</dependency></programlisting> + + <para>We need a properties file <link + xlink:href=""></link> + defining XML catalogs to be used and additional parameters:</para> + + <programlisting language="none"># Catalogs are relative to this properties file +relative-catalogs=false +# Catalog list + +catalogs=\ +/usr/share/eclipse/dropins/oxygenxml.oxygen_14.2/plugins/com.oxygenxml.editor_14.2.0.v2013021115/frameworks/xhtml/dtd/xhtmlcatalog.xml;\ +/usr/share/eclipse/dropins/oxygenxml.oxygen_14.2/plugins/com.oxygenxml.editor_14.2.0.v2013021115/frameworks/xhtml11/dtd/xhtmlcatalog.xml +# PUBLIC in favour of SYSTEM +prefer=public</programlisting> + + <para>This configuration uses some catalogs from the + <trademark>Oxygen</trademark> <trademark>Eclipse</trademark> plugin. + We may now add a resolver to our SAX application by referencing the + above configuration file <coref linkend="resolverPropertyFile"/> and + registering the resolver to our SAX parser instance <coref + linkend="resolverRegister"/>:</para> + + <programlisting language="none">xmlReader = saxParser.getXMLReader(); + + // Set up resolving PUBLIC identifier + final CatalogManager cm = new CatalogManager("<emphasis role="bold"></emphasis>" <co + xml:id="resolverPropertyFile"/> ); + final CatalogResolver resolver = new CatalogResolver(cm); + xmlReader.setEntityResolver(resolver) <co xml:id="resolverRegister"/>;</programlisting> + </section> + + <section xml:id="saxNamespace"> + <title>Namespaces</title> + + <para>In order to make a <acronym + xlink:href="">SAX</acronym> parser + application namespace aware we have to activate two <acronym + xlink:href="">SAX</acronym> parsing + features:</para> + + <programlisting language="none">xmlReader = saxParser.getXMLReader(); +xmlReader.setFeature("", true); +xmlReader.setFeature("", true);</programlisting> + + <para>This instructs the parser to pass the namespace's name for each + element. Namespace prefixes like <code>xsl</code> in <tag + class="starttag">xsl:for-each</tag> are also passed and may be used by + an application:</para> + + <programlisting language="none">package sax; +... +public class NamespaceEventHandler extends DefaultHandler { +... + public void startElement(String <emphasis role="bold">namespaceUri</emphasis>, String localName, + String rawName, Attributes attrs) { + System.out.println("Opening Element rawName='" + rawName + "'\n" + + "namespaceUri='" + <emphasis role="bold">namespaceUri</emphasis> + "'\n" + + "localName='" + localName + + "'\n--------------------------------------------"); +}</programlisting> + + <para>As an example we take a XSLT script:</para> + + <programlisting language="none"><?xml version="1.0" encoding="utf-8"?> +<xsl:stylesheet version="1.0" + xmlns:xsl='' + xmlns:fo=''> + + <xsl:template match="/"> + <fo:block>A block</fo:block> + <HTML/> + </xsl:template> + +</xsl:stylesheet></programlisting> + + <para>This XSLT script being conceived as a XML document instance + contains elements belonging to two different namespaces namely + <code></code> and + <code></code>. The script also + contains a <quote>raw</quote> <tag audience="" + class="emptytag">HTML</tag> element being introduced only for + demonstration purposes belonging to the default namespace. The result + reads:</para> + + <programlisting language="none">Opening Element rawName='xsl:stylesheet' +namespaceUri='' +localName='stylesheet' +-------------------------------------------- +Opening Element rawName='xsl:template' +namespaceUri='' +localName='template' +-------------------------------------------- +Opening Element rawName='fo:block' +namespaceUri='' +localName='block' +-------------------------------------------- +Opening Element rawName='HTML' +namespaceUri='' +localName='HTML'</programlisting> + + <para>Now the parser tells us to which namespace a given element node + belongs to. A XSLT engine for example uses this information to build + two classes of elements:</para> + + <itemizedlist> + <listitem> + <para>Elements belonging to the namespace + <code></code> like <tag + class="emptytag">xsl:value-of select="..."</tag> have to be + interpreted as instructions by the processor.</para> + </listitem> + + <listitem> + <para>Elements <emphasis role="bold">not</emphasis> belonging to + the namespace <code></code> + like <tag class="emptytag">html</tag> or <tag + class="starttag">fo:block</tag> are copied <quote>as is</quote> to + the output.</para> + </listitem> + </itemizedlist> + + <qandaset defaultlabel="qanda" xml:id="quandaentry_SqlFromXml"> + <title>Generating SQL INSERT statements from XML data</title> + + <qandadiv> + <qandaentry> + <question> + <para>Consider the following schema and document instance + example:</para> + + <figure xml:id="catalogProductDescriptionsExample"> + <title>A sample catalog containing products and + corresponding descriptions.</title> + + <programlisting language="none"><xs:element name="catalog"> + <xs:complexType> + <xs:sequence> + <xs:element ref="product" minOccurs="0" maxOccurs="unbounded"/> + </xs:sequence> + </xs:complexType> +</xs:element> + +<xs:element name="product"> + <xs:complexType> + <xs:sequence> + <xs:element name="name" type="xs:string"/> + <xs:element name="description" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> + <xs:element name="age" type="xs:int" minOccurs="0" maxOccurs="1"/> + </xs:sequence> + <xs:attribute name="id" type="xs:ID" use="required"/> + </xs:complexType> +</xs:element></programlisting> + + <programlisting language="none"><catalog ... xsi:noNamespaceSchemaLocation="catalog.xsd"> + <product id="mpt"> + <name>Monkey Picked Tea</name> + <description>Rare wild Chinese tea</description> + <description>Picked only by specially trained monkeys</description> + </product> + <product id="instantTent"> + <name>4-Person Instant Tent</name> + <description>4-person, 1-room tent</description> + <description>Pre-attached tent poles</description> + <description>Exclusive WeatherTec system.</description> + <age>15</age> + </product> +</catalog></programlisting> + </figure> + + <para>Data being contained in catalog instances shall be + transferred to a relational database system. Implement and + test a <xref linkend="glo_SAX"/> + application by following the subsequently described + steps:</para> + + <glosslist> + <glossentry> + <glossterm>Database schema</glossterm> + + <glossdef> + <para>Create a database schema matching a product of + your choice (<productname>Mysql</productname>, + <productname>Oracle</productname>, ...). Your schema + should map type and integrity constraints of the given + DTD. In particular:</para> + + <itemizedlist> + <listitem> + <para>The element <tag class="starttag">age</tag> is + optional.</para> + </listitem> + + <listitem> + <para><tag class="starttag">description</tag> + elements are children of <product> elements + and should thus be modeled by a 1:n relation.</para> + </listitem> + + <listitem> + <para>In a catalog the order of descriptions of a + given product matters. Thus your schema should allow + for descriptions being ordered.</para> + </listitem> + </itemizedlist> + </glossdef> + </glossentry> + + <glossentry> + <glossterm>SAX Application</glossterm> + + <glossdef> + <para>The order of appearance of the XML elements <tag + class="starttag">product</tag>, <tag + class="starttag">name</tag> and <tag + class="starttag">age</tag> does not permit a linear + generation of suitable SQL <code>INSERT</code> + statements by a <xref linkend="glo_SAX"/> content + handler. Instead you will have to keep copies of local + element values when implementing + <methodname>org.xml.sax.ContentHandler.startElement(String,String,String,org.xml.sax.Attributes)</methodname> + and related callback methods. The following sequence of + insert statements corresponds to the XML data being + contained in <xref + linkend="catalogProductDescriptionsExample"/>. You may + use these statements as a blueprint to be generated by + your <xref linkend="glo_SAX"/> + application:</para> + + <programlisting language="none"><emphasis role="bold">INSERT INTO Product VALUES ('mpt', 'Monkey picked tea', NULL);</emphasis> +INSERT INTO Description VALUES('mpt', 0, 'Picked only by specially trained monkeys'); +INSERT INTO Description VALUES('mpt', 1, 'Rare wild Chinese tea'); + +<emphasis role="bold">INSERT INTO Product VALUES ('instantTent', '4-person instant tent', 15);</emphasis> +INSERT INTO Description VALUES('instantTent', 0, 'Exclusive WeatherTec system.'); +INSERT INTO Description VALUES('instantTent', 1, '4-person, 1-room tent'); +INSERT INTO Description VALUES('instantTent', 2, 'Pre-attached tent poles');</programlisting> + + <para>Provide a suitable <xref linkend="glo_Junit"/> + test.</para> + </glossdef> + </glossentry> + </glosslist> + </question> + + <answer> + <annotation role="make"> + <para role="eclipse">P/catalog2sql</para> + </annotation> + + <para>Running this project and executing tests requires the + following Maven project dependency to be installed (e.g. + locally via <command>mvn</command> <option>install</option>) + to satisfy a dependency:</para> + + <annotation role="make"> + <para role="eclipse">P/saxerrorhandler</para> + </annotation> + + <para>Some remarks are in order here:</para> + + <orderedlist> + <listitem> + <para>The <xref linkend="glo_SQL"/> database schema might + read:</para> + + <programlisting language="sql">CREATE TABLE Product ( + id CHAR(20) NOT NULL PRIMARY KEY <co linkends="catalog2sqlSchema-1" + xml:id="catalog2sqlSchema-1-co"/> + ,name VARCHAR(255) NOT NULL + ,age SMALLINT <co linkends="catalog2sqlSchema-2" + xml:id="catalog2sqlSchema-2-co"/> +); + +CREATE TABLE Description ( + product CHAR(20) NOT NULL REFERENCES Product <co + linkends="catalog2sqlSchema-3" + xml:id="catalog2sqlSchema-3-co"/> + ,orderIndex int NOT NULL <co linkends="catalog2sqlSchema-4" + xml:id="catalog2sqlSchema-4-co"/> -- preserving the order of descriptions belonging to a given product + ,text VARCHAR(255) NOT NULL + ,UNIQUE(product, orderIndex) <co linkends="catalog2sqlSchema-5" + xml:id="catalog2sqlSchema-5-co"/> +);</programlisting> + + <calloutlist> + <callout arearefs="catalog2sqlSchema-1-co" + xml:id="catalog2sqlSchema-1"> + <para>The primary key constraint implements the + uniqueness of <tag class="starttag">product + id='xyz'</tag> values</para> + </callout> + + <callout arearefs="catalog2sqlSchema-2-co" + xml:id="catalog2sqlSchema-2"> + <para>Nullability of <code>age</code> implements <tag + class="starttag">age</tag> elements being + optional.</para> + </callout> + + <callout arearefs="catalog2sqlSchema-3-co" + xml:id="catalog2sqlSchema-3"> + <para><tag class="starttag">description</tag> elements + being children of <tag class="starttag">product</tag> + are being implemented by a foreign key to its + identifying owner thus forming weak entities.</para> + </callout> + + <callout arearefs="catalog2sqlSchema-4-co" + xml:id="catalog2sqlSchema-4"> + <para>The attribute <code>orderIndex</code> allows + descriptions to be sorted thus maintaining the + original order of appearance of <tag + class="starttag">description</tag> elements.</para> + </callout> + + <callout arearefs="catalog2sqlSchema-5-co" + xml:id="catalog2sqlSchema-5"> + <para>The <code>orderIndex</code> attribute is unique + within the set of descriptions belonging to the same + product.</para> + </callout> + </calloutlist> + </listitem> + + <listitem> + <para>The result of the given input XML sample file should + be similar to the content of the supplied reference file + <filename>products.reference.xml</filename>:</para> + + <programlisting language="sql">INSERT INTO Product (id, name) VALUES ('mpt', 'Monkey Picked Tea'); +INSERT INTO Description VALUES('mpt', 0, 'Rare wild Chinese tea'); +INSERT INTO Description VALUES('mpt', 1, 'Picked only by specially trained monkeys'); +-- end of current product entry -- + +INSERT INTO Product VALUES ('instantTent', '4-Person Instant Tent', 15); +INSERT INTO Description VALUES('instantTent', 0, '4-person, 1-room tent'); +INSERT INTO Description VALUES('instantTent', 1, 'Pre-attached tent poles'); +INSERT INTO Description VALUES('instantTent', 2, 'Exclusive WeatherTec system.'); +-- end of current product entry --</programlisting> + + <para>So a <xref linkend="glo_Junit"/> test may just + execute the XML to SQL converter and then compare the + effective output to the above reference file.</para> + </listitem> + </orderedlist> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + + <qandaset defaultlabel="qanda" xml:id="quandaentry_NumElemByNs"> + <title>Counting element names grouped by namespaces</title> + + <qandadiv> + <qandaentry> + <question> + <para>We want to extend the SAX examples counting <link + linkend="saxElementCount">elements</link> and <link + linkend="exercise_saxAttrib">attributes</link> of arbitrary + document instances. Consider the following XSL sample document + containing <xref linkend="glo_XHTML"/>:</para> + + <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> +<xsl:stylesheet xmlns:xsl="" + xmlns:xs="" <co + xml:id="xhtmlCombinedNs_Svg"/> + xmlns:h="" <co xml:id="xhtmlCombinedNs_Xhtml"/> + exclude-result-prefixes="xs" version="2.0"> + + <xsl:template match="/"> + <h:html> + <h:head> + <h:title></h:title> + </h:head> + <h:body> + <h:h1>A heading</h:h1> + <h:p>A paragraph</h:p> + <h:h1>Yet another heading</h:h1> + <xsl:apply-templates/> + </h:body> + </h:html> + </xsl:template> + + <xsl:template match="*"> + <xsl:message> + <xsl:text>No template defined for element '</xsl:text> + <xsl:value-of select="name(.)"/> + <xsl:text>'</xsl:text> + </xsl:message> + </xsl:template> + +</xsl:stylesheet></programlisting> + + <para>This XSL stylesheet defines two different namespaces + <coref linkend="xhtmlCombinedNs_Svg"/> and <coref + linkend="xhtmlCombinedNs_Xhtml"/>.</para> + + <para>Implement a <xref linkend="glo_SAX"/> + application being able to group elements from arbitrary XML + documents by namespaces along with their corresponding + frequencies of occurrence. The intended output for the + previous <xref linkend="glo_XSL"/> example shall look + like:</para> + + <programlisting language="none">Namespace '<emphasis + role="bold"></emphasis>' contains: +<head> (1 occurrence) +<p> (1 occurrence) +<h1> (2 occurrences) +<html> (1 occurrence) +<title> (1 occurrence) +<body> (1 occurrence) + +Namespace '<emphasis role="bold"></emphasis>' contains: +<stylesheet> (1 occurrence) +<template> (2 occurrences) +<value-of> (1 occurrence) +<apply-templates> (1 occurrence) +<text> (2 occurrences) +<message> (1 occurrence)</programlisting> + + <para>Hint: Counting frequencies and grouping by namespaces + may be achieved by using standard Java container + implementations of <classname>java.util.Map</classname>. You + may for example define sets of related XML elements and group + them by their corresponding namespaces. Thus nested maps are + being required.</para> + </question> + + <answer> + <annotation role="make"> + <para role="eclipse">P/xmlstatistics</para> + </annotation> + + <para>Running this project and executing tests requires the + following Maven project dependency to be installed (e.g. + locally via <command>mvn</command> <option>install</option>) + to satisfy the following dependency:</para> + + <annotation role="make"> + <para role="eclipse">P/saxerrorhandler</para> + </annotation> + + <para>The above solution contains both a running application + and a (incomplete) <xref linkend="glo_Junit"/> test.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + </chapter> + diff --git a/Sda1/sda1.xml b/Sda1/sda1.xml deleted file mode 100644 index f8dc3c847..000000000 --- a/Sda1/sda1.xml +++ /dev/null @@ -1,14354 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<part version="5.0" xml:id="sda1" xmlns="" - xmlns:xlink="" - xmlns:xi="" - xmlns:svg="" - xmlns:m="" - xmlns:html="" - xmlns:db=""> - <info> - <title>Structured Data and Applications 1</title> - - <author> - <personname><firstname>Martin</firstname> - <surname>Goik</surname></personname> - - <affiliation> - <orgname></orgname> - </affiliation> - </author> - - <legalnotice> - <para>Source code available at <uri - xlink:href=""></uri></para> - </legalnotice> - </info> - - <chapter xml:id="prerequisites"> - <title>Prerequisites</title> - - <section xml:id="resources"> - <title>Lecture resources</title> - - <glosslist> - <glossentry> - <glossterm>Recommended books</glossterm> - - <glossdef> - <itemizedlist> - <listitem> - <para><xref linkend="bib_fawcett2012"/></para> - </listitem> - - <listitem> - <para><xref linkend="bib_Walmsley02"/></para> - </listitem> - </itemizedlist> - </glossdef> - </glossentry> - - <glossentry> - <glossterm>Lecture notes as PDF</glossterm> - - <glossdef> - <para><uri - xlink:href=""></uri></para> - - <caution> - <para>Some figures and videos are left blank.</para> - </caution> - </glossdef> - </glossentry> - - <glossentry> - <glossterm>Live lecture additions</glossterm> - - <glossdef> - <para><link - xlink:href=""></link></para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm>List of exercises</glossterm> - - <glossdef> - <para>The lecture notes contain exercises to be solved by you! A - complete list is available at <uri - xlink:href=""></uri>.</para> - - <para>You may also want/This solut to use the corresponding PDF - version of the above table within <filename - xlink:href="">printversion.pdf</filename> - to keep track of your personal advances by filling in your - completion status on individual exercises.</para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm><link - linkend="gloss_Javadoc"><trademark>Javadoc</trademark></link> - references and source code</glossterm> - - <glossdef> - <para>The lecture notes contain a lot of <link - linkend="gloss_Javadoc"><trademark>Javadoc</trademark></link> - references. Most classes appearing within these lecture notes have - <link - linkend="gloss_Javadoc"><trademark>Javadoc</trademark></link> - generated links to the source code as well. For example when - clicking on the class name in - <classname>sda.jdbc.intro.v1.SimpleInsert</classname> you will see - the complete implementation.</para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm>Links to animated figures</glossterm> - - <glossdef> - <para>The lecture notes' online version contains links to <uri - xlink:href="">PDF - images</uri>. Clicking on <quote>Animated PDF Version</quote> - takes you to a referenced PDF which in full screen mode of Acrobat - Reader or <trademark>google-chrome</trademark> provides a slide - like animation.</para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm><trademark>Virtualbox</trademark> image</glossterm> - - <glossdef> - <para>A <productname - xlink:href="">Virtualbox</productname> - image is available at <uri - xlink:href=""></uri> - <link - xlink:href=""></link>.</para> - - <caution> - <para>Access from networks being external to - <uri></uri> requires <acronym>VPN</acronym> - access.</para> - </caution> - - <para>It contains (hopefully) all related tools from the <link - xlink:href="">CSM</link> - department's lecture room Linux installation:</para> - - <itemizedlist> - <listitem> - <para>Eclipse J2EE version with <productname - xlink:href="">Database - developer tools</productname>, <productname - xlink:href="">git</productname>, <trademark - xlink:href="">Oxygenxml</trademark>, - <productname - xlink:href="">TestNG</productname> - and <productname - xlink:href="">svn</productname> - plugins installed.</para> - </listitem> - - <listitem> - <para>A running <productname - xlink:href="">Mysql</productname> server - preconfigured with user <quote><code>hdmuser</code></quote>, - password <quote><code>XYZ</code></quote> (<emphasis - role="bold">capital letters!</emphasis>) and database - <quote><code>hdm</code></quote>.</para> - </listitem> - - <listitem> - <para><productname - xlink:href="">Xmlmind XML - editor</productname> for visually editing technical documents - based on <productname - xlink:href="">docbook</productname> - or <productname - xlink:href="">DITA</productname>.</para> - </listitem> - </itemizedlist> - - <caution> - <para>This VM is only accessible from within the <orgname - xlink:href="">HdM</orgname> network. - External downloads require <productname - xlink:href="">OpenVPN</productname>.</para> - </caution> - - <para>The virtual machine is based on the <productname - xlink:href="">Lubuntu</productname> fork of the - <productname - xlink:href="">Ubuntu</productname> Linux - distribution for resource saving reasons.</para> - </glossdef> - </glossentry> - - <glossentry xml:id="oxygenLicenseKey"> - <glossterm><uri>Oxygen Xml Editor</uri> license key</glossterm> - - <glossdef> - <para>This is the only software component in this lecture - requiring a license. Your <orgname>HdM</orgname> affiliation - entitles you to use the <productname - xlink:href="">Oxygenxml</productname> - software for educational (non-commercial) purposes. The - corresponding key is available at <uri - xlink:href=""></uri>.</para> - - <para>This license key is compatible both with the standalone and - the eclipse plugin version of the product.</para> - - <caution> - <para>The license key's <abbrev - xlink:href="">ftp</abbrev> - URL is only accessible from within the <orgname - xlink:href="">HdM</orgname> network. - External access requires <link - xlink:href="">Vpn - activation</link>.</para> - </caution> - </glossdef> - </glossentry> - - <glossentry> - <glossterm>Source code of lecture resources</glossterm> - - <glossdef> - <para>The complete lecture sources are available from <link - xlink:href=""></link>.</para> - - <para>You may simply execute <quote><command - xlink:href="">git</command> - <option>clone</option> - <option></option> - <option>.</option></quote> to check out the master tree.</para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm>Source code of exercises and examples</glossterm> - - <glossdef> - <para>These sources contain a subdirectory - <filename>ws/eclipse/Jdbc</filename> which can be imported as an - eclipse project. This allows for browsing solutions to the - exercises and executing sample applications. Import into eclipse - works the following way:</para> - - <itemizedlist> - <listitem> - <para>When starting eclipse choose - <filename>.../ws/eclipse</filename> as workspace</para> - </listitem> - - <listitem> - <para>In eclipse click <quote>File --> Import --> - General --> Existing Projects into Workspace</quote>. After - re-selecting the current workspace - <filename>.../ws/eclipse</filename> the folder - <filename>Jdbc</filename> should be on the list of importable - projects.</para> - - <para>Depending on your eclipse installation you may have to - adjust the <link - linkend="gloss_Java"><trademark>Java</trademark></link> system - libraries. Right click on your project root in the package - explorer and choose <quote>Build Path --> Configure - Buildpath</quote>. The <quote>JRE System Library</quote> entry - in the <quote>Libraries</quote> tab may have to be changed to - suit your eclipse's installation needs. You may want to create - a dummy <link - linkend="gloss_Java"><trademark>Java</trademark></link> - project to find the correct setting.</para> - </listitem> - </itemizedlist> - </glossdef> - </glossentry> - </glosslist> - </section> - - <section xml:id="tools"> - <title>Tools</title> - - <para>The subsequent sections describe tools being helpful to - successfully carry out the exercises. These descriptions are suitable - for current Linux/Ubuntu systems. However these tool are available for - <trademark>Windows</trademark> or <trademark>Apple</trademark> systems - as well. For the latter some command line hints may have to be replaced - by using GUI based tools.</para> - - <para>You may want to use the <link - xlink:href="">corresponding</link> - <link xlink:href="">Virtualbox image</link> - containing a complete system avoiding installation hassles. This should - work well one reasonable current hardware systems.</para> - - <section xml:id="eclipse"> - <title><productname - xlink:href="">JDK</productname> - and Eclipse</title> - - <para>So you like to take the hard way rather than using <link - xlink:href="">the - virtualbox image</link>? Good! Real programmers tend to complicate - things!</para> - - <para>The Eclipse IDE will be used as the primary coding tool - especially for <link - linkend="gloss_Java"><trademark>Java</trademark></link> and XML. Users - may use different tools like e.g. <productname - xlink:href="">Netbeans</productname> or - <productname - xlink:href="">XML-Spy</productname>. - There are however some caveats:</para> - - <itemizedlist> - <listitem> - <para>Certain functionalities may not be provided</para> - </listitem> - - <listitem> - <para><orgname>HdM</orgname> staff support in case of troubles - will be limited to coding excluding tool support. In other words: - You are on your own!</para> - </listitem> - </itemizedlist> - - <para>Installation of eclipse requires a suitable <link - linkend="gloss_Java"><trademark>Java</trademark></link> Development - Kit.</para> - - <caution> - <para>Your<productname - xlink:href="">JDK</productname> - selection may be affected by your system's hardware. On a 64 bit - system you may install either a 32 bit or a 64 bit <productname - xlink:href="">JDK</productname>. - If you subsequently install eclipse you must select the appropriate - 32 or 64 Bit version matching your <productname - xlink:href="">JDK</productname> - choice.</para> - </caution> - - <para>Due to Oracle's (end-user unfriendly) licensing policy you may - have to install this component manually. For <productname - xlink:href="">Ubuntu</productname> and - <productname xlink:href="">Debian</productname> - systems a standard (package manager compatible) procedure is being - described at <uri - xlink:href=""></uri>. - This boils down to (being executed as user root or preceded by - <command>sudo</command> <option>...</option>):</para> - - <programlisting language="none">add-apt-repository ppa:webupd8team/java -apt-get update -apt-get install oracle-jdk7-installer</programlisting> - - <para>During the installation process you will have to accept Oracle's - license terms. If you do so this information will be cached and not be - asked again for when updating via <command>aptitude - </command><option>update</option>;<command>aptitude</command> - <option>safe-upgrade</option>. After successful installation when - executing <command - xlink:href="">java</command> - <option>-version</option> in a shell you should see something similar - to:</para> - - <programlisting language="none">goik@goiki:~$ <emphasis role="bold">java -version</emphasis> -java version "1.7.0_07" -Java(TM) SE Runtime Environment (build 1.7.0_07-b10) -Java HotSpot(TM) Server VM (build 23.3-b01, mixed mode)</programlisting> - - <para>The Eclipse IDE comes <link - xlink:href="">with various - flavours</link> depending on which plugins are already being shipped. - For our purposes the <quote><productname>Eclipse - Classic</productname></quote> <link - linkend="gloss_Java"><trademark>Java</trademark></link> edition is - sufficient. You may however want to install other flavours like - <quote><productname>Eclipse IDE for Java EE - Developers</productname></quote> if you require features beyond this - course's needs. Remember to download the correct 32 or 64 bit version - corresponding to your<productname - xlink:href="">JDK</productname>.</para> - - <para>Follow <uri - xlink:href=""></uri> - to install eclipse on your system.</para> - </section> - - <section xml:id="oxygenxmlInstall"> - <title><productname - xlink:href="">Oxygenxml</productname> - plugin</title> - - <para>Go to <uri - xlink:href=""></uri>. - You may choose between the <quote>Plugin Update site</quote> and - <quote>Plugin zip distribution</quote> installation method. The latter - allows for better long term eclipse plugin management and is being - described at</para> - - <para>There are two different ways to install Eclipse plugins:</para> - - <itemizedlist> - <listitem> - <para>Use Eclipse's built in Update manager by <link - xlink:href="">defining - a corresponding update site</link>.</para> - </listitem> - - <listitem> - <para>Unzip <filename></filename> - in a subfolder of <filename>.../eclipse/dropins</filename> and - restart eclipse (as root).</para> - </listitem> - </itemizedlist> - - <para>See <xref linkend="oxygenLicenseKey"/> for obtaining a license - key. You may as well install the standalone version of the Oxygen XML - Editor.</para> - </section> - - <section xml:id="erMaster"> - <title>ERMaster</title> - - <para>Visual editing of physical entity relationship diagrams. See - <link xlink:href="">installation - instructions</link> on top of an existing eclipse installation.</para> - </section> - - <section xml:id="testngInstall"> - <title><foreignphrase>TestNG</foreignphrase> plugin</title> - - <para>Some exercises require the TestNG plugin to be installed in the - Eclipse IDE. You may proceed in a similar way as in <uri - linkend="oxygenxmlInstall">Oxygenxml</uri>. According to <uri - xlink:href=""></uri> - the Eclipse URL being needed is - <quote></quote>.</para> - </section> - - <section xml:id="mysql"> - <title><productname - xlink:href="">Mysql</productname> Database - components</title> - - <para>We start by installing the <productname - xlink:href="">Mysql</productname> server:</para> - - <programlisting language="none">root@goiki:~# aptitude install mysql-server -The following NEW packages will be installed: - libdbd-mysql-perl{a} libdbi-perl{a} libnet-daemon-perl{a} libplrpc-perl{a} - mysql-client-5.5{a} mysql-server-5.5 -0 packages upgraded, 6 newly installed, 0 to remove and 0 not upgraded. -Need to get 0 B/17.8 MB of archives. After unpacking 63.2 MB will be used. -Do you want to continue? [Y/n/?]</programlisting> - - <para>Hit <keycap>Y - return</keycap> to start. During the - installation you will be asked for the <productname - xlink:href="">Mysql</productname> servers - <quote>root</quote> (Administrator) password:</para> - - <programlisting language="none">Package configuration - - - ┌───────────────────────────┤ Configuring mysql-server-5.5 ├────────────────────────────┠- │ While not mandatory, it is highly recommended that you set a password for the MySQL │ - │ administrative "root" user. │ - │ │ - │ If this field is left blank, the password will not be changed. │ - │ │ - │ New password for the MySQL "root" user: │ - │ │ - │ ********_____________________________________________________________________________ │ - │ │ - │ <Ok> │ - │ │ - └───────────────────────────────────────────────────────────────────────────────────────┘ - - - </programlisting> - - <para>This has to be entered twice. Keep a <emphasis - role="bold">permanent</emphasis> record of this entry. Alternatively - set a bookmark to <uri - xlink:href=""></uri> - for later reference *** and don't blame me! ***.</para> - - <para>At this point we should be able to connect to our newly - installed Server. We create a database <quote>hdm</quote> to be used - for our exercises:</para> - - <programlisting language="none">goik@goiki:~$ mysql -u root -p -Enter password: -Welcome to the MySQL monitor. Commands end with ; or \g. -Your MySQL connection id is 42 -Server version: 5.5.24-0ubuntu0.12.04.1 (Ubuntu) - -Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved. - -Oracle is a registered trademark of Oracle Corporation and/or its -affiliates. Other names may be trademarks of their respective -owners. - -Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. - -mysql> <emphasis role="bold">create database hdm;</emphasis> -Query OK, 1 row affected (0.00 sec)</programlisting> - - <para>Following <uri - xlink:href=""></uri> - we add a new user and grant full access to the newly created - database:</para> - - <programlisting language="none">goik@goiki:~$ mysql -u root -p -Enter password: - ... -mysql> CREATE USER 'hdmuser'@'localhost' IDENTIFIED BY 'XYZ'; -mysql> use hdm; -mysql> GRANT ALL PRIVILEGES ON *.* TO 'hdmuser'@'localhost' WITH GRANT OPTION; -mysql> FLUSH PRIVILEGES;</programlisting> - - <para>The next step is optional. The <productname - xlink:href="">Ubuntu</productname> <productname - xlink:href="">Mysql</productname> server default - configuration allows connections only via <varname>loopback</varname> - interface i.e. <varname>localhost</varname>. If you want your - <productname xlink:href="">Mysql</productname> - server to listen to the external network interface comment out the - bind-address parameter in - <filename>/etc/mysql/my.cnf</filename>:</para> - - <programlisting language="none"># Instead of skip-networking the default is now to listen only on -# localhost which is more compatible and is not less secure. -# <emphasis role="bold">bind-address =</emphasis></programlisting> - - <para>Since we are dealing with <link - linkend="gloss_Java"><trademark>Java</trademark></link> a <trademark - xlink:href="">JDBC</trademark> - driver is needed to connect Applications to our <productname - xlink:href="">Mysql</productname> server:</para> - - <programlisting language="none">root@goiki:~# aptitude install libmysql-java</programlisting> - - <para>This provides the file - /usr/share/java/mysql-connector-java-5.1.16.jar and two symbolic - links:</para> - - <programlisting language="none">goik@goiki:~$ cd /usr/share/java -goik@goiki:/usr/share/java$ ls -al mysql* --rw-r--r-- 1 ... 2011 <emphasis role="bold">mysql-connector-java-5.1.16.jar</emphasis> -lrwxrwxrwx 1 ... 2011 <emphasis role="bold">mysql-connector-java.jar -> mysql-connector-java-5.1.16.jar</emphasis> -lrwxrwxrwx 1 ... 2011 <emphasis role="bold">mysql.jar -> mysql-connector-java.jar</emphasis></programlisting> - </section> - </section> - - <section xml:id="lectureNotes"> - <title>Lecture related resources</title> - - <para>The sources for lecture notes and exercises are available from the - <orgname xlink:href="">MIB</orgname> - <productname xlink:href="">git</productname> - repository:</para> - - <para><uri - xlink:href=""></uri></para> - - <para>Check-out is straightforward:</para> - - <programlisting language="none">goik@goiki:~$ mkdir StructuredData;cd StructuredData - -goik@goiki:~/StructuredData$ git clone . -Cloning into '.'... -remote: Counting objects: 694, done -... -Resolving deltas: 100% (296/296), done.</programlisting> - - <para>After checkout an eclipse workspace holding the complete example - source code becomes visible:</para> - - <programlisting language="none">goik@goiki:~/StructuredData$ cd ws/eclipse -goik@goiki:~/StructuredData/ws/eclipse$ ls -al -insgesamt 16 -drwxr-xr-x 3 goik fb1prof 4096 Nov 8 22:04 . -drwxr-xr-x 4 goik fb1prof 4096 Nov 8 22:04 .. --rw-r--r-- 1 goik fb1prof 11 Nov 8 22:04 .gitignore -<emphasis role="bold">drwxr-xr-x 6 goik fb1prof 4096 Nov 8 22:04 Jdbc</emphasis></programlisting> - - <para>The subdirectory <filename>Jdbc</filename> can be imported as an - eclipse project via File --> import --> General --> Existing - Projects into workspace. This should enable each participant to browse - and execute the examples being provided in the lecture notes. It also - contains the a <productname - xlink:href="">Mysql</productname> driver in - Jdbc/lib/mysql-connector-java-5.1.16.jar being required to set up a - <trademark - xlink:href="">JDBC</trademark> - connection.</para> - </section> - - <section xml:id="repeatRelational"> - <title>Some notes on relational databases</title> - - <qandaset defaultlabel="qanda" xml:id="airlineRelationalSchema"> - <title>Airlines, airports and flights</title> - - <qandadiv> - <qandaentry> - <question> - <para>Implement a relational schema describing airlines, - flights, airports and their respective relationships:</para> - - <itemizedlist> - <listitem> - <para>Airline:</para> - - <itemizedlist> - <listitem> - <para>An informal unique name like e.g. - <quote>Lufthansa</quote>.</para> - </listitem> - - <listitem> - <para>A unique <link - xlink:href="">ICAO - abbreviation</link>.</para> - </listitem> - </itemizedlist> - </listitem> - - <listitem> - <para>Destination</para> - - <itemizedlist> - <listitem> - <para>Full name like <quote>Frankfurt am Main - International</quote></para> - </listitem> - - <listitem> - <para>World airport code like <quote>FRA</quote>.</para> - </listitem> - </itemizedlist> - </listitem> - - <listitem> - <para>Flight</para> - - <itemizedlist> - <listitem> - <para>A unique flight number e.g. LH 4234</para> - </listitem> - - <listitem> - <para>The <quote>owning</quote> airline.</para> - </listitem> - - <listitem> - <para>originating airport</para> - </listitem> - - <listitem> - <para>destination airport</para> - </listitem> - - <listitem> - <para>Constraint: origin and destination must differ. - Hint: <productname>Mysql</productname> provides a - syntactical means to implement this constraint. It will - however not be enforced at runtime. Database vendors - like Oracle, IBM/DB2, <productname>Sybase</productname>, - <productname>Informix</productname> - <abbrev>etc.</abbrev> support this type of runtime - integrity constraint enforcement.</para> - </listitem> - </itemizedlist> - </listitem> - </itemizedlist> - - <para>Provide surrogate keys for all entities and provide names - for all constraints (<abbrev>e.g.</abbrev> defining - <code>CONSTRAINT _PK_XYZ PRIMARY KEY(...)</code> etc. ).</para> - </question> - - <answer> - <programlisting language="sql">CREATE Table Airline ( - id INT NOT NULL - ,name CHAR(20) NOT NULL - ,airlineCode CHAR(5) NOT NULL - - ,CONSTRAINT _PK_Airline_id PRIMARY KEY(id) - ,CONSTRAINT _UN_Airline_name UNIQUE(name) - ,CONSTRAINT _UN_Airline_airlineCode UNIQUE(airlineCode) -); - -CREATE TABLE Destination ( - id INT NOT NULL - ,fullName CHAR(20) NOT NULL - ,airportCode CHAR(5) - - ,CONSTRAINT _PK_Destination_id PRIMARY KEY(id) - ,CONSTRAINT _UN_Destination_airportCode UNIQUE(airportCode) -); - -CREATE TABLE Flight ( - id INT NOT NULL - ,flightNumber CHAR(10) NOT NULL - ,airline INT NOT NULL REFERENCES Airline - ,origin int NOT NULL REFERENCES Destination - ,destination int NOT NULL REFERENCES Destination - - -- For yet unknown reasons the following alternative MySQL 5.1 syntax compatible - -- statements fail with message 'Cannot add foreign key constraint": - -- ,CONSTRAINT _FK_Flight_airline FOREIGN KEY(airline) REFERENCES Airline - -- ,CONSTRAINT _FK_Flight_origin FOREIGN KEY(origin) REFERENCES Destination - -- ,CONSTRAINT _FK_Flight_destination FOREIGN KEY(destination) REFERENCES Destination - - ,CONSTRAINT _PK_Flight_id UNIQUE(id) - ,CONSTRAINT _UN_Flight_flightNumber UNIQUE(flightNumber) - ,CONSTRAINT _CK_Flight_origin_destination CHECK(NOT(origin = destination)) -);</programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="toolingConfigJdbc"> - <title>Tooling: Configuring and using the <link - xlink:href="">Eclipse database - development</link> plugin</title> - - <para>For some basic SQL communications the Eclipse environment offers a - standard plugin (Database development). Establishing connections to a - specific database server generally requires prior installation of a - <trademark - xlink:href="">JDBC</trademark> - driver on the client side as being shown in the following video:</para> - - <figure xml:id="figureConfigJdbcDriver"> - <title>Adding a <trademark - xlink:href="">JDBC</trademark> - Driver for <productname - xlink:href="">Mysql</productname> to the database - plugin.</title> - - <mediaobject> - <videoobject> - <videodata fileref="Ref/Video/jdbcDriverConfig.mp4"/> - </videoobject> - </mediaobject> - </figure> - - <para>During the exercises the eclipse database development perspective - may be used to browse and structure SQL tables and data. The following - video demonstrates the configuration of a <trademark - xlink:href="">JDBC</trademark> - connection to a local (<varname>localhost</varname> network interface) - database server. With respect to the introduction given in <xref - linkend="mysql"/> we assume the existence of a database <code>hdm</code> - and a corresponding account <quote>hdmuser</quote> and password - <quote><code>XYZ</code></quote> (<emphasis role="bold">capital - letters!</emphasis>) on our database server.</para> - - <figure xml:id="figureConfigJdbcConnection"> - <title>Configuring a <trademark - xlink:href="">JDBC</trademark> - connection to a (local) <productname - xlink:href="">Mysql</productname> database - server.</title> - - <mediaobject> - <videoobject> - <videodata fileref="Ref/Video/jdbcConnection.mp4"/> - </videoobject> - </mediaobject> - </figure> - - <para>We are now ready to communicate with our database server. The last - video in this section shows some basic SQL tasks:</para> - - <figure xml:id="figureEclipseBasicSql"> - <title>Executing SQL statements, browsing schema and retrieving - data</title> - - <mediaobject> - <videoobject> - <videodata fileref="Ref/Video/eclipseBasicSql.mp4"/> - </videoobject> - </mediaobject> - </figure> - </section> - </chapter> - - <chapter xml:id="xmlIntro"> - <title>Introduction to XML</title> - - <section xml:id="xmlBasic"> - <title>The XML industry standard</title> - - <para>A short question might be: <quote>What is XML?</quote> An answer - might be: The acronym XML stands for - <quote>E<emphasis>x</emphasis>tensible <emphasis>M</emphasis>arkup - <emphasis>L</emphasis><foreignphrase>anguage</foreignphrase></quote> and - is an industry standard being published by the W3C standardization - organization. Like other industry software standards talking about XML - leads to talk about XML based software: Applications and frameworks - supplying added values to software implementors and enhancing data - exchange between applications.</para> - - <para>Many readers are already familiar with XML without explicitly - referring to the standard itself: The world wide web's - <foreignphrase>lingua franca</foreignphrase> HTML has been ported to an - XML dialect forming the <link - xlink:href="">XHTML</link> Standard. The idea - behind this standard is to distinguish between an abstract markup - language and rendered results being generated from so called document - instances by a browser:</para> - - <figure xml:id="renderXhtmlMarkup"> - <title>Rendering XHTML markup</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/xhtml.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>Xhtml is actually a good example to illustrate the tree like, - hierarchical structure of XML documents:</para> - - <figure xml:id="xhtmlTree"> - <title>Xhtml tree structure</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/xhtmlexample.fig" scale="65"/> - </imageobject> - </mediaobject> - </figure> - - <para>We may extend this example by representing a mathematical formula - via a standard called <link - xlink:href="">Mathml</link>:</para> - - <figure xml:id="mathmlExample"> - <title>A formula in <link - xlink:href="">MathML</link> - representation.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/sqrtrender.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>Again we observe a similar situation: A database like - <emphasis>representation</emphasis> of a formula on the left and a - <emphasis>rendered</emphasis> version on the right. Regarding XML we - have:</para> - - <itemizedlist> - <listitem> - <para>The <link xlink:href="">MathML</link> - standard intended to describe mathematical formulas. The standard - defines a set of <emphasis>tags</emphasis> like e.g. <tag - class="starttag">math:msqrt</tag> with well-defined semantics - regarding permitted attribute values and nesting rules.</para> - </listitem> - - <listitem> - <para>Informal descriptions of formatting expectations.</para> - </listitem> - - <listitem> - <para>Software transforming an XML formula representation into - visible or printable output. In other words: A rendering - engine.</para> - </listitem> - </itemizedlist> - - <para>XML documents may also be regarded as a persistence mechanism to - represent and store data. Similarities to Relational Database Systems - exist. A RDBMS - (<emphasis>R</emphasis><foreignphrase>elational</foreignphrase> - <emphasis>D</emphasis><foreignphrase>atabase</foreignphrase> - <emphasis>M</emphasis><foreignphrase>anagement</foreignphrase> - <emphasis>S</emphasis><foreignphrase>ystem</foreignphrase>) is typically - capable to hold Tera bytes of data being organized in tables. The - arrangement of data may be subject to various constraints like - candidate- or foreign key rules. With respect to both end users and - software developers a RDBMS itself is a building block in a complete - solution. We need an application on top of it acting as a user interface - to the data being contained.</para> - - <para>In contrast to a RDBMS XML allows data to be organized - hierarchically. The <link - xlink:href="">MathML</link> representation given - in <xref linkend="mathmlExample"/> may be graphically visualized:</para> - - <figure xml:id="mathmltree"> - <title>A tree graph representation of the <link - xlink:href="">MathML</link> example given - before.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/sqrtree.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>CAD applications may user XML documents as a representation of - graphical primitives:</para> - - <informalfigure> - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/attributes.fig" scale="65"/> - </imageobject> - </mediaobject> - </informalfigure> - - <para>Of course RDBMS also allow the representation of tree like - structures or arbitrary graphs. But these have to be modelled by using - foreign key constraints since relational tables themselves have a - <quote>flat</quote> structure. Some RDBMS vendors provide extensions to - the SQL standard which allow <quote>native</quote> representations of - <link linkend="gloss_XML"><abbrev>XML</abbrev></link> documents.</para> - </section> - - <section xml:id="xmlHtml"> - <title>Well formed XML documents</title> - - <para>The general structure of an <link - linkend="gloss_XML"><abbrev>XML</abbrev></link> document is as - follows:</para> - - <figure xml:id="xmlbase"> - <title><link linkend="gloss_XML"><abbrev>XML</abbrev></link> basic - structure</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/xmlbase.fig" scale="65"/> - </imageobject> - </mediaobject> - </figure> - - <para>We explore a simple XML document representing messages like - E-mails:</para> - - <figure xml:id="memoWellFormed"> - <title>The representation of a short message.</title> - - <programlisting language="none"><?xml<co - xml:id="first_xml_code_magic"/> version="1.0"<co - xml:id="first_xml_code_version"/> encoding="UTF-8"<co - xml:id="first_xml_code_encoding"/>?> -<memo><co xml:id="first_xml_code_topelement"/> - <from>M. Goik</from><co xml:id="first_xml_code_from"/> - <to>B. King</to> - <to>A. June</to> - <subject>Best whishes</subject> - <content>Hi all, congratulations to your splendid party</content> -</memo></programlisting> - </figure> - - <calloutlist> - <callout arearefs="first_xml_code_magic"> - <para>The very first characters <code><?xml</code> may be - regarded as a <link - xlink:href="">magic - number string</link> being used as a format indicator which allows - to distinguish between different file types i.e. GIF, JPEG, HTML and - so on.</para> - </callout> - - <callout arearefs="first_xml_code_version"> - <para>The <code>version="1.0"</code> attribute tells us that all - subsequent lines will conform to the <link - xlink:href="">XML</link> standard of version - 1.0. This way a document can express its conformance to the version - 1.0 standard even if in the future this standard evolves to a higher - version e.g. <code>version="2.1"</code>.</para> - </callout> - - <callout arearefs="first_xml_code_encoding"> - <para>The attribute <code>encoding="UTF-8"</code> tells us that all - text in the current document uses <link - xlink:href="">Unicode</link> encoding. <link - xlink:href="">Unicode</link> is a widely accepted - industry standard for font encoding. Thus European, Cyrillic and - most Asian font codes are allowed to be used in documents - <emphasis>simultaneously</emphasis>. Other encodings may limit the - set of allowed characters, e.g. <code>encoding="ISO-8859-1"</code> - will only allow characters belonging to western European languages. - However a system also needs to have the corresponding fonts (e.g. - TrueType) being installed in order to render the document - appropriately. A document containing Chinese characters is of no use - if the underlying rendering system lacks e.g. a set of Chinese True - Type fonts.</para> - </callout> - - <callout arearefs="first_xml_code_topelement"> - <para>An XML document has exactly one top level - <emphasis>node</emphasis>. In contrast to the HTML standard these - nodes are commonly called elements rather than tags. In this example - the top level (root) element is <tag - class="starttag">memo</tag>.</para> - </callout> - - <callout arearefs="first_xml_code_from"> - <para>Each XML element like <tag class="starttag">from</tag> has a - corresponding counterpart <tag class="endtag">from</tag>. In terms - of XML we say each element being opened has to be closed. In - conjunction with the precedent point this is equivalent to the fact - that each XML document represents a tree structure as being shown in - the <link linkend="mathmltree">tree graph</link> - representation.</para> - </callout> - </calloutlist> - - <para>As with the introductory formula example this representation - itself is of limited usefulness: In an office environment we need a - rendered version being given either as print or as some online format - like E-Mail or HTML.</para> - - <para>From a software developer's point of view we may use a piece of - software called a <emphasis>parser</emphasis> to test the document's - standard conformance. At the MI department we may simply invoke - <userinput><command>xmlparse</command> message.xml</userinput> to start - a check:</para> - - <programlisting language="none"><errortext>goik>xmlparse wellformed.xml -Parsing was successful</errortext></programlisting> - - <para>Various XML related plugins are supplied for the <productname - xlink:href="">eclipse platform</productname> like the - <productname xlink:href="">Oxygen - software</productname> supplying <quote>life</quote> conformance - checking while editing XML documents. Now we test our assumptions by - violating some of the rules stated before. We deliberately omit the - closing element <tag class="endtag">from</tag>:</para> - - <figure xml:id="omitFrom"> - <title>An invalid XML document due to the omission of <tag - class="endtag">from</tag>.</title> - - <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> -<memo> - <from>M. Goik <co xml:id="omitFromMissingElement"/> - <to>B. King</to> - <to>A. June</to> - <subject>Best whishes</subject> - <content>Hi all, congratulations to your splendid party</content> -</memo></programlisting> - - <calloutlist> - <callout arearefs="omitFromMissingElement"> - <para>The opening element <tag class="starttag">from</tag> is not - terminated by <tag class="endtag">from</tag>.</para> - </callout> - </calloutlist> - </figure> - - <para>Consequently the parser's output reads:</para> - - <programlisting language="none"><errortext>goik>xmlparse omitfrom.xml -file:///ma/goik/workspace/Vorlesungen/Input/Memo/omitfrom.xml:8:3: -fatal error org.xml.sax.SAXParseException: The element type "from" -must be terminated by the matching end-tag "</from>". parsing error</errortext></programlisting> - - <para>Experienced HTML authors may be confused: In fact HTML is not an - XML standard. Instead HTML belongs to the set of SGML applications. SGML - is a much older standard namely the <emphasis>Standard Generalized - Markup Language</emphasis>.</para> - - <para>Even if every XML element has a closing counterpart the resulting - XML may be invalid:</para> - - <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> -<memo> - <from>M. Goik<to>B. King</from></to> - <to>A. June</to> - <subject>Best whishes</subject> - <content>Hi all, congratulations to your splendid party</content> -</memo></programlisting> - - <para>The parser echoes:</para> - - <programlisting language="none"><computeroutput>file:///ma/goik/workspace/Vorlesungen/Input/Memo/nonest.xml:3:29: -fatal error org.xml.sax.SAXParseException: The element type "to" must be -terminated by the matching end-tag "</to>". parsing error</computeroutput></programlisting> - - <para>This type of error is caused by so called improper nesting of - elements: The element <tag class="starttag">from</tag>is closed before - the <quote>inner</quote> element <tag class="starttag">to</tag> has been - closed. Actually this violates the expressibility of XML documents as a - tree like structure. The situation may be resolved by choosing:</para> - - <programlisting language="none">...<from>M. Goik<to>B. King</to></from>...</programlisting> - - <para>We provide two examples illustrating proper and improper nesting - of XML documents:</para> - - <figure xml:id="fig_nestingProper"> - <title>Proper nesting of XML elements</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/propernest.fig" scale="65"/> - </imageobject> - </mediaobject> - </figure> - - <para>The following example violates proper nesting constraint and thus - does not provide an XML document:</para> - - <figure xml:id="fig_improperNest"> - <title>Improperly nested elements</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/impropernest.fig"/> - </imageobject> - </mediaobject> - </figure> - - <!-- goik:later - <para>An animation showing the usage of the Oxygen plug in for the - examples given above can be found <uri - xlink:href="src/viewlet/wellformed/wellformed_viewlet_swf.html">here</uri>.</para> ---> - - <para>XML elements may have so called attributes like <tag - class="attribute">date</tag> in the following example:</para> - - <figure xml:id="memoWellAttrib"> - <title>An XML document with attributes.</title> - - <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> -<memo date="10.02.2006" priority="high"> - <from>M. Goik</from> - <to>B. King</to> - <to>A. June</to> - <subject>Best whishes</subject> - <content>Hi all, congratulations to your splendid party</content> -</memo></programlisting> - </figure> - - <para>The conformance of a XML document with the following rules may be - verified by invoking a parser:</para> - - <itemizedlist> - <listitem> - <para>Within the <emphasis>scope</emphasis> of a given element an - attribute name must be unique. In the example above one may not - define a second attribute <varname>date="..."</varname> within the - same element <memo ... >. This reflects the usual programming - language semantics of attributes: In a <link - linkend="gloss_Java"><trademark>Java</trademark></link> class an - attribute is represented by an unique identifier and thus cannot - appear twice.</para> - </listitem> - - <listitem> - <para>An attribute value must be enclosed either in single (') or - double (") quotes. This is different from the HTML standard which - allows attribute values without quotes provided the given attribute - value does not give rise to ambiguities. For example <tag - class="starttag">td align=left</tag> is allowed since the attribute - value <tag class="attvalue">left</tag> does not contain any spaces - thus allowing a parser to recognize the end of the value's - definition.</para> - </listitem> - </itemizedlist> - - <qandaset defaultlabel="qanda" xml:id="example_memoAttribTree"> - <title>A graphical representation of a memo.</title> - - <qandadiv> - <qandaentry> - <question> - <para>Draw a graphical representation similar as in <xref - linkend="mathmltree"/> of the memo document being given in <xref - linkend="memoWellAttrib"/>.</para> - </question> - - <answer> - <para>The <link linkend="memoWellAttrib">memo document's</link> - structure may be visualized as:</para> - - <informalfigure xml:id="memotreeFigure"> - <para>A graphical representation of <xref - linkend="memoWellAttrib"/>:</para> - - <informalfigure xml:id="memotreeFigureFalse"> - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/memotree.fig"/> - </imageobject> - </mediaobject> - </informalfigure> - - <para>The sequence of <emphasis>element</emphasis> child nodes - is important in XML and has to be preserved. Only the order of - the two attributes <tag class="attribute">date</tag> and <tag - class="attribute">priority</tag> is undefined: They actually - belong to the <tag class="starttag">memo</tag> node serving as - a dictionary with the attribute names being the keys and the - attribute values being the values of the dictionary.</para> - </informalfigure> - </answer> - </qandaentry> - - <qandaentry xml:id="example_attribInQuotes"> - <question> - <label>Attributes and quotes</label> - - <para>As stated before XML attributes have to be enclosed in - single or double quotes. Construct an XML document with mixed - quotes like <code><date day="monday'></code>. How does the - parser react? Find the corresponding syntax definition of legal - attribute values in the <link - xlink:href="">XML standard W3C - Recommendation</link>.</para> - </question> - - <answer> - <para>The parser flags a mixture of single and double quotes for - a given attribute as an error. The XML standard <link - xlink:href="">defines</link> - the syntax of attribute values: An attribute value has to be - enclosed <emphasis>either</emphasis> in two single - <emphasis>or</emphasis> in two double quotes as being defined in - <uri - xlink:href=""></uri>.</para> - </answer> - </qandaentry> - - <qandaentry xml:id="quoteInAttributValue"> - <question> - <label>Quotes as part of an attributes value?</label> - - <para>Single and double quote are used to delimit an attribute - value. May quotes appear themselves as part of an at tribute's - value, e.g. like in a person's name <code>Gary "King" - Mandelson</code>?</para> - </question> - - <answer> - <para>Attribute values may contain double quotes if the - attributes value is enclosed in single quotes and vice versa. As - a limitation the value of an an attribute may not contain single - quotes and double quotes at the same time:</para> - - <informalfigure xml:id="exampleSingleDoubleQuotes"> - <para>Quotes as part of attribute values.</para> - - <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> -<test> - <person name='Gary "King" Mandelson'/> <!-- o.k. --> - <person name="Gary 'King' Mandelson"/> <!-- o.k. --> - <person name="Gary 'King 'S.' "Mandelson"'/> <!-- oops! --> -</test></programlisting> - </informalfigure> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <para>Some constraints being imposed on XML documents by the standard - defined so far may be summarized as:</para> - - <itemizedlist> - <listitem> - <para>A XML documents requires to have exactly one top level - element.</para> - </listitem> - - <listitem> - <para>Elements have to be properly nested. An element must not be - closed if an <quote>inner</quote> Element is still open.</para> - </listitem> - - <listitem> - <para>Attribute names within a given Element must be unique.</para> - </listitem> - - <listitem> - <para>Attribute values <emphasis>must</emphasis> be quoted - correctly.</para> - </listitem> - </itemizedlist> - - <para>The very last rule shows one of several differences to the HTML - Standard: In HTML a lot of elements don't have to be closed. For example - paragraphs (<tag class="starttag">p</tag>) or images (<tag - class="starttag">img src='foo.gif'</tag>) don't have to be closed - explicitly. This is due to the fact that HTML used to be defined in - accordance with the older <emphasis><emphasis - role="bold">S</emphasis>tandard <emphasis - role="bold">G</emphasis>eneralized <emphasis - role="bold">M</emphasis>arkup <emphasis - role="bold">L</emphasis>anguage</emphasis> (SGML) Standard.</para> - - <para>These constraints are part of the definition of a <link - xlink:href="">well formed - document</link>. The specification imposes additional constraints for a - document to be well-formed.</para> - </section> - </chapter> - - <chapter xml:id="dtd"> - <title>Beyond well- formedness</title> - - <section xml:id="motivationDdt"> - <title>Motivation</title> - - <para>So far we are able to create XML documents containing - hierarchically structured data. We may nest elements and thus create - tree structures of arbitrary depth. The only restrictions being imposed - by the XML standard are the constraints of well - formedness. For many - purposes in software development this is not sufficient.</para> - - <para>A company named <productname>Softmail</productname> might - implement an email system which uses <link - linkend="memoWellAttrib">memo</link> document files as low level data - representation serving as a persistence layer. Now a second company - named <productname>Hardmail</productname> wants to integrate mails - generated by <productname>Softmail</productname>'s system into its own - business product. The <productname>Hardmail</productname> software - developers might <emphasis>infer</emphasis> the logical structure of - <productname>Softmail</productname>'s email representation but the - following problems arise:</para> - - <itemizedlist> - <listitem> - <para>The logical structure will in practice become more complex: - E-mails may contain attachments leading to multi part messages. - Additional header information is required for standard Internet mail - compliance. This adds additional complexity to the XML structure - being mandatory for data representation. Relying only on - well-formedness the specification of an internal E-mail format can - only be achieved <emphasis>informally</emphasis>. Thus a rule like - <quote>Each E-mail must have a subject</quote> may be written down - in the specification. A software developer will code these rules but - probably make mistakes as the set of rules grows.</para> - - <para>In contrast a RDBMS based solution offers to solve such - problems in a declarative manner: A developer may use a <code>NOT - NULL</code> constraint on a subject attribute of type - <code>VARCHAR</code> thus inhibiting empty subjects.</para> - </listitem> - - <listitem> - <para>As <productname>Softmail</productname>'s product evolves its - internal E-mail XML format is subject to change due to functional - extensions and possibly bug fixes both giving rise to - interoperability problems.</para> - </listitem> - </itemizedlist> - - <para>Generally speaking well formed XML documents lack grammar - constraints as being available for programming languages. In case of - RDBMS developers can impose primary-, foreign and <code>CHECK</code> - constraints in a <emphasis>declarative</emphasis> manner rather than - hard coding them into their applications (A solution bad programmers are - in favour of though...). Various XML standards exist for declarative - constraint definitions namely:</para> - - <itemizedlist> - <listitem> - <para>DTDs</para> - </listitem> - - <listitem> - <para><link xlink:href="">XML - Schema</link></para> - </listitem> - - <listitem> - <para><link - xlink:href="">RelaxNG</link></para> - </listitem> - </itemizedlist> - </section> - - <section xml:id="dtdBasic"> - <title>XML Schema</title> - - <section xml:id="dtdFirstExample"> - <title>Structural descriptions for documents</title> - - <para>As an example we choose documents of type - <emphasis>memo</emphasis> as a starting point. Documents like the - example from <xref linkend="memoWellAttrib"/> may be - <emphasis>informally</emphasis> described to be a sequence of the - following mandatory items:</para> - - <figure xml:id="figure_memo_informalconstraints"> - <title>Informal constraints on <tag class="element">memo</tag> - document instances</title> - - <itemizedlist> - <listitem> - <para><emphasis>Exactly one</emphasis> sender.</para> - </listitem> - - <listitem> - <para><emphasis>One or more</emphasis> recipients.</para> - </listitem> - - <listitem> - <para>Subject</para> - </listitem> - - <listitem> - <para>Content</para> - </listitem> - </itemizedlist> - - <para>In addition we have:</para> - - <itemizedlist> - <listitem> - <para>A date string <emphasis>must</emphasis> be supplied</para> - </listitem> - - <listitem> - <para>A priority <emphasis>may</emphasis> be supplied with - allowed values to be chosen from the set of values <tag - class="attvalue">low</tag>, <tag class="attvalue">medium</tag> - or <tag class="attvalue">high</tag>.</para> - </listitem> - </itemizedlist> - </figure> - - <para>All these fields contain ordinary text to be filled in by a user - and shall appear exactly in the defined order. For simplicity we do - not care about email address syntax rules being described in <link - xlink:href="">RFC based address - schemes</link>. We will see how the <emphasis>constraints</emphasis> - mentioned above can be modelled in XML by an extension to the concept - of well formed documents.</para> - </section> - - <section xml:id="section_memo_machinereadable"> - <title>A machine readable description</title> - - <para>We now introduce an example of an XML schema. It allows for the - specification of additional constraints to both element nodes and - their attributes. Our set of <link - linkend="figure_memo_informalconstraints" revision="">informal - constraints</link> on memo documents may be expressed as:</para> - - <figure xml:id="figure_memo_dtd"> - <title>A schema to describe memo documents.</title> - - <programlisting language="none"><xs:schema xmlns:xs="" - xmlns:vc="" elementFormDefault="qualified" - vc:minVersion="1.0" vc:maxVersion="1.1"> - - <xs:element name="memo"> - <xs:complexType> - <xs:sequence> <co xml:id="memodtd_memodef"/> - <xs:element name="from" type="xs:string"/> <co - xml:id="memodtd_elem_from"/> - <xs:element name="to" minOccurs="1" maxOccurs="unbounded" type="xs:string"/> - <xs:element name="subject" type="xs:string"/> - <xs:element name="content" type="xs:string"/> - </xs:sequence> - <xs:attribute name="date" type="xs:date" use="required"/> <co - xml:id="memodtd_memo_attribs"/> - <xs:attribute name="priority" type="Priority" use="optional"/> - </xs:complexType> - - </xs:element> - - <xs:simpleType name="Priority"> - <xs:restriction base="xs:string"> - <xs:enumeration value="low"/> - <xs:enumeration value="medium"/> - <xs:enumeration value="high"/> - </xs:restriction> - </xs:simpleType> - -</xs:schema></programlisting> - - <calloutlist> - <callout arearefs="memodtd_memodef"> - <para>A <tag class="element">memo</tag> consists of a sender, at - least one recipient, a subject and content.</para> - </callout> - - <callout arearefs="memodtd_memo_attribs"> - <para>A <tag class="element">memo</tag> has got one required - attribute <varname>date</varname> and an optional attribute - <varname>priority</varname> being restricted to the three - allowed values <tag class="attvalue">low</tag>, <tag - class="attvalue">medium</tag> and <tag - class="attvalue">high</tag> being defined by a separate <tag - class="starttag">xs:simpleType</tag> directive.</para> - </callout> - - <callout arearefs="memodtd_elem_from"> - <para>A <tag class="starttag">from</tag> element consists of - ordinary text. This disallows XML markup. For example - <code><from>Smith & partner</from></code> is - disallowed since XML uses the ampersand (&) to denote the - beginning of an entity like <tag class="genentity">auml</tag> - for the German a-umlaut (ä). The correct form is - <code><from>Smith &amp; partner</from></code> - using the predefined entity <tag class="genentity">amp</tag> as - an escape sequence for the ampersand.</para> - - <para><code>type="xs:string"</code> is a built in XML Schema - type representing a restricted version of ordinary strings. - Without digging into details a <code>xs:string</code> string - must not contain any markup code like e.g. <tag - class="starttag">msqrt</tag>. This ensures that a string does - not interfere with the document's XML markup.</para> - </callout> - </calloutlist> - </figure> - - <para>We notice our schema's syntax itself is an XML document.</para> - - <para>From the viewpoint of software modeling an XML Schema instance - is a <emphasis>schema</emphasis> describing the syntax of a class of - XML document instances adhering to it. In the context of XML - technologies <link xlink:href="">XML - Schema</link> is one of several language alternatives which allow for - XML document structure descriptions.</para> - - <para>Readers being familiar with <abbrev - xlink:href="">BNF</abbrev> - or <abbrev - xlink:href="">EBNF</abbrev> - will be able to understand the grammatical rules being expressed - here.</para> - - <productionset> - <title>A message of type <tag class="starttag">memo</tag></title> - - <production xml:id="memo.ebnf.memo"> - <lhs>Memo Message</lhs> - - <rhs>'<memo>' <nonterminal - def="#memo.ebnf.sender">Sender</nonterminal> [<nonterminal - def="#memo.ebnf.recipient">Recipient</nonterminal>]+ <nonterminal - def="#memo.ebnf.subject">Subject</nonterminal> <nonterminal - def="#memo.ebnf.content">Content</nonterminal> - '</memo>'</rhs> - </production> - - <production xml:id="memo.ebnf.sender"> - <lhs>Sender</lhs> - - <rhs>'<from>' <nonterminal def="#memo.ebnf.text"> Text - </nonterminal> '</from>'</rhs> - </production> - - <production xml:id="memo.ebnf.recipient"> - <lhs>Recipient</lhs> - - <rhs>'<to>' <nonterminal def="#memo.ebnf.text"> Text - </nonterminal> '</to>'</rhs> - </production> - - <production xml:id="memo.ebnf.subject"> - <lhs>Subject</lhs> - - <rhs>'<subject>' <nonterminal def="#memo.ebnf.text"> Text - </nonterminal> '</subject>'</rhs> - </production> - - <production xml:id="memo.ebnf.content"> - <lhs>Content</lhs> - - <rhs>'<content>' <nonterminal def="#memo.ebnf.text"> Text - </nonterminal> '</content>'</rhs> - </production> - - <production xml:id="memo.ebnf.text"> - <lhs>Text</lhs> - - <rhs>[a-zA-Z0-9]* <lineannotation>In real documents this is too - restrictive!</lineannotation></rhs> - </production> - </productionset> - - <para>We may as well supply a graphical representation:</para> - - <figure xml:id="extendContModelGraph"> - <title>Graphical representation of the extended <code>content</code> - model.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/contentmixed.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>In comparison to our informal description of memo documents a - schema offers an added value: The grammar is machine readable and may - thus become input to a parser which in turn gets enabled to check - whether an XML document obeys the constraints being imposed. So the - parser must be instructed to use a schema in addition to the XML - document in question. For this purpose an XML document may define a - reference to a schema:</para> - - <figure xml:id="memo_external_dtd"> - <title>A memo document instance holding a reference to a document - external schema.</title> - - <programlisting language="none"><memo <co - xml:id="memo_external_dtd_top_element"/> xmlns:xsi="" - xsi:noNamespaceSchemaLocation="memo.xsd" <co - xml:id="memo_external_dtd_url"/> - date="2014-09-24" priority="high"> - <from>M. Goik</from> - <to>B. King</to> - <to>A. June</to> - <subject>Best whishes</subject> - <content>Hi all, congratulations to your splendid party</content> -</memo></programlisting> - - <calloutlist> - <callout arearefs="memo_external_dtd_top_element"> - <para>The element <tag class="starttag">memo</tag> is chosen to - be the top (root) element of the document's tree. It must be - defined in our schema <filename>memo.xsd</filename>. This is - really a choice since an XML schema defines a - <emphasis>set</emphasis> of elements in - <emphasis>arbitrary</emphasis> order. There is no such rule as - <quote>define before use</quote>. So an XML schema does not tell - us which element has to appear on top of a document.</para> - - <para>Suppose a given XML schema offers both <tag - class="starttag">book</tag> and <tag - class="starttag">report</tag> elements. An XML author writing a - complex document will choose <tag class="starttag">book</tag> as - top level element rather than <tag class="starttag">report</tag> - being more appropriate for a small piece of documentation. - Consequently it is an XML authors <emphasis>choice</emphasis> - which of the elements being defined in a schema shall appear as - <emphasis>the</emphasis> top level element</para> - </callout> - - <callout arearefs="memo_external_dtd_url"> - <para>The address of the schema's rule set. In the given example - it is just a filename but it may as well be an <link - xlink:href="">URL</link> of type - <abbrev - xlink:href="">ftp</abbrev>, - <abbrev xlink:href="">http</abbrev> - and so on, see <xref linkend="memoDtdOnFtp"/>.</para> - </callout> - </calloutlist> - </figure> - - <para>In presence of a schema parsing a document is actually a two - step process: First the parser will check the document for well - -formedness. Then the parser will read the referenced schema - <filename>memo.xsd</filename> and check the document for the - additional constraints being defined within.</para> - - <para>In the current example both the schema and the XML memo document - reside as text files in a common file system folder. For general use a - schema is usually kept at a centralized location. The attribute - <varname>xsi:noNamespaceSchemaLocation</varname> value is actually a - <emphasis>U</emphasis><foreignphrase>niform</foreignphrase> - <emphasis>R</emphasis><foreignphrase>esource</foreignphrase> - <emphasis>L</emphasis><foreignphrase>ocator</foreignphrase> <link - xlink:href="">(URL)</link>. Thus our - <filename>memo.xsd</filename> may also be supplied as a <abbrev - xlink:href="">http</abbrev> or <abbrev - xlink:href="">ftp</abbrev> - <link xlink:href="">URL</link>:</para> - - <figure xml:id="memoDtdOnFtp"> - <title>A schema reference to a FTP server.</title> - - <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> -<memo ... xsi:noNamespaceSchemaLocation=""> - <from>M. Goik</from> - ... -</memo></programlisting> - </figure> - - <para>Some terms are helpful in the context of schemas:</para> - - <variablelist> - <varlistentry> - <term>Validating / non-validating:</term> - - <listitem> - <para>A non-validating parser only checks a document for well- - formedness. If it also checks XML documents for conformance to - schema it is a <emphasis>validating</emphasis> parser.</para> - </listitem> - </varlistentry> - - <varlistentry> - <term>Valid / invalid documents:</term> - - <listitem> - <para>An XML document referencing a schema may either be valid - or invalid depending on its conformance to the schema in - question.</para> - </listitem> - </varlistentry> - - <varlistentry> - <term>Document instance:</term> - - <listitem> - <para>An XML memo document may conform to the <link - linkend="figure_memo_dtd">memo schema</link>. In this case we - call it a <emphasis>document instance</emphasis> of the memo - schema.</para> - - <para>This situation is quite similar as in typed programming - languages: A <link - linkend="gloss_Java"><trademark>Java</trademark></link> - <code>class</code> declaration is a blueprint for the <link - linkend="gloss_Java"><trademark>Java</trademark></link> runtime - system to construct <link - linkend="gloss_Java"><trademark>Java</trademark></link> objects - in memory. This is done by e.g. a statement<code> String name = - new String();</code>. The identifier <code>name</code> will hold - a reference to an <emphasis>instance of class String</emphasis>. - So in a <link - linkend="gloss_Java"><trademark>Java</trademark></link> runtime - environment a class declaration plays the same role as a schema - declaration in XML. See also <xref - linkend="example_memoJavaClass"/>.</para> - </listitem> - </varlistentry> - </variablelist> - - <para>For further discussions it is very useful to clearly distinguish - element definitions in a schema from their - <emphasis>realizations</emphasis> in a corresponding document - instance: Our memo schema defines an element <tag - class="starttag">from</tag> to be of content <type>xs:string</type>. - According to the schema at least one <tag class="starttag">from</tag> - clause must appear in a valid (conforming) document instance . If we - were talking about HTML document instances we would prefer to talk - about a <tag class="starttag">from</tag> <emphasis>tag</emphasis> - rather than a <tag class="starttag">from</tag> - <emphasis>element</emphasis>.</para> - - <para>In this document we will use the term <emphasis>element - type</emphasis> to denote an <code><xs:element ...</code> - definition in a schema. Thus we will talk about an element type <tag - class="element">subject</tag> being defined in - <filename>memo.xsd</filename>.</para> - - <para>An element type being defined in a <abbrev - xlink:href="">schema</abbrev> - may have document instances as realizations. For example the document - instance shown in <xref linkend="memo_external_dtd"/> has two - <emphasis>nodes</emphasis> of element type <tag - class="element">to</tag>. Thus we say that the document instance - contains two <emphasis>element nodes</emphasis> of type <tag - class="element">to</tag>. We will frequently abbreviate this by saying - the instance contains to <tag class="starttag">from</tag> element - nodes. And we may even omit the term <emphasis>nodes</emphasis> and - simply talk about two <tag class="starttag">from</tag> elements. But - the careful reader should always distinguish between a single type - <code>foo</code> being defined in a <abbrev - xlink:href="">schema</abbrev> - and the possibly empty set of <tag class="starttag">foo</tag> nodes - appearing in valid document instances.</para> - - <para><abbrev - xlink:href="">Schema</abbrev>'s - appear on top of well-formed XML documents:</para> - - <figure xml:id="wellformedandvalid"> - <title>Well-formed and valid documents</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/wellformedandvalid.fig" scale="65"/> - </imageobject> - </mediaobject> - </figure> - - <qandaset defaultlabel="qanda" xml:id="example_memoTestValid"> - <title>Validation of memo document instances.</title> - - <qandadiv> - <qandaentry> - <question> - <para>Copy the two files <link - xlink:href="Ref/src/Memo.1/message.xml">message.xml</link> and - <link xlink:href="Ref/src/Memo.1/memo.xsd">memo.xsd</link> - into your eclipse project. Use the Oxygen XML plug in to check - if the document is valid. Then subsequently do and undo the - following changes each time checking the document for - validity:</para> - - <itemizedlist> - <listitem> - <para>Omit the <tag class="starttag">from</tag> - element.</para> - </listitem> - - <listitem> - <para>Change the order of the two sub elements <tag - class="starttag">subject</tag> and <tag - class="starttag">content</tag>.</para> - </listitem> - - <listitem> - <para>Erase the <varname>date</varname> attribute and its - value.</para> - </listitem> - - <listitem> - <para>Erase the <varname>priority</varname> attribute and - its value.</para> - </listitem> - </itemizedlist> - - <para>What do you observe?</para> - </question> - - <answer> - <para>The <tag class="attribute">priority</tag> attribute is - declared as <code>optional</code> and may thus be omitted. - Erasing the <tag class="attribute">priority</tag> attribute - thus leaves the document in a valid state. The remaining three - edit actions yield an invalid document instance.</para> - </answer> - </qandaentry> - - <qandaentry xml:id="example_memoJavaClass"> - <question> - <label>A memo implementation sketch in Java</label> - - <para>The aim of this exercise is to clarify the (abstract) - relation between XML <abbrev - xlink:href="">schema</abbrev>'s - and sets of <link - linkend="gloss_Java"><trademark>Java</trademark></link> - classes rather then building a running application. We want to - model the <link xlink:href="Ref/src/Memo.1/memo.xsd">memo - schema</link> as a set of <link - linkend="gloss_Java"><trademark>Java</trademark></link> - classes.</para> - </question> - - <answer> - <para>The XML attributes <tag class="attribute">date</tag> and - <tag class="attribute">priority</tag> can be mapped as <link - linkend="gloss_Java"><trademark>Java</trademark></link> - attributes. The same applies for the Memo elements <tag - class="element">from</tag>, <tag class="element">subject</tag> - and <tag class="element">content</tag> which may be - implemented as simple Strings or alternatively as separate - Classes wrapping the String content. The latter method of - implementation should be preferred if the Memo schema is - expected to grow in complexity. A simple sketch reads:</para> - - <programlisting language="none">import java.util.Date; -import java.util.SortedSet; - -public class Memo { - private Date date; - Priority priority = Priority.standard; - private String from, subject,content; - private SortedSet<String> to; - // Accessors not yet implemented -}</programlisting> - - <para>The only thing to note here is the implementation of the - <tag class="element">to</tag> element: We want to be able to - address a <emphasis>set</emphasis> of recipients. Thus we have - to disallow duplicates. Note that this is an - <emphasis>informal</emphasis> constraint not being handled by - our schema: A Memo document instance <emphasis>may</emphasis> - have duplicate content in <tag class="starttag">to</tag> - nodes. This is a weakness of <abbrev - xlink:href="">schema</abbrev>s: - We are unable to impose uniqueness constraints on the content - of partial sets of document nodes.</para> - - <para>On the other hand our set of recipients has to be - ordered: In a XML document instance the order of <tag - class="starttag">to</tag> nodes is important and has to be - preserved in a <link - linkend="gloss_Java"><trademark>Java</trademark></link> - representation. Thus we choose an - <classname>java.util.SortedSet</classname> parametrized with - String type to fulfill both requirements.</para> - - <para>Our schema defines:</para> - - <programlisting language="none"><!ATTLIST memo ... priority (low|medium|high) #IMPLIED></programlisting> - - <para>Starting from <link - linkend="gloss_Java"><trademark>Java</trademark></link> 1.5 we - may implement this constraint by a type safe enumeration in a - file <filename></filename>:</para> - - <programlisting language="none">public enum Priority{low, standard, high};</programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <para>In the following chapters we will extend the memo document type - (<code><!DOCTYPE memo ... ></code>) to demonstrate various - concepts of <abbrev - xlink:href="">schema</abbrev>'s - and other XML related standards. In parallel a series of exercises - deals with building a schema usable to edit books. This schema gets - extended as our knowledge about XML advances. We start with an initial - exercise:</para> - - <qandaset defaultlabel="qanda" xml:id="example_bookDtd"> - <title>A schema for editing books</title> - - <qandadiv> - <qandaentry> - <question> - <para>Write a schema describing book document instances with - the following features:</para> - - <itemizedlist> - <listitem> - <para>A book shall have a title to describe the book - itself.</para> - </listitem> - - <listitem> - <para>A book shall have at least one but possibly a - sequence of chapters.</para> - </listitem> - - <listitem> - <para>Each chapter shall have a title and at least one - paragraph.</para> - </listitem> - - <listitem> - <para>The titles and paragraphs shall consist of ordinary - text.</para> - </listitem> - </itemizedlist> - </question> - - <answer> - <para>A possible schema looks like:</para> - - <figure xml:id="figure_book.dtd_v1"> - <title>A first schema version for book documents</title> - - <programlisting language="none"><xs:schema xmlns:xs="" - xmlns:vc="" elementFormDefault="qualified" - vc:minVersion="1.0" vc:maxVersion="1.1"> - - <xs:element name="book"> - <xs:complexType> - <xs:sequence> - <xs:element ref="title"/> - <xs:element ref="chapter" minOccurs="1" maxOccurs="unbounded"/> - </xs:sequence> - </xs:complexType> - </xs:element> - - <xs:element name="title" type="xs:string"/> - <xs:element name="chapter"> - <xs:complexType> - <xs:sequence> - <xs:element ref="title"/> - <xs:element ref="para" minOccurs="1" maxOccurs="unbounded"/> - </xs:sequence> - </xs:complexType> - </xs:element> - - <xs:element name="para" type="xs:string"/> - -</xs:schema></programlisting> - </figure> - - <para>We supply a valid document instance:</para> - - <informalfigure xml:id="bookInitialInstance"> - <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> -<book xmlns:xsi="" - xsi:noNamespaceSchemaLocation="book.xsd"> - <title>Introduction to Java</title> - <chapter> - <title>Introduction</title> - <para>Java is a programming language</para> - </chapter> - <chapter> - <title>The virtual machine</title> - <para>We also call it the runtime system.</para> - </chapter> - <chapter> - <title>Annotations</title> - <para>Annotations provide a means to add meta information.</para> - <para>This is especially useful for framework authors.</para> - </chapter> -</book></programlisting> - </informalfigure> - - <para>.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="dtdVsSqlDdl"> - <title>Relating <abbrev - xlink:href="">schema</abbrev>'s - and <acronym - xlink:href="">SQL</acronym> - <abbrev - xlink:href="">DDL</abbrev></title> - - <para>XML <abbrev - xlink:href="">schema</abbrev>'s - and <acronym - xlink:href="">SQL</acronym> - <abbrev - xlink:href="">DDL</abbrev> - are related: They both describe data models and thus integrity - constraints. We consider a simple invoice example:</para> - - <figure xml:id="invoiceIntegrity"> - <title>Invoice integrity constraints</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/invoicedata.fig" scale="65"/> - </imageobject> - </mediaobject> - </figure> - - <para>A relational implementation may look like:</para> - - <figure xml:id="invoiceSqlDdl"> - <title>Relational implementation</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/invoicedataimplement.fig" scale="65"/> - </imageobject> - </mediaobject> - </figure> - - <qandaset defaultlabel="qanda" xml:id="qandaInvoiceSchema"> - <title>An XML schema representing invoices</title> - - <qandadiv> - <qandaentry> - <question> - <para>Represent the relational schema being described in <xref - linkend="invoiceSqlDdl"/> by an XML Schema and provide an - appropriate instance example.</para> - </question> - - <answer> - <para>A possible schema implementation:</para> - - <programlisting language="none"><xs:schema xmlns:xs="" - xmlns:vc="" elementFormDefault="qualified" - vc:minVersion="1.0" vc:maxVersion="1.1"> - - <xs:simpleType name="money"> - <xs:restriction base="xs:decimal"> - <xs:fractionDigits value="2"/> - </xs:restriction> - </xs:simpleType> - - <xs:element name="data"> - <xs:complexType> - <xs:sequence> - <xs:element ref="customer" maxOccurs="unbounded"/> - <xs:element ref="invoice" maxOccurs="unbounded"/> - </xs:sequence> - </xs:complexType> - <xs:key name="customerId"> - <xs:selector xpath="customer"/> - <xs:field xpath="@id"/> - </xs:key> - - <xs:keyref refer="customerId" name="customerToInvoice"> - <xs:selector xpath="invoice"/> - <xs:field xpath="@customer"></xs:field> - </xs:keyref> - </xs:element> - - <xs:element name="customer"> - <xs:complexType> - <xs:sequence> - <xs:element name="name" type="xs:string"/> - <xs:element name="phoneNumber" type="xs:string" minOccurs="0"/> - </xs:sequence> - <xs:attribute name="id" type="xs:int" use="required"/> - </xs:complexType> - </xs:element> - - <xs:element name="invoice"> - <xs:complexType> - <xs:sequence> - <xs:element name="amount" type="money"/> - <xs:element name="status"> - <xs:simpleType> - <xs:restriction base="xs:token"> - <xs:enumeration value="open"/> - <xs:enumeration value="due"/> - <xs:enumeration value="cleared"/> - </xs:restriction> - </xs:simpleType> - </xs:element> - </xs:sequence> - <xs:attribute name="customer" type="xs:int" use="required"/> - </xs:complexType> - </xs:element> - -</xs:schema></programlisting> - - <para>An example data set:</para> - - <programlisting language="none"><data xmlns:xsi="" - xsi:noNamespaceSchemaLocation="invoice.xsd"> - <customer id="5"> - <name>Clarke Jefferson</name> - </customer> - - <invoice customer="5"> - <amount>33.12</amount> - <status>due</status> - </invoice> -</data></programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="airlineXsd"> - <title>The airline example revisited</title> - - <qandaset defaultlabel="qanda" xml:id="qandaAirlineXsd"> - <title>Airline meta information by XML schema</title> - - <qandadiv> - <qandaentry> - <question> - <para>Transform the relational schema from <xref - linkend="airlineRelationalSchema"/> into an XML schema and - supply some test data. In particular consider the following - constraints:</para> - - <itemizedlist> - <listitem> - <para>Data types</para> - - <itemizedlist> - <listitem> - <para><link - xlink:href="">ICAO - airline designator</link></para> - </listitem> - - <listitem> - <para><link - xlink:href="">ICAO - airport code</link></para> - </listitem> - </itemizedlist> - </listitem> - - <listitem> - <para>Primary / Unique key definitions</para> - </listitem> - - <listitem> - <para>Foreign key definitions</para> - </listitem> - - <listitem> - <para>CHECK constraint: Your XML schema will require <tag - class="starttag">xs:assert test="..." </tag> and thus XML - schema version 1.1. You may want to read about - co-occurrence constraints as being described in <link - xlink:href="">Listing - 6. Assertion on complex type - @height < - @width</link>.</para> - </listitem> - </itemizedlist> - - <para>The following XML example instance may guide you towards - an <filename>airline.xsd</filename> schema:</para> - - <programlisting language="none"><top xmlns:xsi="" - xsi:noNamespaceSchemaLocation="airline.xsd"> - <airlines> - <airline airlineCode="DLH" id="1"> - <name>Lufthansa</name> - </airline> - <airline airlineCode="AFR" id="2"> - <name>Air France</name> - </airline> - </airlines> - <destinations> - <destination id="1" airportCode="EDDF"> - <fullName>Frankfurt International Airport – Frankfurt am Main</fullName> - </destination> - - <destination id="3" airportCode="EBCI"> - <fullName>Brussels South Charleroi Airport – Charleroi</fullName> - </destination> - </destinations> - - <flights> - <flight id="1" airline="2" origin="1" destination="3"> - <flightNumber>LH 4234</flightNumber> - </flight> - </flights> -</top></programlisting> - - <para>Hints:</para> - - <itemizedlist> - <listitem> - <para>Identify all relational schema constraints from - solution of <xref linkend="airlineRelationalSchema"/> and - model them accordingly.</para> - </listitem> - - <listitem> - <para>The above example does not contain any constraint - violations. In order to test your schema for completeness - tinkering with primary key, unique and referencing - attribute values may be helpful.</para> - </listitem> - </itemizedlist> - </question> - - <answer> - <programlisting language="none"><xs:schema xmlns:xs="" - xmlns:vc="" elementFormDefault="qualified" - vc:minVersion="1.1"> - - <xs:simpleType name="ICAOAirportCode"> - <xs:restriction base="xs:string"> - <xs:length value="4" /> - <xs:pattern value="[A-Z09]+"></xs:pattern> - </xs:restriction> - </xs:simpleType> - - <xs:simpleType name="ICAOAirlineCode"> - <xs:restriction base="xs:string"> - <xs:length value="3"/> - <xs:pattern value="[A-Z]+"></xs:pattern> - </xs:restriction> - </xs:simpleType> - - <xs:element name="top"> - <xs:complexType> - <xs:sequence> - <xs:element ref="airlines"/> - <xs:element ref="destinations"/> - <xs:element ref="flights"/> - </xs:sequence> - </xs:complexType> - - <xs:keyref name="_FK_Flight_airline" refer="_PK_Airline_id"> - <xs:selector xpath="flights/flight"/> - <xs:field xpath="@airline"/> - </xs:keyref> - - <xs:keyref name="_FK_Flight_origin" refer="_PK_Destination_id"> - <xs:selector xpath="flights/flight"/> - <xs:field xpath="@origin"/> - </xs:keyref> - - <xs:keyref name="_FK_Flight_destination" refer="_PK_Destination_id"> - <xs:selector xpath="flights/flight"/> - <xs:field xpath="@destination"/> - </xs:keyref> - - </xs:element> - - <xs:element name="airlines"> - <xs:complexType> - <xs:sequence> - <xs:element ref="airline" minOccurs="0" maxOccurs="unbounded"/> - </xs:sequence> - </xs:complexType> - - <xs:key name="_PK_Airline_id"> - <xs:selector xpath="airline"/> - <xs:field xpath="@id"/> - </xs:key> - - <xs:key name="_UN_Airline_name"> - <xs:selector xpath="airline"/> - <xs:field xpath="name"/> - </xs:key> - - <xs:key name="_UN_Airline_airlineCode"> - <xs:selector xpath="airline"/> - <xs:field xpath="@airlineCode"/> - </xs:key> - </xs:element> - - <xs:element name="airline"> - <xs:complexType> - <xs:sequence> - <xs:element name="name" type="xs:string"/> - </xs:sequence> - <xs:attribute name="id" type="xs:int" use="required"/> - <xs:attribute name="airlineCode" type="ICAOAirlineCode" use="required"/> - </xs:complexType> - </xs:element> - - <xs:element name="destinations"> - <xs:complexType> - <xs:sequence> - <xs:element ref="destination" minOccurs="0" maxOccurs="unbounded"/> - </xs:sequence> - </xs:complexType> - - <xs:key name="_PK_Destination_id"> - <xs:selector xpath="destination"/> - <xs:field xpath="@id"/> - </xs:key> - - <xs:key name="_UN_Destination_airportCode"> - <xs:selector xpath="destination"/> - <xs:field xpath="@airportCode"/> - </xs:key> - </xs:element> - - <xs:element name="destination"> - <xs:complexType> - <xs:sequence> - <xs:element name="fullName"/> - </xs:sequence> - <xs:attribute name="id" type="xs:int"/> - <xs:attribute name="airportCode" type="ICAOAirportCode"/> - </xs:complexType> - </xs:element> - - <xs:element name="flights"> - <xs:complexType> - <xs:sequence> - <xs:element ref="flight" minOccurs="0" maxOccurs="unbounded"/> - </xs:sequence> - </xs:complexType> - - <xs:key name="_PK_Flight_id"> - <xs:selector xpath="flight"/> - <xs:field xpath="@id"/> - </xs:key> - - <xs:key name="_UN_Flight_flightNumber"> - <xs:selector xpath="flight"/> - <xs:field xpath="flightNumber"/> - </xs:key> - - </xs:element> - - <xs:element name="flight"> - <xs:complexType> - <xs:sequence> - <xs:element name="flightNumber" type="xs:string"/> - </xs:sequence> - <xs:attribute name="id" type="xs:int" use="required"/> - <xs:attribute name="airline" type="xs:int" use="required"/> - <xs:attribute name="origin" type="xs:int"/> - <xs:attribute name="destination" type="xs:int"/> - <xs:assert test="not(@origin = @destination)"> - <xs:annotation> - <xs:documentation>CHECK constraint _CK_Flight_origin_destination</xs:documentation> - </xs:annotation> - </xs:assert> - </xs:complexType> - </xs:element> - -</xs:schema></programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="xmlAndJava"> - <title>Relating <abbrev - xlink:href="">schema</abbrev>'s - and <link linkend="gloss_Java"><trademark>Java</trademark></link> - class descriptions.</title> - - <para>We may also compare XML data constraints to <link - linkend="gloss_Java"><trademark>Java</trademark></link>. A <link - linkend="gloss_Java"><trademark>Java</trademark></link> class - declaration is actually a blueprint for a <trademark - xlink:href="">JRE</trademark> - to instantiate compatible objects. Likewise an XML schema restricts - well-formed documents:</para> - - <figure xml:id="fig_XmlAndJava"> - <title>XML <abbrev - xlink:href="">schema</abbrev>'s - and <link linkend="gloss_Java"><trademark>Java</trademark></link> - class declarations.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/xmlattribandjava.fig" scale="65"/> - </imageobject> - </mediaobject> - </figure> - </section> - - <section xml:id="xmlSchemaExercise"> - <title>XML schema exercises</title> - - <section xml:id="sectSchemaProductCatalog"> - <title>A product catalog</title> - - <qandaset defaultlabel="qanda" xml:id="quandaProductCatalog"> - <title>Product catalog schema</title> - - <qandadiv> - <qandaentry> - <question> - <para>Consider the following product catalog example:</para> - - <programlisting language="none"><catalog xmlns:xsi="" - xsi:noNamespaceSchemaLocation="catalog.xsd"> - <title>Outdoor products</title> - <introduction> - <para>We offer a great variety of basic stuff for mountaineering - such as ropes, harnesses and tents.</para> - <para>Our shop is proud for its large number of available - sleeping bags.</para> - </introduction> - <product id="x-223"> - <title>Multi freezing bag Nightmare camper</title> - <description> - <para>You will feel comfortable till minus 20 degrees - At - least if you are a penguin or a polar bear.</para> - </description> - </product> - <product id="r-334"> - <title>Rope 40m</title> - <description> - <para>Excellent for indoor climbing.</para> - </description> - </product> -</catalog></programlisting> - - <para>As you may have inferred the following rules shall - apply for arbitrary catalog documents:</para> - - <itemizedlist> - <listitem> - <para>Each <tag class="starttag">catalog</tag> shall - have exactly one <tag class="starttag">title</tag> and - <tag class="starttag">introduction</tag> element.</para> - </listitem> - - <listitem> - <para><tag class="starttag">introduction</tag> and <tag - class="starttag">description</tag> shall have at least - one <tag class="starttag">para</tag> child.</para> - </listitem> - - <listitem> - <para>Each <tag class="starttag">catalog</tag> shall - have at least one <tag - class="starttag">product</tag>.</para> - </listitem> - - <listitem> - <para>Each <tag class="starttag">product</tag> shall - have exactly one <tag class="starttag">title</tag> and - at least one <tag class="starttag">para</tag> child - element.</para> - </listitem> - - <listitem> - <para>The required <code>id</code> attribute shall not - contain whitespace and be unique with respect to all - <tag class="starttag">product</tag> elements.</para> - </listitem> - - <listitem> - <para>The attribute price shall represent money amounts - and be optional.</para> - </listitem> - </itemizedlist> - - <para>Provide a suitable <filename>catalog.xsd</filename> - schema.</para> - </question> - - <answer> - <programlisting language="none"><xs:schema xmlns:xs="" - xmlns:vc="" elementFormDefault="qualified" - vc:minVersion="1.0" vc:maxVersion="1.1"> - - <xs:simpleType name="money"> - <xs:restriction base="xs:decimal"> - <xs:fractionDigits value="2"/> - </xs:restriction> - </xs:simpleType> - - <xs:element name="title" type="xs:string"/> - <xs:element name="para" type="xs:string"/> - - <xs:element name="description" type="paraSequence"/> - <xs:element name="introduction" type="paraSequence"/> - - <xs:complexType name="paraSequence"> - <xs:sequence> - <xs:element ref="para" minOccurs="1" maxOccurs="unbounded"/> - </xs:sequence> - </xs:complexType> - - <xs:element name="product"> - <xs:complexType> - <xs:sequence> - <xs:element ref="title"/> - <xs:element ref="description"/> - </xs:sequence> - <xs:attribute name="id" type="xs:token" use="required"/> - <xs:attribute name="price" type="money" use="optional"/> - </xs:complexType> - </xs:element> - - <xs:element name="catalog"> - <xs:complexType> - <xs:sequence> - <xs:element ref="title"/> - <xs:element ref="introduction"/> - <xs:element ref="product" minOccurs="1" maxOccurs="unbounded"/> - </xs:sequence> - </xs:complexType> - - <xs:key name="uniqueProductId"> - <xs:selector xpath="product"></xs:selector> - <xs:field xpath="@id"/> - </xs:key> - </xs:element> - -</xs:schema></programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="sectQandaBookV1"> - <title>Book like documents</title> - - <qandaset defaultlabel="qanda" xml:id="example_operatorprecedence"> - <title>Book documents with mixed content and itemized - lists</title> - - <qandadiv> - <qandaentry xml:id="example_book_v2"> - <question> - <para>Extend the first version of <link - linkend="example_bookDtd">book.xsd</link> to support the - following features:</para> - - <itemizedlist> - <listitem> - <para>Within a <tag class="starttag">chapter</tag> node - <tag class="starttag">para</tag> and <tag - class="starttag">itemizedlist</tag> elements in - arbitrary order shall be allowed.</para> - </listitem> - - <listitem> - <para><tag class="starttag">itemizedlist</tag> nodes - shall contain at least one <tag - class="starttag">listitem</tag>.</para> - </listitem> - - <listitem> - <para><tag class="starttag">listitem</tag> nodes shall - be composed of one or more para or nested list item - elements.</para> - </listitem> - - <listitem> - <para>Within a <tag class="starttag">para</tag> we want - to be able to emphasize text passages.</para> - </listitem> - </itemizedlist> - - <para>The following sample document instance shall be - valid:</para> - - <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> -<book xmlns:xsi="" - xsi:noNamespaceSchemaLocation="catalog.xsd"> - <title>Introduction to Java</title> - <chapter> - <title>Introduction</title> - <para>Java supports <emphasis>lots</emphasis> of concepts:</para> - <itemizedlist> - <listitem> - <para>Single <emphasis>implementation</emphasis> inheritance.</para> - </listitem> - <listitem> - <para>Multiple <emphasis>interface</emphasis> inheritance.</para> - <itemizedlist> - <listitem><para>Built in types</para></listitem> - <listitem><para>User defined types</para></listitem> - </itemizedlist> - </listitem> - </itemizedlist> - </chapter> -</book></programlisting> - </question> - - <answer> - <para>An extended schema looks like:</para> - - <figure xml:id="paraListEmphasize"> - <title>Version 2 of book.xsd</title> - - <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> -<xs:schema xmlns:xs="" - xmlns:vc="" elementFormDefault="qualified" - vc:minVersion="1.0" vc:maxVersion="1.1"> - - <xs:import namespace="" schemaLocation="" /> - - - <xs:include schemaLocation="table.xsd"/> - - <!-- Type definitions --> - <xs:simpleType name="languageType"> - <xs:restriction base="xs:string"> - <xs:enumeration value="en"/> - <xs:enumeration value="fr"/> - <xs:enumeration value="de"/> - <xs:enumeration value="it"/> - <xs:enumeration value="es"/> - </xs:restriction> - </xs:simpleType> - - - <!-- Elements having no inner structure --> - <xs:element name="emphasis" type="xs:string"/> - <xs:element name="title" type="xs:string"/> - <xs:element name="link"> - <xs:complexType mixed="true"> - <xs:attribute name="linkend" type="xs:IDREF" use="required"/> - </xs:complexType> - </xs:element> - - <!-- Starting the game ... --> - <xs:element name="book"> - <xs:complexType> - <xs:sequence> - <xs:element ref="title"/> - <xs:element ref="chapter" minOccurs="1" maxOccurs="unbounded"/> - </xs:sequence> - <xs:attribute name="lang" type="languageType" use="optional"/> - </xs:complexType> - </xs:element> - - <xs:element name="chapter"> - <xs:complexType> - <xs:sequence> <co xml:id="figure_book.dtd_v2_chapter"/> - <xs:element ref="title"/> - <xs:choice minOccurs="1" maxOccurs="unbounded"> - <xs:element ref="para"/> - <xs:element ref="itemizedlist"/> - <xs:element ref="table"/> - </xs:choice> - </xs:sequence> - <xs:attribute name="id" type="xs:ID" use="optional"/> - <xs:attribute ref="xml:base"/> <!-- This allows for <xi:include ...> --> - </xs:complexType> - </xs:element> - - <xs:element name="para"> - <xs:complexType mixed="true"> <co - xml:id="figure_book.dtd_v2_para"/> - <xs:choice minOccurs="0" maxOccurs="unbounded"> - <xs:element ref="emphasis"/> - <xs:element ref="link"/> - </xs:choice> - <xs:attribute name="id" type="xs:ID" use="optional"/> - </xs:complexType> - </xs:element> - - <xs:element name="itemizedlist"> - <xs:complexType> - <xs:sequence> - <xs:element ref="listitem" minOccurs="1" <co - xml:id="figure_book.dtd_v2_itemizedlist"/> maxOccurs="unbounded"/> - </xs:sequence> - <xs:attribute name="id" type="xs:ID" use="optional"/> - </xs:complexType> - </xs:element> - - <xs:element name="listitem"> - <xs:complexType> - <xs:choice minOccurs="1" maxOccurs="unbounded"> <co - xml:id="figure_book.dtd_v2_listitem"/> - <xs:element ref="para"/> - <xs:element ref="itemizedlist"/> - </xs:choice> - </xs:complexType> - </xs:element> - -</xs:schema></programlisting> - - <caption> - <para>This allows emphasized text in <tag - class="starttag">para</tag> nodes and <tag - class="starttag">itemizedlists</tag>.</para> - </caption> - </figure> - - <calloutlist> - <callout arearefs="figure_book.dtd_v2_chapter"> - <para>We hook into <tag class="starttag">chapter</tag> - to allow arbitrary sequences of at least one <tag - class="starttag">para</tag> or <tag - class="starttag">itemizedlist</tag> element node.</para> - </callout> - - <callout arearefs="figure_book.dtd_v2_para"> - <para><tag class="starttag">para</tag> nodes now allow - mixed content.</para> - </callout> - - <callout arearefs="figure_book.dtd_v2_itemizedlist"> - <para>An <tag class="starttag">itemizedlist</tag> - contains at least one list item.</para> - </callout> - - <callout arearefs="figure_book.dtd_v2_listitem"> - <para>A <tag class="starttag">listitem</tag> contains a - sequence of at least one <tag - class="starttag">para</tag> or <tag - class="starttag">itemizedlist</tag> child node. The - latter gives rise to nested lists. We find a similar - construct in HTML namely unnumbered lists defined by - <code><UL><LI>... </code>constructs.</para> - </callout> - </calloutlist> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="sectQandaBookLang"> - <title>Allow different languages</title> - - <qandaset defaultlabel="qanda" xml:id="example_book.dtd_v3"> - <title>book.xsd and languages</title> - - <qandadiv> - <qandaentry> - <question> - <para>We want to extend our schema from <xref - linkend="example_book_v2"/> by allowing an author to define - the language to be used within the whole or parts of the - document in question. Add an attribute <code>lang</code> to - all relevant elements like e.g. <tag class="starttag">para - lang="es"</tag>. An XML editor may use this attribute to - activate corresponding dictionaries for spell - checking.</para> - - <para>The <code>lang</code> attribute shall be restricted to - the following values:</para> - - <itemizedlist> - <listitem> - <para><token>en</token></para> - </listitem> - - <listitem> - <para><token>fr</token></para> - </listitem> - - <listitem> - <para><token>de</token></para> - </listitem> - - <listitem> - <para><token>it</token></para> - </listitem> - - <listitem> - <para><token>es</token></para> - </listitem> - </itemizedlist> - </question> - - <answer> - <para>We define a suitable <tag - class="starttag">xs:attribute</tag> type:</para> - - <programlisting language="none"><xs:attribute <emphasis - role="bold">name="lang"</emphasis>> - <xs:simpleType> - <xs:restriction base="xs:string"> - <xs:enumeration value="en"/> - <xs:enumeration value="fr"/> - <xs:enumeration value="de"/> - <xs:enumeration value="it"/> - <xs:enumeration value="es"/> - </xs:restriction> - </xs:simpleType> -</xs:attribute></programlisting> - - <para>Than we add this attribute to our elements like <tag - class="starttag">chapter</tag> and others:</para> - - <programlisting language="none"> <xs:element name="chapter"> - <xs:complexType> - <xs:sequence> ... </xs:sequence> - <xs:attribute <emphasis role="bold">ref="lang"</emphasis> use="optional"/> - ... - </xs:complexType> - </xs:element></programlisting> - - <para>This allows us to set a language on arbitrary - hierarchy level. But of course we may define it on top level - as well:</para> - - <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> -<book ... lang="english"> - <title>Introduction to Java</title> -...</programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="sectMixQuotes"> - <title>Mixing attribute quotes</title> - - <qandaset defaultlabel="qanda" xml:id="example_quotes"> - <title>Single and double quotes reconsidered</title> - - <qandadiv> - <qandaentry> - <question> - <para>We recall the problem of nested quotes yielding - non-well formed XML code:</para> - - <programlisting language="none"><img src="bold.gif" alt="We may use "quotes" here" /></programlisting> - - <para>The XML specification defines legal attribute value - definitions as:</para> - - <productionset> - <title><link - xlink:href="">Literals</link></title> - - <production xml:id="w3RecXml_NT-EntityValue"> - <lhs>EntityValue</lhs> - - <rhs>'"' ([^%&"] | <nonterminal - def="#w3RecXml_NT-PEReference">PEReference</nonterminal> - | <nonterminal - def="#w3RecXml_NT-Reference">Reference</nonterminal>)* - '"' | "'" ([^%&'] | <nonterminal - def="#w3RecXml_NT-PEReference">PEReference</nonterminal> - | <nonterminal - def="#w3RecXml_NT-Reference">Reference</nonterminal>)* - "'"</rhs> - </production> - - <production xml:id="w3RecXml_NT-AttValue"> - <lhs>AttValue</lhs> - - <rhs>'"' ([^<&"] | <nonterminal - def="#w3RecXml_NT-Reference">Reference</nonterminal>)* - '"' | "'" ([^<&'] | <nonterminal - def="#w3RecXml_NT-Reference">Reference</nonterminal>)* - "'"</rhs> - </production> - - <production xml:id="w3RecXml_NT-SystemLiteral"> - <lhs>SystemLiteral</lhs> - - <rhs>('"' [^"]* '"') | ("'" [^']* "'")</rhs> - </production> - - <production xml:id="w3RecXml_NT-PubidLiteral"> - <lhs>PubidLiteral</lhs> - - <rhs>'"' <nonterminal - def="#w3RecXml_NT-PubidChar">PubidChar</nonterminal>* - '"' | "'" (<nonterminal - def="#w3RecXml_NT-PubidChar">PubidChar</nonterminal> - - "'")* "'"</rhs> - </production> - - <production xml:id="w3RecXml_NT-PubidChar"> - <lhs>PubidChar</lhs> - - <rhs>#x20 | #xD | #xA | [a-zA-Z0-9] - | [-'()+,./:=?;!*#@$_%]</rhs> - </production> - </productionset> - - <para>Find out how it is possible to set the attribute <tag - class="attribute">alt</tag>'s value to the string <code>We - may use "quotes" here</code>.</para> - </question> - - <answer> - <para>The production rule for attribute values reads:</para> - - <productionset> - <productionrecap linkend="w3RecXml_NT-AttValue"/> - </productionset> - - <para>This allows us to use either of two alternatives to - delimit attribute values:</para> - - <glosslist> - <glossentry> - <glossterm><tag class="starttag">img ... - alt="..."/</tag></glossterm> - - <glossdef> - <para><emphasis>Validity constraint:</emphasis> do not - use <code>"</code> inside the value string.</para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm><tag class="starttag">img ... - alt='...'/</tag></glossterm> - - <glossdef> - <para><emphasis>Validity constraint:</emphasis> do not - use <code>'</code> inside the value string.</para> - </glossdef> - </glossentry> - </glosslist> - - <para>We may take advantage of the second rule:</para> - - <programlisting language="none"><img src="bold.gif" alt='We may use "quotes" here' /></programlisting> - - <para>Notice that according to <xref - linkend="w3RecXml_NT-AttValue"/> the delimiting quotes must - not be mixed. The following code is thus not well - formed:</para> - - <programlisting language="none"><img src="bold.gif'/></programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="qandasetInternalRef"> - <title>Internal references</title> - - <qandaset defaultlabel="qanda" xml:id="example_book.dtd_v5"> - <title>book.xsd and internal references</title> - - <qandadiv> - <qandaentry> - <question> - <para>We want to extend <xref - linkend="example_book.dtd_v3"/> schema to allow for document - internal references by:</para> - - <itemizedlist> - <listitem> - <para>Allowing each <tag class="starttag">chapter</tag>, - <tag class="starttag">para</tag> and <tag - class="starttag">itemizedlist</tag> to become reference - targets.</para> - </listitem> - - <listitem> - <para>Extending the element <tag - class="element">para</tag>'s mixed content model by a - new element <tag class="element">link</tag> with an - attribute <tag class="attribute">linkend</tag> being a - reference to a target.</para> - </listitem> - </itemizedlist> - </question> - - <answer> - <para>We extend our schema:</para> - - <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> -<xs:schema xmlns:xs="" - xmlns:vc="" elementFormDefault="qualified" - vc:minVersion="1.0" vc:maxVersion="1.1"> - - <xs:import namespace="" schemaLocation="" /> - - - <xs:include schemaLocation="table.xsd"/> - - <!-- Type definitions --> - - <xs:attribute name="lang"> - <xs:simpleType> - <xs:restriction base="xs:string"> - <xs:enumeration value="en"/> - <xs:enumeration value="fr"/> - <xs:enumeration value="de"/> - <xs:enumeration value="it"/> - <xs:enumeration value="es"/> - </xs:restriction> - </xs:simpleType> - </xs:attribute> - - <!-- Elements having no inner structure --> - <xs:element name="emphasis" type="xs:string"/> - <xs:element name="title" type="xs:string"/> - <xs:element name="link"> - <xs:complexType mixed="true"> <co - xml:id="progamlisting_book_v5_link"/> - <xs:attribute name="linkend" <co - xml:id="progamlisting_book_v5_link_linkend"/> type="xs:IDREF" use="required"/> - </xs:complexType> - </xs:element> - - <!-- Starting the game ... --> - <xs:element name="book"> - <xs:complexType> - <xs:sequence> - <xs:element ref="title"/> - <xs:element ref="chapter" minOccurs="1" maxOccurs="unbounded"/> - </xs:sequence> - <xs:attribute ref="lang" use="optional"/> - </xs:complexType> - </xs:element> - - <xs:element name="chapter"> - <xs:complexType> - <xs:sequence> - <xs:element ref="title"/> - <xs:choice minOccurs="1" maxOccurs="unbounded"> - <xs:element ref="para"/> - <xs:element ref="itemizedlist"/> - <xs:element ref="table"/> - </xs:choice> - </xs:sequence> - <xs:attribute ref="lang" use="optional"/> - <xs:attribute name="id" <co - xml:id="progamlisting_book_v5_chapter_id"/> type="xs:ID" use="optional"/> - <xs:attribute ref="xml:base"/> <!-- This allows for <xi:include ...> --> - </xs:complexType> - </xs:element> - - <xs:element name="para"> - <xs:complexType mixed="true"> <co - xml:id="progamlisting_book_v5_mixed_link"/> - <xs:choice minOccurs="0" maxOccurs="unbounded"> - <xs:element ref="emphasis"/> - <xs:element ref="link"/> - </xs:choice> - <xs:attribute ref="lang" use="optional"/> - <xs:attribute name="id" <co - xml:id="progamlisting_book_v5_para_id"/> type="xs:ID" use="optional"/> - </xs:complexType> - </xs:element> - - <xs:element name="itemizedlist"> - <xs:complexType> - <xs:sequence> - <xs:element ref="listitem" minOccurs="1" maxOccurs="unbounded"/> - </xs:sequence> - <xs:attribute ref="lang" use="optional"/> - <xs:attribute name="id" type="xs:ID" use="optional"/> - </xs:complexType> - </xs:element> - - <xs:element name="listitem"> - <xs:complexType> - <xs:choice minOccurs="1" maxOccurs="unbounded"> - <xs:element ref="para"/> - <xs:element ref="itemizedlist"/> - </xs:choice> - <xs:attribute ref="lang" use="optional"/> - </xs:complexType> - </xs:element> - -</xs:schema></programlisting> - - <calloutlist> - <callout arearefs="progamlisting_book_v5_chapter_id"> - <para>Defining an attribute <tag - class="attribute">id</tag> of type <code>ID</code> for - the elements <tag class="element">chapter</tag>, <tag - class="element">para</tag> and <tag - class="element">itemizedList</tag>. This enables an - author to define internal reference targets.</para> - </callout> - - <callout arearefs="progamlisting_book_v5_mixed_link"> - <para>A link is part of the element <tag - class="element">para</tag>'s mixed content model. Thus - an author may define internal references along with - ordinary text.</para> - </callout> - - <callout arearefs="progamlisting_book_v5_link"> - <para>Like in HTML a link may contain text. If converted - to HTML the formatting expectation is a hypertext - link.</para> - </callout> - - <callout arearefs="progamlisting_book_v5_link_linkend"> - <para>The attribute <tag class="attribute">linkend</tag> - holds the reference to an internal target being either a - <tag class="element">chapter</tag>, a <tag - class="element">para</tag> or an <tag - class="element">itemizedList</tag>.</para> - </callout> - </calloutlist> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - </section> - </section> - </chapter> - - <chapter xml:id="xsl"> - <title>The Extensible Stylesheet Language XSL</title> - - <para>XSL is a <link xlink:href="">W3C - standard</link> which defines a language to transform XML documents into - the following output formats:</para> - - <itemizedlist> - <listitem> - <para>Ordinary text e.g in <link - xlink:href="">Unicode</link> encoding.</para> - </listitem> - - <listitem> - <para>XML.</para> - </listitem> - - <listitem> - <para>HTML</para> - </listitem> - - <listitem> - <para>XHTML</para> - </listitem> - </itemizedlist> - - <para>Transforming a source XML document into a target XML document may be - required if:</para> - - <itemizedlist> - <listitem> - <para>The target document expresses similar semantics but uses a - different XML dialect i.e. different tag names.</para> - </listitem> - - <listitem> - <para>The target document is only a view on the source document. We - may for example extract the chapter names from a <tag - class="starttag">book</tag> document to create a table of - contents.</para> - </listitem> - </itemizedlist> - - <section xml:id="xsl_helloworld"> - <title>A <quote>Hello, world</quote> <abbrev - xlink:href="">XSL</abbrev> example</title> - - <para>We start from an extended version of our - <filename>memo.xsd</filename>:</para> - - <programlisting language="none"><xs:schema xmlns:xs="" - xmlns:vc="" elementFormDefault="qualified" - vc:minVersion="1.0" vc:maxVersion="1.1"> - -<xs:element name="memo"> - <xs:complexType> - <xs:sequence> - <xs:element name="from" type="Person"/> - <xs:element name="to" type="Person" minOccurs="1" maxOccurs="unbounded"/> - <xs:element name="subject" type="xs:string"/> - <xs:element ref="content"/> - </xs:sequence> - <xs:attribute name="date" type="xs:date" use="required"/> - <xs:attribute name="priority" type="Priority" use="optional"/> - </xs:complexType> - </xs:element> - - <xs:complexType name="Person"> - <xs:simpleContent> - <xs:extension base="xs:string"> - <xs:attribute name="id" type="xs:ID"/> - </xs:extension> - </xs:simpleContent> - </xs:complexType> - - <xs:element name="content"> - <xs:complexType> - <xs:sequence> - <xs:element ref="para" minOccurs="1" maxOccurs="unbounded"/> - </xs:sequence> - </xs:complexType> - - </xs:element> - - <xs:element name="para"> - <xs:complexType mixed="true"> - <xs:sequence> - <xs:element ref="link" minOccurs="0"/> - </xs:sequence> - </xs:complexType> - </xs:element> - - <xs:element name="link"> - <xs:complexType mixed="true"> - <xs:simpleContent> - <xs:extension base="xs:string"> - <xs:attribute name="linkend" type="xs:IDREF"/> - </xs:extension> - </xs:simpleContent> - </xs:complexType> - </xs:element> - - <xs:simpleType name="Priority"> - <xs:restriction base="xs:string"> - <xs:enumeration value="low"/> - <xs:enumeration value="medium"/> - <xs:enumeration value="high"/> - </xs:restriction> - </xs:simpleType> - -</xs:schema></programlisting> - - <para>This schema allows a memo's document content to be structured into - paragraphs. A paragraph may contain links either to the sender or to a - recipient.</para> - - <figure xml:id="figure_memoref_instance"> - <title>A memo document instance with an internal reference.</title> - - <programlisting language="none"><memo xmlns:xsi="" - xsi:noNamespaceSchemaLocation="memo.xsd" - date="2014-09-24" priority="high" > - <from <emphasis role="bold">id="goik"</emphasis>>Martin Goik</from> - <to>Adam Hacker</to> - <to id="eve">Eve Intruder</to> - <subject>Firewall problems</subject> - <content> - <para>Thanks for your excellent work.</para> - <para>Our firewall is definitely broken! This bug has been reported by - the <link <emphasis role="bold">linkend="goik"</emphasis>>sender</link>.</para> - </content> -</memo></programlisting> - </figure> - - <para>We want to extract the sender's name from an arbitrary <tag - class="element">memo</tag> document instance. Using <abbrev - xlink:href="">XSL</abbrev> this task can be - accomplished by a script <filename>memo2sender.xsl</filename>:</para> - - <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> -<xsl:stylesheet xmlns:xsl="" - version="2.0"> - - <xsl:output method="text"/> - - <xsl:template match="/memo"> - <xsl:value-of select="from"/> - </xsl:template> - -</xsl:stylesheet></programlisting> - - <para>Before closer examining this code we first show its effect. We - need a piece of software called a <abbrev - xlink:href="">XSL</abbrev> processor. It - reads both a <tag>memo</tag> document instance and a style sheet and - produces the following output:</para> - - <programlisting language="none"><computeroutput>[goik@mupter Memoref]$ xml2xml message.xml memo2sender.xsl -Martin Goik</computeroutput></programlisting> - - <para>The result is the sender's name <computeroutput>Martin - Goik</computeroutput>. We may sketch the transformation - principle:</para> - - <figure xml:id="figure_xsl_principle"> - <title>An <abbrev - xlink:href="">XSL</abbrev> processor - transforming a XML document into a result using a stylesheet</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/xslconvert.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>The executable <filename>xml2xml</filename> defined at the MI - department is actually a script wrapping the <productname - xlink:href="">Saxon XSLT - processor</productname>. We may also use the Eclipse/Oxygen plugin - replacing the shell command by a GUI <link - xlink:href="">as - being described in the corresponding documentation</link>. Next we - closer examine the <abbrev - xlink:href="">XSL</abbrev> example - code:</para> - - <programlisting language="none"><xsl:stylesheet <co - xml:id="programlisting_helloxsl_stylesheet"/> xmlns:xsl <co - xml:id="programlisting_helloxsl_namespace_abbv"/> ="" - version="2.0" <co xml:id="programlisting_helloxsl_xsl_version"/> > - - <xsl:output method="text" <co - xml:id="programlisting_helloxsl_method_text"/>/> - - <xsl:template <co xml:id="programlisting_helloxsl_template"/> match <co - xml:id="programlisting_helloxsl_match"/> ="/memo"> - <xsl:value-of <co xml:id="programlisting_helloxsl_value-of"/> select <co - xml:base="" xml:id="programlisting_helloxsl_valueof_select_att"/> ="from" /> - </xsl:template> - -</xsl:stylesheet></programlisting> - - <calloutlist> - <callout arearefs="programlisting_helloxsl_stylesheet"> - <para>The element stylesheet belongs the the namespace - <code></code>. This namespace is - <emphasis>represented</emphasis> by the literal - <literal>xsl</literal>. As an alternative we might also use <tag - class="starttag">stylesheet - xmlns=""</tag> instead of <tag - class="starttag">xsl:stylesheet ...</tag>. The value of the - namespace itself gets defined next.</para> - </callout> - - <callout arearefs="programlisting_helloxsl_namespace_abbv"> - <para>The keyword <code>xmlns</code> is reserved by the <link - xlink:href="">Namespaces in - XML</link> specification. In <quote>pure</quote> XML the whole term - <code>xmlns:xsl</code> would simply define an attribute. In presence - of a namespace aware XML parser however the literal - <literal>xsl</literal> represents the attribute value <tag - class="attvalue"></tag>. This - value <emphasis>must not</emphasis> be changed! Otherwise a XSL - converter will fail since it cannot distinguish processing - instructions from other XML elements. An element <tag - class="starttag">stylesheet</tag> belonging to a different namespace - <code>http//</code> may have to be - generated.</para> - </callout> - - <callout arearefs="programlisting_helloxsl_xsl_version"> - <para>The <link xlink:href="">XSL - standard</link> is still evolving. The version number identifies the - conformance level for the subsequent code.</para> - </callout> - - <callout arearefs="programlisting_helloxsl_method_text"> - <para>The <tag class="attribute">method</tag> attribute in the <link - xlink:href=""><xsl:output></link> - element specifies the type of output to be generated. Depending on - this type we may also define indentation depths and/or encoding. - Allowed <tag class="attvalue">method</tag> values are:</para> - - <glosslist> - <glossentry> - <glossterm>text</glossterm> - - <glossdef> - <para>Ordinary text.</para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm>html</glossterm> - - <glossdef> - <para><link - xlink:href="">HTML</link> - markup.</para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm>xhtml</glossterm> - - <glossdef> - <para><link - xlink:href="">Xhtml</link> markup - differing from the former by e.g. the closing - <quote>/></quote> in <tag><img - src="..."/></tag>.</para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm>xml</glossterm> - - <glossdef> - <para>XML code. This is most commonly used to create views on - or different dialects of a XML document instance.</para> - </glossdef> - </glossentry> - </glosslist> - </callout> - - <callout arearefs="programlisting_helloxsl_template"> - <para>A <tag class="starttag">xsl:template</tag> defines the output - that will be created for document nodes being defined by a - selector.</para> - </callout> - - <callout arearefs="programlisting_helloxsl_match"> - <para>The attribute <tag class="attribute">match</tag> tells us for - which nodes of a document instance the given <tag - class="starttag">xsl:template</tag> is appropriate. In the given - example the value <code>/memo</code> tells us that the template is - only responsible for <tag class="element">memo</tag> nodes appearing - at top level i.e. being the root element of the document - instance.</para> - </callout> - - <callout arch="" - arearefs="programlisting_helloxsl_value-of programlisting_helloxsl_valueof_select_att"> - <para>A <tag class="element">value-of</tag> element writes content - to the <abbrev xlink:href="">XSL</abbrev> - process' output. In this example the <code>#PCDATA</code> content - from the element <tag class="element">from</tag> will be written to - the output.</para> - </callout> - </calloutlist> - </section> - - <section xml:id="xpath"> - <title><link xlink:href="">XPath</link> and - node sets</title> - - <para>The <acronym - xlink:href="">XPath</acronym> standard allows - us to retrieve node sets from XML documents by predicate based queries. - Thus its role may be compared to <acronym - xlink:href="">SQL</acronym> - <code>SELECT</code> ... <code>FROM</code> ...<code>WHERE</code> queries. - Some simple examples:</para> - - <figure xml:id="fig_Xpath"> - <title>Simple <acronym - xlink:href="">XPath</acronym> - queries</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/xpath.fig" scale="65"/> - </imageobject> - </mediaobject> - </figure> - - <para>We are now interested in a list of all recipients being defined in - a <tag class="element">memo</tag> element. We introduce the element <tag - class="element">xsl:for-each</tag> which iterates over a result set of - nodes:</para> - - <figure xml:id="programlisting_tolist_xpath"> - <title>Iterating over the list of recipient nodes.</title> - - <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> - -<xsl:stylesheet xmlns:xsl="" - version="2.0"> - - <xsl:output method="text"/> - - <xsl:template match="/" <co xml:id="programlisting_tolist_match_root"/>> - <xsl:for-each select="memo/to" <co - xml:id="programlisting_tolist_xpath_memo_to"/> > - <xsl:value-of select="." <co xml:id="programlisting_tolist_value_of"/> /> - <xsl:text>,</xsl:text> <co - xml:id="programlisting_tolist_xsl_text"/> - </xsl:for-each> - </xsl:template> - -</xsl:stylesheet></programlisting> - </figure> - - <calloutlist> - <callout arearefs="programlisting_tolist_match_root"> - <para>This template matches the XML document instance, - <emphasis>not</emphasis> the visible <tag - class="element"><memo></tag> node.</para> - </callout> - - <callout arearefs="programlisting_tolist_xpath_memo_to"> - <para>The <link xlink:href="">XPath</link> - expression <tag class="attvalue">memo/to</tag> gets evaluated - starting from the invisible top level document node being the - context node. For the given document instance this will define a - result set containing both <tag class="element"><to></tag> - recipient nodes, see <xref - linkend="figure_memo_xpath_memo_to"/>.</para> - </callout> - - <callout arearefs="programlisting_tolist_value_of"> - <para>The dot <quote>.</quote> represents the <code>#PCDATA</code> - content of the current <tag class="element">to</tag> element.</para> - </callout> - - <callout arearefs="programlisting_tolist_xsl_text"> - <para>A comma is appended. This is not quite correct since it should - be absent for the last element.</para> - </callout> - </calloutlist> - - <figure xml:id="figure_recipientlist_trailing_comma"> - <title>A list of recipients.</title> - - <para>The <abbrev - xlink:href="">XSL</abbrev> presented before - yields:</para> - - <programlisting language="none"><computeroutput>Adam Hacker,Eve Intruder</computeroutput><emphasis - role="bold">,</emphasis></programlisting> - </figure> - - <para>Right now we do not bother about the trailing <quote>,</quote> - after the last recipient. The surrounding - <code><xsl:text></code>,<code></xsl:text></code> elements - <emphasis>may</emphasis> be omitted. We encourage the reader to leave - them in place since they increase readability when a template's body - gets more complex. The element <tag class="starttag">xsl:text</tag> is - used to append static text to the output. This way we append a separator - after each recipient. We now discuss the role of the two attributes <tag - class="attribute">match="/"</tag> and <tag - class="attribute">select=memo/to</tag>. Both are examples of so called - <link xlink:href="">XPath</link> expressions. - They allow to define <emphasis>node sets</emphasis> being subsets from - the set of all nodes from a given document instance.</para> - - <para>Conceptually <link - xlink:href="">XPath</link> expressions may be - compared to the <acronym - xlink:href="">SQL</acronym> language the - latter allowing the retrieval of data<emphasis>sets</emphasis> from a - relational database. We illustrate the current example by a - figure:</para> - - <figure xml:id="figure_memo_xpath_memo_to"> - <title>Selecting node sets from <tag class="element">memo</tag> - document instances</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/memoxpath.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>This figure needs some explanation. We observe an additional node - <quote>above</quote> <tag class="starttag">memo</tag> being represented - as <quote>filled</quote>. This node represents the document instance as - a whole and has got <tag>memo</tag> as its only child. We will - rediscover this additional root node when we discuss the <abbrev - xlink:href="">DOM</abbrev> - application programming interface.</para> - - <para>As already mentioned the expression <code>memo/to</code> evaluates - to a <emphasis>set</emphasis> of nodes. In our example this set consists - of two nodes of type <tag class="starttag">to</tag> each of them - representing a recipient of the memo. We observe a subtle difference - between the two <abbrev - xlink:href="">XPath</abbrev> - expressions:</para> - - <glosslist> - <glossentry> - <glossterm><code>match="/"</code></glossterm> - - <glossdef> - <para>The expression starts and actually consists of the string - <quote>/</quote>. Thus it can be called an - <emphasis>absolute</emphasis> <abbrev - xlink:href="">XPath</abbrev> expression. - Like a file specification <filename>C:\dos\myprog.exe</filename> - it starts on top level and needs no further context information to - get evaluated.</para> - - <para>A <abbrev - xlink:href="">XSL</abbrev> style sheet - <emphasis>must</emphasis> have an <link - xlink:href="">initial - context node</link> to start the transformation. This is achieved - by providing exactly one <tag class="starttag">xsl:template</tag> - with an absolute <abbrev - xlink:href="">XPath</abbrev> value for - its <tag class="attribute">match</tag> attribute like <tag - class="attvalue">/memo</tag>.<emphasis/></para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm><code>select="memo/to"</code></glossterm> - - <glossdef> - <para>This expression can be compared to a - <emphasis>relative</emphasis> file path specification like e.g. - <filename>../images/hdm.gif</filename>. We need to add the base - (context) directory in order for a relative file specification to - become meaningful. If the base directory is - <filename>/home/goik/xml</filename> than this - <emphasis>relative</emphasis> file specification will address the - file <filename>/home/goik/images/hdm.gif</filename>.</para> - - <para>Likewise we have to define a <emphasis>context</emphasis> - node if we want to evaluate a relative <abbrev - xlink:href="">XPath</abbrev> expression. - In our example this is the root node. The XSL specification - introduces the term <link - xlink:href="">evaluation - context</link> for this purpose.</para> - </glossdef> - </glossentry> - </glosslist> - - <para>In order to explain relative <abbrev - xlink:href="">XPath</abbrev> expressions we - consider <code>content/para</code> starting from the (unique!) <tag - class="element">memo</tag> node:</para> - - <figure xml:id="memoXpathPara"> - <title>The node set represented by <code>content/para</code> starting - at the context node <tag class="starttag">memo</tag>.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/memorelativexpath.fig"/> - </imageobject> - - <caption> - <para>The dashed lines represent the relative <abbrev - xlink:href="">XPath</abbrev> expressions - starting from the context node to each of the nodes in the result - set.</para> - </caption> - </mediaobject> - </figure> - </section> - - <section xml:id="xsl_important_elements"> - <title>Some important <abbrev - xlink:href="">XSL</abbrev> elements</title> - - <section xml:id="xsl_if"> - <title><tag class="starttag">xsl:if</tag></title> - - <para>Sometimes we need conditional processing rules. We might want - create a list of sender and recipients with a defined value for the - attribute <tag class="attribute">id</tag>. In the <link - linkend="figure_memoref_instance">given example</link> this is only - valid for the (unique) sender and the recipient <code><to - id="eve">Eve Intruder</to></code>. We assume this set of - persons shall be inserted into a relational database table - <code>Customer</code> consisting of two <code>NOT NULL</code> columns - <code>id</code> an <code>name</code>. Thus both attributes - <emphasis>must</emphasis> be specified and we must exclude <tag - class="starttag">from</tag> or <tag class="starttag">to</tag> nodes - with undefined <tag class="attribute">id</tag> attributes:</para> - - <figure xml:id="programlisting_memo_export_sql"> - <title>Exporting SQL statements.</title> - - <programlisting language="none">... -<xsl:variable name="newline" <co xml:id="programlisting_xsl_if_definevar"/>> <!-- A newline \n --> - <xsl:text> -</xsl:text> -</xsl:variable> - -<xsl:template match="/memo"> - <xsl:for-each select="from|to" <co xml:id="programlisting_xsl_if_foreach"/>> - <xsl:if <emphasis role="bold">test="@id"</emphasis> <co - xml:id="programlisting_xsl_if_test"/>> - <xsl:text>INSERT INTO Customer (id, name) VALUES ('</xsl:text> - <xsl:value-of select="@id" <co - xml:id="programlisting_xsl_if_select_idattrib"/>/> - <xsl:text>', '</xsl:text> - <xsl:value-of select="." <co - xml:id="programlisting_xsl_if_selectcontent"/>/> - <xsl:text>')</xsl:text> - <xsl:value-of select="$newline" <co - xml:id="programlisting_xsl_if_usevar"/>/> - </xsl:if> - </xsl:for-each> -</xsl:template></programlisting> - - <caption> - <para>We want to export data from XML documents to a database - server. For this purpose INSERT statements are being crafted from - a XML document containing relevant data.</para> - </caption> - </figure> - - <calloutlist> - <callout arearefs="programlisting_xsl_if_definevar"> - <para>Define a file local variable <code>newline</code>. Dealing - with text output frequently requires the insertion of newlines. - Due to the syntax of the <tag class="element">xsl:text</tag> - elements this tends to clutter the code.</para> - </callout> - - <callout arearefs="programlisting_xsl_if_foreach"> - <para>Iterate over the set of the sender node and all recipient - nodes.</para> - </callout> - - <callout arearefs="programlisting_xsl_if_test"> - <para>The attribute value of <tag class="attribute">test</tag> - will be <link - xlink:href="">evaluated</link> - as a boolean. In this example it evaluates to <code>true</code> - iff the attribute <tag class="attribute">id</tag> is defined for - the context node. Since we are inside the <tag - class="element">xsl:for-each</tag> block all context nodes are - either of type <tag class="starttag">from</tag> or <tag - class="starttag">to</tag> and thus <emphasis>may</emphasis> have - an <tag class="attribute">id</tag> attribute.</para> - </callout> - - <callout arearefs="programlisting_xsl_if_select_idattrib"> - <para>The <tag class="attribute">id</tag> attributes value is - copied to the output. The <quote>@</quote> character in - <code>select="@id"</code> tells the <abbrev - xlink:href="">XSL</abbrev> processor to - read the value of an <emphasis>attribute</emphasis> with name <tag - class="attribute">id</tag> rather then the content of a nested - sub<emphasis>element</emphasis> like in <code><to - id="foo"><id>I am - nested!</id></to></code>.</para> - </callout> - - <callout arearefs="programlisting_xsl_if_selectcontent"> - <para>As stated earlier the dot <quote>.</quote> denotes the - current context element. In this example simply the - <code>#PCDATA</code> content is copied to the output.</para> - </callout> - - <callout arearefs="programlisting_xsl_if_usevar"> - <para>The <quote>$</quote> sign in front of <code>newline</code> - tells the <abbrev - xlink:href="">XSL</abbrev> processor to - access the variable <varname>newline</varname> previously defined - in <coref linkend="programlisting_xsl_if_definevar"/> rather then - interpreting it as the name of a sub element or an - attribute.</para> - </callout> - </calloutlist> - - <para>As expected the recipient entry <quote>Adam Hacker</quote> does - not appear due to the fact that no <tag class="attribute">id</tag> - attribute is defined in its <tag class="starttag">to</tag> - element:</para> - - <programlisting language="none"><computeroutput>INSERT INTO Customer (id, name) VALUES ('goik', 'Martin Goik') -INSERT INTO Customer (id, name) VALUES ('eve', 'Eve intruder')</computeroutput></programlisting> - - <qandaset defaultlabel="qanda" xml:id="example_position_last"> - <title>The XPath functions position() and last()</title> - - <qandadiv> - <qandaentry> - <question> - <para>We return to our recipient list in <xref - linkend="figure_recipientlist_trailing_comma"/>. We are - interested in a list of recipients avoiding the trailing - comma:</para> - - <programlisting language="none"><computeroutput>Adam Hacker,Eve Intruder</computeroutput></programlisting> - - <para>We may use a <tag class="element">xsl:if</tag> to insert - a comma for all but the very last recipient node. This can be - achieved by using the <abbrev - xlink:href="">XSL</abbrev> - functions <link - xlink:href="">position()</link> - and <link - xlink:href="">last()</link>. - Hint: The arithmetic operator <quote><</quote> may be used - in <abbrev - xlink:href="">XSL</abbrev> to - compare two integer numbers. However it must be escaped as - <code>&lt;</code> in order to be XML compatible.</para> - </question> - - <answer> - <para>We have to exclude the comma for the last node of the - recipient list. If we have e.g. 10 recipients the function - <code>position()</code> will return values integer values - starting at 1 and ending with 10. So for the last node the - comparison <code>10 < 10</code> will evaluate to - false:</para> - - <programlisting language="none"><xsl:for-each select="memo/to"> - <xsl:value-of select="."/> - <xsl:if test="position() &lt; last()"> - <xsl:text>,</xsl:text> - </xsl:if> -</xsl:for-each></programlisting> - </answer> - </qandaentry> - - <qandaentry xml:id="example_avoid_xsl_if"> - <question> - <label>Avoiding xsl:if</label> - - <para>In <xref linkend="programlisting_memo_export_sql"/> we - used the <abbrev - xlink:href="">XPath</abbrev> value - <quote>from|to</quote> to select the desired sender and - recipient nodes. Inside the <tag - class="element">xsl:for-each</tag> block we permitted only - those nodes which have an <tag class="attribute">id</tag> - attribute. These two steps may be combined into a single - <abbrev xlink:href="">XPath</abbrev> - expression obsoleting the <tag - class="element">xsl:if</tag>.</para> - </question> - - <answer> - <para>We simply need a modified <abbrev - xlink:href="">XPath</abbrev> in the - <tag class="element">for-each</tag>:</para> - - <programlisting language="none"><xsl:for-each select="<emphasis - role="bold">from[@id]|to[@id]</emphasis>"> - <xsl:text>INSERT INTO Customer (id, name) VALUES ('</xsl:text> - <xsl:value-of select="@id"/> - <xsl:text>', '</xsl:text> - <xsl:value-of select="."/> - <xsl:text>')</xsl:text> - <xsl:value-of select="$newline"/> -</xsl:for-each></programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="xsl_apply_templates"> - <title><tag class="starttag">xsl:apply-templates</tag></title> - - <para>We already used <tag class="element">xsl:for-each</tag> to - iterate over a list of element nodes. <abbrev - xlink:href="">XSL</abbrev> offers a - different possibility for this purpose. The idea is to define the - formatting rules at a centralized location. So the solution to <xref - linkend="example_position_last"/> in an equivalent way:</para> - - <programlisting language="none"><xsl:template match="/"> - <xsl:apply-templates select="memo/to" <co - xml:id="programlisting_apply_templates_apply"/>/> -</xsl:template> - -<xsl:template match="to" <co xml:id="programlisting_apply_templates_match"/>> - <xsl:value-of select="."/> - <xsl:if test="<emphasis role="bold">position()</emphasis> &lt; <emphasis - role="bold">last()</emphasis>"> - <xsl:text>,</xsl:text> - </xsl:if> -</xsl:template></programlisting> - - <calloutlist> - <callout arearefs="programlisting_apply_templates_apply"> - <para>Definition of the recipient node list. Each element of this - list shall be processed further.</para> - </callout> - - <callout arearefs="programlisting_apply_templates_match"> - <para>This template <emphasis>may</emphasis> be used by a XSL - processor to format nodes of type <tag class="starttag">to</tag>. - Since the processor is asked to do exactly this in <xref - linkend="programlisting_apply_templates_apply"/> the current - template will <emphasis>really</emphasis> be used in this - example.</para> - </callout> - </calloutlist> - - <para>The procedure outlined above may have the following - advantages:</para> - - <itemizedlist> - <listitem> - <para>Some elements may appear at different places of a given - document hierarchy. For example a <tag - class="starttag">title</tag> element is likely to appear as a - child of chapters, sections, tables figures and so on. It may be - sufficient to define a single template with a - <code>match="title"</code> attribute which contains all rules - being required.</para> - </listitem> - - <listitem> - <para>Sometimes the body of a <tag - class="starttag">xsl:for-each</tag> ... <tag - class="endtag">xsl:for-each</tag> spans multiple screens thus - limiting code readability. Factoring out the body into a template - may avoid this obstacle.</para> - </listitem> - </itemizedlist> - - <para>This method is well known from programming languages: If the - code inside a loop is needed multiple times or reaches a painful line - count <emphasis>good</emphasis> programmers tend to define a separate - method. For example:</para> - - <programlisting language="none">for (int i = 0; i < 10; i++){ - if (a[i] < b[i]){ - max[i] = b; - } else { - max[i] = a; - } - ... -}</programlisting> - - <para>Inside the loop's body the relative maximum value of two - variables gets computed. This may be needed at several locations and - thus it is convenient to centralize this code into a method:</para> - - <programlisting language="none">// cf. <xsl:template match="..."> -static int maximum(int a, int b){ - if (a < b){ - return b; - } else { - return a; - } -} -... -// cf. <xsl:apply-templates select="..."/> -for (int i = 0; i < 10; i++){ - max[i] = maximum(a[i], b[i]); -}</programlisting> - - <para>So far calling a static method in <link - linkend="gloss_Java"><trademark>Java</trademark></link> may be - compared to a <tag class="starttag">xsl:apply-templates</tag>. There - is however one big difference. In <abbrev - xlink:href="">XSL</abbrev> the - <quote>method</quote> being called may not exist at all. A <tag - class="starttag">xsl:apply-templates</tag> instructs a processor to - format a set of nodes. It does not contain information about any rules - being defined to do this job:</para> - - <programlisting language="none"><xsl:stylesheet xmlns:xsl="" - version="2.0"> - - <xsl:output method="text"/> - - <xsl:template match="/memo"> - <xsl:apply-templates <emphasis role="bold">select="content"</emphasis>/> - </xsl:template> - -</xsl:stylesheet></programlisting> - - <para>Since no suitable template supplying rules for <tag - class="starttag">content</tag> nodes exists a <abbrev - xlink:href="">XSL</abbrev> processor uses a - default formatting rule instead:</para> - - <programlisting language="none"><computeroutput>Thanks for your excellent work.Our firewall is definitely -broken! This bug has been reported by the sender.</computeroutput></programlisting> - - <para>We observe that the <code>#PCDATA</code> content strings of the - element itself and all (recursive) sub elements get glued together - into one string. In most cases this is definitely not intended. - Omitting a necessary template is usually a programming error. It is - thus good programming practice during style sheet development to - define a special template catching forgotten rules:</para> - - <programlisting language="none"><xsl:template match="/memo"> - <xsl:apply-templates select="content"/> -</xsl:template> - -<xsl:template match="*"> - <xsl:message> - <xsl:text>Error: No template defined matching element '</xsl:text> - <xsl:value-of select="name(.)"/> - <xsl:text>'</xsl:text> - </xsl:message> -</xsl:template></programlisting> - - <para>The <quote>*</quote> matches any element if there is no <link - xlink:href="">better - matching</link> rule defined. Since we did not supply any template for - <tag class="starttag">content</tag> nodes at all this default template - will match nodes of type <tag class="starttag">content</tag>. The - function <code>name()</code> is predefined in <abbrev - xlink:href="">XSL</abbrev> and returns the - element type name of a node. During the formatting process we will now - see the following warning message:</para> - - <programlisting language="none"><computeroutput>Error: No template defined matching element 'content'</computeroutput></programlisting> - - <para>We note that for document nodes <tag - class="starttag">xyz</tag><code>foo</code><tag - class="endtag">xyz</tag> containing only <code>#PCDATA</code> a simple - <tag class="emptytag">xsl:apply-templates select="xyz"</tag> is - sufficient: A <abbrev - xlink:href="">XSL</abbrev> processor uses - its default rule and copies the node's content <code>foo</code> to its - output.</para> - - <qandaset defaultlabel="qanda" xml:id="example_rdbms_person"> - <title>Extending the export to a RDBMS</title> - - <qandadiv> - <qandaentry> - <question> - <para>We assume that our RDBMS table <code>Customer</code> - from <xref linkend="programlisting_memo_export_sql"/> shall be - replaced by a table <code>Person</code>. We expect the senders - of memo documents to be employees of a given company. - Conversely the recipients of memos are expected to be - customers. Our <code>Person</code> table shall have a - <quote>tag</quote> like column named <code>type</code> having - exactly two allowed values <code>customer</code> or - <code>employee</code> being controlled by a <code>CHECK</code> - constraint, see <xref linkend="table_person"/>. Create a style - sheet generating the necessary SQL statements from a memo - document instance. Hint: Define two different templates for - <tag class="starttag">from</tag> and <tag - class="starttag">to</tag> nodes.</para> - </question> - - <answer> - <para>We define two templates differing only in the static - string value for a person's type. The relevant <abbrev - xlink:href="">XSL</abbrev> portion - reads:<programlisting language="none"><xsl:template match="/memo"> - <xsl:apply-templates select="from|to"/> -</xsl:template> - -<xsl:template match="from"> - <xsl:text>INSERT INTO Person (name, type) VALUES('</xsl:text> - <xsl:value-of select="."/> - <xsl:text>', <emphasis role="bold">'employee'</emphasis>)</xsl:text> - <xsl:value-of select="$newline"/> -</xsl:template> - - <xsl:template match="to"> - <xsl:text>INSERT INTO Person (name, type) VALUES('</xsl:text> - <xsl:value-of select="."/> - <xsl:text>', <emphasis role="bold">'customer'</emphasis>)</xsl:text> - <xsl:value-of select="$newline"/> -</xsl:template></programlisting></para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <table xml:id="table_person"> - <title>The Person table</title> - - <?dbhtml table-width="30%" ?> - - <?dbfo table-width="40%" ?> - - <tgroup cols="2"> - <colspec colwidth="3*"/> - - <colspec colwidth="2*"/> - - <thead> - <row> - <entry>name</entry> - - <entry>type</entry> - </row> - </thead> - - <tbody> - <row> - <entry>Martin Goik</entry> - - <entry>employee</entry> - </row> - - <row> - <entry>Adam Hacker</entry> - - <entry>customer</entry> - </row> - - <row> - <entry>Eve intruder</entry> - - <entry>customer</entry> - </row> - </tbody> - </tgroup> - </table> - </section> - - <section xml:id="xsl_choose"> - <title><tag class="starttag">xsl:choose</tag></title> - - <para>We already described the <tag class="starttag">xsl:if</tag> - which can be compared to an <code>if(..){...}</code> statement in many - programming languages. The <tag class="starttag">xsl:choose</tag> - element can be compared to multiple <code>else</code> conditions - including an optional final <code>else</code> block being reached if - all boolean tests fail:</para> - - <programlisting language="none">if (condition a){ -...//block 1 -} else if (condition b){ -... //block b -} ... -... -else { - ... //code being reached whan all conditions evaluate to false -}</programlisting> - - <para>We want to generate a list of memo recipient names with roman - type numeration up to 10. Higher numbers shall be displayed in - ordinary decimal notation:</para> - - <programlisting language="none"><computeroutput>I:Adam Hacker -II:Eve intruder -III: ... -IV: ... -...</computeroutput></programlisting> - - <para>Though <abbrev - xlink:href="">XSL</abbrev> offers <link - xlink:href="">a better way</link> - we may generate these number literals by:</para> - - <programlisting language="none"><xsl:template match="/memo"> - <xsl:apply-templates select="to"/> -</xsl:template> - -<xsl:template match="to"> - <xsl:choose> - <xsl:when test="1 = position()">I</xsl:when> - <xsl:when test="2 = position()">II</xsl:when> - <xsl:when test="3 = position()">III</xsl:when> - <xsl:when test="4 = position()">IV</xsl:when> - <xsl:when test="5 = position()">V</xsl:when> - <xsl:when test="6 = position()">VI</xsl:when> - <xsl:when test="7 = position()">VII</xsl:when> - <xsl:when test="8 = position()">VIII</xsl:when> - <xsl:when test="9 = position()">IX</xsl:when> - <xsl:when test="10 = position()">X</xsl:when> - <xsl:otherwise> - <xsl:value-of select="position()"/> - </xsl:otherwise> - </xsl:choose> - - <xsl:text>:</xsl:text> - <xsl:value-of select="."/> - <xsl:value-of select="$newline"/> -</xsl:template></programlisting> - - <para>Note that this conversion is incomplete: If the number in - question is larger than 10 it will be formatted in ordinary decimal - style according to the <tag class="starttag">xsl:otherwise</tag> - clause.</para> - </section> - - <section xml:id="section_html_book"> - <title>A complete HTML formatting example</title> - - <para>We now present a series of exercises showing how to format <tag - class="starttag">book</tag> document instances to XHTML. This is done - in a step by step manner each time showing correspondent code snippets - for our <filename>memo.xsd</filename>.</para> - - <section xml:id="section_memo_to_list"> - <title>Listing the recipients of a memo</title> - - <para>In order to generate a XHTML <link - xlink:href="">list</link> - of all <tag class="starttag">memo</tag> recipients of a memo we have - to use <tag class="starttag">xsl:output method="xhtml"</tag> and - embed the required HTML tags in our <abbrev - xlink:href="">XSL</abbrev> style - sheet:</para> - - <programlisting language="none"><xsl:output method="xhtml" indent="yes"/> - -<xsl:template match="/memo"> - <html> - <head> - <title>Recipient list</title> - </head> - <body> - <ul> - <xsl:apply-templates select="to"/> - </ul> - </body> - </html> -</xsl:template> - -<xsl:template match="to"> - <li> - <xsl:value-of select="."/> - </li> -</xsl:template></programlisting> - - <para>Processing this style sheet for a <tag - class="starttag">memo</tag> document instance yields:</para> - - <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> -<html> - <head> - <title>Recipient list</title> - </head> - <body> - <ul> - <li>Adam Hacker</li> - <li>Eve intruder</li> - </ul> - </body> -</html></programlisting> - - <para>The generated Xhtml code does not contain a reference to a - DTD. We may supply this reference by modifying our <tag - class="emptytag">xsl:output</tag> directive:</para> - - <programlisting language="none"><xsl:output method="xhtml" indent="yes" - <emphasis role="bold">doctype-public</emphasis>="-//W3C//DTD XHTML 1.0 Strict//EN" - <emphasis role="bold">doctype-system</emphasis>=""/></programlisting> - - <para>This adds a corresponding header which allows to validate the - generated HTML:</para> - - <programlisting language="none"><!DOCTYPE html - PUBLIC "<emphasis role="bold">-//W3C//DTD XHTML 1.0 Strict//EN</emphasis>" - "<emphasis role="bold"></emphasis>"> -<html><head> ...</programlisting> - - <para>This may be improved further by instructing the XSL formatter - to use <uri - xlink:href=""></uri> - as default namespace:</para> - - <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> -<xsl:stylesheet <emphasis role="bold">xmlns=""</emphasis> - xmlns:xsl="" version="2.0"> - -<xsl:output method="xhtml" indent="yes" - doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN" - doctype-system=""/> - - <xsl:template match="/"> - <html><head> ... - </xsl:template> -... -</xsl:stylesheet></programlisting> - - <para>This yields the following output::</para> - - <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE html - PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" - ""> - -<html <emphasis role="bold">xmlns=""</emphasis>> - <head> ... -</html></programlisting> - - <para>The top level element <tag class="element">html</tag> is now - declared to belong to the namespace - <code>xmlns="</code>. This will be - inherited by all inner Xhtml elements.</para> - - <qandaset defaultlabel="qanda" xml:id="example_xsl_book_1_dtd"> - <title>Transforming book instances to Xhtml</title> - - <qandadiv> - <qandaentry> - <question> - <para>Create a <abbrev - xlink:href="">XSL</abbrev> style - sheet to transform instances of the first version of <link - endterm="example_bookDtd" - linkend="example_bookDtd">book.xsd</link> (<xref - linkend="example_bookDtd"/>) into <uri - xlink:href="">Xhtml - 1.0 strict</uri>.</para> - - <para>You should first construct a Xhtml document - <emphasis>manually</emphasis> before coding the XSL. After - you have a <quote>working</quote> Xhtml example document - create a <abbrev - xlink:href="">XSL</abbrev> style - sheet which transforms arbitrary - <filename>book.xsd</filename> document instances into a - corresponding Xhtml file.</para> - </question> - - <answer> - <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> -<xsl:stylesheet xmlns:xsl="" version="2.0"> - - <xsl:output indent="yes" method="xhtml"/> - - <xsl:template match="/book"> - <html> - <head> - <title><xsl:value-of select="title"/></title> - </head> - <body> - <h1><xsl:value-of select="title"/></h1> - <xsl:apply-templates select="chapter"/> - </body> - </html> - </xsl:template> - - <xsl:template match="chapter"> - <h2><xsl:value-of select="title"/></h2> - <xsl:apply-templates select="para"/> - </xsl:template> - - <xsl:template match="para"> - <p><xsl:value-of select="."/></p> - </xsl:template> - -</xsl:stylesheet></programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="section_xsl_attribute"> - <title><tag class="starttag">xsl:attribute</tag></title> - - <para>Sometimes we want to set attribute values in a generated XML - document. For example we might want to set the background color - <quote>red</quote> if a memo has a priority value of <tag - class="attvalue">high</tag>:</para> - - <programlisting language="none"><h1 style="background:red">Firewall problems</h1></programlisting> - - <para>Regarding our memo example this may be achieved by:</para> - - <programlisting language="none"><xsl:template match="/memo"> - <html> - ... - <body> - <xsl:variable name="<emphasis role="bold">messageColor</emphasis>" <co - xml:id="programlisting_priority_lolor_vardef"/>> - <xsl:choose> - <xsl:when test="@priority = 'low'">green</xsl:when> - <xsl:when test="@priority = 'medium'">yellow</xsl:when> - <xsl:when test="@priority = 'high'">red</xsl:when> - </xsl:choose> - </xsl:variable> - <h1 style="background:{<emphasis role="bold">$messageColor</emphasis>};" <co - xml:id="programlisting_priority_lolor_usevar"/>> - <xsl:value-of select="subject"/> - </h1> - </body> - </html> -</xsl:template></programlisting> - - <calloutlist> - <callout arearefs="programlisting_priority_lolor_vardef"> - <para>Definition of a color name depending on the attribute <tag - class="attvalue">priority</tag>'s value. The set off possible - attribute values (low,medium,high) is mapped to the color names - (green, yellow,red).</para> - </callout> - - <callout arearefs="programlisting_priority_lolor_usevar"> - <para>The color variable is used to compose the attribute <tag - class="attribute">style</tag>'s value. The curly - <code>{...}</code> braces are part of the <abbrev - xlink:href="">XSL</abbrev> standard's - syntax. They are required here to instruct the <abbrev - xlink:href="">XSL</abbrev> processor - to substitute the local variable <code>messageColor</code>'s - value instead of simply copying the literal string - <quote><code>$messageColor</code></quote> itself to the output - document e.g. generating <tag class="starttag">h1 style = - "background:$messageColor;"</tag>.</para> - </callout> - </calloutlist> - - <para>Instead of constructing an extra variable <abbrev - xlink:href="">XSL</abbrev> offers a - slightly more compact way for the same purpose. The <tag - class="starttag">xsl:attribute</tag> element allows us to define the - name of an attribute to be added together with an attribute value - specification:</para> - - <programlisting language="none"><xsl:template match="/memo"> - <html> - ... - <h1> - <xsl:attribute name="<emphasis role="bold">style</emphasis>"> - <xsl:text>background:</xsl:text> - <xsl:choose> - <xsl:when test="@priority = 'low'">green</xsl:when> - <xsl:when test="@priority = 'medium'">yellow</xsl:when> - <xsl:when test="@priority = 'high'">red</xsl:when> - </xsl:choose> - </xsl:attribute> - <xsl:value-of select="subject"/> - </h1> - </body> - </html> -</xsl:template></programlisting> - - <qandaset defaultlabel="qanda" xml:id="example_book_toc"> - <title>Adding a table of contents (toc)</title> - - <qandadiv> - <qandaentry> - <question> - <para>For larger document instances it is convenient to add - a table of contents to the generated Xhtml document. <!-- We - demonstrate the desired result as an <uri - xlink:href="src/viewlet/bookhtmltoc/bookhtmltoc_viewlet_swf.html">animation</uri>.--></para> - - <para>For this exercise you need a unique string value for - each <tag class="starttag">chapter</tag> node. If a <tag - class="starttag">chapter</tag>'s <tag - class="attribute">id</tag> attribute had been declared as - <code>#REQUIRED</code> its value would do this job - perfectly. Unfortunately you cannot rely on its existence - since it is declared to be <code>#IMPLIED</code> and may - thus be absent.</para> - - <para>XSL offers a standard function for this purpose namely - <link - xlink:href="">generate-id(...)</link>. - In a nutshell this function takes a XML node as an argument - (or being called without arguments it uses the context node) - and creates a string value being unique with respect to - <emphasis>all</emphasis> other nodes in the document. For a - given node the function may be called repeatedly and is - guaranteed to always return the same value during the - <emphasis>same</emphasis> transformation run. So it suffices - to add something like <tag class="starttag">a - href="#{generate-id(...)}"</tag> or use it in conjunction - with <tag class="starttag">xsl:attribute</tag>.</para> - </question> - - <answer> - <para>We use the <code>generate-id()</code> function to - create a unique identity string for each chapter node. Since - we also want to define links to the table of contents we - need another unique string value. It is tempting to simply - use a static value like <quote>__toc__</quote> for this - purpose. However we can not be sure that this value - coincides with one of the <code>generate-id()</code> - function return values.</para> - - <para>A cleaner solution uses the <tag - class="starttag">book</tag> node's generated identity string - for this purpose. As stated before this value is - definitively unique:</para> - - <programlisting language="none"><xsl:template match="/book"> -... - <body> - <h1><xsl:value-of select="title"/></h1> - <h2 id="{generate-id(.)}" <co xml:base="" - xml:id="programlisting_book_toc_def_toc"/>>Table of contents</h2> - <ul> - <xsl:for-each select="chapter"> - <li> - <a href="#{generate-id(.)}" <co xml:base="" - xml:id="programlisting_book_toc_ref_chap"/>><xsl:value-of select="title"></xsl:value-of></a> - </li> - </xsl:for-each> - </ul> - <xsl:apply-templates select="chapter"/> - </body> - </html> -</xsl:template> - -<xsl:template match="chapter"> - <h2 id="{generate-id(.)}" <co xml:base="" - xml:id="programlisting_book_toc_def_chap"/>> - <a href="#{generate-id(/book)}" <co xml:base="" - xml:id="programlisting_book_toc_ref_toc"/>> - <xsl:value-of select="title"/> - </a> - </h2> - <xsl:apply-templates select="para"/> -</xsl:template> -...</programlisting> - - <calloutlist> - <callout arearefs="programlisting_book_toc_def_toc"> - <para>The current context node is <tag - class="starttag">book</tag>. We use it as argument to - <code>generate-id()</code> to create a unique identity - string.</para> - </callout> - - <callout arearefs="programlisting_book_toc_ref_chap"> - <para>The <tag class="starttag">xsl:for-each</tag> - iterates over all <tag class="starttag">chapter</tag> - nodes. We reference the corresponding target nodes being - created in <xref - linkend="programlisting_book_toc_def_chap"/>.</para> - </callout> - - <callout arearefs="programlisting_book_toc_def_chap"> - <para>Each <tag class="starttag">chapter</tag>'s heading - is supplied with a unique identity string being - referenced from <xref - linkend="programlisting_book_toc_ref_chap"/>.</para> - </callout> - - <callout arearefs="programlisting_book_toc_ref_toc"> - <para>Clicking on a chapter's title shall take us back - to the table of contents (toc). So we create a hypertext - link referencing our toc heading's identity string being - defined in <xref - linkend="programlisting_book_toc_def_toc"/>.</para> - </callout> - </calloutlist> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="section_xsl_mixed"> - <title>XSL and mixed content</title> - - <para>The subsequent example shows an element <tag - class="starttag">content</tag> having a mixed content model possibly - containing <tag class="starttag">url</tag> and <tag - class="starttag">emphasis</tag> child nodes:</para> - - <programlisting language="none"><content>The <emphasis - role="bold"><url href="">XML</url></emphasis> language - is <emphasis role="bold"><emphasis>easy</emphasis></emphasis> to learn. However you need - some <emphasis role="bold"><emphasis>time</emphasis></emphasis>.</content></programlisting> - - <para>Embedded element nodes have been set to bold style in order to - distinguish them from <code>xs:text</code> nodes. A possible - <acronym>XHtml</acronym> output might look like:</para> - - <programlisting language="none"><p>The <emphasis role="bold"><a href="">XML</a>language is<em>easy</em></emphasis> to learn. However you -need some <emphasis role="bold"><em>time</em></emphasis>.</p></programlisting> - - <para>We start with a first version of an <abbrev - xlink:href="">XSL</abbrev> - template:</para> - - <programlisting language="none"> <xsl:template match="content"> - <p> - <xsl:value-of select="."/> - </p> - </xsl:template></programlisting> - - <para>As mentioned earlier all <code>#PCDATA</code> text nodes of - the whole subtree are glued together leading to:</para> - - <programlisting language="none"><p>The XML language is easy to learn. However you need some time.</p></programlisting> - - <para>Our next attempt is to define templates to format the elements - <tag class="starttag">url</tag> and <tag - class="starttag">emphasis</tag>:</para> - - <programlisting language="none">... -<xsl:template match="content"> - <p> - <xsl:apply-templates select="emphasis|url"/> - </p> -</xsl:template> - -<xsl:template match="url"> - <a href="{@href}"><xsl:value-of select="."/></a> -</xsl:template> - -<xsl:template match="emphasis"> - <em><xsl:value-of select="."/></em> -</xsl:template> -...</programlisting> - - <para>As expected the sub elements are formatted correctly. - Unfortunately the <code>#PCDATA</code> text nodes between the - element nodes are lost:</para> - - <programlisting language="none"><p> - <a href="">XML</a> - <em>easy</em> - <em>time</em> -</p></programlisting> - - <para>To correct this transformation script we have to tell the - formatting processor to include bare text nodes into the output. The - <abbrev xlink:href="">XPath</abbrev> - standard defines a function <link - xlink:href="">text()</link> - for this purpose. It returns the boolean value <code>true</code> for - an argument node of type text:</para> - - <programlisting language="none">... -<xsl:template match="content"> - <p> - <xsl:apply-templates select="<emphasis role="bold">text()</emphasis>|emphasis|url"/> - </p> -</xsl:template> -...</programlisting> - - <para>The yields the desired output. The text node result elements - are shown in bold style</para> - - <programlisting language="none"><p><emphasis role="bold">The</emphasis> <a href="">XML</a><emphasis - role="bold"> language is </emphasis><em>easy</em><emphasis - role="bold"> to learn. However -you need some </emphasis><em>time</em><emphasis role="bold">.</emphasis></p></programlisting> - - <para>Some remarks:</para> - - <orderedlist> - <listitem> - <para>The <abbrev - xlink:href="">XPath</abbrev> - expression <code>select="text()|emphasis|url"</code> corresponds - nicely to the schema's content model definition:</para> - - <programlisting language="none"><xs:element name="content"> - <xs:complexType <emphasis role="bold">mixed="true"</emphasis>> - <xs:choice minOccurs="0" maxOccurs="unbounded"> - <xs:element <emphasis role="bold">ref="emphasis"</emphasis>/> - <xs:element <emphasis role="bold">ref="url"</emphasis>/> - </xs:choice> - ... - </xs:complexType> -</xs:element></programlisting> - </listitem> - - <listitem> - <para>In most mixed content models <emphasis>all</emphasis> sub - elements of e.g. <tag class="starttag" role="">content</tag> - have to be formatted. During development some of the elements - defined in a schema are likely to be omitted by accidence. For - this reason the <quote>typical</quote> <abbrev - xlink:href="">XPath</abbrev> - expression acting on mixed content models is defined to match - <emphasis>any</emphasis> sub element nodes:</para> - - <programlisting language="none">select="text()|<emphasis - role="bold">*</emphasis>"</programlisting> - </listitem> - - <listitem> - <para>Regarding <code>select="text()|emphasis|url"</code> we - have defined two templates for element nodes <tag - class="starttag">emphasis</tag> and <tag - class="starttag">url</tag>. What happens to those text nodes - being matched by <code>text()</code>? These are subject to a - default rule: The content of bare text nodes is written to the - output. We may however redefine this default rule by adding a - template:</para> - - <programlisting language="none"><xsl:template match="text()"> - <emphasis role="bold"><span style="color:red"> - <xsl:value-of select="."/> - </span></emphasis> -</xsl:template></programlisting> - - <para>This yields:</para> - - <programlisting language="none"><p> - <emphasis role="bold"><span style="color:red">The </span></emphasis> - <a href="">XML</a> - <emphasis role="bold"><span style="color:red"> language is </span></emphasis> - <em>easy</em> - <emphasis role="bold"><span style="color:red"> to learn. However you need some </span></emphasis> - <em>time</em> - <emphasis role="bold"><span style="color:red">.</span></emphasis> -</p></programlisting> - - <para>In most cases it is not desired to replace all text nodes - throughout the whole document. In the current example we might - only format text nodes being <emphasis>immediate</emphasis> - children of <tag class="starttag">content</tag>. This may be - achieved by restricting the <abbrev - xlink:href="">XPath</abbrev> - expression to <tag class="starttag">xsl:template - match="content/text()"</tag>.</para> - </listitem> - </orderedlist> - </section> - - <section xml:id="section_xsl_functionid"> - <title>The function <code>id()</code></title> - - <para>In <abbrev - xlink:href="">XSL</abbrev> we sometimes - want to lookup nodes by an attribute value of type <link - xlink:href="???">ID</link>. We consider our product catalog from - <xref linkend="sectSchemaProductCatalog"/>. The following <abbrev - xlink:href="">XSL</abbrev> may be used to - create <acronym>XHtml</acronym>l documents from <tag - class="starttag">catalog</tag> instances:</para> - - <programlisting language="none" xml:lang=""><xsl:template match="/catalog"> - <html> - <head><title>Product catalog</title></head> - <body> - <h1>List of Products</h1> - <xsl:apply-templates select="product"/> - </body> - </html> -</xsl:template> - -<xsl:template match="product"> - <h2 id="{@id}" <co xml:base="" - xml:id="programlisting_catalog2html_v1_defid"/>><xsl:value-of select="title"/></h2> - <xsl:apply-templates select="para"/> -</xsl:template> - -<xsl:template match="para"> - <p><xsl:apply-templates select="text()|*" <co - xml:id="programlisting_catalog2html_v1_mixed"/>/></p> -</xsl:template> - -<xsl:template match="link"> - <a href="#{@ref}" <co xml:id="programlisting_catalog2html_v1_refid"/>><xsl:value-of select="."/></a> -</xsl:template></programlisting> - - <calloutlist> - <callout arearefs="programlisting_catalog2html_v1_defid"> - <para>The <code>ID</code> attribute <tag - class="starttag">product id="foo"</tag> is unique within the - document instance. We may thus use it as an unique string value - in the generated Xhtml, too.</para> - </callout> - - <callout arearefs="programlisting_catalog2html_v1_mixed"> - <para>Mixed content consisting of text and <tag - class="starttag">link</tag> nodes.</para> - </callout> - - <callout arearefs="programlisting_catalog2html_v1_refid"> - <para>We define a file local Xhtml reference to a - product.</para> - </callout> - </calloutlist> - - <para>The <tag class="starttag">para</tag> element from the example - document instance containing a <tag class="starttag">link - ref="homeTrainer"</tag> reference will be formatted as:</para> - - <programlisting language="none"><p>If you hate rain look <a href="#homeTrainer">here</a>.</p></programlisting> - - <para>Now suppose we want to add the product's title <emphasis>Home - trainer</emphasis> here to give the reader an idea about the product - without clicking the hypertext link:</para> - - <programlisting language="none"><p>If you hate rain look <a href="#homeTrainer">here</a> <emphasis - role="bold">(Home trainer)</emphasis>.</p></programlisting> - - <para>This title text node is part of the <tag - class="starttag">product</tag>node being referenced from the current - <tag class="starttag">para</tag>:</para> - - <figure xml:id="linkIdrefProduct"> - <title>A graphical representation of our <tag - class="starttag">catalog</tag>.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/xsl_id.fig"/> - </imageobject> - - <caption> - <para>The dashed line shows the <code>IDREF</code> based - reference from the <tag class="starttag">link</tag> to the - <tag class="starttag">product</tag> node.</para> - </caption> - </mediaobject> - </figure> - - <para>In <abbrev - xlink:href="">XSL</abbrev> we may follow - <code>ID</code> reference by means of the built in function <link - xlink:href="">id(...)</link>:</para> - - <programlisting language="none"><xsl:template match="link"> - <a href="#{@ref}"><xsl:value-of select="."/></a> - <xsl:text> (</xsl:text> - <xsl:value-of select="<emphasis role="bold">id(@ref)</emphasis>/title" <co - xml:id="programlisting_xsl_id_follow"/>/> - <xsl:text>)</xsl:text> -</xsl:template></programlisting> - - <para>Evaluating <code>id(@ref)</code> at <xref - linkend="programlisting_xsl_id_follow"/> returns the first <tag - class="starttag">product</tag> <emphasis>node</emphasis>. We simply - take its <tag class="starttag">title</tag> value and embed it into a - pair of braces. This way the desired text portion <emphasis - role="bold">(Home trainer)</emphasis> gets added after the hypertext - link.</para> - - <qandaset defaultlabel="qanda" xml:id="example_book_xsl_mixed"> - <title>Extending the memo style sheet by mixed content and - itemized lists</title> - - <qandadiv> - <qandaentry> - <question> - <para>In <xref linkend="example_book.dtd_v5"/> we - constructed a schema allowing itemized lists and mixed - content for <tag class="starttag">book</tag> instances. This - schema also allowed to define <tag - class="starttag">emphasis</tag>, <tag - class="starttag">table</tag> and <tag - class="starttag">link</tag> elements being part of a mixed - content definition. Extend the current book2html.xsl to - account for these extensions.</para> - - <para - xlink:href="">As - we already saw in our memo example itemized lists in Xhtml - are represented by the element <tag - class="starttag">ul</tag> containing <tag - class="starttag">li</tag> elements. Since <tag - class="starttag">p</tag> elements are also allowed to appear - as children our itemized lists can be easily mapped to Xhtml - tags. A<tag class="starttag">link</tag> node may be - transformed into <tag class="starttag">a href="..."</tag> - Xhtml node.</para> - - <para>The table model is a simplified version of the Xhtml - table model. Read the <abbrev - xlink:href="">XSL</abbrev> - documentation of the element <tag - class="emptytag">xsl:copy-of</tag> at <link - xlink:href="">copy-of</link> - for processing tables.</para> - </question> - - <answer> - <para>The full source code of the solution is available at - <link - xlink:href="Ref/src/Dtd/book/v5/book2html.1.xsl">(Online - HTML version) ... book2html.1.xsl</link>. We discuss some - important aspects. The following table provides mapping - rules from <filename>book.xsd</filename> to Xhtml:</para> - - <table xml:id="table_book2xhtml_element_mappings"> - <title>Mapping elements from <filename>book.xsd</filename> - to Xhtml</title> - - <?dbhtml table-width="50%" ?> - - <?dbfo table-width="50%" ?> - - <tgroup cols="2"> - <colspec colwidth="3*"/> - - <colspec colwidth="2*"/> - - <thead> - <row> - <entry>book.xsd</entry> - - <entry>Xhtml</entry> - </row> - </thead> - - <tbody> - <row> - <entry><tag class="starttag">book</tag>/<tag - class="starttag">title</tag></entry> - - <entry><tag class="starttag">h1</tag></entry> - </row> - - <row> - <entry><tag class="starttag">chapter</tag>/<tag - class="starttag">title</tag></entry> - - <entry><tag class="starttag">h2</tag></entry> - </row> - - <row> - <entry><tag class="starttag">para</tag> (mixed - content)</entry> - - <entry><tag class="starttag">p</tag></entry> - </row> - - <row> - <entry><tag class="starttag">link - href="foo"</tag></entry> - - <entry><tag class="starttag">a - href="foo"</tag></entry> - </row> - - <row> - <entry><tag class="starttag">emphasis</tag></entry> - - <entry><tag class="starttag">em</tag></entry> - </row> - - <row> - <entry><tag - class="starttag">itemizedlist</tag></entry> - - <entry><tag class="starttag">ul</tag></entry> - </row> - - <row> - <entry><tag class="starttag">listitem</tag></entry> - - <entry><tag class="starttag">li</tag></entry> - </row> - - <row> - <entry><tag class="starttag">table</tag>, <tag - class="starttag">caption</tag>,<tag - class="starttag">tr</tag>, <tag - class="starttag">td</tag> along with all - attributes</entry> - - <entry>Identity copy</entry> - </row> - </tbody> - </tgroup> - </table> - - <para>Since our table model is a subset of the HTML table - model we may simply copy corresponding nodes to the - output:</para> - - <programlisting language="none"><xsl:template match="table"> - <xsl:copy-of select="."/> -</xsl:template></programlisting> - - <para>Next we need rules for itemized lists and paragraphs. - Our model already implements lists in a way that closely - resembles XHTML lists. Since the structure are compatible we - only have to provide a mapping:</para> - - <programlisting language="none"><xsl:template match="para"> - <p id="{generate-id(.)}"><xsl:apply-templates select="text()|*" /></p> -</xsl:template> - -<xsl:template match="itemizedlist"> - <ul><xsl:apply-templates select="listitem"/></ul> -</xsl:template> - -<xsl:template match="listitem"> - <li><xsl:apply-templates select="*"/></li> -</xsl:template></programlisting> - - <para>Since <emphasis>all</emphasis> chapters are reachable - via hypertext links from the table of contents we - <emphasis>must</emphasis> supply a unique <code>id</code> - value <xref - linkend="programlisting_book2html_single_chapterid"/> for - <emphasis>all</emphasis> of them. Chapters and paragraphs - may be referenced by <tag class="starttag">link</tag> - elements and thus <emphasis>both</emphasis> need a unique - identity value. For simplicity we create both of them via - <code>generate-id()</code>. In a more sophisticated solution - the strategy would be slightly different:</para> - - <itemizedlist> - <listitem> - <para>If a <tag class="starttag">chapter</tag> node does - have an <code>id</code> attribute defined then take its - value.</para> - </listitem> - - <listitem> - <para>If a <tag class="starttag">chapter</tag> node does - <emphasis>not</emphasis> have an <code>id</code> - attribute defined then use - <code>generate-id()</code>.</para> - </listitem> - - <listitem> - <para><tag class="starttag">para</tag> nodes only get - values in XHTML if they do have an <code>id</code> - attribute defined. This is consistent since these nodes - are never referenced from the table of contents. Thus an - identity is only required if the <tag - class="starttag">para</tag> node is referenced by a <tag - class="starttag">link</tag>. If that is a case the <tag - class="starttag">para</tag> surely does have a defined - identity value.</para> - </listitem> - </itemizedlist> - - <para>We also have to provide a hypertext link <xref - linkend="programlisting_book2html_single_toclink"/> to the - table of contents:</para> - - <programlisting language="none"><xsl:template match="chapter"> - <h2 id="{<emphasis role="bold">generate-id(.)</emphasis>}" <co - xml:base="" - xml:id="programlisting_book2html_single_chapterid"/>> - <a href="#{<emphasis role="bold">generate-id(/book)</emphasis>}" <co - xml:base="" - xml:id="programlisting_book2html_single_toclink"/>><xsl:value-of select="title"/></a> - </h2> - <xsl:apply-templates select="para|itemizedlist|table"/> -</xsl:template></programlisting> - - <para>Implementing the <tag class="starttag">link</tag> - element is somewhat more complicated. We cannot use the - <code>@ref</code> attribute values itself as <tag - class="starttag">a href="..."</tag> attribute values since - the target's identity string is generated via - <code>generate-id()</code>. But we may follow the reference - via the <abbrev - xlink:href="">XPath</abbrev> <link - linkend="section_xsl_functionid">id()</link> function and - then use the target's identity value:</para> - - <programlisting language="none"><xsl:template match="link"> - <a href="#{generate-id(id(@linkend))}"> - <xsl:value-of select="."/> - </a> -</xsl:template></programlisting> - - <para>The call to <code>id(@linkend)</code> returns either a - <tag class="starttag">chapter</tag> or a <tag - class="starttag">para</tag> node since attributes of type - <code>ID</code> are only defined for these two elements. - Using this node as input to <code>generate-id()</code> - returns the desired identity value to be used in the - generated Xhtml.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="xslAxis"> - <title>XSL axis definitions</title> - - <para>XSL allows us to traverse a document instance's graph in - different directions. We start with a memo document instance:</para> - - <programlisting language="none"><memo xmlns:xsi="" - xsi:noNamespaceSchemaLocation="memo.xsd" date="9.9.2099"> - <from>Joe</from> - <to>Jack</to> - <to>Eve</to> - <to>Jude</to> - <to>Tolstoi</to> - <subject>Ignore me!</subject> - <content> - <para>Dumb text.</para> - </content> -</memo></programlisting> - - <para>This instance defines four nodes of type <tag - class="starttag">to</tag>. For each of these we want to create a - line of text showing also the preceding and the following - recipients:</para> - - <programlisting language="none"> <----Jack----> Eve Jude Tolstoi <co - xml:id="programlisting_axis_jack"/> -Jack <----Eve----> Jude Tolstoi <co xml:id="programlisting_axis_eve"/> -Jack Eve <----Jude----> Tolstoi <co xml:id="programlisting_axis_jude"/> -Jack Eve Jude <----Tolstoi----> <co - xml:id="programlisting_axis_tolstoi"/></programlisting> - - <calloutlist> - <callout arearefs="programlisting_axis_jack"> - <para>Jack has no predecessor and 3 successors</para> - </callout> - - <callout arearefs="programlisting_axis_eve"> - <para>Eve has 1 predecessor and 2 successors</para> - </callout> - - <callout arearefs="programlisting_axis_jude"> - <para>Jude has 2 predecessors and 1 successor</para> - </callout> - - <callout arearefs="programlisting_axis_tolstoi"> - <para><personname>Tolstoi</personname> has 3 predecessors and no - successor</para> - </callout> - </calloutlist> - - <para>XSL supports this type of transformation by supplying <acronym - xlink:href="">XPath</acronym> axis - definitions. We consider a memo document with 9 <tag - class="starttag">to</tag> nodes:</para> - - <figure xml:id="memo9recipients"> - <title>A memo with 9 recipients</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/memofour.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>We marked the 4-th recipient to represent the context node. - All three <tag class="starttag">to</tag> nodes to the - <quote>left</quote> belong to the <emphasis>set</emphasis> of - preceding siblings with respect to the context node. Likewise the 5 - neighbours to the right are called following siblings. Returning to - our <quote>four recipient</quote> example we may create the desired - output by:</para> - - <programlisting language="none"><xsl:template match="/"> - <xsl:apply-templates select="memo/to"/> -</xsl:template> - -<xsl:template match="to"> - - <xsl:for-each select="preceding-sibling::to" <co - xml:id="programlisting_memo_four_xsl_preceding"/>> - <xsl:value-of select="."/> - <xsl:text> </xsl:text> - </xsl:for-each> - - <xsl:text> &lt;----</xsl:text> - <xsl:value-of select="."/> <co - xml:id="programlisting_memo_four_xsl_context"/> - <xsl:text>----&gt; </xsl:text> - - <xsl:for-each select="following-sibling::to"> <co - xml:id="programlisting_memo_four_xsl_following"/> - <xsl:value-of select="."/> - <xsl:text> </xsl:text> - </xsl:for-each> - <xsl:value-of select="$newline"/> -</xsl:template></programlisting> - - <calloutlist> - <callout arearefs="programlisting_memo_four_xsl_preceding"> - <para>Iterate on the set of recipients <quote>left</quote> of - the context node.</para> - </callout> - - <callout arearefs="programlisting_memo_four_xsl_context"> - <para>Taking the context node's value embedded in <code><---- - ... ----></code>.</para> - </callout> - - <callout arearefs="programlisting_memo_four_xsl_following"> - <para>Iterate on the set of recipients <quote>right</quote> of - the context node.</para> - </callout> - </calloutlist> - - <para>More formally the set of preceding siblings is defined to be - the set of all nodes having the same parent as the context node and - appearing <quote>before</quote> the context node. The notion - <quote>before</quote> is meant in the sense of a <link - xlink:href="">depth-first</link> - traversal of the document tree. <abbrev - xlink:href="">XPath</abbrev> provides - different axis definitions, see <uri - xlink:href=""></uri> - for details. We provide an illustration here:</para> - - <figure xml:id="disjointAxeSets"> - <title>Disjoint <acronym - xlink:href="">XPath</acronym> axis - definitions.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/preceding.fig"/> - </imageobject> - - <caption> - <para>The sets defined by ancestor, descendant, following, - preceding and self are disjoint. Their union forms the set of - all document nodes.</para> - </caption> - </mediaobject> - </figure> - - <para>Some remarks:<itemizedlist> - <listitem> - <para>If the context node is already the topmost node i.e. the - root node then the sets defined by <code>ancestor</code> and - <code>parent</code> are empty.</para> - </listitem> - - <listitem> - <para>The <code>parent</code> set <emphasis>always</emphasis> - contains zero or one node.</para> - </listitem> - </itemizedlist></para> - </section> - - <section xml:id="xslChunking"> - <title>Splitting documents into chunks</title> - - <para>Sometimes we want to generate multiple output documents from a - single XML source. It may for example be a bad idea to transform a - book of 200 printed pages into a <emphasis>single</emphasis> online - HTML page. Instead we may split each chapter into a separate HTML - file and create navigation links between them.</para> - - <para>We consider a memo document instance. We want to generate one - text file for each memo recipient containing just the recipient's - name using the <abbrev - xlink:href="">XSL</abbrev> element <link - xlink:href=""><xsl:result-document></link>:</para> - - <programlisting language="none"><xsl:template match="/memo"> - <xsl:apply-templates select="to"/> -</xsl:template> - -<xsl:template match="to"> - <emphasis role="bold"><xsl:result-document</emphasis> - <co xml:id="programlisting_xsl_result_document_main"/> - <emphasis role="bold">href="file_{position()}.txt"</emphasis> - <co xml:id="programlisting_xsl_result_document_href"/> - <emphasis role="bold">method="text"</emphasis> - <co xml:id="programlisting_xsl_result_document_method"/>> - <xsl:value-of select="."/> <co - xml:id="programlisting_xsl_result_document_content"/> - - <emphasis role="bold"></xsl:result-document></emphasis> -</xsl:template></programlisting> - - <calloutlist> - <callout arearefs="programlisting_xsl_result_document_main"> - <para>The output from all generating <abbrev - xlink:href="">XSL</abbrev> directives - will be redirected from standard output to another output - channel.</para> - </callout> - - <callout arearefs="programlisting_xsl_result_document_href"> - <para>The output will be written to a file named - <filename>file_i.txt</filename> with decimal number - <code>i</code> ranging from value 1 up to the number of - recipients.</para> - </callout> - - <callout arearefs="programlisting_xsl_result_document_method"> - <para>The <code>method</code> attribute possibly overrides a - value being given in the <tag class="starttag">xsl:output</tag> - element. We may also redefine <link - xlink:href="">other - attributes</link> from <tag class="starttag">xsl:output</tag> - like <code>doctype-{public.system}</code> and the generated - file's <code>encoding</code>.</para> - </callout> - - <callout arearefs="programlisting_xsl_result_document_content"> - <para>All output being generated in this region gets redirected - to the channel specified in <xref - linkend="programlisting_xsl_result_document_href"/>.</para> - </callout> - </calloutlist> - - <qandaset defaultlabel="qanda" xml:id="example_book_chunk"> - <title>Splitting book into chapter files</title> - - <qandadiv> - <qandaentry> - <question> - <para>Extend your solution of <xref - linkend="example_book_xsl_mixed"/> by writing each <tag - class="starttag">chapter</tag>'s content into a separate - Xhtml file. In addition create a file - <filename>index.html</filename> which contains references to - the corresponding <tag class="starttag">chapter</tag> - documents. Thus for a document instance with two chapters - the overall navigation structure is illustrated by <xref - linkend="figure_book_navigation"/>.</para> - - <para>Implementing the <tag class="starttag">link</tag> tag - may cause a problem: An internal link may reference a <tag - class="starttag">para</tag>. You need to identify the <tag - class="starttag">chapter</tag> node embedding this para. - This may be done by using a suitable <abbrev - xlink:href="">XPath</abbrev> axis - direction.</para> - </question> - - <answer> - <para>The full source code of the solution is available at - <link - xlink:href="Ref/src/Dtd/book/v5/book2chunks.1.xsl">(Online - HTML version) ... book2chunks.1.xsl</link>. First we - generate the table of contents file - <filename>index.html</filename>:</para> - - <programlisting language="none"><xsl:template match="/"> - <xsl:result-document href="index.html"> - <xsl:apply-templates select="book"/> - </xsl:result-document> - - <xsl:for-each select="book/chapter"> - <xsl:result-document href="{generate-id(.)}.html"> - <xsl:apply-templates select="."/> - </xsl:result-document> - </xsl:for-each> -</xsl:template> - -<xsl:template match="book"> - <html> - <head><title><xsl:value-of select="title"/></title></head> - <body> - <h1><xsl:value-of select="title"/></h1> - <h2>Table of contents</h2> - <ul> - <xsl:for-each select="<emphasis role="bold">chapter</emphasis>"> - <li><a href="{<emphasis role="bold">generate-id(.)</emphasis>}.html"><xsl:value-of select="title"/></a></li> - </xsl:for-each> - </ul> - </body> - </html> -</xsl:template></programlisting> - - <para>The <tag class="starttag">link ref="..."</tag> may - reference a <tag class="starttag">chapter</tag> or a <tag - class="starttag">para</tag>. So we may need to <quote>step - up</quote> from a paragraph to the corresponding chapter - node:</para> - - <programlisting language="none"><xsl:template match="link"> - <xsl:variable name="reftargetNode" select="id(@linkend)"/> - <xsl:variable name="reftargetParentChapter" - select="$reftargetNode/ancestor-or-self::chapter"/> - - <a href="{generate-id($reftargetParentChapter)}.html#{ - generate-id($reftargetNode)}"> - <xsl:value-of select="."/> - </a> -</xsl:template></programlisting> - - <para>This is consistent since <emphasis>all</emphasis> <tag - class="starttag">p</tag> nodes in the generated Xhtml - receive a unique <code>id</code> value regardless whether - the originating <tag class="starttag">para</tag> node does - have one.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <figure xml:id="figure_book_navigation"> - <title>A <tag class="starttag">book</tag> document with two - chapters</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/booknavigate.fig"/> - </imageobject> - </mediaobject> - </figure> - </section> - </section> - </section> - </chapter> - - <chapter xml:id="xmlApis"> - <title><abbrev xlink:href="">API</abbrev>s - for XML document processing</title> - - <section xml:id="sax"> - <title>The Simple API for XML</title> - - <section xml:id="saxPrinciple"> - <title>The principle of a <acronym - xlink:href="">SAX</acronym> - application</title> - - <para>We are already familiar with transformations of XML document - instances to other formats. Sometimes the capabilities being offered - by a given transformation approach do not suffice for a given problem. - Obviously a general purpose programming language like <link - linkend="gloss_Java"><trademark>Java</trademark></link> offers - superior means to perform advanced manipulations of XML document - trees.</para> - - <para>Before diving into technical details we present an example - exceeding the limits of our present transformation capabilities. We - want to format an XML catalog document with article descriptions to - HTML. The price information however shall resides in a XML document - external database namely a RDBMS:</para> - - <figure xml:id="saxRdbmsAccessPrinciple"> - <title>Generating HTML from a XML document and an RDBMS.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/saxxmlrdbms.fig" scale="65"/> - </imageobject> - </mediaobject> - </figure> - - <para>Our catalog might look like:</para> - - <figure xml:id="simpleCatalog"> - <title>A <link linkend="gloss_XML"><abbrev>XML</abbrev></link> based - catalog.</title> - - <programlisting language="none"><catalog> - <item orderNo="<emphasis role="bold">3218</emphasis>">Swinging headset</item> - <item orderNo="<emphasis role="bold">9921</emphasis>">200W Stereo Amplifier</item> -</catalog></programlisting> - </figure> - - <para>The RDBMS may hold some relation with a field - <code>orderNo</code> as primary key and a corresponding attribute like - <code>price</code>. In a real world application <code>orderNo</code> - should probably be an integer typed <code>IDENTITY</code> - attribute.</para> - - <figure xml:id="saxRdbmsSchema"> - <title>A Relation containing price information.</title> - - <programlisting language="none">CREATE TABLE Product ( - orderNo CHAR(10) PRIMARY KEY - ,price Money -) - -INSERT INTO Product VALUES('<emphasis role="bold">3218</emphasis>', 42.57) -INSERT INTO Product VALUES('<emphasis role="bold">9921</emphasis>', 121.50)</programlisting> - - <caption> - <para>Prices are depending on article numbers.</para> - </caption> - </figure> - - <para>The intended HTML output with order numbers being highlighted - looks like:</para> - - <figure xml:id="saxPriceOut"> - <title>HTML generated output.</title> - - <programlisting language="none"><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> - <html> - <head><title>Available products</title></head> - <body> - <table border="1"> - <tbody> - <tr> - <th><emphasis role="bold">Order number</emphasis></th> - <th>Price</th> - <th>Product</th> - </tr> - <tr> - <td><emphasis role="bold">3218</emphasis></td> - <td>42,57</td> - <td>Swinging headset</td> - </tr> - <tr> - <td><emphasis role="bold">9921</emphasis></td> - <td>121,50</td> - <td>200W Stereo Amplifier</td> - </tr> - </tbody> - </table> - </body> - </html></programlisting> - - <caption> - <para>This result HTML document contains content both from our XML - document an from the database table <code>Product</code>.</para> - </caption> - </figure> - - <para>The intended transformation is beyond the XSLT standard's - processing capabilities: XSLT does not enable us to RDBMS content. - However some XSLT processors provide extensions for this task.</para> - - <para>It is tempting to write a <link - linkend="gloss_Java"><trademark>Java</trademark></link> application - which might use e.g. <trademark - xlink:href="">JDBC</trademark> - for database access. But how do we actually read and parse a XML file? - Sticking to the <link - linkend="gloss_Java"><trademark>Java</trademark></link> standard we - might use a <link - xlink:href="">FileInputStream</link> - instance to read from <code>catalog.xml</code> and write a XML parser - by ourself. Fortunately <orgname>SUN</orgname>'s <trademark - xlink:href="">JDK</trademark> - already includes an API denoted <acronym - xlink:href="">SAX</acronym>, the - <emphasis>S</emphasis>imple <emphasis>A</emphasis>pi for - <emphasis>X</emphasis>ml. The<productname - xlink:href="">JDK</productname> - also includes a corresponding parser implementation. In addition there - are third party <acronym - xlink:href="">SAX</acronym> parser - implementations available like <productname - xlink:href="">Xerces</productname> from the - <orgname xlink:href="">Apache - Foundation</orgname>.</para> - - <para>The <acronym - xlink:href="">SAX</acronym> API is event - based and will be illustrated by the relationship between customers - and a software vendor company:</para> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/updateinfo.fig"/> - </imageobject> - </mediaobject> - - <para>After purchasing software customers are asked to register their - software. This way the vendor receives the customer's address. Each - time a new release is being completed all registered customers will - receive a notification typically including a <quote>special - offer</quote> to upgrade their software. From an abstract point of - view the following two actions take place:</para> - - <variablelist> - <varlistentry> - <term>Registration</term> - - <listitem> - <para>The customer registers itself at the company's site - indicating it's interest in updated versions.</para> - </listitem> - </varlistentry> - - <varlistentry> - <term>Notification</term> - - <listitem> - <para>Upon completion of each new software release (considered - to be an <emphasis>event</emphasis>) a message is sent to all - registered customers.</para> - </listitem> - </varlistentry> - </variablelist> - - <para>The same principle applies to GUI applications in software - development. A key press <emphasis>event</emphasis> for example will - be forwarded by an application's <emphasis>event handler</emphasis> to - a callback function (sometimes called a <emphasis>handler</emphasis> - method) being implemented by an application developer. The <acronym - xlink:href="">SAX</acronym> API works the - same way: A parser reads a XML document generating events which - <emphasis>may</emphasis> be handled by an application. During document - parsing the XML tree structure gets <quote>flattened</quote> to a - sequence of events:</para> - - <figure xml:id="saxFlattenEvent"> - <title>Parsing a XML document creates a corresponding sequence of - events.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/saxmodel.pdf"/> - </imageobject> - </mediaobject> - </figure> - - <para>An application may register components to the parser:</para> - - <figure xml:id="figureSax"> - <title><acronym xlink:href="">SAX</acronym> - Principle</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/saxapparch.pdf"/> - </imageobject> - - <caption> - <para>A <acronym - xlink:href="">SAX</acronym> application - consists of a <acronym - xlink:href="">SAX</acronym> parser and - an implementation of event handlers being specific to the - application. The application is developed by implementing the - two handlers.</para> - </caption> - </mediaobject> - </figure> - - <para>An Error Handler is required since the XML stream may contain - errors. In order to implement a <acronym - xlink:href="">SAX</acronym> application we - have to:</para> - - <orderedlist> - <listitem> - <para>Instantiate required objects:</para> - - <itemizedlist> - <listitem> - <para>Parser</para> - </listitem> - - <listitem> - <para>Event Handler</para> - </listitem> - - <listitem> - <para>Error Handler</para> - </listitem> - </itemizedlist> - </listitem> - - <listitem> - <para>Register handler instances</para> - - <itemizedlist> - <listitem> - <para>register Event Handler to Parser</para> - </listitem> - - <listitem> - <para>register Error Handler to Parser</para> - </listitem> - </itemizedlist> - </listitem> - - <listitem> - <para>Start the parsing process by calling the parser's - appropriate method.</para> - </listitem> - </orderedlist> - </section> - - <section xml:id="saxIntroExample"> - <title>First steps</title> - - <para>Our first <acronym - xlink:href="">SAX</acronym> toy application - <classname>sax.stat.v1.ElementCount</classname> shall simply count the - number of elements it finds in an arbitrary XML document. In addition - the <acronym xlink:href="">SAX</acronym> - events shall be written to standard output generating output sketched - in <xref linkend="saxFlattenEvent"/>. The application's central - implementation reads:</para> - - <figure xml:id="saxElementCount"> - <title>Counting XML elements.</title> - - <programlisting language="none">package sax.stat.v1; -... - -public class ElementCount { - - public void parse(final String uri) { - try { - final SAXParserFactory saxPf = SAXParserFactory.newInstance(); - final SAXParser saxParser = saxPf.newSAXParser(); - saxParser.parse(uri, eventHandler); - } catch (ParserConfigurationException e){ - e.printStackTrace(System.err); - } catch (org.xml.sax.SAXException e) { - e.printStackTrace(System.err); - } catch (IOException e){ - e.printStackTrace(System.err); - } - } - - public int getElementCount() { - return eventHandler.getElementCount(); - } - private final MyEventHandler eventHandler = new MyEventHandler(); -}</programlisting> - - <caption> - <para>This application works for arbitrary well-formed XML - documents.</para> - </caption> - </figure> - - <para>We now explain this application in detail. The first part deals - with the instantiation of a parser:</para> - - <programlisting language="none">try { - final SAXParserFactory saxPf = <emphasis role="bold">SAXParserFactory</emphasis>.newInstance(); - final SAXParser saxParser = saxPf.newSAXParser(); - saxParser.parse(uri, eventHandler); -} catch (ParserConfigurationException e){ - e.printStackTrace(System.err); -} ...</programlisting> - - <para>In order to keep an application independent from a specific - parser implementation the <acronym - xlink:href="">SAX</acronym> uses the so - called <link - xlink:href="">Abstract - Factory Pattern</link> instead of simply calling a constructor from a - vendor specific parser class.</para> - - <para>In order to be useful the parser has to be instructed to do - something meaningful when a XML document gets parsed. For this purpose - our application supplies an event handler instance:</para> - - <programlisting language="none">public void parse(final String uri) { - try { - final SAXParserFactory saxPf = SAXParserFactory.newInstance(); - final SAXParser saxParser = saxPf.newSAXParser(); - saxParser.parse(uri, <emphasis role="bold">eventHandler</emphasis>); - } catch (org.xml.sax.SAXException e) { - ... - private final MyEventHandler <emphasis role="bold">eventHandler = new MyEventHandler()</emphasis>; -}</programlisting> - - <para>What does the event handler actually do? It offers methods to - the parser being callable during the parsing process:</para> - - <programlisting language="none">package sax.stat.v1; -... -public class MyEventHandler extends <classname>org.xml.sax.helpers.DefaultHandler</classname> { - - public void <emphasis role="bold"><emphasis role="bold">startDocument()</emphasis></emphasis><co - xml:id="programlisting_eventhandler_startDocument"/> { - System.out.println("Opening Document"); - } - public void <emphasis role="bold">endDocument()</emphasis><co - xml:id="programlisting_eventhandler_endDocument"/> { - System.out.println("Closing Document"); - } - public void <emphasis role="bold">startElement(String namespaceUri, String localName, String rawName, - Attributes attrs)</emphasis> <co - xml:id="programlisting_eventhandler_startElement"/>{ - System.out.println("Opening \"" + rawName + "\""); - elementCount++; - } - public void <emphasis role="bold">endElement(String namespaceUri, String localName, - String rawName)</emphasis><co - xml:id="programlisting_eventhandler_endElement"/>{ - System.out.println("Closing \"" + rawName + "\""); - } - public void <emphasis role="bold">characters(char[] ch, int start, int length)</emphasis><co - xml:id="programlisting_eventhandler_characters"/>{ - System.out.println("Content \"" + new String(ch, start, length) + '"'); - } - public int getElementCount() <co - xml:id="programlisting_eventhandler_getElementCount"/>{ - return elementCount; - } - private int elementCount = 0; -}</programlisting> - - <calloutlist> - <callout arearefs="programlisting_eventhandler_startDocument"> - <para>This method gets called exactly once namely when opening the - XML document as a whole.</para> - </callout> - - <callout arearefs="programlisting_eventhandler_endDocument"> - <para>After successfully parsing the whole document instance this - method will finally be called.</para> - </callout> - - <callout arearefs="programlisting_eventhandler_startElement"> - <para>This method gets called each time a new element is parsed. - In the given catalog.xml example it will be called three times: - First when the <tag class="starttag">catalog</tag> appears and - then two times upon each <item ... >. The supplied - parameters depend whether or not name space processing is - enabled.</para> - </callout> - - <callout arearefs="programlisting_eventhandler_endElement"> - <para>Called each time an element like <tag class="starttag">item - ...</tag> gets closed by its counterpart <tag - class="endtag">item</tag>.</para> - </callout> - - <callout arearefs="programlisting_eventhandler_characters"> - <para>This method is responsible for the treatment of textual - content i.e. handling <code>#PCDATA</code> element content. We - will explain its uncommon signature a little bit later.</para> - </callout> - - <callout arearefs="programlisting_eventhandler_getElementCount"> - <para><function>getElementCount()</function> is a getter method to - read only access the private field <varname>elementCount</varname> - which gets incremented in <coref - linkend="programlisting_eventhandler_startElement"/> each time an - XML element opens.</para> - </callout> - </calloutlist> - - <para>The call <code>saxParser.parse(uri, eventHandler)</code> - actually initiates the parsing process and tells the parser to:</para> - - <itemizedlist> - <listitem> - <para>Open the XML document being referenced by the URI - argument.</para> - </listitem> - - <listitem> - <para>Forward XML events to the event handler instance supplied by - the second argument.</para> - </listitem> - </itemizedlist> - - <para>A driver class containing a <code>main(...)</code> method may - start the whole process and print out the desired number of elements - upon completion of a parsing run:</para> - - <programlisting language="none">package sax.stat.v1; - -public class ElementCountDriver { - public static void main(String argv[]) { - ElementCount xmlStats = new ElementCount(); - xmlStats.parse("<emphasis role="bold">Input/Sax/catalog.xml</emphasis>"); - System.out.println("Document contains " + xmlStats.<emphasis role="bold">getElementCount()</emphasis> + " elements"); - } -}</programlisting> - - <para>Processing the catalog example instance yields:</para> - - <programlisting language="none">Opening Document -<emphasis role="bold">Opening "catalog"</emphasis> <co - xml:id="programlisting_catalog_output"/> -Content " - " -<emphasis role="bold">Opening "item"</emphasis> <co - xml:id="programlisting_catalog_item1"/> -Content "Swinging headset" -Closing "item" -Content " - " -<emphasis role="bold">Opening "item"</emphasis> <co - xml:id="programlisting_catalog_item2"/> -Content "200W Stereo Amplifier" -Closing "item" -Content " -" -Closing "catalog" -Closing Document -<emphasis role="bold">Document contains 3 elements</emphasis> <co - xml:id="programlisting_catalog_elementcount"/></programlisting> - - <calloutlist> - <callout arearefs="programlisting_catalog_output"> - <para>Start parsing element <tag - class="starttag">catalog</tag>.</para> - </callout> - - <callout arch="" arearefs="programlisting_catalog_item1"> - <para>Start parsing element <tag class="starttag">item - orderNo="3218"</tag>Swinging headset<tag class="endtag" - role="">item</tag>.</para> - </callout> - - <callout arch="" arearefs="programlisting_catalog_item2"> - <para>Start parsing element <tag class="starttag">item - orderNo="9921"</tag>200W Stereo Amplifier<tag class="endtag" - role="">item</tag>.</para> - </callout> - - <callout arearefs="programlisting_catalog_elementcount"> - <para>After the parsing process has completed the application - outputs the number of elements being counted so far.</para> - </callout> - </calloutlist> - - <para>The output contains some lines of <quote>empty</quote> content. - This content is due to whitespace being located between elements. For - example a newline appears between the the <tag - class="starttag">catalog</tag> and the first <tag - class="starttag">item</tag> element. The parser encapsulates this - whitespace in a call to the <link - xlink:href="[],%20int,%20int)">characters</link> - method. In an application this call will typically be ignored. XML - document instances in a professional context will typically not - contain any newline characters at all. Instead the whole document is - represented as a single line. This inhibits human readability which is - not required if the processing applications work well. In this case - empty content as above will not appear.</para> - - <para>The <code>characters(char[] ch, int start, int length)</code> - method's signature looks somewhat strange regarding <link - linkend="gloss_Java"><trademark>Java</trademark></link> conventions. - One might expect <code>characters(String s)</code>. But this way the - <acronym xlink:href="">SAX</acronym> API - allows efficient parser implementations: A parser may initially - allocate a reasonable large <code>char</code> array of say 128 bytes - sufficient to hold 64 (<link - xlink:href="">Unicode</link>) characters. If this - buffer gets exhausted the parser might allocate a second buffer of - double size thus implementing an <quote>amortized doubling</quote> - algorithm:</para> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/saxcharacter.pdf"/> - </imageobject> - </mediaobject> - - <para>In this example the first element content fits in the first - buffer. The second content <code>200W Stereo Amplifier</code> and the - third content <code>Earphone</code> both fit in the second buffer. - Subsequent content may require further buffer allocations. Such a - strategy minimizes the number of time consuming <code>new </code> - <link - xlink:href="">String</link> - <code>(...)</code> constructor calls being necessary for the more - convenient API variant <code>characters(String s)</code>.</para> - </section> - - <section xml:id="saxRegistry"> - <title>Event- and error handler registration</title> - - <para>Our first <acronym - xlink:href="">SAX</acronym> application - suffers from the following deficiencies:</para> - - <itemizedlist> - <listitem> - <para>The error handling is very sparse. It completely relies on - exceptions being thrown by classes like <link - xlink:href="">SAXException</link> - which frequently do not supply meaningful error - information.</para> - </listitem> - - <listitem> - <para>The application is not aware of namespaces. Thus reading - e.g. <abbrev xlink:href="">XSL</abbrev> - document instances will not allow to distinguish between elements - from different namespaces like HTML.</para> - </listitem> - - <listitem> - <para>The parser will not validate a document instance against a - schema being present.</para> - </listitem> - </itemizedlist> - - <para>We now incrementally add these features to the <acronym - xlink:href="">SAX</acronym> parsing process. - <acronym xlink:href="">SAX</acronym> offers - an interface <link - xlink:href="">XmlReader</link> - to conveniently <emphasis>register</emphasis> event- and error handler - instances independently instead of passing both interfaces as a single - argument to the <link - xlink:href=",%20org.xml.sax.helpers.DefaultHandler)">parse</link> - method. We first code an error handler class by implementing the - interface <classname>org.xml.sax.ErrorHandler</classname> being part - of the <acronym xlink:href="">SAX</acronym> - API:</para> - - <programlisting language="none">package sax.stat.v2; -... -public class MyErrorHandler implements ErrorHandler { - - <emphasis role="bold">public void warning(SAXParseException e)</emphasis> { - System.err.println("[Warning]" + getLocationString(e)); - } - <emphasis role="bold">public void error(SAXParseException e)</emphasis> { - System.err.println("[Error]" + getLocationString(e)); - } - <emphasis role="bold">public void fatalError(SAXParseException e)</emphasis> throws SAXException{ - System.err.println("[Fatal Error]" + getLocationString(e)); - } - private String getLocationString(SAXParseException e) { - return " line " + e.getLineNumber() + - ", column " + e.getColumnNumber()+ ":" + e.getMessage(); - } -}</programlisting> - - <para>These three methods represent the - <classname>org.xml.sax.ErrorHandler</classname> interface. The method - <function>getLocationString</function> is used to supply precise - parsing error locations by means of line- and column numbers within a - document instance. If errors or warnings are encountered the parser - will call one of the appropriate public methods:</para> - - <figure xml:id="saxMissItem"> - <title>A non well formed document.</title> - - <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> -<catalog> - <item orderNo="3218">Swinging headset</item> - <item orderNo="9921">200W Stereo Amplifier -</catalog></programlisting> - - <caption> - <para>This document is not well formed since due to a missing a - closing <tag class="endtag">item</tag> tag is missing.</para> - </caption> - </figure> - - <para>Our error handler method gets called yielding an informative - message:</para> - - <programlisting language="none">[Fatal Error] line 5, column -1:Expected "</item>" to terminate -element starting on line 4.</programlisting> - - <para>This error output is achieved by - <emphasis>registering</emphasis> an instance of - <classname>sax.stat.v2.MyErrorHandler</classname> to the parser prior - to starting the parsing process. In the following code snippet we also - register a content handler instance to the parser and thus separate - the parser's configuration from its invocation:</para> - - <programlisting language="none">package sax.stat.v2; -... -public class ElementCount { - public ElementCount() - throws SAXException, ParserConfigurationException{ - final SAXParserFactory saxPf = SAXParserFactory.newInstance(); - final SAXParser saxParser = saxPf.newSAXParser(); - xmlReader = saxParser.getXMLReader(); - xmlReader.setContentHandler(eventHandler); <co - xml:id="programlisting_assemble_parser_setcontenthandler"/> - xmlReader.setErrorHandler(errorHandler); <co - xml:id="programlisting_assemble_parser_seterrorhandler"/> - } - public void parse(final String uri) - throws IOException, SAXException{ - xmlReader.parse(uri); <co - xml:id="programlisting_assemble_parser_invokeparse"/> - } - public int getElementCount() { - return eventHandler.getElementCount(); <co - xml:id="programlisting_assemble_parser_getelementcount"/> - } - private final XMLReader xmlReader; - private final MyEventHandler eventHandler = new MyEventHandler(); <co - xml:id="programlisting_assemble_parser_createeventhandler"/> - private final MyErrorHandler errorHandler = new MyErrorHandler(); <co - xml:id="programlisting_assemble_parser_createerrorhandler"/> -}</programlisting> - - <calloutlist> - <callout arearefs="programlisting_assemble_parser_setcontenthandler programlisting_assemble_parser_seterrorhandler"> - <para>Referring to <xref linkend="figureSax" os=""/> these two - calls attach the event- and error handler objects to the parser - thus implementing the two arrows from the parser to the - application's implementation.</para> - </callout> - - <callout arearefs="programlisting_assemble_parser_invokeparse"> - <para>The parser is invoked. Note that in this example we only - pass a document's URI but no reference to a handler object.</para> - </callout> - - <callout arearefs="programlisting_assemble_parser_getelementcount"> - <para>The method <function>getElementCount()</function> is needed - to allow a calling object to access the private - <varname>eventHandler</varname> object's - <function>getElementCount()</function> method.</para> - </callout> - - <callout arearefs="programlisting_assemble_parser_createeventhandler programlisting_assemble_parser_createerrorhandler"> - <para>An event handling and an error handling object are created - to handle events during the parsing process.</para> - </callout> - </calloutlist> - - <para>The careful reader might notice a subtle difference between the - content- and the error handler implementation: The class - <classname>sax.stat.v2.MyErrorHandler</classname> implements the - interface <classname>org.xml.sax.ErrorHandler</classname>. But - <classname>sax.stat.v2.MyEventHandler</classname> is derived from - <classname>org.xml.sax.helpers.DefaultHandler</classname> which itself - implements the <classname>org.xml.sax.ContentHandler</classname> - interface. Actually one might as well start from the latter interface - requiring to implement all of it's 11 methods. In most circumstances - this only complicates the application's code since it is unnecessary - to react to events belonging for example to processing instructions. - For this reason it is good coding practice to use the empty default - implementations in - <classname>org.xml.sax.helpers.DefaultHandler</classname> and to - redefine only those methods corresponding to events actually being - handled by the application in question.</para> - - <qandaset defaultlabel="qanda" xml:id="sda1SaxReadAttributes"> - <title>SAX and attribute values</title> - - <qandadiv> - <qandaentry> - <question> - <label>Reading an element's set of attributes.</label> - - <para>The example document instance does include <tag - class="attribute">orderNo</tag> attribute values for each <tag - class="starttag">item</tag> element. The parser does not yet - show these attribute keys and their corresponding values. Read - the documentation for <classname - xlink:href="">org.xml.sax.Attributes</classname> - and extend the given code to use it.</para> - - <para>You should start from the <xref linkend="glo_MIB"/> - Maven archetype <code>mi-maven-archetype-sax</code>. - Configuration hints are available at <uri - xlink:href=""></uri>.</para> - </question> - - <answer> - <para>For the given example it would suffice to read the known - <tag class="attribute">orderNo</tag> attributes value. A - generic solution may ask for the set of all defined attributes - and show their values:</para> - - <programlisting language="none">package sax; - -public class AttribEventHandler extends DefaultHandler { - - public void startElement(String namespaceUri, String localName, - String rawName, Attributes attrs) { - System.out.println("Opening Element " + rawName); - for (int i = 0; i < attrs.getLength(); i++){ - System.out.println(attrs.getQName(i) + "=" + attrs.getValue(i) + "\n"); - } - } -}</programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <section xml:id="sda1SecElementLists"> - <title>The set of element names</title> - - <qandaset defaultlabel="qanda" xml:id="sda1QandaElementNames"> - <title>Element lists of arbitrary XML documents.</title> - - <qandadiv> - <qandaentry> - <question> - <para>We reconsider the simple application reading arbitrary - XML documents and providing a list of XML Elements being - contained within:</para> - - <programlisting language="none">Opening Document -<emphasis role="bold">Opening "catalog"</emphasis> -Content " - " -<emphasis role="bold">Opening "item"</emphasis> -Content "Swinging headset" -Closing "item" -Content " ...</programlisting> - - <para>If an element like e.g. <tag - class="starttag">item</tag> appears multiple times it will - also be written to standard output multiple times.</para> - - <para>We are now interested to get the list of all elements - names being present in an arbitrary XML document. Consider - the following example:</para> - - <programlisting language="none"><memo> - <from> - <name>Martin</name> - <surname>Goik</surname> - </from> - <to> - <name>Adam</name> - <surname>Hacker</surname> - </to> - <to> - <name>Eve</name> - <surname>Intruder</surname> - </to> - <date year="2005" month="1" day="6"/> - <subject>Firewall problems</subject> - <content> - <para>Thanks for your excellent work.</para> - <para>Our firewall is definitely broken!</para> - </content> -</memo></programlisting> - - <para>The elements <tag class="starttag">to</tag> , <tag - class="starttag">name</tag>, <tag - class="starttag">surname</tag> and <tag - class="starttag">para</tag> both appear multiple times. - Write a SAX application which processes arbitrary XML - documents and creates an alphabetically sorted list of - elements being contained <emphasis role="bold">excluding - duplicates</emphasis>. The intended output for the above - example is:</para> - - <programlisting language="none">List of elements: {content date from memo name para subject surname to }</programlisting> - - <para>The corresponding handler should be implemented in a - re-usable way. Thus if different XML documents are being - handled in succession the list of elements should be erased - prior to processing the current document. Hints:</para> - - <itemizedlist> - <listitem> - <para>Use a <classname>java.util.SortedSet</classname> - instance to collect element names thereby excluding - duplicates.</para> - </listitem> - - <listitem> - <para>The method - <methodname>sax.count.ListTagNamesHandler.startDocument()</methodname> - may be used to initialize your handler.</para> - </listitem> - </itemizedlist> - </question> - - <answer> - <para>A suitable handler reads:</para> - - <programlisting language="none">package sax.count; - -import java.util.SortedSet; -import java.util.TreeSet; - -import org.xml.sax.Attributes; -import org.xml.sax.SAXException; -import org.xml.sax.helpers.DefaultHandler; - -/** Reading attributes from element events */ -public class ListTagNamesHandler extends DefaultHandler { - - // A SortedSet by definition does not contain any duplicates. - private SortedSet<String> elementNames = new TreeSet<>(); - - @Override - public void startDocument() throws SAXException { - elementNames.clear(); // May contain elements from a previous run. - } - - public void startElement(String namespaceUri, String localName, - String rawName, Attributes attrs) { - // In case the current element name has already been inserted - // this method call will be silently ignored. - elementNames.add(rawName); - } - - /** - * @return A sorted list of element names of he currently processed XML - * document without duplicates. - */ - public String[] getTagNames() { - return elementNames.toArray(new String[0]); - } -}</programlisting> - - <para>A complete application requires a driver:</para> - - <programlisting language="none">package sax.count; - -import javax.xml.parsers.SAXParser; -import javax.xml.parsers.SAXParserFactory; - -import org.xml.sax.XMLReader; - -import sax.stat.v2.MyErrorHandler; - -public class Driver { - - public static void main(String argv[]) throws Exception { - - final SAXParserFactory saxPf = SAXParserFactory.newInstance(); - final SAXParser saxParser = saxPf.newSAXParser(); - final XMLReader xmlReader = saxParser.getXMLReader(); - final ListTagNamesHandler handler = new ListTagNamesHandler(); - xmlReader.setContentHandler(handler); - xmlReader.setErrorHandler(new MyErrorHandler()); - xmlReader.parse("Input/Xml/Memo/message.xml"); - - System.out.print("List of elements: {"); - for (String elementName : handler.getTagNames()) { - System.out.print(elementName + " "); - } - System.out.println("}"); - } -}</programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="sda1SaxView"> - <title>A limited view on a given XML document instance</title> - - <qandaset defaultlabel="qanda" xml:id="sda1QandamemoView"> - <title>A specific view on memo documents</title> - - <qandadiv> - <qandaentry> - <question> - <para>We reconsider the following memo instance:</para> - - <programlisting language="none"><memo> - <from> - <name>Martin</name> - <surname>Goik</surname> - </from> - <to> - <name>Adam</name> - <surname>Hacker</surname> - </to> - <to> - <name>Eve</name> - <surname>Intruder</surname> - </to> - <date year="2005" month="1" day="6"/> - <subject>Firewall problems</subject> - <content> - <para>Thanks for your excellent work.</para> - <para>Our firewall is definitely broken!</para> - </content> -</memo></programlisting> - - <para>Every memo instance does have exactly one sender and - one subject. Write a SAX application to achieve the - following output:</para> - - <programlisting language="none">Sender: Martin Goik -Subject: Firewall problems</programlisting> - - <para>Hint: The callback implementation of - <methodname>org.xml.sax.helpers.DefaultHandler.characters(char[],int,int)</methodname> - may be used to filter the desired output. You have to limit - its output to <tag class="starttag">from</tag> and <tag - class="starttag">subject</tag> descendant content. Taking - the <tag class="starttag">subject</tag>Firewall problems<tag - class="endtag">subject</tag> element as an example the - corresponding event sequence reads:</para> - - <informaltable border="1"> - <tr> - <th>Event</th> - - <th>Corresponding callback</th> - </tr> - - <tr> - <td>...</td> - - <td>...</td> - </tr> - - <tr> - <td>Opening <tag class="starttag">subject</tag> - element</td> - - <td>startElement(...)</td> - </tr> - - <tr> - <td>Firewall problems</td> - - <td>characters(...)</td> - </tr> - - <tr> - <td>Closing <tag class="endtag">subject</tag> - element</td> - - <td>endElement(...)</td> - </tr> - - <tr> - <td>...</td> - - <td>...</td> - </tr> - </informaltable> - - <para>Limiting output of our - <methodname>org.xml.sax.helpers.DefaultHandler.characters(char[],int,int)</methodname> - callback method can be achieved by introducing instance - scope boolean variables being set to true or false inside - your - <methodname>org.xml.sax.helpers.DefaultHandler.startElement(String - uri,String localName,String qName,org.xml.sax.Attributes - attributes)</methodname> and - <methodname>org.xml.sax.helpers.DefaultHandler.endElement(String - uri, String localName, String qName)</methodname> - implementations accordingly to keep track of the current - event state.</para> - </question> - - <answer> - <programlisting language="none">package sax.view; -... -/** A view on memo documents restricting to sender name an subject. */ -public class MemoViewHandler extends DefaultHandler { - - // These variables help us to keep track of the current event state spanning - // each startElement(...) -- character(...) -- endElement(...) event sequence - boolean inFromContext = false, - inSubjectContext = false; - - public void startElement(String namespaceUri, String localName, - String rawName, Attributes attrs) { - switch(rawName) { - case "from": - inFromContext = true; - System.out.print("Sender: "); - break; - case "subject": - inSubjectContext = true; - System.out.print("Subject: "); - break; - case "surname": - if (inFromContext) { - System.out.print(" "); // Adding additional space between <name> and <surname> content. - } - break; - } - } - - @Override - public void endElement(String uri, String localName, String rawName) - throws SAXException { - switch(rawName) { - case "from": - inFromContext = false; - System.out.println(); - break; - case "subject": - inSubjectContext = false; - System.out.println(); - break; - } - } - - @Override - public void characters(char[] ch, int start, int length) throws SAXException { - if (inFromContext || inSubjectContext) { - System.out.print(new String(ch, start, length)); - } - } -}</programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - </section> - - <section xml:id="saxValidate"> - <title><acronym xlink:href="">SAX</acronym> - validation</title> - - <para>So far we only parsed well formed document instances. Our - current parser may operate on valid XML instances:</para> - - <figure xml:id="saxNotValid"> - <title>An invalid XML document.</title> - - <programlisting language="none"><xs:element name="catalog"> - <xs:complexType> - <xs:sequence> - <xs:element ref="item"/> - </xs:sequence> - </xs:complexType> -</xs:element> - -<xs:element name="item"> - <xs:complexType mixed="true"> - <xs:attribute name="orderNo" type="xs:int" use="required"/> - </xs:complexType> -</xs:element></programlisting> - - <programlisting language="none"><catalog> - <item orderNo="3218">Swinging headset</item> - <item orderNo="9921">200W Stereo Amplifier</item> <emphasis - role="bold"><!-- second entry forbidden by schema --></emphasis> -</catalog></programlisting> - - <caption> - <para>In contrast to <xref linkend="saxMissItem"/> this document - is well formed. But it is not <emphasis - role="bold">valid</emphasis> with respect to the schema since more - than one <tag class="starttag">item</tag> elements are - present.</para> - </caption> - </figure> - - <para>This document instance is well-formed but not valid: Only one - element <tag class="starttag">item</tag> is allowed due to an - ill-defined schema. The parser will not report any error or warning. - In order to enable validation we need to configure our parser:</para> - - <programlisting language="none">xmlReader.setFeature("", true);</programlisting> - - <para>The string <code></code> - serves as a key. Since this is an ordinary string value a parser may - or may not implement it. The <acronym - xlink:href="">SAX</acronym> standard defines - two exception classes for dealing with feature related errors:</para> - - <variablelist> - <varlistentry> - <term><link - xlink:href="">SAXNotRecognizedException</link></term> - - <listitem> - <para>The feature is not known to the parser.</para> - </listitem> - </varlistentry> - - <varlistentry> - <term><link - xlink:href="">SAXNotSupportedException</link></term> - - <listitem> - <para>The feature is known to the parser but the parser does not - support it or it does not support a specific value being set as - a value.</para> - </listitem> - </varlistentry> - </variablelist> - - <para>The <productname - xlink:href="">xml-commons - resolver project </productname>offers an implementation being able to - process various catalog file formats. Maven based project allow the - corresponding library import by adding the following - dependency:</para> - - <programlisting language="none"><dependency> - <groupId>xml-resolver</groupId> - <artifactId>xml-resolver</artifactId> - <version>1.2</version> -</dependency></programlisting> - - <para>We need a properties file <link - xlink:href=""></link> - defining XML catalogs to be used and additional parameters:</para> - - <programlisting language="none"># Catalogs are relative to this properties file -relative-catalogs=false -# Catalog list - -catalogs=\ -/usr/share/eclipse/dropins/oxygenxml.oxygen_14.2/plugins/com.oxygenxml.editor_14.2.0.v2013021115/frameworks/xhtml/dtd/xhtmlcatalog.xml;\ -/usr/share/eclipse/dropins/oxygenxml.oxygen_14.2/plugins/com.oxygenxml.editor_14.2.0.v2013021115/frameworks/xhtml11/dtd/xhtmlcatalog.xml -# PUBLIC in favour of SYSTEM -prefer=public</programlisting> - - <para>This configuration uses some catalogs from the - <trademark>Oxygen</trademark> <trademark>Eclipse</trademark> plugin. - We may now add a resolver to our SAX application by referencing the - above configuration file <coref linkend="resolverPropertyFile"/> and - registering the resolver to our SAX parser instance <coref - linkend="resolverRegister"/>:</para> - - <programlisting language="none">xmlReader = saxParser.getXMLReader(); - - // Set up resolving PUBLIC identifier - final CatalogManager cm = new CatalogManager("<emphasis role="bold"></emphasis>" <co - xml:id="resolverPropertyFile"/> ); - final CatalogResolver resolver = new CatalogResolver(cm); - xmlReader.setEntityResolver(resolver) <co xml:id="resolverRegister"/>;</programlisting> - </section> - - <section xml:id="saxNamespace"> - <title>Namespaces</title> - - <para>In order to make a <acronym - xlink:href="">SAX</acronym> parser - application namespace aware we have to activate two <acronym - xlink:href="">SAX</acronym> parsing - features:</para> - - <programlisting language="none">xmlReader = saxParser.getXMLReader(); -xmlReader.setFeature("", true); -xmlReader.setFeature("", true);</programlisting> - - <para>This instructs the parser to pass the namespace's name for each - element. Namespace prefixes like <code>xsl</code> in <tag - class="starttag">xsl:for-each</tag> are also passed and may be used by - an application:</para> - - <programlisting language="none">package sax; -... -public class NamespaceEventHandler extends DefaultHandler { -... - public void startElement(String <emphasis role="bold">namespaceUri</emphasis>, String localName, - String rawName, Attributes attrs) { - System.out.println("Opening Element rawName='" + rawName + "'\n" - + "namespaceUri='" + <emphasis role="bold">namespaceUri</emphasis> + "'\n" - + "localName='" + localName - + "'\n--------------------------------------------"); -}</programlisting> - - <para>As an example we take a XSLT script:</para> - - <programlisting language="none"><?xml version="1.0" encoding="utf-8"?> -<xsl:stylesheet version="1.0" - xmlns:xsl='' - xmlns:fo=''> - - <xsl:template match="/"> - <fo:block>A block</fo:block> - <HTML/> - </xsl:template> - -</xsl:stylesheet></programlisting> - - <para>This XSLT script being conceived as a XML document instance - contains elements belonging to two different namespaces namely - <code></code> and - <code></code>. The script also - contains a <quote>raw</quote> <tag audience="" - class="emptytag">HTML</tag> element being introduced only for - demonstration purposes belonging to the default namespace. The result - reads:</para> - - <programlisting language="none">Opening Element rawName='xsl:stylesheet' -namespaceUri='' -localName='stylesheet' --------------------------------------------- -Opening Element rawName='xsl:template' -namespaceUri='' -localName='template' --------------------------------------------- -Opening Element rawName='fo:block' -namespaceUri='' -localName='block' --------------------------------------------- -Opening Element rawName='HTML' -namespaceUri='' -localName='HTML'</programlisting> - - <para>Now the parser tells us to which namespace a given element node - belongs to. A XSLT engine for example uses this information to build - two classes of elements:</para> - - <itemizedlist> - <listitem> - <para>Elements belonging to the namespace - <code></code> like <tag - class="emptytag">xsl:value-of select="..."</tag> have to be - interpreted as instructions by the processor.</para> - </listitem> - - <listitem> - <para>Elements <emphasis role="bold">not</emphasis> belonging to - the namespace <code></code> - like <tag class="emptytag">html</tag> or <tag - class="starttag">fo:block</tag> are copied <quote>as is</quote> to - the output.</para> - </listitem> - </itemizedlist> - - <qandaset defaultlabel="qanda" xml:id="quandaentry_SqlFromXml"> - <title>Generating SQL INSERT statements from XML data</title> - - <qandadiv> - <qandaentry> - <question> - <para>Consider the following schema and document instance - example:</para> - - <figure xml:id="catalogProductDescriptionsExample"> - <title>A sample catalog containing products and - corresponding descriptions.</title> - - <programlisting language="none"><xs:element name="catalog"> - <xs:complexType> - <xs:sequence> - <xs:element ref="product" minOccurs="0" maxOccurs="unbounded"/> - </xs:sequence> - </xs:complexType> -</xs:element> - -<xs:element name="product"> - <xs:complexType> - <xs:sequence> - <xs:element name="name" type="xs:string"/> - <xs:element name="description" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> - <xs:element name="age" type="xs:int" minOccurs="0" maxOccurs="1"/> - </xs:sequence> - <xs:attribute name="id" type="xs:ID" use="required"/> - </xs:complexType> -</xs:element></programlisting> - - <programlisting language="none"><catalog ... xsi:noNamespaceSchemaLocation="catalog.xsd"> - <product id="mpt"> - <name>Monkey Picked Tea</name> - <description>Rare wild Chinese tea</description> - <description>Picked only by specially trained monkeys</description> - </product> - <product id="instantTent"> - <name>4-Person Instant Tent</name> - <description>4-person, 1-room tent</description> - <description>Pre-attached tent poles</description> - <description>Exclusive WeatherTec system.</description> - <age>15</age> - </product> -</catalog></programlisting> - </figure> - - <para>Data being contained in catalog instances shall be - transferred to a relational database system. Implement and - test a <link linkend="gloss_SAX"><abbrev>SAX</abbrev></link> - application by following the subsequently described - steps:</para> - - <glosslist> - <glossentry> - <glossterm>Database schema</glossterm> - - <glossdef> - <para>Create a database schema matching a product of - your choice (<productname>Mysql</productname>, - <productname>Oracle</productname>, ...). Your schema - should map type and integrity constraints of the given - DTD. In particular:</para> - - <itemizedlist> - <listitem> - <para>The element <tag class="starttag">age</tag> is - optional.</para> - </listitem> - - <listitem> - <para><tag class="starttag">description</tag> - elements are children of <product> elements - and should thus be modeled by a 1:n relation.</para> - </listitem> - - <listitem> - <para>In a catalog the order of descriptions of a - given product matters. Thus your schema should allow - for descriptions being ordered.</para> - </listitem> - </itemizedlist> - </glossdef> - </glossentry> - - <glossentry> - <glossterm>SAX Application</glossterm> - - <glossdef> - <para>The order of appearance of the XML elements <tag - class="starttag">product</tag>, <tag - class="starttag">name</tag> and <tag - class="starttag">age</tag> does not permit a linear - generation of suitable SQL <code>INSERT</code> - statements by a <link - linkend="gloss_SAX"><abbrev>SAX</abbrev></link> content - handler. Instead you will have to keep copies of local - element values when implementing - <methodname>org.xml.sax.ContentHandler.startElement(String,String,String,org.xml.sax.Attributes)</methodname> - and related callback methods. The following sequence of - insert statements corresponds to the XML data being - contained in <xref - linkend="catalogProductDescriptionsExample"/>. You may - use these statements as a blueprint to be generated by - your <link - linkend="gloss_SAX"><abbrev>SAX</abbrev></link> - application:</para> - - <programlisting language="none"><emphasis role="bold">INSERT INTO Product VALUES ('mpt', 'Monkey picked tea', NULL);</emphasis> -INSERT INTO Description VALUES('mpt', 0, 'Picked only by specially trained monkeys'); -INSERT INTO Description VALUES('mpt', 1, 'Rare wild Chinese tea'); - -<emphasis role="bold">INSERT INTO Product VALUES ('instantTent', '4-person instant tent', 15);</emphasis> -INSERT INTO Description VALUES('instantTent', 0, 'Exclusive WeatherTec system.'); -INSERT INTO Description VALUES('instantTent', 1, '4-person, 1-room tent'); -INSERT INTO Description VALUES('instantTent', 2, 'Pre-attached tent poles');</programlisting> - - <para>Provide a suitable <xref linkend="glo_Junit"/> - test.</para> - </glossdef> - </glossentry> - </glosslist> - </question> - - <answer> - <annotation role="make"> - <para role="eclipse">P/catalog2sql</para> - </annotation> - - <para>Running this project and executing tests requires the - following Maven project dependency to be installed (e.g. - locally via <command>mvn</command> <option>install</option>) - to satisfy a dependency:</para> - - <annotation role="make"> - <para role="eclipse">P/saxerrorhandler</para> - </annotation> - - <para>Some remarks are in order here:</para> - - <orderedlist> - <listitem> - <para>The <xref linkend="glo_SQL"/> database schema might - read:</para> - - <programlisting language="sql">CREATE TABLE Product ( - id CHAR(20) NOT NULL PRIMARY KEY <co linkends="catalog2sqlSchema-1" - xml:id="catalog2sqlSchema-1-co"/> - ,name VARCHAR(255) NOT NULL - ,age SMALLINT <co linkends="catalog2sqlSchema-2" - xml:id="catalog2sqlSchema-2-co"/> -); - -CREATE TABLE Description ( - product CHAR(20) NOT NULL REFERENCES Product <co - linkends="catalog2sqlSchema-3" - xml:id="catalog2sqlSchema-3-co"/> - ,orderIndex int NOT NULL <co linkends="catalog2sqlSchema-4" - xml:id="catalog2sqlSchema-4-co"/> -- preserving the order of descriptions belonging to a given product - ,text VARCHAR(255) NOT NULL - ,UNIQUE(product, orderIndex) <co linkends="catalog2sqlSchema-5" - xml:id="catalog2sqlSchema-5-co"/> -);</programlisting> - - <calloutlist> - <callout arearefs="catalog2sqlSchema-1-co" - xml:id="catalog2sqlSchema-1"> - <para>The primary key constraint implements the - uniqueness of <tag class="starttag">product - id='xyz'</tag> values</para> - </callout> - - <callout arearefs="catalog2sqlSchema-2-co" - xml:id="catalog2sqlSchema-2"> - <para>Nullability of <code>age</code> implements <tag - class="starttag">age</tag> elements being - optional.</para> - </callout> - - <callout arearefs="catalog2sqlSchema-3-co" - xml:id="catalog2sqlSchema-3"> - <para><tag class="starttag">description</tag> elements - being children of <tag class="starttag">product</tag> - are being implemented by a foreign key to its - identifying owner thus forming weak entities.</para> - </callout> - - <callout arearefs="catalog2sqlSchema-4-co" - xml:id="catalog2sqlSchema-4"> - <para>The attribute <code>orderIndex</code> allows - descriptions to be sorted thus maintaining the - original order of appearance of <tag - class="starttag">description</tag> elements.</para> - </callout> - - <callout arearefs="catalog2sqlSchema-5-co" - xml:id="catalog2sqlSchema-5"> - <para>The <code>orderIndex</code> attribute is unique - within the set of descriptions belonging to the same - product.</para> - </callout> - </calloutlist> - </listitem> - - <listitem> - <para>The result of the given input XML sample file should - be similar to the content of the supplied reference file - <filename>products.reference.xml</filename>:</para> - - <programlisting language="sql">INSERT INTO Product (id, name) VALUES ('mpt', 'Monkey Picked Tea'); -INSERT INTO Description VALUES('mpt', 0, 'Rare wild Chinese tea'); -INSERT INTO Description VALUES('mpt', 1, 'Picked only by specially trained monkeys'); --- end of current product entry -- - -INSERT INTO Product VALUES ('instantTent', '4-Person Instant Tent', 15); -INSERT INTO Description VALUES('instantTent', 0, '4-person, 1-room tent'); -INSERT INTO Description VALUES('instantTent', 1, 'Pre-attached tent poles'); -INSERT INTO Description VALUES('instantTent', 2, 'Exclusive WeatherTec system.'); --- end of current product entry --</programlisting> - - <para>So a <xref linkend="glo_Junit"/> test may just - execute the XML to SQL converter and then compare the - effective output to the above reference file.</para> - </listitem> - </orderedlist> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <qandaset defaultlabel="qanda" xml:id="quandaentry_NumElemByNs"> - <title>Counting element names grouped by namespaces</title> - - <qandadiv> - <qandaentry> - <question> - <para>We want to extend the SAX examples counting <link - linkend="saxElementCount">elements</link> and <link - linkend="exercise_saxAttrib">attributes</link> of arbitrary - document instances. Consider the following XSL sample document - containing <link linkend="gloss_XHTML">XHTML</link> :</para> - - <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> -<xsl:stylesheet xmlns:xsl="" - xmlns:xs="" <co - xml:id="xhtmlCombinedNs_Svg"/> - xmlns:h="" <co xml:id="xhtmlCombinedNs_Xhtml"/> - exclude-result-prefixes="xs" version="2.0"> - - <xsl:template match="/"> - <h:html> - <h:head> - <h:title></h:title> - </h:head> - <h:body> - <h:h1>A heading</h:h1> - <h:p>A paragraph</h:p> - <h:h1>Yet another heading</h:h1> - <xsl:apply-templates/> - </h:body> - </h:html> - </xsl:template> - - <xsl:template match="*"> - <xsl:message> - <xsl:text>No template defined for element '</xsl:text> - <xsl:value-of select="name(.)"/> - <xsl:text>'</xsl:text> - </xsl:message> - </xsl:template> - -</xsl:stylesheet></programlisting> - - <para>This XSL stylesheet defines two different namespaces - <coref linkend="xhtmlCombinedNs_Svg"/> and <coref - linkend="xhtmlCombinedNs_Xhtml"/>.</para> - - <para>Implement a <link linkend="gloss_SAX">SAX</link> - application being able to group elements from arbitrary XML - documents by namespaces along with their corresponding - frequencies of occurrence. The intended output for the - previous <xref linkend="glo_XSL"/> example shall look - like:</para> - - <programlisting language="none">Namespace '<emphasis - role="bold"></emphasis>' contains: -<head> (1 occurrence) -<p> (1 occurrence) -<h1> (2 occurrences) -<html> (1 occurrence) -<title> (1 occurrence) -<body> (1 occurrence) - -Namespace '<emphasis role="bold"></emphasis>' contains: -<stylesheet> (1 occurrence) -<template> (2 occurrences) -<value-of> (1 occurrence) -<apply-templates> (1 occurrence) -<text> (2 occurrences) -<message> (1 occurrence)</programlisting> - - <para>Hint: Counting frequencies and grouping by namespaces - may be achieved by using standard Java container - implementations of <classname>java.util.Map</classname>. You - may for example define sets of related XML elements and group - them by their corresponding namespaces. Thus nested maps are - being required.</para> - </question> - - <answer> - <annotation role="make"> - <para role="eclipse">P/xmlstatistics</para> - </annotation> - - <para>Running this project and executing tests requires the - following Maven project dependency to be installed (e.g. - locally via <command>mvn</command> <option>install</option>) - to satisfy the following dependency:</para> - - <annotation role="make"> - <para role="eclipse">P/saxerrorhandler</para> - </annotation> - - <para>The above solution contains both a running application - and a (incomplete) <xref linkend="glo_Junit"/> test.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - </section> - - <section xml:id="dom"> - <title>The Document Object Model (<acronym - xlink:href="">DOM</acronym>)</title> - - <titleabbrev><acronym - xlink:href="">DOM</acronym></titleabbrev> - - <section xml:id="domBase"> - <title>Language independent specification</title> - - <titleabbrev>Language independence</titleabbrev> - - <para>XML documents allow for automated content processing. We already - discussed the <acronym - xlink:href="">SAX</acronym> API to access XML - documents by <link - linkend="gloss_Java"><trademark>Java</trademark></link> applications. - There are however situations where <acronym - xlink:href="">SAX</acronym> is not - appropriate:</para> - - <itemizedlist> - <listitem> - <para>The <acronym - xlink:href="">SAX</acronym> is event - based. XML node elements are passed to handler methods. Sometimes - we want to access neighbouring nodes from a context node in our - handler methods for example a <tag class="starttag">title</tag> - following a <tag class="starttag">chapter</tag> node. <acronym - xlink:href="">SAX</acronym> does not - offer any support for this. If we need references to neighbouring - nodes we have to create them ourselves during a <acronym - xlink:href="">SAX</acronym> parsing run. - This is tedious and leads to code being hard to understand.</para> - </listitem> - - <listitem> - <para>Some applications may want to select node sets by <acronym - xlink:href="">XPath</acronym> - expressions which is completely impossible in a <acronym - xlink:href="">SAX</acronym> - application.</para> - </listitem> - - <listitem> - <para>We may want to move subtrees within a document itself (for - example exchanging two <tag class="starttag">chapter</tag> nodes) - or even transferring them to a different document.</para> - </listitem> - </itemizedlist> - - <para>The greatest deficiency of the <acronym - xlink:href="">SAX</acronym> is the fact that - an XML instance is not represented as a tree like structure but as a - succession of events. The <acronym - xlink:href="">DOM</acronym> allows us to - represent XML document instances as tree like structures and thus - enables navigational operations between nodes.</para> - - <para>In order to achieve language <emphasis>and</emphasis> software - vendor independence the <acronym - xlink:href="">DOM</acronym> approach uses two - stages:</para> - - <itemizedlist> - <listitem> - <para>The <acronym - xlink:href="">DOM</acronym> is formulated in - an Interface Definition Language (<abbrev - xlink:href="">IDL</abbrev>)</para> - </listitem> - - <listitem> - <para>In order to use the <acronym - xlink:href="">DOM</acronym> API by a concrete - programming language a so called <emphasis>language - binding</emphasis> is required. In languages like <link - linkend="gloss_Java"><trademark>Java</trademark></link> the - language binding will still be a set of (<link - linkend="gloss_Java"><trademark>Java</trademark></link>) - interfaces. Thus for actually coding an application an - implementation of these interfaces is needed</para> - </listitem> - </itemizedlist> - - <para>So what exactly may an <abbrev - xlink:href="">IDL</abbrev> - be? The programming language <link - linkend="gloss_Java"><trademark>Java</trademark></link> already allows - pure interface definitions without any implementation. In C++ the same - result can be achieved by so called <emphasis>pure virtual - classes</emphasis>. An <abbrev - xlink:href="">IDL</abbrev> - offers extended features to describe such interfaces. For <acronym - xlink:href="">DOM</acronym> the <productname - xlink:href="">CORBA - 2.2</productname> <abbrev - xlink:href="">IDL</abbrev> - had been chosen to describe an XML document programming interface. As - a first example we take an excerpt from the <acronym - xlink:href="">DOM</acronym>'s <link - xlink:href="">Node</link> - interface definition:</para> - - <programlisting language="none">interface Node { - // NodeType - const unsigned short ELEMENT_NODE = 1; - const unsigned short ATTRIBUTE_NODE = 2; - const unsigned short TEXT_NODE = 3; - ... - - readonly attribute DOMString nodeName; - attribute DOMString nodeValue; - // raises(DOMException) on setting - // raises(DOMException) on retrieval - readonly attribute unsigned short nodeType; - readonly attribute Node parentNode; - ... - readonly attribute NodeList childNodes; - readonly attribute Node firstChild; - ... - Node insertBefore(in Node newChild, - in Node refChild) - raises(DOMException); - ...</programlisting> - - <para>If we want to implement the <abbrev - xlink:href="">IDL</abbrev> - <classname>org.w3c.dom.Node</classname> specification in e.g. <link - linkend="gloss_Java"><trademark>Java</trademark></link> a language - binding has to be defined. This means writing <link - linkend="gloss_Java"><trademark>Java</trademark></link> code which - closely resembles the <abbrev - xlink:href="">IDL</abbrev> - specification. Obviously this task depends on and is restricted by the - constructs being offered by the target programming language. The W3C - <link - xlink:href="">defines</link> - the <link linkend="gloss_Java"><trademark>Java</trademark></link> - <classname>org.w3c.dom.Node</classname> interface by:</para> - - <programlisting language="none">package org.w3c.dom; - -public interface Node { - public static final short ELEMENT_NODE = 1; // Node Types - public static final short ATTRIBUTE_NODE = 2; - public static final short TEXT_NODE = 3; - ... - public String getNodeName(); - public String getNodeValue() throws DOMException; - public void setNodeValue(String nodeValue) throws DOMException; - public short getNodeType(); - public Node getParentNode(); - public NodeList getChildNodes(); - public Node getFirstChild(); - ... - public Node insertBefore(Node newChild, - Node refChild) - throws DOMException; - ... - }</programlisting> - - <para>We take - <methodname>org.w3c.dom.Node.getChildNodes()</methodname> as an - example:</para> - - <figure xml:id="domRetrieveChildren"> - <title>Retrieving child nodes of a given context node</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/domtree.fig" scale="65"/> - </imageobject> - </mediaobject> - </figure> - - <para>The <classname>org.w3c.dom.Node</classname> interface offers a - set of common operations for objects being part of a XML document. But - a XML document tree contains different types of nodes such as:</para> - - <itemizedlist> - <listitem> - <para>Elements</para> - </listitem> - - <listitem> - <para>Attributes</para> - </listitem> - - <listitem> - <para>Entities</para> - </listitem> - </itemizedlist> - - <para>An XML API may address this issue by offering data types to - represent these different kinds of nodes. The <acronym - xlink:href="">DOM</acronym> <link - linkend="gloss_Java"><trademark>Java</trademark></link> Binding - defines an inheritance hierarchy of interfaces for this - purpose:</para> - - <figure xml:id="domJavaNodeInterfaces"> - <title>Inheritance interface hierarchy in the <acronym - xlink:href="">DOM</acronym> <link - linkend="gloss_Java"><trademark>Java</trademark></link> - binding</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/nodeHierarchy.svg"/> - </imageobject> - </mediaobject> - </figure> - - <para>Two commonly used <link - linkend="gloss_Java"><trademark>Java</trademark></link> - implementations of these interfaces are:</para> - - <variablelist> - <varlistentry> - <term>Xerces</term> - - <listitem> - <para><orgname - xlink:href="">Apache Software - foundation</orgname></para> - </listitem> - </varlistentry> - - <varlistentry> - <term>Jaxp</term> - - <listitem> - <para><orgname xlink:href="">Sun - microsystems</orgname></para> - </listitem> - </varlistentry> - </variablelist> - - <para>Both implementations offer additional interfaces beyond the - <acronym xlink:href="">DOM</acronym>'s - scope.</para> - - <para>Going back to the <acronym - xlink:href="">DOM</acronym> itself the - specification is divided into <link - xlink:href="">modules</link>:</para> - - <figure xml:id="figureDomModules"> - <title><acronym xlink:href="">DOM</acronym> - modules.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/dom-architecture.screen.png"/> - </imageobject> - </mediaobject> - </figure> - </section> - - <section xml:id="domCreate"> - <title>Creating a new document from scratch</title> - - <titleabbrev>New document</titleabbrev> - - <para>If we want to export non-XML content (e.g. from a RDBMS) into - XML we may achieve this by the following recipe:</para> - - <orderedlist> - <listitem> - <para>Create a document builder instance.</para> - </listitem> - - <listitem> - <para>Create an empty <link - xlink:href="">Document</link> - instance.</para> - </listitem> - - <listitem> - <para>Fill in the desired Elements and Attributes.</para> - </listitem> - - <listitem> - <para>Create a serializer.</para> - </listitem> - - <listitem> - <para>Serialize the resulting tree to a stream.</para> - </listitem> - </orderedlist> - - <para>An introductory piece of code illustrates these steps:</para> - - <figure xml:id="simpleDomCreate"> - <title>Creation of a XML document instance from scratch.</title> - - <programlisting language="none">package dom; -... -public class CreateDoc { - public static void main(String[] args) throws Exception { - - // Create the root element - <emphasis role="bold">final Element titel = new Element("titel"); -</emphasis> - //Set a date - <emphasis role="bold">titel.setAttribute("date", "23.02.2000");</emphasis> - - // Append a text node as child - <emphasis role="bold">titel.addContent(new Text("Versuch 1"));</emphasis> - - - // Set formatting for the XML output - <emphasis role="bold">final Format outFormat = Format.getPrettyFormat();</emphasis> - - // Serialize to console - <emphasis role="bold">final XMLOutputter printer = new XMLOutputter(outFormat); - printer.output(titel, System.out);</emphasis> - } -}</programlisting> - </figure> - - <para>We get the following result:</para> - - <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> -<titel date="23.02.2000">Versuch 1</titel></programlisting> - </section> - - <section xml:id="domCreateExercises"> - <title>Exercises</title> - - <qandaset defaultlabel="qanda" xml:id="createDocModify"> - <title>A sub structured <tag class="starttag">title</tag></title> - - <qandadiv> - <qandaentry> - <question> - <label>Creation of an extended XML document instance</label> - - <para>In order to run the examples given during the lecture - the <filename - xlink:href="">jdom2.jar</filename> - library must be added to the <envar>CLASSPATH</envar>.</para> - - <para>The <acronym - xlink:href="">DOM</acronym> creating - example given before may be used as a starting point. Extend - the <acronym xlink:href="">DOM</acronym> - tree created in <xref linkend="simpleDomCreate"/> to produce - an extended XML document:</para> - - <programlisting language="none"><title> - <long>The long version of this title</long> - <short>Short version</short> -</title></programlisting> - </question> - - <answer> - <programlisting language="none">package dom; -... -public class CreateExtended { - /** - * @param args - * @throws IOException - */ - public static void main(String[] args) throws IOException { - - final Element titel = new Element("titel"), - tLong = new Element("long"), - tShort = new Element("short"); - - <emphasis role="bold">// Append <long> and <short> to parent <title></emphasis> - titel.addContent(tLong).addContent(tShort); - - <emphasis role="bold">// Append text to <long> and <short></emphasis> - tLong.addContent(new Text("The long version of this title")); - tShort.addContent(new Text("Short version")); - - <emphasis role="bold">// Set formatting for the XML output</emphasis> - Format outFormat = Format.getPrettyFormat(); - - <emphasis role="bold">// Serialize to console</emphasis> - final XMLOutputter printer = new XMLOutputter(outFormat); - printer.output(titel, System.out); - } -}</programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="domParse"> - <title>Parsing existing XML documents</title> - - <titleabbrev>Parsing</titleabbrev> - - <para>We already used a <acronym - xlink:href="">SAX</acronym> to parse an XML - document. Rather than handling <acronym - xlink:href="">SAX</acronym> events ourselves - these events may be used to construct a <acronym - xlink:href="">DOM</acronym> representation of our - document. This work is done by an instance of. We use our catalog - example from <xref linkend="simpleCatalog"/> as an introductory - example.</para> - - <para>We already noticed the need for an - <classname>org.xml.sax.ErrorHandler</classname> object during <acronym - xlink:href="">SAX</acronym> processing. A - <acronym xlink:href="">DOM</acronym> Parser - requires a similar type of Object in order to react to parsing errors - in a meaningful way. In principle a <acronym - xlink:href="">DOM</acronym> parser implementor is - free to choose his implementation but most implementations are based - on top of a <acronym - xlink:href="">SAX</acronym> parser. For this - reason it was natural to choose a <acronym - xlink:href="">DOM</acronym> error handling - interface which is similar to a <acronym - xlink:href="">SAX</acronym> - <classname>org.xml.sax.ErrorHandler</classname>. The following code - serves the needs described before:</para> - - <figure xml:id="domTreeTraversal"> - <title>Accessing a XML Tree purely by <acronym - xlink:href="">DOM</acronym> methods.</title> - - <programlisting language="none">package dom; -... -public class ArticleOrder { - -<emphasis role="bold"> // Though we are playing DOM here, a <acronym - xlink:href="">SAX</acronym> parser still - // assembles our DOM tree.</emphasis> - private SAXBuilder builder = new SAXBuilder(); - - public ArticleOrder() { - <emphasis role="bold">// Though an ErrorHandler is not strictly required it allows - // for easierlocalization of XML document errors</emphasis> - builder.setErrorHandler(new MySaxErrorHandler(System.out));<co - linkends="domSetSaxErrorHandler-co" - xml:id="domSetSaxErrorHandler"/> - } - - /** Descending a catalog till its <item> elements. For each product - * its name and order number are being written to the output. - * @throws ... - */ - public void process(final String filename) throws JDOMException, IOException { - - <emphasis role="bold">// Parsing our XML file</emphasis> - final Document docInput =; - - <emphasis role="bold">// Accessing the document's root element</emphasis> - final Element docRoot = docInput.getRootElement(); - - <emphasis role="bold">// Accessing the <item> children of parent element <catalog></emphasis> - final List<Element> items = docRoot.getChildren(); // Element nodes only - for (final Element item : items) { - System.out.println("Article: " + item.getText() - + ", order number: " + item.getAttributeValue("orderNo")); - } ...</programlisting> - - <para>Note <coref linkend="domSetSaxErrorHandler" - xml:id="domSetSaxErrorHandler-co"/>: This is our standard <acronym - xlink:href="">SAX</acronym> error handler - implementing the <classname>org.xml.sax.ErrorHandler</classname> - interface.</para> - </figure> - - <para>Executing this method needs a driver instance providing an input - XML filename:</para> - - <programlisting language="none">package dom; -... -public class ArticleOrderDriver { - public static void main(String[] argv) throws Exception { - final ArticleOrder ao = new ArticleOrder(); - ao.process("<emphasis role="bold">Input/article.xml</emphasis>"); - } -}</programlisting> - - <para>This yields:</para> - - <programlisting language="none">Article: Swinging headset, order number: 3218 -Article: 200W Stereo Amplifier, order number: 9921</programlisting> - - <para>To illustrate the internal processes we take a look at the - sequence diagram:</para> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/sequenceDomParser.svg"/> - </imageobject> - </mediaobject> - - <qandaset defaultlabel="qanda" xml:id="exercise_domHtmlSimple"> - <title>Creating HTML output</title> - - <qandadiv> - <qandaentry> - <question> - <label>Simple HTML output</label> - - <para>Instead exporting simple text output in <xref - linkend="domTreeTraversal"/> we may also create HTML pages - like:</para> - - <programlisting language="none"><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> -<html> - <head> - <title>Available articles</title> - </head> - <body> - <h1>Available articles</h1> - <table> - <tbody> - <tr> - <th align="left">Article Description</th><th>Order Number</th> - </tr> - <tr> - <td align="left"><emphasis role="bold">Swinging headset</emphasis></td><td><emphasis - role="bold">3218</emphasis></td> - </tr> - <tr> - <td align="left"><emphasis role="bold">200W Stereo Amplifier</emphasis></td><td><emphasis - role="bold">9921</emphasis></td> - </tr> - </tbody> - </table> - </body> -</html></programlisting> - - <para>Instead of simply writing - <code>...println(<html>\n\t<head>...)</code> - statements you are expected to code a more sophisticated - solution. We may combine<xref linkend="createDocModify"/> and - <xref linkend="createDocModify"/>. The idea is reading the XML - catalog instance as a <acronym - xlink:href="">DOM</acronym> as before. - Then construct a <emphasis>second</emphasis> <acronym - xlink:href="">DOM</acronym> tree for the - desired HTML output and fill in the article information from - the first <acronym - xlink:href="">DOM</acronym> tree - accordingly.</para> - </question> - - <answer> - <para>We introduce a class - <classname>solve.dom.HtmlTree</classname>:</para> - - <programlisting language="none">package solve.dom; -... -package solve.dom; - -import; -import; - -import org.jdom2.DocType; -import org.jdom2.Document; -import org.jdom2.Element; -import org.jdom2.Text; -import org.jdom2.output.Format; -import org.jdom2.output.XMLOutputter; - -/** - * Holding a HTML DOM to produce output. - * @author goik - */ -public class HtmlTree { - - private Document htmlOutput; - private Element tableBody; - - public HtmlTree(final String titleText, - final String[] tableHeaderFields) { <co - linkends="programlisting_catalog2html_htmlskel_co" - xml:id="programlisting_catalog2html_htmlskel"/> - - DocType doctype = new DocType("html", - "-//W3C//DTD XHTML 1.0 Strict//EN", - ""); - - final Element htmlRoot = new Element("html"); <co - linkends="programlisting_catalog2html_tablehead_co" - xml:id="programlisting_catalog2html_tablehead"/> - htmlOutput = new Document(htmlRoot); - htmlOutput.setDocType(doctype); - - // We create a HTML skeleton including an "empty" table - final Element head = new Element("head"), - body = new Element("body"), - table = new Element("table"); - - htmlRoot.addContent(head).addContent(body); - - head.addContent(new Element("title").addContent(new Text(titleText))); - - body.addContent(new Element("h1").addContent(new Text(titleText))); - - body.addContent(table); - - - tableBody = new Element("tbody"); - table.addContent(tableBody); - - final Element tr = tableBody.addContent(new Element("tr")); - for (final String headerField: tableHeaderFields) { - tr.addContent(new Element("th").addContent(new Text(headerField))); - } - } - - public void appendItem(final String itemName, final String orderNo) {<co - linkends="programlisting_catalog2html_insertproduct_co" - xml:id="programlisting_catalog2html_insertproduct"/> - final Element tr = new Element("tr"); - tableBody.addContent(tr); - tr.addContent(new Element("td").addContent(new Text(itemName))); - tr.addContent(new Element("td").addContent(new Text(orderNo))); - } - public void serialize(PrintStream out){ - - // Set formatting for the XML output - final Format outFormat = Format.getPrettyFormat(); - - // Serialize to console - final XMLOutputter printer = new XMLOutputter(outFormat); - try { - printer.output(htmlOutput, System.out); - } catch (IOException e) { - e.printStackTrace(); - System.exit(1); - } - } - /** - * @return the table's <tbody> element - */ - public Element getTable() { - return tableBody; - } -} - - </programlisting> - - <calloutlist> - <callout arearefs="programlisting_catalog2html_htmlskel" - xml:id="programlisting_catalog2html_htmlskel_co"> - <para>A basic HTML skeleton is is being created:</para> - - <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" - ""> -<html xmlns=""> - <head> - <title>Available articles</title> - </head> - <body> - <h1>Available articles</h1> - <table> - <emphasis role="bold"><tbody></emphasis> <!-- Data to be inserted here in next step --> - <emphasis role="bold"></tbody></emphasis> - </table> - </body> -</html></programlisting> - - <para>The table containing the product's data is empty at - this point and thus invalid.</para> - </callout> - - <callout arearefs="programlisting_catalog2html_tablehead" - xml:id="programlisting_catalog2html_tablehead_co"> - <para>The table's header is appended but the actual data - from our two products is still missing:</para> - - <programlisting language="none">... <h1>Available articles</h1> - <table> - <tbody> - <tr> - <th>Article Description</th> - <th>Order Number</th> - <emphasis role="bold"></tr></emphasis><!-- Data to be appended after this row in next step --> - <emphasis role="bold"></tbody></emphasis> - </table> ...</programlisting> - </callout> - - <callout arearefs="programlisting_catalog2html_insertproduct" - xml:id="programlisting_catalog2html_insertproduct_co"> - <para>Calling - <methodname>solve.dom.HtmlTree.appendItem(String,String)</methodname> - once per product completes the creation of our HTML DOM - tree:</para> - - <programlisting language="none">... </tr> - <tr> - <td>Swinging headset</td> - <td>3218</td> - </tr> - <tr> - <td>200W Stereo Amplifier</td> - <td>9921</td> - </tr> - </tbody> ...</programlisting> - </callout> - </calloutlist> - - <para>The class <classname>solve.dom.Article2Html</classname> - reads the catalog data:</para> - - <programlisting language="none">package solve.dom; -... -public class Article2Html { - - private final SAXBuilder builder = new SAXBuilder(); - private final HtmlTree htmlResult; - - public Article2Html() { - - builder.setErrorHandler(new MySaxErrorHandler(System.out)); - - htmlResult = new HtmlTree("Available articles", new String[] { <co - linkends="programlisting_catalog2html_glue_createhtmldom_co" - xml:id="programlisting_catalog2html_glue_createhtmldom"/> - "Article Description", "Order Number" }); - } - - /** Read an Xml catalog instance and insert product names among with their - * order numbers into the HTML DOM. Then serialize HTML tree to a stream. - * - * @param - * filename of the Xml source. - * @param out - * The output stream for HTML serialization. - * @throws IOException - * @throws JDOMException - */ - public void process(final String filename, final PrintStream out) throws JDOMException, IOException{ - final List<Element> items = -; - - for (final Element item : items) { <co - linkends="programlisting_catalog2html_glue_prodloop_co" - xml:id="programlisting_catalog2html_glue_prodloop"/> - htmlResult.appendItem(item.getText(), item.getAttributeValue("orderNo")); <co - linkends="programlisting_catalog2html_glue_insertprod_co" - xml:id="programlisting_catalog2html_glue_insertprod"/> - } - htmlResult.serialize(out); <co - linkends="programlisting_catalog2html_glue_serialize_co" - xml:id="programlisting_catalog2html_glue_serialize"/> - } -}</programlisting> - - <calloutlist> - <callout arearefs="programlisting_catalog2html_glue_createhtmldom" - xml:id="programlisting_catalog2html_glue_createhtmldom_co"> - <para>Create an instance holding a HTML <acronym - xlink:href="">DOM</acronym> with a - table header containing the strings <emphasis>Article - Description</emphasis> and <emphasis>Order - Number</emphasis>.</para> - </callout> - - <callout arearefs="programlisting_catalog2html_glue_prodloop" - xml:id="programlisting_catalog2html_glue_prodloop_co"> - <para>Iterate over all product nodes.</para> - </callout> - - <callout arearefs="programlisting_catalog2html_glue_insertprod" - xml:id="programlisting_catalog2html_glue_insertprod_co"> - <para>Insert the product's name an order number into the - HTML <acronym - xlink:href="">DOM</acronym>.</para> - </callout> - - <callout arearefs="programlisting_catalog2html_glue_serialize" - xml:id="programlisting_catalog2html_glue_serialize_co"> - <para>Serialize the completed HTML <acronym - xlink:href="">DOM</acronym> tree to - the output stream.</para> - </callout> - </calloutlist> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="domJavaScript"> - <title>Using <acronym xlink:href="">DOM</acronym> - with HTML/Javascript</title> - - <para>Due to script language support in a variety of browsers we may - also use the <acronym xlink:href="">DOM</acronym> - to implement client side event handling. As an example we <link - xlink:href="Ref/src/tablesort.html">demonstrate</link> how a HTML - table can be made sortable by clicking on a header's column. The - example code along with the code description can be found at <uri - xlink:href=""></uri>.</para> - - <para>Quite remarkably there are only few ingredients required to - enrich an ordinary static HTML table with this functionality:</para> - - <itemizedlist> - <listitem> - <para>An external Javascript library has to be included via - <code><script type="text/javascript" - src="sorttable.js"></code></para> - </listitem> - - <listitem> - <para>Each sortable HTML table needs:</para> - - <itemizedlist> - <listitem> - <para>A unique <code>id</code> attribute</para> - </listitem> - - <listitem> - <para>A <code>class="sortable"</code> attribute</para> - </listitem> - </itemizedlist> - </listitem> - </itemizedlist> - </section> - - <section xml:id="domXpath"> - <title>Using <acronym - xlink:href="">XPath</acronym></title> - - <para><xref linkend="domTreeTraversal"/> demonstrated the possibility - to traverse trees solely by using <acronym - xlink:href="">DOM</acronym> Method calls. Though - this approach is possible it will in general not lead to stable - applications. Real world examples are often based on large XML - documents with complex hierarchical structures. Thus using this rather - primitive approach will foster deeply nested method calls being - necessary to access desired node sets. In addition changing the - conceptional schema will require rewriting large code - portions..</para> - - <para>As we already know from <abbrev - xlink:href="">XSL</abbrev> transformations - <code>Xpath</code> allows to address node sets inside a XML tree. The - role of <acronym - xlink:href="">XPath</acronym> can be - compared to SQL queries when working with relational databases. - <acronym xlink:href="">XPath</acronym> may - also be used within <link - linkend="gloss_Java"><trademark>Java</trademark></link> code. As a - first example we show an image filename extracting application - operating on XHTML documents. The following example contains three - <tag class="starttag">img</tag> elements:</para> - - <figure xml:id="htmlGallery"> - <title>A HTML document containing <code>IMG</code> tags.</title> - - <programlisting language="none"><?xml version="1.0"?> -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" - ""> -<html> - <head> - <title>Picture gallery</title> - </head> - <body> - <h1>Picture gallery</h1> - <p>Images may appear inline:<emphasis role="bold"><img src="inline.gif" alt="none"/></emphasis></p> - <table> - <tbody> - <tr> - <td>Number one:</td> - <td><emphasis role="bold"><img src="one.gif" alt="none"/></emphasis></td> - </tr> - <tr> - <td>Number two:</td> - <td><emphasis role="bold"><img src="" alt="none"/></emphasis></td> - </tr> - </tbody> - </table> - </body> -</html> -</programlisting> - </figure> - - <para>A given HTML document may contain <tag - class="starttag">img</tag> elements at <emphasis>arbitrary</emphasis> - positions. It is sometimes desirable to check for existence and - accessibility of such external objects being necessary for the page's - correct rendering. A simple XSL script will do first part the job - namely extracting the <tag class="starttag">img</tag> elements:</para> - - <figure xml:id="gallery2imagelist"> - <title>A <abbrev - xlink:href="">XSL</abbrev> script for - image name extraction.</title> - - <programlisting language="none"><xsl:stylesheet version="1.0" xmlns:xsl="" - xmlns:html=""> - <xsl:output method="text"/> - - <xsl:template match="/"> - <xsl:for-each select="//html:img"> - <xsl:value-of select="@src"/> - <xsl:text> </xsl:text> - </xsl:for-each> - </xsl:template> - -</xsl:stylesheet></programlisting> - </figure> - - <para>Note the necessity for <code>html</code> namespace inclusion - into the <acronym - xlink:href="">XPath</acronym> expression in - <code><xsl:for-each select="//html:img"></code>. A simple - <code>select="//img"></code> results in an empty node set. - Executing the <abbrev - xlink:href="">XSL</abbrev> script yields a - list of image filenames being contained in the HTML page i.e. - <code>inline.gif one.gif two.gif</code>.</para> - - <para>Now we want to write a <link - linkend="gloss_Java"><trademark>Java</trademark></link> application - which allows to check whether these referenced image files do exist - and have sufficient permissions to be accessed. A simple approach may - pipe the <abbrev xlink:href="">XSL</abbrev> - output to our application which then executes the readability checks. - Instead we want to incorporate the <acronym - xlink:href="">XPath</acronym> based search - into the application. Ignoring Namespaces and trying to resemble the - <abbrev xlink:href="">XSL</abbrev> actions - as closely as possible our application will have to search for <link - xlink:href="">Element</link> - Nodes by the <acronym - xlink:href="">XPath</acronym> expression - <code>//html:img</code>:</para> - - <figure xml:id="domFindImages"> - <title>Extracting <tag class="emptytag">img</tag> element image - references from a HTML document.</title> - - <programlisting language="none">package dom.xpath; -... -public class DomXpath { - private final SAXBuilder builder = new SAXBuilder(); - - public DomXpath() { - builder.setErrorHandler(new MySaxErrorHandler(System.err)); - } - public void process(final String xhtmlFilename) throws JDOMException, IOException { - - final Document htmlInput =;<co - linkends="programlisting_java_searchimg_parse_co" - xml:id="programlisting_java_searchimg_parse"/> - final XPathExpression<Object> xpath = XPathFactory.instance().compile( "//img" ); <co - linkends="programlisting_java_searchimg_pf_co" - xml:id="programlisting_java_searchimg_pf"/> <co - linkends="programlisting_java_searchimg_newxpath_co" - xml:id="programlisting_java_searchimg_newxpath"/> - final List<Object> images = xpath.evaluate(htmlInput);<co - linkends="programlisting_java_searchimg_execquery_co" - xml:id="programlisting_java_searchimg_execquery"/> - - for (Object o: images) { <co - linkends="programlisting_java_searchimg_loop_co" - xml:id="programlisting_java_searchimg_loop"/> - final Element image = (Element ) o;<co - linkends="programlisting_java_searchimg_cast_co" - xml:id="programlisting_java_searchimg_cast"/> - System.out.print(image.getAttribute("src") + " "); - } - } -}</programlisting> - - <caption> - <para>This application searches for <tag - class="emptytag">img</tag> elements and shows their - <code>src</code> attribute value.</para> - </caption> - </figure> - - <calloutlist> - <callout arearefs="programlisting_java_searchimg_parse" - xml:id="programlisting_java_searchimg_parse_co"> - <para>Parse a XHTML document instance into a DOM tree.</para> - </callout> - - <callout arearefs="programlisting_java_searchimg_pf" - xml:id="programlisting_java_searchimg_pf_co"> - <para>Create a <acronym - xlink:href="">XPath</acronym> - factory.</para> - </callout> - - <callout arearefs="programlisting_java_searchimg_newxpath" - xml:id="programlisting_java_searchimg_newxpath_co"> - <para>Create a <acronym - xlink:href="">XPath</acronym> query - instance. This may be used to search for a set of nodes starting - from a context node.</para> - </callout> - - <callout arearefs="programlisting_java_searchimg_execquery" - xml:id="programlisting_java_searchimg_execquery_co"> - <para>Using the document's root node as the context node we search - for <tag class="starttag">img</tag> elements appearing at - arbitrary positions in our document.</para> - </callout> - - <callout arearefs="programlisting_java_searchimg_loop" - xml:id="programlisting_java_searchimg_loop_co"> - <para>We iterate over the retrieved list of images.</para> - </callout> - - <callout arearefs="programlisting_java_searchimg_cast" - xml:id="programlisting_java_searchimg_cast_co"> - <para>Casting to the correct type.</para> - </callout> - </calloutlist> - - <para>The result is a list of image filename references:</para> - - <programlisting language="none">inline.gif one.gif </programlisting> - - <qandaset defaultlabel="qanda" xml:id="quandaentry_CastAlwaysLegal"> - <title>Legal casting?</title> - - <qandadiv> - <qandaentry> - <question> - <para>Why is the cast in <coref - linkend="programlisting_java_searchimg_cast"/> in <xref - linkend="domFindImages"/> guaranteed to never cause a - <classname>java.lang.ClassCastException</classname>?</para> - </question> - - <answer> - <para>The <acronym - xlink:href="">XPath</acronym> - <code>//img</code> expression is guaranteed to return only - <tag class="starttag">img</tag> elements. Thus within our - <link linkend="gloss_Java"><trademark>Java</trademark></link> - context we are sure to find only - <classname>org.jdom2.Element</classname> instances.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <qandaset defaultlabel="qanda" xml:id="exercise_htmlImageVerify"> - <title>Verification of referenced images readability</title> - - <qandadiv> - <qandaentry> - <question> - <para>We want to extend the example given in <xref - linkend="domFindImages"/> by testing the existence and - checking for readability of referenced images. The following - HTML document contains <quote>dead</quote> image - references:</para> - - <programlisting language="none" - xml:id="domCheckImageAccessibility"><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" - ""> -<html xmlns=""> ... - <body> - <h1>External Pictures</h1> - <p>A local image reference:<img src="inline.gif" alt="none"/></p> - <table> - <tbody> - <tr> - <td>An existing picture:</td> - <td><img - src="" - alt="none"/></td> - </tr> - <tr> - <td>A non-existing picture:</td> - <td><img src="<emphasis role="bold"></emphasis>" alt="none"/></td> - </tr> - </tbody> - </table> - </body> -</html></programlisting> - - <para>Write an application which checks for readability of - <abbrev - xlink:href="">URL</abbrev> - image references to <emphasis>external</emphasis> Servers - starting either with <code>http://</code> or - <code>ftp://</code> ignoring other protocol types. Internal - image references referring to the <quote>current</quote> - server typically look like <code><img - src="/images/test.gif"</code>. So in order to distinguish - these two types of references we may use the XSL built in - function <link - xlink:href="">starts-with()</link> - testing for the <code>http</code> or <code>ftp</code> protocol - definition part of an <abbrev - xlink:href="">URL</abbrev>. - A possible output for the example being given is:</para> - - <programlisting language="none">Received 'sun.awt.image.URLImageSource' from - -Unable to open ''</programlisting> - - <para>The following code snippet shows a helpful class method - to check for both correctness of <abbrev - xlink:href="">URL</abbrev>'s - and accessibility of referenced objects:</para> - - <programlisting language="none">package dom.xpath; -... -public class CheckUrl { - public static void checkReadability(final String urlRef) { - try { - final URL url = new URL(urlRef); - try { - final Object imgCandidate = url.getContent(); - if (null == imgCandidate) { - System.err.println("Unable to open '" + urlRef + "'"); - } else { - System.out.println("Received '" - + imgCandidate.getClass().getName() + "' from " - + urlRef); - } - } catch (IOException e) { - System.err.println("Unable to open '" + urlRef + "'"); - } - } catch (MalformedURLException e) { - System.err.println("Adress '" + urlRef + "' is malformed"); - } - } -}</programlisting> - </question> - - <answer> - <para>We are interested in the set of images within a given - HTML document containing an <link - xlink:href="">URL</link> reference - starting either with <code>http://</code> or - <code>ftp://</code>. This is achieved by the following - <acronym - xlink:href="">XPath</acronym> - expression:</para> - - <programlisting language="none">//html:img[starts-with(@src, 'http://') or starts-with(@src, 'ftp://')]</programlisting> - - <para>The application only needs to pass the corresponding - <abbrev - xlink:href="">URL</abbrev>'s - to the method <link - xlink:href="domCheckUrlObjectExistence">CheckUrl.checkReadability()</link>. - The rest of the code is identical to the <link - linkend="domFindImages">introductory example</link>:</para> - - <informalfigure xml:id="solutionFintExtImgRef"> - <programlisting language="none">package dom.xpath; -... -public class CheckExtImage { - private final SAXBuilder builder = new SAXBuilder(); - - public CheckExtImage() { - builder.setErrorHandler(new MySaxErrorHandler(System.err)); - } - public void process(final String xhtmlFilename) throws JDOMException, IOException { - - final Document htmlInput =; - final XPathExpression<Object> xpath = XPathFactory.instance().compile( - "<emphasis role="bold">//img[starts-with(@src, 'http://') or starts-with(@src, 'ftp://')]</emphasis>"); - final List<Object> images = xpath.evaluate(htmlInput); - - for (Object o: images) { - final Element image = (Element ) o; - <emphasis role="bold">CheckUrl.checkReadability(image.getAttributeValue("src"));</emphasis> - } - } -}</programlisting> - </informalfigure> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="domXsl"> - <title><acronym xlink:href="">DOM</acronym> and - <abbrev xlink:href="">XSL</abbrev></title> - - <para><link linkend="gloss_Java"><trademark>Java</trademark></link> - based <link linkend="gloss_XML"><abbrev>XML</abbrev></link> - applications may use XSL style sheets for processing. A <acronym - xlink:href="">DOM</acronym> tree may for example - be transformed into another tree. The package <link - xlink:href="">javax.xml.transform</link> - provides interfaces and classes for this purpose. We consider the - following product catalog example:</para> - - <figure xml:id="climbingCatalog"> - <title>A simplified <link - linkend="gloss_XML"><abbrev>XML</abbrev></link> product - catalog</title> - - <programlisting language="none"><catalog xmlns:xsi="" - xsi:noNamespaceSchemaLocation="catalog.xsd"> - <title>Outdoor products</title> - <introduction> - <para>We offer a great variety of basic stuff for mountaineering - such as ropes, harnesses and tents.</para> - <para>Our shop is proud for its large number of available - sleeping bags.</para> - </introduction> - <product id="x-223"> - <title>Multi freezing bag Nightmare camper</title> - <description> - <para>You will feel comfortable till minus 20 degrees - At - least if you are a penguin or a polar bear.</para> - </description> - </product> - <product id="r-334"> - <title>Rope 40m</title> - <description> - <para>Excellent for indoor climbing.</para> - </description> - </product> -</catalog></programlisting> - - <para>A corresponding schema file <filename>catalog.xsd</filename> - is straightforward:</para> - - <programlisting language="none"><xs:schema xmlns:xs="" - xmlns:vc="" elementFormDefault="qualified" - vc:minVersion="1.0" vc:maxVersion="1.1"> - - <xs:simpleType name="money"> - <xs:restriction base="xs:decimal"> - <xs:fractionDigits value="2"/> - </xs:restriction> - </xs:simpleType> - - <xs:element name="title" type="xs:string"/> - <xs:element name="para" type="xs:string"/> - - <xs:element name="description" type="paraSequence"/> - <xs:element name="introduction" type="paraSequence"/> - - <xs:complexType name="paraSequence"> - <xs:sequence> - <xs:element ref="para" minOccurs="1" maxOccurs="unbounded"/> - </xs:sequence> - </xs:complexType> - - <xs:element name="product"> - <xs:complexType> - <xs:sequence> - <xs:element ref="title"/> - <xs:element ref="description"/> - </xs:sequence> - <xs:attribute name="id" type="xs:ID" use="required"/> - <xs:attribute name="price" type="money" use="optional"/> - </xs:complexType> - </xs:element> - - <xs:element name="catalog"> - <xs:complexType> - <xs:sequence> - <xs:element ref="title"/> - <xs:element ref="introduction"/> - <xs:element ref="product" minOccurs="1" maxOccurs="unbounded"/> - </xs:sequence> - </xs:complexType> - </xs:element> - -</xs:schema> -</programlisting> - </figure> - - <para>A <abbrev xlink:href="">XSL</abbrev> - style sheet may be used to transform this document into the HTML - Format:</para> - - <figure xml:id="catalog2html"> - <title>A <abbrev - xlink:href="">XSL</abbrev> style sheet - for catalog transformation to HTML.</title> - - <programlisting language="none"><?xml version="1.0" encoding="utf-8"?> -<xsl:stylesheet xmlns:xsl="" - version="2.0" xmlns=""> - - <xsl:template match="/catalog"> - <html> - <head><title><xsl:value-of select="title"/></title></head> - <body style="background-color:#FFFFFF"> - <h1><xsl:value-of select="title"/></h1> - <xsl:apply-templates select="product"/> - </body> - </html> - </xsl:template> - - <xsl:template match="product"> - <h3><xsl:value-of select="title"/></h3> - <xsl:for-each select="description/para"> - <p><xsl:value-of select="."/></p> - </xsl:for-each> - <xsl:if test="price"> - <p> - <xsl:text>Price:</xsl:text> - <xsl:value-of select="price/@value"/> - </p> - </xsl:if> - </xsl:template> -</xsl:stylesheet></programlisting> - </figure> - - <para>As a preparation for <xref linkend="exercise_catalogRdbms"/> we - now demonstrate the usage of <abbrev - xlink:href="">XSL</abbrev> within a <link - linkend="gloss_Java"><trademark>Java</trademark></link> application. - This is done by a <link - xlink:href="">Transformer</link> - instance:</para> - - <figure xml:id="xml2xml"> - <title>Transforming an XML document instance to HTML by a XSL style - sheet.</title> - - <programlisting language="none">package dom.xsl; -... -public class Xml2Html { - private final SAXBuilder builder = new SAXBuilder(); - - final XSLTransformer transformer; - - public Xml2Html(final String xslFilename) throws XSLTransformException { - builder.setErrorHandler(new MySaxErrorHandler(System.err)); - transformer = new XSLTransformer(xslFilename); - } - public void transform(final String xmlInFilename, - final String resultFilename) throws JDOMException, IOException { - - final Document inDoc =; - Document result = transformer.transform(inDoc); - - // Set formatting for the XML output - final Format outFormat = Format.getPrettyFormat(); - - // Serialize to console - final XMLOutputter printer = new XMLOutputter(outFormat); - printer.output(result.getDocument(), System.out); - - } -}</programlisting> - </figure> - - <para>A corresponding driver file is needed to invoke a - transformation:</para> - - <figure xml:id="xml2xmlDriver"> - <title>A driver class for the xml2xml transformer.</title> - - <programlisting language="none">package dom.xsl; -... -public class Xml2HtmlDriver { -... - public static void main(String[] args) { - final String - inFilename = "Input/Dom/climbing.xml", - xslFilename = "Input/Dom/catalog2html.xsl", - htmlOutputFilename = "Input/Dom/climbing.html"; - try { - final Xml2Html converter = new Xml2Html(xslFilename); - converter.transform(inFilename, htmlOutputFilename); - } catch (Exception e) { - System.err.println("The conversion of '" + inFilename - + "' by stylesheet '" + xslFilename - + "' to output HTML file '" + htmlOutputFilename - + "' failed with the following error:" + e); - e.printStackTrace(); - } - } -}</programlisting> - </figure> - - <qandaset defaultlabel="qanda" xml:id="exercise_catalogRdbms"> - <title>HTML from XML and relational data</title> - - <qandadiv> - <qandaentry> - <question> - <label>Catalogs and RDBMS</label> - - <para>We want to extend the transformation being described - before in <xref linkend="xml2xml"/> by reading price - information from a RDBMS. Consider the following schema and - <code>INSERT</code>s:</para> - - <programlisting language="none">CREATE TABLE Product( - orderNo CHAR(10) - ,price NUMERIC(10,2) -); - -INSERT INTO Product VALUES('x-223', 330.20); -INSERT INTO Product VALUES('w-124', 110.40);</programlisting> - - <para>Adding prices may be implemented the following - way:</para> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/xml2html.fig"/> - </imageobject> - </mediaobject> - - <para>You may implement this by following these steps:</para> - - <orderedlist> - <listitem> - <para>You may reuse class - <classname>sax.rdbms.RdbmsAccess</classname> from <xref - linkend="saxRdbms"/>.</para> - </listitem> - - <listitem> - <para>Use the previous class to modify <xref - linkend="xml2xml"/> by introducing a new method - <code>addPrices(final Document catalog)</code> which adds - prices to the <acronym - xlink:href="">DOM</acronym> tree - accordingly. The insertion points may be reached by an - <acronym - xlink:href="">XPath</acronym> - expression.</para> - </listitem> - </orderedlist> - </question> - - <answer> - <para>The additional functionality on top of <xref - linkend="xml2xml"/> is represented by a method - <methodname>dom.xsl.XmlRdbms2Html.addPrices()</methodname>. - This method modifies the <acronym - xlink:href="">DOM</acronym> input tree - prior to applying the XSL. Prices are being inserting based on - data received from an RDBMS via <trademark - xlink:href="">JDBC</trademark>:</para> - - <programlisting language="none">package dom.xsl; -... -public class XmlRdbms2Html { - private final SAXBuilder builder = new SAXBuilder(); - - DbAccess db = new DbAccess(); - - final XSLTransformer transformer; - Document catalog; - - final org.jdom2.xpath.XPathExpression<Object> selectProducts = - XPathFactory.instance().compile("/catalog/product"); - - /** - * @param xslFilename the stylesheet being used for subsequent - * transformations by {@link #transform(String, String)}. - * - * @throws XSLTransformException - */ - public XmlRdbms2Html(final String xslFilename) throws XSLTransformException { - builder.setErrorHandler(new MySaxErrorHandler(System.err)); - transformer = new XSLTransformer(xslFilename); - } - - /** - * The actual workhorse carrying out the transformation - * and adding prices from the database table. - * - * @param xmlInFilename input file to be transformed - * @param resultFilename the result file holding the generated HTML document - * @throws JDOMException The transformation may fail for various reasons. - * @throws IOException - */ - public void transform(final String xmlInFilename, - final String resultFilename) throws JDOMException, IOException { - - catalog =; - - addPrices(); - - final Document htmlResult = transformer.transform(catalog); - - // Set formatting for the XML output - final Format outFormat = Format.getPrettyFormat(); - - // Serialize to console - final XMLOutputter printer = new XMLOutputter(outFormat); - printer.output(htmlResult, System.out); - - } - private void addPrices() { - final List<Object> products = selectProducts.evaluate(catalog.getRootElement()); - - db.connect("jdbc:mysql://localhost:3306/hdm", "hdmuser", "XYZ"); - for (Object p: products) { - final Element product = (Element ) p; - final String productId = product.getAttributeValue("id"); - product.setAttribute("price", db.readPrice(productId)); - } - db.close(); - } -}</programlisting> - - <para>The method <code>addPrices(...)</code> utilizes our - RDBMS access class:</para> - - <programlisting language="none">package dom.xsl; -... -public class DbAccess { - public void connect(final String jdbcUrl, - final String userName, final String password) { - try { - conn = DriverManager.getConnection(jdbcUrl, userName, password); - priceQuery = conn.prepareStatement(sqlPriceQuery); - } catch (SQLException e) { - System.err.println("Unable to open connection to database:" + e);} - } - public String readPrice(final String articleNumber) { - String result; - try { - priceQuery.setString(1, articleNumber); - final ResultSet rs = priceQuery.executeQuery(); - if ( { - result = rs.getString("price"); - } else { - result = "No price available for article '" + articleNumber + "'"; - } - } catch (SQLException e) { - result = "Error reading price for article '" + articleNumber + "':" + e; - } - return result; - } - ... -}</programlisting> - - <para>Of course the connection details should be moved to a - configuration file.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - </section> - </chapter> - - <chapter xml:id="introPersistence"> - <title>Accessing Relational Data</title> - - <section xml:id="persistence"> - <title>Persistence in Object Oriented languages</title> - - <para>Following <xref linkend="bib_Bauer05"/> we may define persistence - by:</para> - - <blockquote> - <para>persistence allows an object to outlive the process that created - it. The state of the object may be stored to disk and an object with - the same state re-created at some point in the future.</para> - </blockquote> - - <para>The notion of <quote>process</quote> refers to operating systems. - Let us start wit a simple example assuming a <link - linkend="gloss_Java"><trademark>Java</trademark></link> class - User:</para> - - <programlisting language="none">public class User { - String cname; //The user's common name e.g. 'Joe Bix' - String uid; //The user's unique system ID (login name) e.g. 'bix' - -// getters, setters and other stuff - ... -}</programlisting> - - <para>A relational implementation might look like:</para> - - <programlisting language="none">CREATE TABLE User( - CHAR(80) cname - ,CHAR(10) uid PRIMARY KEY -)</programlisting> - - <para>Now a <link - linkend="gloss_Java"><trademark>Java</trademark></link> application may - create instances of class <code>User</code> and save these to a - database:</para> - - <figure xml:id="processObjPersist"> - <title>Persistence across process boundaries</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/persistence.fig" scale="65"/> - </imageobject> - </mediaobject> - </figure> - - <para>Both the <trademark - xlink:href="">JRE</trademark> - instances and the RDBMS database server are processes (or sets of - processes) typically existing in different address spaces. The two - <trademark - xlink:href="">JRE</trademark> - processes mentioned here may as well be started in disjoint address - spaces. In fact we might even run two entirely different applications - implemented in different programming languages like <abbrev - xlink:href="">PHP</abbrev>.</para> - - <para>It is important to mention that the two arrows -  <quote>save</quote> and <quote>load</quote> thus typically denote a - communication across machine boundaries.</para> - </section> - - <section xml:id="jdbcIntro"> - <title>Introduction to <trademark - xlink:href="">JDBC</trademark></title> - - <section xml:id="jdbcWrite"> - <title>Write access, principles</title> - - <para>Connecting an application to a database means to establish a - connection from a client to a database server:</para> - - <figure xml:id="jdbcClientServer"> - <title>Networking between clients and database servers</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/clientserv.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>So <trademark - xlink:href="">JDBC</trademark> - is just one among a whole bunch of protocol implementations connecting - database servers and applications. Consequently <trademark - xlink:href="">JDBC</trademark> - is expected to appear in the lower layer of multi-tier applications. - We take a three-tier application as a starting point:</para> - - <figure xml:id="jdbcThreeTier"> - <title>The role of <trademark - xlink:href="">JDBC</trademark> - in a three-tier application</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/jdbcThreeTier.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>We may add an additional layer. Web applications are typically - being build on top of an application server (<productname - xlink:href="">WebSphere</productname>, - <productname - xlink:href="">Glassfish</productname>, - <productname - xlink:href="">Jboss</productname>,...) - providing additional services:</para> - - <figure xml:id="jdbcFourTier"> - <title><trademark - xlink:href="">JDBC</trademark> - connecting application server and database.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/jdbcFourTier.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>So what is actually required to connect to a database server? A - client requires the following parameter values to open a - connection:</para> - - <orderedlist> - <listitem xml:id="ItemJdbcProtocol"> - <para>The type of database server i.e. <productname - xlink:href="">Oracle</productname>, - <productname - xlink:href="">DB2</productname>, - <productname - xlink:href="">Informix</productname>, - <productname xlink:href="">Mysql</productname> - etc. This information is needed because of vendor dependent - <trademark - xlink:href="">JDBC</trademark> - protocol implementations.</para> - </listitem> - - <listitem> - <para>The server's <link - xlink:href="">DNS</link> - name or IP number</para> - </listitem> - - <listitem> - <para>The database service's port number at the previously defined - host. The database server process listens for connections to this - port number.</para> - </listitem> - - <listitem xml:id="itemJdbcDatabaseName"> - <para>The database name within the given database server</para> - </listitem> - - <listitem> - <para>Optional: A database user's account name and - password.</para> - </listitem> - </orderedlist> - - <para>Items <xref linkend="ItemJdbcProtocol"/> - <xref - linkend="itemJdbcDatabaseName"/> will be encapsulated into a so called - <trademark - xlink:href="">JDBC</trademark> - <link - xlink:href="">URL</link>. - We consider a typical example corresponding to the previous parameter - list:</para> - - <figure xml:id="jdbcUrlComponents"> - <title>Components of a <trademark - xlink:href="">JDBC</trademark> - URL</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/jdbcurl.fig" scale="65"/> - </imageobject> - </mediaobject> - </figure> - - <para>In fact this <trademark - xlink:href="">JDBC</trademark> - URL example closely resembles other types of URL strings as being - defined in <uri - xlink:href=""></uri>. - Look for <code>opaque_part</code> to understand the second - <quote>:</quote> in the protocol definition part of a <trademark - xlink:href="">JDBC</trademark> - URL. Common example for <abbrev - xlink:href="">URL</abbrev>s - are:</para> - - <itemizedlist> - <listitem> - <para><code></code></para> - </listitem> - - <listitem> - <para><code></code></para> - </listitem> - - <listitem> - <para><code></code></para> - </listitem> - </itemizedlist> - - <para>We notice the explicit mentioning of a port number 8080 in the - second example; The default <abbrev - xlink:href="">http</abbrev> protocol port - number is 80. So if a web server accepts connections at port 80 we do - not have to specify this value. A web browser will automatically use - this default port.</para> - - <para>Actually the notion <quote><code>jdbc:mysql</code></quote> - denotes a sub protocol implementation namely<orgname> - Mysql</orgname>'s implementation of <trademark - xlink:href="">JDBC</trademark>. - Connecting to an IBM DB2 server would require jdbc:db2 for this - protocol part.</para> - - <para>In contrast to <abbrev - xlink:href="">http</abbrev> no standard - ports are <quote>officially</quote> assigned for <trademark - xlink:href="">JDBC</trademark> - protocol variants. Due to vendor specific implementations this does - not make any sense. Thus we <emphasis role="bold">always</emphasis> - have to specify the port number when opening <trademark - xlink:href="">JDBC</trademark> - connections.</para> - - <para>Writing <trademark - xlink:href="">JDBC</trademark> - based applications follows a simple scheme:</para> - - <figure xml:id="jdbcArchitecture"> - <title>Architecture of JDBC</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/jdbcarch.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>From a programmer's point of view the - <classname>java.sql.DriverManager</classname> is a bootstrapping - object: Other objects like Statement instances are created from this - central and unique object.</para> - - <para>The first instance being created by the - <classname>java.sql.DriverManager</classname> is an object of type - <classname>java.sql.Connection</classname>. In <xref - linkend="exerciseJdbcWhyInterface"/> we discuss the way vendor - specific implementation details are hidden by Interfaces. We can - distinguish between:</para> - - <orderedlist> - <listitem> - <para>Vendor neutral specific parts of a <trademark - xlink:href="">JDBC</trademark> - environment. These are those components being shipped by Oracle or - other organizations providing <link - linkend="gloss_Java"><trademark>Java</trademark></link> runtimes. - The class <classname>java.sql.DriverManager</classname> belongs to - this domain.</para> - </listitem> - - <listitem> - <para>Vendor specific parts. In <xref linkend="jdbcArchitecture"/> - this starts with the <classname>java.sql.Connection</classname> - object.</para> - </listitem> - </orderedlist> - - <para>The <classname>java.sql.Connection</classname> object thus marks - the boundary between a <trademark - xlink:href="">JDK</trademark> - / <trademark - xlink:href="">JRE</trademark> - and a <trademark - xlink:href="">JDBC</trademark> - Driver implementation from e.g. Oracle or other institutions.</para> - - <para><xref linkend="jdbcArchitecture"/> does not show details about - the relations between <classname>java.sql.Connection</classname>, - <classname>java.sql.Statement</classname> and - <classname>java.sql.ResultSet</classname> objects. We start by giving - a rough description of the tasks and responsibilities these three - types have:</para> - - <glosslist> - <glossentry> - <glossterm><classname>java.sql.Connection</classname></glossterm> - - <glossdef> - <para>Holding a permanent connection to a database server. Both - client and server can contact each other. The database server - may for example terminate a transaction if problems like - deadlocks occur.</para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm><classname>java.sql.Statement</classname></glossterm> - - <glossdef> - <para>We have two distinct classes of actions:</para> - - <orderedlist> - <listitem> - <para>Instructions to modify data on the database server. - These include <code>INSERT</code>, <code>UPDATE</code> and - <code>DELETE</code> operations as far as - <abbrev>SQL-DML</abbrev> is concerned. <trademark - xlink:href="">JDBC</trademark> - acts as a means of transport and merely returns integer - values back to the client like the number of rows being - affected by an UPDATE.</para> - </listitem> - - <listitem> - <para>Instructions reading data from the server. This is - done by sending SELECT statements. It is not sufficient to - just return integer values: Instead <trademark - xlink:href="">JDBC</trademark> - needs to copy complete datasets back to the client to fill - containers being accessible by applications. This is being - discussed in <xref linkend="jdbcRead"/>.</para> - </listitem> - </orderedlist> - </glossdef> - </glossentry> - </glosslist> - - <para>We shed some light on the relationship between these important - <trademark - xlink:href="">JDBC</trademark> - components and their respective creation:<figure - xml:id="jdbcObjectCreation"> - <title>Important <trademark - xlink:href="">JDBC</trademark> - instances and relationships.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/jdbcObjectRelation.fig"/> - </imageobject> - </mediaobject> - </figure></para> - </section> - - <section xml:id="writeAccessCoding"> - <title>Write access, coding!</title> - - <para>So how does it actually work with respect to coding? You may - want to read <xref linkend="toolingConfigJdbc"/> before starting your - exercises. We first prepare a database table using Eclipse's database - tools:</para> - - <figure xml:id="figSchemaPerson"> - <title>A relation <code>Person</code> containing names and email - addresses</title> - - <programlisting language="none"><emphasis role="strong">CREATE</emphasis> <emphasis - role="strong">TABLE</emphasis> Person ( - name CHAR(20) - ,email CHAR(20) <emphasis>UNIQUE</emphasis>)</programlisting> - </figure> - - <para>Our actual (toy) <trademark - xlink:href="">JDBC</trademark> - application will insert a single object ('Jim', '') into - the <code>Person</code> relation. This is simpler than reading data - since no client <classname>java.sql.ResultSet</classname> container is - needed:</para> - - <figure xml:id="figJdbcSimpleWrite"> - <title>A simple <trademark - xlink:href="">JDBC</trademark> - application inserting data into a relational table.</title> - - <programlisting language="none">01 package sda.jdbc.intro.v1; -02 -03 import java.sql.Connection; -04 import java.sql.DriverManager; -05 import java.sql.SQLException; -06 import java.sql.Statement; -07 -08 public class SimpleInsert { -09 -10 public static void main(String[] args) throws SQLException { -11 // Step 1: Open a connection to the database server -12 final Connection conn = DriverManager.getConnection( -13 "jdbc:mysql://localhost:3306/hdm", "hdmuser", "XYZ"); -14 // Step 2: Create a Statement instance -15 final Statement stmt = conn.createStatement(); -16 // Step 3: Execute the desired INSERT -17 final int updateCount = stmt.executeUpdate( -18 "INSERT INTO Person VALUES('Jim', '')"); -19 // Step 4: Give feedback to the enduser -20 System.out.println("Successfully inserted " + updateCount + " dataset(s)"); -21 } -22 }</programlisting> - </figure> - - <para>Looks simple? Unfortunately it does not (yet) work:</para> - - <programlisting language="none">Exception in thread "main" java.sql.SQLException: <emphasis - role="bold"> - No suitable driver found for jdbc:mysql://localhost:3306/hdm</emphasis> - at java.sql.DriverManager.getConnection( - at java.sql.DriverManager.getConnection( - at sda.jdbc.intro.SimpleInsert.main(</programlisting> - - <para>What's wrong here? In <xref linkend="figureConfigJdbcDriver"/> - we needed a <productname - xlink:href="">Mysql</productname> <trademark - xlink:href="">JDBC</trademark> - Driver implementation <filename>mysql-connector-java.jar</filename> as - a prerequisite to open connections to a database server. This - implementation is mandatory for our toy application as well. All we - have to do is adding <filename>mysql-connector-java.jar</filename> to - our <link linkend="gloss_Java"><trademark>Java</trademark></link> - <varname>CLASSPATH</varname> at <emphasis - role="bold">runtime</emphasis>.</para> - - <para>Depending on our <link - linkend="gloss_Java"><trademark>Java</trademark></link> environment - this will be achieved by different means. Eclipse requires the - definition of a run configuration as being described in <uri - xlink:href=""></uri>. - When configuring a run-time configuration for - <classname>sda.jdbc.intro.SimpleInsert</classname> we have to add - <filename>mysql-connector-java.jar</filename> to the - <varname>Classpath</varname> tab. The following screen shot shows a - working configuration:</para> - - <figure xml:id="figureConfigRunExtJar"> - <title>Creating an Eclipse run time configuration containing a - <productname xlink:href="">Mysql</productname> - <trademark - xlink:href="">JDBC</trademark> - Driver Jar marked red.</title> - - <screenshot> - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/runConfigJarAnnot.screen.png" - scale="70"/> - </imageobject> - </mediaobject> - </screenshot> - </figure> - - <para>This time execution works as expected:</para> - - <programlisting language="none">Successfully inserted 1 dataset(s)</programlisting> - - <qandaset defaultlabel="qanda" xml:id="quandaentry_DupInsert"> - <title>Exception on inserting objects</title> - - <qandadiv> - <qandaentry> - <question> - <para>A second invocation of - <classname>sda.jdbc.intro.v1.SimpleInsert</classname> yields - the following runtime error:</para> - - <programlisting language="none">Exception in thread "main" - com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: - <emphasis role="bold">Duplicate entry '' for key 'email'</emphasis> -... - at com.mysql.jdbc.StatementImpl.executeUpdate( - at sda.jdbc.intro.SimpleInsert.main(</programlisting> - </question> - - <answer> - <para>This expected error is easy to understand: The - exception's message text <emphasis role="bold">Duplicate entry - 'Jim' for key 'PRIMARY'</emphasis> informs us about a UNIQUE - key constraint violation with respect to the attribute - <code>email</code> in our schema definition in <xref - linkend="figSchemaPerson"/>. We cannot add a second entry with - the same value <code>''</code>.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <para>It is worth to mention that the <productname - xlink:href="">Mysql</productname> driver - implementation does not have to be available at compile time. - <trademark - xlink:href="">JDBC</trademark> - defines interfaces in favour of (concrete) classes. The latter are - only required at runtime.</para> - - <para>When working with eclipse we need a separate runtime - configuration for each runnable <link - linkend="gloss_Java"><trademark>Java</trademark></link> application to - add the <trademark - xlink:href="">JDBC</trademark> - driver implementation to the runtime <envar>CLASSPATH</envar>. This - may become tedious. Judging the pros and cons you may simply add - <filename>mysql-connector-java.jar</filename> to your compile time - <envar>CLASSPATH as well</envar>. As a drawback all <trademark - xlink:href="">JDBC</trademark> - implementing classes will now become visible wen e.g. hitting - auto-completion.</para> - - <para>We now discuss some important methods being defined in the - <trademark - xlink:href="">JDBC</trademark> - interfaces:</para> - - <glosslist> - <glossentry> - <glossterm><classname>java.sql.Connection</classname></glossterm> - - <glossdef> - <itemizedlist> - <listitem> - <para><link - xlink:href="">createStatement()</link></para> - </listitem> - - <listitem> - <para><link - xlink:href="">setAutoCommit()</link>, - <link - xlink:href="">getAutoCommit()</link></para> - </listitem> - - <listitem> - <para><link - xlink:href="">getWarnings()</link></para> - </listitem> - - <listitem> - <para><link - xlink:href="">isClosed()</link>, - <link - xlink:href="">isValid(int - timeout)</link></para> - </listitem> - - <listitem> - <para><link - xlink:href="">rollback()</link>, - <link - xlink:href="">commit()</link> - and .</para> - </listitem> - - <listitem> - <para><link - xlink:href="">close()</link></para> - </listitem> - </itemizedlist> - </glossdef> - </glossentry> - - <glossentry> - <glossterm><classname>java.sql.Statement</classname></glossterm> - - <glossdef> - <itemizedlist> - <listitem> - <para><link - xlink:href="">executeUpdate(String - sql)</link></para> - </listitem> - - <listitem> - <para><link - xlink:href="">getConnection()</link></para> - </listitem> - - <listitem> - <para><link - xlink:href="">getResultSet()</link></para> - </listitem> - - <listitem> - <para><link - xlink:href="">close()</link> - and <link - xlink:href="">isClosed()</link></para> - </listitem> - </itemizedlist> - </glossdef> - </glossentry> - </glosslist> - - <qandaset defaultlabel="qanda" xml:id="quandaentry_AutoCommit"> - <title><trademark - xlink:href="">JDBC</trademark> - and transactions</title> - - <qandadiv> - <qandaentry> - <question> - <para><link - xlink:href="">How - does the method setAutoCommit()</link> relate to <link - xlink:href="">commit()</link> - and <link - xlink:href="">rollback()</link>?</para> - </question> - - <answer> - <para>A connections default state is <code>autocommit == - true</code>. This means that individual SQL statements are - executed as separate transactions.</para> - - <para>If we want to group two or more statements into a - transaction we have to:</para> - - <orderedlist> - <listitem> - <para>Call - <code>connection.setAutoComit(false)</code></para> - </listitem> - - <listitem> - <para>From now on subsequent SQL statements will - implicitly become part of a transaction till either of the - three events happens:</para> - - <orderedlist numeration="loweralpha"> - <listitem> - <para><code>connection.commit()</code></para> - </listitem> - - <listitem> - <para><code>connection.rollback()</code></para> - </listitem> - - <listitem> - <para>The transaction gets aborted by the database - server. This may for example happen in case of a - deadlock conflict with a second transaction.</para> - </listitem> - </orderedlist> - - <para>Note that the first two events are initiated by our - client software. The third possible action is being - carried out by the database server.</para> - </listitem> - </orderedlist> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <qandaset defaultlabel="qanda" xml:id="quandaentry_Close"> - <title>Closing <trademark - xlink:href="">JDBC</trademark> - connections</title> - - <qandadiv> - <qandaentry> - <question> - <para>Why is it very important to call the close() method for - <classname>java.sql.Connection</classname> and / or - <classname>java.sql.Statement</classname> instances?</para> - </question> - - <answer> - <para>A <trademark - xlink:href="">JDBC</trademark> - connection ties network resources (socket connections). These - may be used up if e.g. new connections get established within - a loop without being closed.</para> - - <para>The situation is comparable to memory leaks when using - programming languages lacking a garbage collector.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <qandaset defaultlabel="qanda" xml:id="quandaentry_AbortTran"> - <title>Aborted transactions</title> - - <qandadiv> - <qandaentry> - <question> - <para>In the previous exercise we mentioned the possibility of - a transaction abort issued by the database server. Which - responsibility arises for an application programmer? Hint: How - may an implementation become aware of such an abort - transaction event?</para> - </question> - - <answer> - <para>If a database server aborts a transaction a - <classname>java.sql.SQLException</classname> will be thrown. - An application must be aware of this possibility and thus - implement a sensible <code>catch(...)</code> clause - accordingly.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <qandaset defaultlabel="qanda" xml:id="exerciseJdbcWhyInterface"> - <title>Interfaces and classes in <trademark - xlink:href="">JDBC</trademark></title> - - <qandadiv> - <qandaentry> - <question> - <para>The <trademark - xlink:href="">JDBC</trademark> - standard mostly defines interfaces as - <classname>java.sql.Connection</classname> and - <classname>java.sql.Statement</classname>. Why are these not - being defined as classes? Moreover why is - <classname>java.sql.DriverManager</classname> being defined as - a class rather than an interface?</para> - - <para>You may want to supply code examples to explain your - argumentation.</para> - </question> - - <answer> - <para>Figure <xref linkend="jdbcArchitecture"/> tells us about - the vendor independent architecture of <trademark - xlink:href="">JDBC</trademark>. - Oracle for example may implement a class - <code></code>:</para> - - <programlisting annotations="nojavadoc" language="java">package; - -import java.sql.Connection; -import java.sql.Statement; -import java.sql.SQLException; - -public class OracleConnection implements Connection { - -... - -Statement createStatement(int resultSetType, - int resultSetConcurrency) - throws SQLException) { - // Implementation omitted here due to - // limited personal hacking capabilities - ... -} -... -}</programlisting> - - <para>If a programmer only uses the <trademark - xlink:href="">JDBC</trademark> - interfaces rather than a vendor's classes it is much easier to - make the resulting application work with different databases - from other vendors. This way a company's implementation is not - exposed to our own <link - linkend="gloss_Java"><trademark>Java</trademark></link> - code.</para> - - <para>Regarding the special role of - <classname>java.sql.DriverManager</classname> we notice the - need of a starting point: We have to create an initial - instance of some class. In theory (<emphasis role="bold">BUT - NOT IN PRACTICE!!!</emphasis>) the following (ugly code) might - be possible:</para> - - <programlisting language="none">package my.personal.application; - -import java.sql.Connection; -import java.sql.Statement; -import java.sql.SQLException; - -public someClass { - - public void someMethod(){ - - Connection conn = <emphasis role="bold">new OracleConnection()</emphasis>; // bad idea! - ... - } - ... -}</programlisting> - - <para>The problem with this approach is the explicit - constructor call: Whenever we want to use another database we - have two possibilities:</para> - - <itemizedlist> - <listitem> - <para>Rewrite our code.</para> - </listitem> - - <listitem> - <para>Introduce some sort of switch statement to provide a - fixed number of databases beforehand:</para> - - <programlisting language="none">public void someMethod(final String vendor){ - - final Connection conn; - - switch(vendor) { - case "ORACLE": - conn = new OracleConnection(); - break; - - case "DB2": - conn = new Db2Connection(); - break; - - default: - conn = null; - break; - } - ... -}</programlisting> - - <para>Adding a new database still requires code - rewriting.</para> - </listitem> - </itemizedlist> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <qandaset defaultlabel="qanda" xml:id="quandaentry_DriverDispatch"> - <title>Driver dispatch mechanism</title> - - <qandadiv> - <qandaentry> - <question> - <para>In exercise <xref linkend="exerciseJdbcWhyInterface"/> - we saw a hypothetic way to resolve the interface/class - resolution problem by using a switch clause. How is this - <code>switch</code> clause's logic actually realized in a - <trademark - xlink:href="">JDBC</trademark> - based application? (<quote>behind the scenes</quote>)</para> - - <para>Hint: Read the documentation of - <classname>java.sql.DriverManager</classname>.</para> - </question> - - <answer> - <para>Prior to opening a Connection a <trademark - xlink:href="">JDBC</trademark> - driver registers itself at the - <classname>java.sql.DriverManager</classname> singleton - instance. For this purpose the standard defined the method - <link - xlink:href="">registerDriver(Driver)</link>. - On success the <classname>java.sql.DriverManager</classname> - adds the driver to an internal dictionary:</para> - - <informaltable border="1"> - <col width="20%"/> - - <col width="30%"/> - - <tr> - <th>protocol</th> - - <th>driver instance</th> - </tr> - - <tr> - <td>jdbc:mysql</td> - - <td>mysqlDriver instance</td> - </tr> - - <tr> - <td>jdbc:oracle</td> - - <td>oracleDriver instance</td> - </tr> - - <tr> - <td>...</td> - - <td>...</td> - </tr> - </informaltable> - - <para>So whenever the method <link - xlink:href=",%20java.lang.String,%20java.lang.String)">getConnection()</link> - is being called the - <classname>java.sql.DriverManager</classname> will scan the - <trademark - xlink:href="">JDBC</trademark> - URL and isolate the protocol part. If we start with - <code>jdbc:mysql://</code> - this is just <code>jdbc:mysql</code>. The value is then being - looked up in the above table of registered drivers to choose - an appropriate instance or null otherwise. This way our - hypothetic switch including the default value null is actually - implemented.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="propertiesFile"> - <title>Connection properties</title> - - <para>So far our application depicted in <xref - linkend="figJdbcSimpleWrite"/> suffers both from missing error - handling and hard-coded parameters.</para> - - <para>Professional applications must be configurable. Changing the - password currently requires source code modification and - recompilation. <link - linkend="gloss_Java"><trademark>Java</trademark></link> offers a - standard procedure to externalize parameters like - <varname>username</varname>, <varname>password</varname> an <trademark - xlink:href="">JDBC</trademark> - connection URL as being present in <xref - linkend="figJdbcSimpleWrite"/>: We may externalize these parameters to - external so called properties files:</para> - - <figure xml:id="propertyExternalization"> - <title>Externalize a single string <code>"User name"</code> to a - separate file <filename></filename>.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/externalize.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>The current figure shows the externalization of just a single - property. The file <filename></filename> contains - key-value pairs. The key <code>PropHello.uname</code> contains the - value <code>User name</code>. Multiple strings may be externalized to - the same properties file.</para> - - <para>Eclipse does have tool support for externalization. Simply hit - Source --> Externalize Strings from the context menu. This - activates a wizard to define property keys, renaming the generated - helper class' name and finally create the actual - <filename></filename> file.</para> - - <qandaset defaultlabel="qanda" xml:id="quandaentry_WritProps"> - <title>Moving <trademark - xlink:href="">JDBC</trademark> - <abbrev - xlink:href="">URL</abbrev> and - credentials to a property file</title> - - <qandadiv> - <qandaentry> - <question> - <para>Start executing the code given in <xref - linkend="figJdbcSimpleWrite"/>. Then extend this example by - externalizing all <trademark - xlink:href="">JDBC</trademark> - related connection parameters to a - <filename></filename> file like:</para> - - <programlisting language="none">SimpleInsert.jdbcUrl=jdbc:mysql://localhost:3306/hdm -SimpleInsert.password=XYZ -SimpleInsert.username=hdmuser</programlisting> - - <para>As being stated earlier the eclipse wizard assists you - by generating both the properties file and a helper class - reading that file at runtime.</para> - </question> - - <answer> - <para>The current exercise is mostly related to tooling. From - our <link - linkend="gloss_Java"><trademark>Java</trademark></link> code - the context menu allows us to choose the desired - wizard:</para> - - <informalfigure> - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/externalize.screen.png"/> - </imageobject> - </mediaobject> - </informalfigure> - - <para>We may now:</para> - - <itemizedlist> - <listitem> - <para>Select the strings to be externalized.</para> - </listitem> - - <listitem> - <para>Supply key names. In the subsequent screenshot this - task has already been started by manually replacing the - default <code>SimpleInsert.1</code> by - <code>Simpleinsert.jdbc</code>.</para> - </listitem> - - <listitem> - <para>Redefine other parameters like prefix, properties - file name etc. In the following screenshot only the first - of three keys has been manually renamed to the sensible - value <varname>SimpleInsert.jdbc</varname>.</para> - </listitem> - </itemizedlist> - - <informalfigure> - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/externalize2.screen.png"/> - </imageobject> - </mediaobject> - </informalfigure> - - <para>The wizard also generates a class - <classname>sda.jdbc.intro.v1.DbProps</classname> to actually - access our properties:</para> - - <programlisting language="none">package sda.jdbc.intro.v1; -... -public class DbProps { - private static final String BUNDLE_NAME = "sda.jdbc.intro.v1.database"; - - private static final ResourceBundle RESOURCE_BUNDLE = ResourceBundle - .getBundle(BUNDLE_NAME); - - private DbProps() { - } - - public static String getString(String key) { - try { - return RESOURCE_BUNDLE.getString(key); - } catch (MissingResourceException e) { - return '!' + key + '!'; - } - } -}</programlisting> - - <para>Our <trademark - xlink:href="">JDBC</trademark> - related code now contains three references to external - properties:</para> - - <programlisting language="none">package sda.jdbc.intro.v1; -... -public class SimpleInsert { - - - public static void main(String[] args) throws SQLException { - // Step 1: Open a connection to the database server - final Connection conn = DriverManager.getConnection ( - <emphasis role="bold">DbProps.getString("PersistenceHandler.jdbcUrl"), </emphasis> - <emphasis role="bold">DbProps.getString("PersistenceHandler.username")</emphasis>, - <emphasis role="bold">DbProps.getString("PersistenceHandler.password")</emphasis>); - // Step 2: Create a Statement instance - final Statement stmt = conn.createStatement(); - // Step 3: Execute the desired INSERT - final int updateCount = stmt.executeUpdate( - "INSERT INTO Person VALUES('Jim', '')"); - // Step 4: Give feedback to the enduser - System.out.println("Successfully inserted " + updateCount + " dataset(s)"); - } -}</programlisting> - - <para>The current base name - <classname>sda.jdbc.intro.v1.PersistenceHandler</classname> is - related to a later exercise.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="xmldata2rdbms"> - <title>Moving data from XML to relational systems</title> - - <qandaset defaultlabel="qanda" xml:id="qandaXmldata2relational"> - <title>Avoiding intermediate <xref linkend="glo_SQL"/> file - export</title> - - <qandadiv> - <qandaentry> - <question> - <para>In <xref linkend="quandaentry_SqlFromXml"/> you - implemented a <xref linkend="glo_SAX"/> application - transforming XML product catalog instances into a series of - SQL statements. Modify your solution by directly inserting - corresponding data by means of <xref linkend="glo_JDBC"/> into - a relational database.</para> - - <para>Error handling may be implemented by simply issuing a - corresponding message before exiting the application. In order - to assure data integrity transferring data shall be realized - in a all-or-nothing fashion by grouping all - <code>INSERT</code>s into a single transaction. You may want - to read about <link - xlink:href="">setAutoCommit(boolean - autoCommit)</link> and <link - xlink:href="">commit()</link> - for this purpose.</para> - </question> - - <answer> - <annotation role="make"> - <para role="eclipse">P/catalog2rdbms</para> - </annotation> - - <para>This solution requires a <command>mvn</command> - <option>install</option> on dependent project - <quote>saxerrorhandler</quote>:</para> - - <annotation role="make"> - <para role="eclipse">P/saxerrorhandler</para> - </annotation> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="sectSimpleInsertGui"> - <title>A first GUI sketch</title> - - <para>So far all data records being transferred to the database server - are still hard-coded in our application. In practice a user wants to - enter data of persons to be submitted to the database.</para> - - <para>We now guide you to develop a first version of a simple GUI for - this tasks. A more <link linkend="figureDataInsert2">elaborate - version</link> will be presented in a follow-up exercise. The - screenshot illustrates the intended application behaviour:</para> - - <figure xml:id="simpleInsertGui"> - <title>A simple GUI to insert data into a database server.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/simpleInsertGui.screen.png"/> - </imageobject> - - <caption> - <para>After clicking <quote>Insert</quote> a message is being - presented to the user. This message may as well indicate a - failure.</para> - </caption> - </mediaobject> - </figure> - - <para>Implementing Swing GUI applications requires knowledge as being - taught in e.g. <link - xlink:href="">113300 - Entwicklung von Web-Anwendungen</link>. If you do not (yet) feel - comfortable writing <productname - xlink:href="">Swing</productname> - applications you may want to read <uri - xlink:href=""></uri> - and <emphasis role="bold">really</emphasis> understand the examples - being presented therein.</para> - - <qandaset defaultlabel="qanda" xml:id="quandaentry_GuiDb"> - <title>GUI for inserting Person data to a database server</title> - - <qandadiv> - <qandaentry> - <question> - <para>Write a GUI application as being outlined in <xref - linkend="simpleInsertGui"/>. You may proceed as - follows:</para> - - <orderedlist> - <listitem> - <para>Write a dummy GUI without any database - functionality. Only present the two labels an input fields - and the Insert button.</para> - </listitem> - - <listitem> - <para>Add an - <classname>java.awt.event.ActionListener</classname> which - generates a SQL INSERT Statement when clicking the Insert - button. Return this string to the user as being shown in - the message window of <xref - linkend="simpleInsertGui"/>.</para> - - <para>At this point you still do not need a database - connection. The message shown to the user is just a fake, - so the GUI <emphasis role="bold">appears</emphasis> to be - working.</para> - </listitem> - - <listitem> - <para>Establish a - <classname>java.sql.Connection</classname> and create a - <classname>java.sql.Statement</classname> instance when - launching your application. Use the latter in your - <classname>java.awt.event.ActionListener</classname> to - actually insert datasets into your database.</para> - </listitem> - </orderedlist> - </question> - - <answer> - <para>The complete implementation resides in - <classname>sda.jdbc.intro.v01.InsertPerson</classname>:</para> - - <programlisting language="none">package sda.jdbc.intro.v01; - -import ... - -public class InsertPerson extends JFrame { - - ... - - public InsertPerson () throws SQLException{ - super ("Add a person's data"); - - setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); - - final JPanel databaseFieldPanel = new JPanel(); - databaseFieldPanel.setLayout(new GridLayout(0,2)); - add(databaseFieldPanel, BorderLayout.CENTER); - - databaseFieldPanel.add(new JLabel("Name:")); - final JTextField nameField = new JTextField(15); - databaseFieldPanel.add(nameField); - - databaseFieldPanel.add(new JLabel("E-mail:")); - final JTextField emailField = new JTextField(15); - databaseFieldPanel.add(emailField); - - final JButton insertButton = new JButton("Insert"); - add(insertButton, BorderLayout.SOUTH); - - final Connection conn = DriverManager.getConnection( - "jdbc:mysql://localhost:3306/hdm", "hdmuser", "XYZ"); - final Statement stmt = conn.createStatement(); - - insertButton.addActionListener(new ActionListener() { - // Linking the GUI to the database server. We assume an open - // connection and a correctly initialized Statement instance - @Override - public void actionPerformed(ActionEvent event) { - final String sql = "INSERT INTO Person VALUES('" + nameField.getText()+ "', '" - + emailField.getText() + "')"; - // We have to catch this Exception because an ActionListener's signature - // prohibits the existence of a "throws" clause. - try { - final int updateCount = stmt.executeUpdate(sql); - JOptionPane.showMessageDialog(null, "Successfully executed \n'" + sql + "'\nand inserted " - + updateCount + " dataset"); - } catch (SQLException e) { - e.printStackTrace(); - } - } - }); - pack(); - } -}</programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="jdbcExceptions"> - <title>Handling possible exceptions</title> - - <para>Our current code lacks any kind of error handling: Exceptions - will not be caught at all and invariably lead to program termination. - This is of course inadequate regarding professional software. In case - of problems we have to:</para> - - <itemizedlist> - <listitem> - <para>Gracefully recover or shut down our application. We may for - example show a pop up window <quote>Terminating due to an internal - error</quote>.</para> - </listitem> - - <listitem> - <para>Enable the customer to supply the development team with - helpful information. The user may for example be asked to submit a - log file in case of errors.</para> - </listitem> - </itemizedlist> - - <para>In addition the solution - <classname>sda.jdbc.intro.v01.InsertPerson</classname> contains an - ugly mix of GUI components and database related code. We take a first - step to decouple these two distinct concerns:</para> - - <qandaset defaultlabel="qanda" xml:id="quandaentry_DbLayer"> - <title>Handling the database layer</title> - - <qandadiv> - <qandaentry> - <question> - <para>Implement a class <code>PersistenceHandler</code> to be - later used as a component of our next step GUI application - prototype. This class should have the following - methods:</para> - - <programlisting language="none">... -/** - * Handle database communication. There are two - * distinct internal states <q>disconnected</q> and <q>connected</q>, see - * {@link #isConnected()}. These two states may be toggled by invoking - * {@link #connect()} and {@link #disconnect()} respectively. - * - * The following snippet illustrates the intended usage: - * <pre> public static void main(String[] args) { - final PersistenceHandler ph = new PersistenceHandler(); - if (ph.connect()) { - if (!ph.add("Jim", "")) { - System.err.println("Insert Error:" + ph.getErrorMessage()); - } - } else { - System.err.println("Connect error:" + ph.getErrorMessage()); - } - }</pre> - * - * @author goik - */ -public class PersistenceHandler { - ... - /** - * Instance in <q>disconnected</q> state. See {@link #isConnected()} - */ - public PersistenceHandler() {/* only present here to supply Javadoc comment */} - - /** - * Inserting a (name, email) record into the database server. In case of - * errors corresponding messages may subsequently be retrieved by calling - * {@link #getErrorMessage()}. - * - * <dt><b>Precondition:</b></dt> <dd>must be in - * <q>connected</q> state, see {@link #isConnected()}</dd> - * - * @param name - * A person's name - * @param email - * A person's email address - * - * @return true if the current data record has been successfully inserted - * into the database server. false in case of error(s). - */ - public boolean add(final String name, final String email){ - ... - } - - /** - * Retrieving error messages in case a call to {@link #add(String, String)}, - * {@link #connect()}, or {@link #disconnect()} yields an error. - * - * @return the error explanation corresponding to the latest failed - * operation, null if no error yet occurred. - */ - public String getErrorMessage() { - return ...; - } - - /** - * Open a connection to a database server. - * - * <dt><b>Precondition:</b><dd> - * <dd>must be in <q>disconnected</q> state, see {@link #isConnected()}</dd> - * - * <dt><b>Precondition:</b><dd> - * <dd>The following properties must be set: - * <pre>PersistenceHandler.jdbcUrl=jdbc:mysql://localhost:3306/hdm -PersistenceHandler.password=XYZ -PersistenceHandler.username=foo</pre> - * </dd> - * - * @return true if connecting was successful - */ - public boolean connect () { - ... - } - - /** - * Close a connection to a database server and clean up JDBC related resources - * - * Error messages in case of failure may subsequently be retrieved by - * calling {@link #getErrorMessage()}. - * - * <dt><b>Precondition:</b></dt> - * <dd>must be in <q>connected</q> state, see {@link #isConnected()}</dd> - * - * @return true if disconnecting was successful, false in case error(s) occur. - */ - public boolean disconnect() { - ... - } - - /** - * An instance can either be in <q>connected</q> or <q>disconnected</q> state. The - * state can be toggled by invoking {@link #connect()} or - * {@link #disconnect()} respectively. - * - * @return true if connected, false otherwise - */ - public boolean isConnected() { - return ...; - } -}</programlisting> - - <para>Notice the two internal states - <quote>disconnected</quote> and - <quote>connected</quote>:</para> - - <figure xml:id="figPersistenceHandlerStates"> - <title>Possible states and transitions for instances of - <code>PersistenceHandler</code>.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/persistHandlerStates.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>According to the above documentation a newly created - <code>PersistenceHandler</code> instance should be in - disconnected state. As being shown in the <link - linkend="gloss_Java"><trademark>Java</trademark></link> class - description you may test your implementation without any GUI - code. If you are already familiar with unit testing this might - be a good start as well.</para> - </question> - - <answer> - <para>We show a possible implementation of - <classname>sda.jdbc.intro.v1.PersistenceHandler</classname>:</para> - - <programlisting language="none">package sda.jdbc.intro.v1; -... - -public class PersistenceHandler { - - Connection conn = null; - Statement stmt = null; - - String errorMessage = null; - - /** - * New instances are in <q>disconnected</q> state. See {@link #isConnected()} - */ - public PersistenceHandler() {/* only present here to supply Javadoc comment */} - - /** - * Inserting a (name, email) record into the database server. In case of - * errors corresponding messages may subsequently be retrieved by calling - * {@link #getErrorMessage()}. - * - * <dt><b>Precondition:</b></dt> <dd>must be in - * <q>connected</q> state, see {@link #isConnected()}</dd> - * - * @param name - * A person's name - * @param email - * A person's email address - * - * @return true if the current data record has been successfully inserted - * into the database server. false in case of error(s). - */ - public boolean add(final String name, final String email){ - final String sql = "INSERT INTO Person VALUES('" + name + "', '" + - email + "')"; - try { - stmt.executeUpdate(sql); - return true; - } catch (SQLException e) { - errorMessage = "Unable to execute '" + sql + "': '" + e.getMessage() + "'"; - return false; - } - } - - /** - * Retrieving error messages in case a call to {@link #add(String, String)}, - * {@link #connect()}, or {@link #disconnect()} yields an error. - * - * @return the error explanation corresponding to the latest failed - * operation, null if no error yet occurred. - */ - public String getErrorMessage() { - return errorMessage; - } - - /** - * Open a connection to a database server. - * - * <dt><b>Precondition:</b><dd> - * <dd>must be in <q>disconnected</q> state, see {@link #isConnected()}</dd> - * - * <dt><b>Precondition:</b><dd> - * <dd>The following properties must be set: - * <pre>PersistenceHandler.jdbcUrl=jdbc:mysql://localhost:3306/hdm -PersistenceHandler.password=XYZ -PersistenceHandler.username=foo</pre> - * </dd> - * - * @return true if connecting was successful - */ - public boolean connect () { - try { - conn = DriverManager.getConnection( - DbProps.getString("PersistenceHandler.jdbcUrl"), - DbProps.getString("PersistenceHandler.username"), - DbProps.getString("PersistenceHandler.password")); - try { - stmt = conn.createStatement(); - return true; - } catch (SQLException e) { - errorMessage = "Connection opened but Statement creation failed:\"" + e.getMessage() + "\"."; - try { - conn.close(); - } catch (SQLException ee) { - errorMessage += "Closing connection failed:\"" + e.getMessage() + "\"."; - } - conn = null; - } - - } catch (SQLException e) { - errorMessage = "Unable to open connection:\"" + e.getMessage() + "\"."; - } - return false; - } - - /** - * Close a connection to a database server and clean up JDBC related resources - * - * Error messages in case of failure may subsequently be retrieved by - * calling {@link #getErrorMessage()}. - * - * <dt><b>Precondition:</b></dt> - * <dd>must be in <q>connected</q> state, see {@link #isConnected()}</dd> - * - * @return true if disconnecting was successful, false in case error(s) occur. - */ - public boolean disconnect() { - boolean resultStatus = true; - final StringBuffer messageCollector = new StringBuffer(); - try { - stmt.close(); - } catch (SQLException e) { - resultStatus = false; - messageCollector.append("Unable to close Statement:\"" + e.getMessage() + "\"."); - } - stmt = null; - try { - conn.close(); - } catch (SQLException e) { - resultStatus = false; - messageCollector.append("Unable to close connection:\"" + e.getMessage() + "\"."); - } - conn = null; - if (!resultStatus) { - errorMessage = messageCollector.toString(); - } - return resultStatus; - } - - /** - * An instance can either be in <q>connected</q> or <q>disconnected</q> state. The - * state can be toggled by invoking {@link #connect()} or - * {@link #disconnect()} respectively. - * - * @return true if connected, false otherwise - */ - public boolean isConnected() { - return null != conn; - } -}</programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <para>We may now complete the next enhancement step of our GUI - database client.</para> - - <qandaset defaultlabel="qanda" xml:id="exerciseGuiWriteTakeTwo"> - <title>Connection on user action</title> - - <qandadiv> - <qandaentry> - <question> - <label>An application writing records to a database - server</label> - - <para>Our aim is to enhance the first GUI prototype being - described in <xref linkend="simpleInsertGui"/>. The - application shall start being disconnected from the database - server. Prior to entering data the user shall be guided to - open a connection. The following video illustrates the desired - user interface:</para> - - <figure xml:id="figureDataInsert2"> - <title>A GUI frontend for adding personal data to a - server.</title> - - <mediaobject> - <videoobject> - <videodata fileref="Ref/Video/dataInsert.mp4"/> - </videoobject> - </mediaobject> - </figure> - - <para>In case a user closes the main window while still being - connected a disconnect from the database server shall be - enforced. For this purpose we must handle the event when the - user clicks on the closing button within the window - decoration. An exit handler method is being required to - terminate a potentially open database connection.</para> - </question> - - <answer> - <para>Our implementation uses the class - <classname>sda.jdbc.intro.v1.PersistenceHandler</classname> - for handling all database communication. The GUI needs to - visualize the two different states <quote>disconnected</quote> - and <quote>connected</quote>. In <quote>disconnected</quote> - state the whole input pane for entering datasets and clicking - the <quote>Insert</quote> button is locked. So the user is - forced to actively open a database connection.</para> - - <para>Notice also the - <classname>java.awt.event.WindowAdapter</classname> - implementation being executed when closing the application's - main window. The - <methodname>java.awt.event.WindowAdapter.windowClosing(java.awt.event.WindowEvent)</methodname> - method disconnects any existing database connection thus - freeing resources.</para> - - <programlisting language="none">package sda.jdbc.intro.v1; - -import ... - -public class InsertPerson extends JFrame { - - private static final long serialVersionUID = 6815975741605247675L; - - final PersistenceHandler persistenceHandler = new PersistenceHandler(); - - final JTextField nameField = new JTextField(15), - emailField = new JTextField(20); - - final JButton toggleConnectButton = new JButton(), - insertButton = new JButton("Insert"); - - final JPanel databaseFieldPanel = new JPanel(); - - private void setGuiConnectionState(final boolean state) { - if (state) { - toggleConnectButton.setText("Disconnect"); - } else { - toggleConnectButton.setText("Connect"); - } - for (final Component c: databaseFieldPanel.getComponents()){ - c.setEnabled(state); - } - } - - public static void main(String[] args) throws SQLException { - InsertPerson app = new InsertPerson(); - app.setVisible(true); - } - - public InsertPerson (){ - super ("Add a person's data"); - - setSize(500, 500); - - addWindowListener(new WindowAdapter() { - // In case a user closes our application window while still being connected - // we have to close the database connection. - @Override - public void windowClosing(WindowEvent e) { - super.windowClosing(e); - if (persistenceHandler.isConnected() && !persistenceHandler.disconnect()) { - System.exit(1); - } else { - System.exit(0); - } - }); - Box top = Box.createHorizontalBox(); - add(top, BorderLayout.NORTH); - top.add(toggleConnectButton); - - toggleConnectButton.addActionListener(new ActionListener() { - - @Override - public void actionPerformed(ActionEvent e) { - if (persistenceHandler.isConnected()) { - if (persistenceHandler.disconnect()){ - setGuiConnectionState(false); - } else { - JOptionPane.showMessageDialog(null, persistenceHandler.getErrorMessage()); - } - } else { - if (persistenceHandler.connect()){ - setGuiConnectionState(true); - } else { - JOptionPane.showMessageDialog(null, persistenceHandler.getErrorMessage()); - } - } - } - }); - - databaseFieldPanel.setLayout(new GridLayout(0,2)); - add(databaseFieldPanel); - - databaseFieldPanel.add(new JLabel("Name:")); - databaseFieldPanel.add(nameField); - - databaseFieldPanel.add(new JLabel("E-mail:")); - databaseFieldPanel.add(emailField); - - insertButton.addActionListener(new ActionListener() { - @Override - public void actionPerformed(ActionEvent e) { - if (persistenceHandler.add(nameField.getText(), emailField.getText())) { - nameField.setText(""); - emailField.setText(""); - JOptionPane.showMessageDialog(null, "Succesfully inserted dataset"); - } else { - JOptionPane.showMessageDialog(null, persistenceHandler.getErrorMessage()); - } - } - }); - databaseFieldPanel.add(Box.createGlue()); - databaseFieldPanel.add(insertButton); - setGuiConnectionState(false); - pack(); - } -}</programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="jdbcSecurity"> - <title><trademark - xlink:href="">JDBC</trademark> - and security</title> - - <section xml:id="jdbcSecurityNetwork"> - <title>Network sniffing</title> - - <para>Sniffing <trademark - xlink:href="">JDBC</trademark> - network traffic is one possibility for intruders to compromise - database applications. This requires physical access to either - of:</para> - - <itemizedlist> - <listitem> - <para>Server host</para> - </listitem> - - <listitem> - <para>Client host</para> - </listitem> - - <listitem> - <para>intermediate hub, switch or router.</para> - </listitem> - </itemizedlist> - - <figure xml:id="figJdbcSniffing"> - <title>Sniffing a <trademark - xlink:href="">JDBC</trademark> - connection by an intruder.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/jdbcSniffing.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>We demonstrate a possible attack by analyzing the network - traffic between our application shown in <xref - linkend="figJdbcSimpleWrite"/> and the <productname - xlink:href="">Mysql</productname> database - server. Prior to starting the application we set up <productname - xlink:href="">Wireshark</productname> for - filtered capturing:</para> - - <itemizedlist> - <listitem> - <para>Connecting to the <varname>loopback</varname> (lo) - interface only. This is sufficient since our client connects to - <varname>localhost</varname>.</para> - </listitem> - - <listitem> - <para>Filtering packets if not of type <acronym - xlink:href="">TCP</acronym> - and having port number 3306</para> - </listitem> - </itemizedlist> - - <para>This yields the following capture being shortened for the sake - of brevity:</para> - - <programlisting language="none">[... -5.5.24-0ubuntu0.12.04.1.%...X*e?I1ZQ...................e,F[yoA5$T[N.mysql_native_password. - A...........!.......................hdmuser <co xml:id="tcpCaptureUsername"/>......U.>S.%..~h...!.xhdm............j..../* - - ... INSERT INTO Person VALUES('Jim', '') <co - xml:id="tcpCaptureSqlInsert"/>6... - .&.#23000Duplicate entry '' for key 'email' <co - xml:id="tcpCaptureErrmsg"/></programlisting> - - <calloutlist> - <callout arearefs="tcpCaptureUsername"> - <para>The <varname>username</varname> initiating the connection - to the database server.</para> - </callout> - - <callout arearefs="tcpCaptureSqlInsert"> - <para>The <code>INSERT ...</code> statement.</para> - </callout> - - <callout arearefs="tcpCaptureErrmsg"> - <para>The resulting error message being sent back to the - client.</para> - </callout> - </calloutlist> - - <para>Something seems to be missing here: The user's password. Our - code in <xref linkend="figJdbcSimpleWrite"/> contains the password - <quote><varname>XYZ</varname></quote> in clear text. But even using - the search function of <productname - xlink:href="">Wireshark</productname> does - not show any such string within the above capture. The <productname - xlink:href="">Mysql</productname> documentation - however <link - xlink:href="">reveals</link> - that everything but the password is transmitted in clear text. So - all we might identify is a hash of <code>XYZ</code>.</para> - - <para>So regarding our (current) <productname - xlink:href="">Mysql</productname> implementation - the impact of this attack type is somewhat limited but still severe: - All data being transmitted between client and server may be - disclosed. This typically comprises sensible data as well. Possible - solutions:</para> - - <itemizedlist> - <listitem> - <para>Create an encrypted tunnel between client and server like - e.g. <link - xlink:href="">ssh - port forwarding</link> or <link - xlink:href="">VPN</link>.</para> - </listitem> - - <listitem> - <para>Many database vendors <link - xlink:href="">supply - SSL</link> or similar <trademark - xlink:href="">JDBC</trademark> - protocol encryption extensions. This requires additional - configuration procedures like setting up server side - certificates. Moreover similar to the http/https protocols - encryption generally slows down data traffic.</para> - </listitem> - </itemizedlist> - - <para>Of course this is only relevant if the transport layer is - considered to be insecure. If both server and client reside within - the same trusted infrastructure no action has to be taken. We also - note that this kind of problem is not limited to <trademark - xlink:href="">JDBC</trademark>. - In fact all protocols lacking encryption are subject to this type of - attack.</para> - </section> - - <section xml:id="sqlInjection"> - <title>SQL injection</title> - - <para>Before diving into technical details we shed some light on the - possible impact of this common attack type being described in this - chapter. Our example is the well known Heartland Payment Systems - data breach:</para> - - <figure xml:id="figHeartlandSecurityBreach"> - <title>Summary about possible SQL injection impact based on the - Heartland security breach</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/heartland.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>Why should we be concerned with SQL injection? In the - introduction of <xref linkend="bib_Clarke09"/> a compelling argument - is being given:</para> - - <blockquote> - <para>Many people say they know what SQL injection is, but all - they have heard about or experienced are trivial examples. SQL - injection is one of the most devastating vulnerabilities to impact - a business, as it can lead to exposure of all of the sensitive - information stored in an application's database, including handy - information such as usernames, passwords, names, addresses, phone - numbers, and credit card details.</para> - </blockquote> - - <para>In this lecture due to limited resources we only deal with - trivial examples mentioned above. One possible way SQL injection - attacks work is by inserting SQL code into fields being designed for - end user input:</para> - - <figure xml:id="figSqlInject"> - <title>SQL injection triggered by ordinary user input.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/sqlinject.fig"/> - </imageobject> - </mediaobject> - </figure> - - <qandaset defaultlabel="qanda" xml:id="sqlInjectDropTable"> - <title>Attack from the dark side</title> - - <qandadiv> - <qandaentry> - <question> - <para>Use the application from <xref - linkend="exerciseGuiWriteTakeTwo"/> and <xref - linkend="figSqlInject"/> to launch a SQL injection attack. - We provide some hints:</para> - - <orderedlist> - <listitem> - <para>The <productname - xlink:href="">Mysql</productname> - <trademark - xlink:href="">JDBC</trademark> - driver implementation already provides precautions to - hamper SQL injection attacks. In its default - configuration a sequence of SQL commands separated by - semicolons (<quote>;</quote>) will not be executed but - flagged as a SQL syntax error. We take an - example:</para> - - <programlisting language="none">INSERT INTO Person VALUES (...);DROP TABLE Person</programlisting> - - <para>In order to execute these so called multi user - queries we explicitly have to enable a <productname - xlink:href="">Mysql</productname> - property. This may be achieved by extending our - <trademark - xlink:href="">JDBC</trademark> - URL:</para> - - <programlisting language="none">jdbc:mysql://localhost:3306/hdm?<emphasis - role="bold">allowMultiQueries=true</emphasis></programlisting> - - <para>The <productname - xlink:href="">Mysql</productname> - manual <link - xlink:href="">contains - </link>a remark regarding this parameter:</para> - - <remark>Notice that this has the potential for SQL - injection if using plain java.sql.Statements and your - code doesn't sanitize input correctly.</remark> - - <para>In other words: You have been warned!</para> - </listitem> - - <listitem> - <para>You may now use either of the two input fields - <quote>name</quote> or <quote>email</quote> to inject - arbitrary SQL code.</para> - </listitem> - </orderedlist> - </question> - - <answer> - <para>We construct a suitable string being injected to drop - our <code>Person</code> table:</para> - - <programlisting language="none">Jim', '');DROP TABLE Person;INSERT INTO Person VALUES('Joe</programlisting> - - <para>This being entered into the name field kills our - <code>Table</code> relation effectively. As the error - message shows two INSERT statements are separated by a DROP - TABLE statement. So after executing the first INSERT our - database server drops the whole table. At last the second - INSERT statement fails giving rise to an error message no - end user will ever understand:</para> - - <figure xml:id="figSqlInjectDropPerson"> - <title>Dropping the <code>Person</code> table by SQL - injection</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/sqlInject.screen.png"/> - </imageobject> - </mediaobject> - </figure> - - <para>According to the message text the table - <code>Person</code> gets dropped as expected. Thus the - subsequent (second) <code>INSERT</code> action is bound to - fail.</para> - - <para>In practice this result my be avoided. The database - user will (hopefully!) not have sufficient permissions to - drop the whole table. Malicious modifications by INSERT, - UPDATE or DELETE statements are still possible.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="sanitizeUserInput"> - <title>Sanitizing user input</title> - - <para>There are at least two general ways to deal with the - disastrous result of <xref linkend="sqlInjectDropTable"/>:</para> - - <itemizedlist> - <listitem> - <para>Keep the database server from interpreting user input - completely. This is probably the best way and will be discussed - in <xref linkend="sectPreparedStatements"/>.</para> - </listitem> - - <listitem> - <para>Let the application check and process user input. - Dangerous user input may be modified prior to being embedded in - SQL statements or being rejected completely.</para> - </listitem> - </itemizedlist> - - <para>The first method is definitely superior in most cases. There - are however cases where the restrictions being implied are too - severe. We may for example choose dynamically which tables shall be - accessed. So an SQL statement's structure rather than just its - predicates is affected by user input. There are at least two - standard procedures dealing with this problem:</para> - - <glosslist> - <glossentry> - <glossterm>Input Filtering</glossterm> - - <glossdef> - <para>In the simplest case we check a user's input by regular - expressions. An example is an input field in a login window - representing a system user name. Legal input may allows - letters and digits only. Special characters, whitespace etc. - are typically prohibited. The input does have a minimum length - of one character. A maximum length may be imposed as well. So - we may choose the regular expression <code>[A-Za-z0-9]+</code> - to check valid user names.</para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm><foreignphrase>Whitelisting</foreignphrase></glossterm> - - <glossdef> - <para>In many cases Input fields only allow a restricted set - of values. Consider an input field for names of planets. An - application may keep a dictionary table to validate user - input:</para> - - <informaltable border="1"> - <col width="10%"/> - - <col width="5%"/> - - <tr> - <td>Mercury</td> - - <td>1</td> - </tr> - - <tr> - <td>Venus</td> - - <td>2</td> - </tr> - - <tr> - <td>Earth</td> - - <td>3</td> - </tr> - - <tr> - <td>...</td> - - <td>...</td> - </tr> - - <tr> - <td>Neptune</td> - - <td>9</td> - </tr> - - <tr> - <td><emphasis role="bold">Default:</emphasis></td> - - <td><emphasis role="bold">0</emphasis></td> - </tr> - </informaltable> - - <para>So if a user enters a valid planet name a corresponding - number representing this particular planet will be sent to the - database. If the user enters an invalid string an error - message may be raised.</para> - - <para>In a GUI in many situations this may be better - accomplished by presenting the list of planets to choose from. - In this case a user has no chance to enter invalid or even - malicious code.</para> - </glossdef> - </glossentry> - </glosslist> - - <para>So we have an <quote>interceptor</quote> sitting between user - input fields and SQL generating code:</para> - - <figure xml:id="figInputFiltering"> - <title>Validating user input prior to dynamically composing SQL - statements.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/filtering.fig"/> - </imageobject> - </mediaobject> - </figure> - - <qandaset defaultlabel="qanda" xml:id="quandaentry_RegexpUse"> - <title>Using regular expressions in <link - linkend="gloss_Java"><trademark>Java</trademark></link></title> - - <qandadiv> - <qandaentry> - <question> - <para>This exercise is a preparation for <xref - linkend="exercisefilterUserInput"/>. The aim is to deal with - regular expressions and to use them in <link - linkend="gloss_Java"><trademark>Java</trademark></link>. If - you don't know yet about regular expressions / pattern - matching you may want to read either of:</para> - - <itemizedlist> - <listitem> - <para><link - xlink:href="">Regular - expressions - An introduction</link></para> - </listitem> - - <listitem> - <para><link - xlink:href="">An - Introduction to Regular Expressions</link></para> - </listitem> - - <listitem> - <para><link - xlink:href="">Regular - Expression Tutorial</link></para> - </listitem> - </itemizedlist> - - <para>Complete the implementation of the following - skeleton:</para> - - <programlisting language="none">... -import java.util.regex.Matcher; -import java.util.regex.Pattern; - -public static void main(String[] args) { - final String [] wordList = new String [] {"Eric", "126653BBb", "_login","some text"}; - final String [] regexpList = new String[] {"[A-K].*", "[^0-9]+.*", "_[a-z]+", ""}; - - for (final String word: wordList) { - for (final String regexp: regexpList) { - testMatch(word, regexp); - } - } -} - -/** - * Matching a given word by a regular expression. A log message is being - * written to stdout. - * - * Hint: The implementation is based on the explanation being given in the - * introduction to {@link Pattern} - * - * @param word This string will be matched by the subsequent argument. - * @param regexp The regular expression tested to match the previous argument. - * @return true if regexp matches word, false otherwise. - */ -public static boolean testMatch(final String word, final String regexp) { -.../* to be implemented by <emphasis role="bold">**YOU**</emphasis> */ -}</programlisting> - - <para>As being noted in the <link - linkend="gloss_Java"><trademark>Java</trademark></link> - above you may want to read the documentation of class - <classname>java.util.regex.Pattern</classname>. The intended - output of the above application is:</para> - - <programlisting language="none">The expression '[A-K].*' matches 'Eric' -The expression '[^0-9]+.*' ... -...</programlisting> - </question> - - <answer> - <para>A possible implementation is given by - <classname>sda.regexp.RegexpPrimer</classname>.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <qandaset defaultlabel="qanda" xml:id="exercisefilterUserInput"> - <title>Input validation by regular expressions</title> - - <qandadiv> - <qandaentry> - <question> - <para>The application of <xref - linkend="sqlInjectDropTable"/> proved to be vulnerable to - SQL injection. Sanitize the two user input field's values to - prevent such behaviour.</para> - - <itemizedlist> - <listitem> - <para>Find appropriate regular expressions to check both - username and email. Some hints:</para> - - <glosslist> - <glossentry> - <glossterm>username</glossterm> - - <glossdef> - <para>Regarding SQL injection the <quote>;</quote> - character is among the most critical. You may want - to exclude certain special characters. This doesn't - harm since their presence in a user's name is likely - to be a typo rather then any sensitive input.</para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm>email</glossterm> - - <glossdef> - <para>There are tons of <quote>ultimate</quote> - regular expressions available to check email - addresses. Remember that rather avoiding - <quote>wrong</quote> email addresses the present - task is to avoid SQL injection. So find a reasonable - one which may be too permissive regarding RFC email - syntax rules but sufficient to secure your - application.</para> - - <para>A concise definition of an email's syntax is - being given in <link - xlink:href="">RFC5322</link>. - Its implementation is beyond scope of the current - lecture. Moreover it is questionable whether E-mail - clients and mail transfer agents implement strict - RFC compliance.</para> - </glossdef> - </glossentry> - </glosslist> - - <para>Both regular expressions must cover the whole user - input from the beginning to the end. This can be - achieved by using <code>^ ... $</code>.</para> - </listitem> - - <listitem> - <para>The <link - linkend="gloss_Java"><trademark>Java</trademark></link> - standard class - <classname>javax.swing.InputVerifier</classname> may - help you validating user input.</para> - </listitem> - - <listitem> - <para>The following screenshot may provide an idea for - GUI realization and user interaction in case of errors. - Of course the submit button's action should be disabled - in case of erroneous input. The user should receive a - helpful error message instead.</para> - - <figure xml:id="figInsertValidate"> - <title>Error message being presented to the - user.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/insertValidate.screen.png"/> - </imageobject> - - <caption> - <para>In the current example the trailing - <quote>;</quote> within the E-Mail field is - invalid.</para> - </caption> - </mediaobject> - </figure> - </listitem> - </itemizedlist> - </question> - - <answer> - <para>Extending - <classname>javax.swing.InputVerifier</classname> allows us - to build a generic class to filter user text input by - arbitrary regular expressions:</para> - - <programlisting language="none">package sda.jdbc.intro.v1.sanitize; -... -public class RegexpVerifier extends InputVerifier { - - final Pattern syntaxPattern; - final JLabel validationLabel; - private boolean inputValid = false; - private final String errMsg; -... - public RegexpVerifier (final String regex, final JLabel validationLabel, final String errMsg) { - this.validationLabel = validationLabel; - this.errMsg = errMsg; - syntaxPattern = Pattern.compile(regex); - } - - @Override - public boolean verify(JComponent input) { - if (input instanceof JTextField) { - final String userInput = ((JTextField) input).getText(); - if (syntaxPattern.matcher(userInput).find()) { - validationLabel.setText(""); - inputValid = true; - } else { - validationLabel.setText(errMsg); - inputValid = false; - } - } - return inputValid; - } - public boolean inputIsValid () { - return inputValid; - } -}</programlisting> - - <para>Instances of - <classname>sda.jdbc.intro.v1.sanitize.RegexpVerifier</classname> - <coref linkend="emailVerifier"/> <coref - linkend="nameVerifier"/> may now be used to validate our two - input data fields <coref linkend="setNameValidation"/> - <coref linkend="setEmailValidation"/>. We put emphasis on - the changes with respect to - <classname>sda.jdbc.intro.v1.InsertPerson</classname>:</para> - - <programlisting language="none">package sda.jdbc.intro.v1.sanitize; -... -public class InsertPerson extends JFrame { - - final JTextField nameField = new JTextField(15); - final JLabel nameFieldValidationLabel <co xml:id="nameVerifier"/> = new JLabel(); - final RegexpVerifier nameFieldVerifier = new RegexpVerifier( - "^[^;'\"]+$", - nameFieldValidationLabel, - "No special characters"); - - final JTextField emailField = new JTextField(20); - final JLabel emailFieldValidationLabel <co xml:id="emailVerifier"/> = new JLabel(); - final RegexpVerifier emailFieldVerifier = - new RegexpVerifier("^[\\w\\-\\.\\_]+@[\\w\\-\\.]*[a-zA-Z]{2,4}$", - emailFieldValidationLabel, - "email not valid"); -... - public static void main(String[] args) throws SQLException { - InsertPerson app = new InsertPerson(); - app.setVisible(true); - } - public InsertPerson (){ -... - databaseFieldPanel.add(nameField); - <emphasis role="bold">nameFieldValidationLabel.setForeground(Color.RED); - databaseFieldPanel.add(nameFieldValidationLabel); - nameField.setInputVerifier(nameFieldVerifier);</emphasis> <co - xml:id="setNameValidation"/> - - databaseFieldPanel.add(new JLabel("E-mail:")); - databaseFieldPanel.add(emailField); - <emphasis role="bold">databaseFieldPanel.add(emailFieldValidationLabel); - emailFieldValidationLabel.setForeground(Color.RED); - emailField.setInputVerifier(emailFieldVerifier);</emphasis> <co - xml:id="setEmailValidation"/> - - insertButton.addActionListener(new ActionListener() { - @Override - public void actionPerformed(ActionEvent e) { - <emphasis role="bold">if (!nameFieldVerifier.inputIsValid() || !emailFieldVerifier.inputIsValid()) { - JOptionPane.showMessageDialog(null, "Invalid input value(s)"); - }</emphasis> else { -...</programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="sectPreparedStatements"> - <title><classname>java.sql.PreparedStatement</classname> - objects</title> - - <para>Sanitizing user input is an essential means to secure an - application. The <trademark - xlink:href="">JDBC</trademark> - standard however provides a mechanism being superior regarding the - purpose of protecting applications against SQL injection attacks. We - shed some light on our current mechanism sending SQL statements to a - database server:</para> - - <figure xml:id="sqlTransport"> - <title>SQL statements in <link - linkend="gloss_Java"><trademark>Java</trademark></link> - applications get parsed at the database server</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/sqlTransport.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>This architecture raises two questions:</para> - - <orderedlist> - <listitem> - <para>What happens in case identical SQL statements are executed - repeatedly? This may happen inside a loop when thousands of - records with identical structure are being sent to a - database.</para> - </listitem> - - <listitem> - <para>Is this architecture adequate with respect to security - concerns?</para> - </listitem> - </orderedlist> - - <para>The first question is related to performance: Parsing - statements being identical despite the properties being contained - within is a waste of resources. We consider the transfer of records - between different databases:</para> - - <programlisting language="none">INSERT INTO Person VALUES ('Jim', '') -INSERT INTO Person VALUES ('Eve', '') -INSERT INTO Person VALUES ('Pete', '') -...</programlisting> - - <para>In this case it does not make sense to repeatedly parse - identical SQL statements. Using single <code>INSERT</code> - statements with multiple data records may not be an option when the - number of records grows.</para> - - <para>The second question is related to our current security topic: - The database server's interpreter my be so <quote>kind</quote> to - interpret an attacker's malicious code as well.</para> - - <para>Both topics are being addressed by - <classname>java.sql.PreparedStatement</classname> objects. Basically - these objects allow for separation of an SQL statements structure - from parameter values contained within. The scenario given in <xref - linkend="sqlTransport"/> may be implemented as:</para> - - <figure xml:id="sqlTransportPrepare"> - <title>Using <classname>java.sql.PreparedStatement</classname> - objects.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/sqlTransportPrepare.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>Prepared statements are an example for parameterized SQL - statements which exist in various programming languages. When using - <classname>java.sql.PreparedStatement</classname> instances we - actually have three distinct phases:</para> - - <orderedlist> - <listitem> - <para xml:id="exerciseGuiWritePrepared">Creating an instance of - <classname>java.sql.PreparedStatement</classname>. The SQL - statement possibly containing place holders gets parsed.</para> - </listitem> - - <listitem> - <para>Setting all placeholder values. This does not involve any - further SQL syntax parsing.</para> - </listitem> - - <listitem> - <para>Execute the statement.</para> - </listitem> - </orderedlist> - - <para>Steps 2. and 3. may be repeated as often as desired without - any re-parsing of SQL statements thus saving resources on the - database server side.</para> - - <para>Our introductory toy application <xref - linkend="figJdbcSimpleWrite"/> may be rewritten using - <classname>java.sql.PreparedStatement</classname> objects:</para> - - <programlisting language="none">sda.jdbc.intro.v1; -... -public class SimpleInsert { - - public static void main(String[] args) throws SQLException { - - final Connection conn = DriverManager.getConnection (... - - // Step 2: Create a PreparedStatement instance - final PreparedStatement pStmt = conn.prepareStatement( - "INSERT INTO Person VALUES(<emphasis role="bold">?, ?</emphasis>)");<co - xml:id="listPrepCreate"/> - - // Step 3a: Fill in desired attribute values - pStmt.setString(1, "Jim");<co xml:id="listPrepSet1"/> - pStmt.setString(2, "");<co xml:id="listPrepSet2"/> - - // Step 3b: Execute the desired INSERT - final int updateCount = pStmt.executeUpdate();<co xml:id="listPrepExec"/> - - // Step 4: Give feedback to the enduser - System.out.println("Successfully inserted " + updateCount + " dataset(s)"); - } -}</programlisting> - - <calloutlist> - <callout arearefs="listPrepCreate"> - <para>An instance of - <classname>java.sql.PreparedStatement</classname> is being - created. Notice the two question marks representing two place - holders for string values to be inserted in the next - step.</para> - </callout> - - <callout arearefs="listPrepSet1 listPrepSet2"> - <para>Fill in the two placeholder values being defined at <coref - linkend="listPrepCreate"/>.</para> - - <caution> - <para>Since half the world of programming folks will index a - list of n elements starting from 0 to n-1, <trademark - xlink:href="">JDBC</trademark> - apparently counts from 1 to n. Working with <trademark - xlink:href="">JDBC</trademark> - would have been too easy otherwise.</para> - </caution> - </callout> - - <callout arearefs="listPrepExec"> - <para>Execute the beast! Notice the empty parameter list. No SQL - is required since we already prepared it in <coref - linkend="listPrepCreate"/>.</para> - </callout> - </calloutlist> - - <para>The problem of SQL injection disappears completely when using - <classname>java.sql.PreparedStatement</classname> instances. An - attacker may safely enter offending strings like:</para> - - <programlisting language="none">Jim', '');DROP TABLE Person;INSERT INTO Person VALUES('Joe</programlisting> - - <para>The above string will be taken <quote>as is</quote> and thus - simply becomes part of the database server's content.</para> - - <qandaset defaultlabel="qanda" xml:id="exerciseSqlInjectPrepare"> - <title>Prepared Statements to keep the barbarians at the - gate</title> - - <qandadiv> - <qandaentry> - <question> - <para>In <xref linkend="sqlInjectDropTable"/> we found our - implementation in <xref linkend="exerciseGuiWriteTakeTwo"/> - to be vulnerable with respect to SQL injection. Rather than - sanitizing user input you shall use - <classname>java.sql.PreparedStatement</classname> objects to - secure the application.</para> - </question> - - <answer> - <para>Due to our separation of GUI and persistence handling - we only need to re-implement - <classname>sda.jdbc.intro.sqlinject.PersistenceHandler</classname>. - We have to replace <classname>java.sql.Statement</classname> - by <classname>java.sql.PreparedStatement</classname> - instances. A possible implementation is - <classname>sda.jdbc.intro.v1.prepare.PersistenceHandler</classname>. - We may now safely enter offending strings like:</para> - - <programlisting language="none">Jim', '');DROP TABLE Person;INSERT INTO Person VALUES('Joe</programlisting> - - <para>This time the input value is taken <quote>as - is</quote> and yields the following error message:</para> - - <informalfigure> - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/sqlInjectPrepare.screen.png"/> - </imageobject> - </mediaobject> - </informalfigure> - - <para>The offending string exceeds the length of the - attribute <code>name</code> within the database table - <code>Person</code>. We may enlarge this value to allow the - <code>INSERT</code> operation:</para> - - <programlisting language="none">CREATE TABLE Person ( - name char(<emphasis role="bold">80</emphasis>) <emphasis role="bold">-- a little bit longer --</emphasis> - ,email CHAR(20) UNIQUE -);</programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <para>We may have followed the track of test-driven development. In - that case we would have written tests before actually implementing - our application. In the current lecture we will do this the other - way round in the following exercise. The idea is to assure software - quality when fixing bugs or extending an application.</para> - - <para>The subsequent exercise requires the <productname - xlink:href="">TestNG</productname> - plugin for Eclipse to be installed. This should already be the case - both in the MI exercise classrooms and in the Virtualbox image - provided at <uri - xlink:href=""></uri>. - If you use a private Eclipse installation you may want to follow - <xref linkend="testngInstall"/>.</para> - - <qandaset defaultlabel="qanda" xml:id="quandaentry_DbLayerUnitTest"> - <title>Testing - <classname>sda.jdbc.intro.v1.PersistenceHandler</classname> using - <productname - xlink:href="">TestNG</productname></title> - - <qandadiv> - <qandaentry> - <question> - <para>Read <xref linkend="chapUnitTesting"/>. Then - test:</para> - - <itemizedlist> - <listitem> - <para>Proper behaviour when opening and closing - connections.</para> - </listitem> - - <listitem> - <para>Proper behavior when inserting data</para> - </listitem> - - <listitem> - <para>Expected behaviour when entering duplicate values - violating integrity constraints. Look for error messages - as well.</para> - </listitem> - </itemizedlist> - - <para>You may write code to initialize the database state - appropriately prior to start tests.</para> - </question> - - <answer> - <para><productname - xlink:href="">TestNG</productname> may be - directed by - <classname>sda.jdbc.intro.v1.prepare.PersistenceHandlerTest</classname>.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - </section> - - <section xml:id="jdbcRead"> - <title>Read Access</title> - - <para>So far we've sent records to a database server. Applications - however need both directions: Pushing data to a Server and receiving - data as well. The overall process looks like:</para> - - <figure xml:id="jdbcReadWrite"> - <title>Server / client object's life cycle</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/jdbcReadWrite.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>So far we've only covered the second (<code>UPDATE</code>) part - of this picture. Reading objects from a database server into a - client's (transient) address space requires a container object to hold - the data in question. Though <link - linkend="gloss_Java"><trademark>Java</trademark></link> offers - standard container interfaces like - <classname>java.util.List</classname> the <trademark - xlink:href="">JDBC</trademark> - standard has created separate specifications like - <classname>java.sql.ResultSet</classname>. Instances of - <classname>java.sql.ResultSet</classname> will hold transient copies - of (database) objects. The next figure outlines the basic - approach:</para> - - <figure xml:id="figJdbcRead"> - <title>Reading data from a database server.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/jdbcread.fig" scale="65"/> - </imageobject> - </mediaobject> - </figure> - - <para>We take an example. Suppose our database contains a table of our - friends' nicknames and their respective birth dates:</para> - - <table border="1" xml:id="figRelationFriends"> - <caption>Names and birth dates of friends.</caption> - - <tr> - <td><programlisting language="none">CREATE TABLE Friends ( - id INTEGER NOT NULL PRIMARY KEY - ,nickname char(10) - ,birthdate DATE -);</programlisting></td> - - <td><programlisting language="none">INSERT INTO Friends VALUES - (1, 'Jim', '1991-10-10') - ,(2, 'Eve', '2003-05-24') - ,(3, 'Mick','2001-12-30') - ;</programlisting></td> - </tr> - </table> - - <para>Following the outline in <xref linkend="figJdbcRead"/> we may - access our data by:</para> - - <figure xml:id="listingJdbcRead"> - <title>Accessing relational data</title> - - <programlisting language="none">package sda.jdbc.intro; -... -public class SimpleRead { - - public static void main(String[] args) throws SQLException { - - // Step 1: Open a connection to the database server - final Connection conn = DriverManager.getConnection ( - DbProps.getString("PersistenceHandler.jdbcUrl"), - DbProps.getString("PersistenceHandler.username"), - DbProps.getString("PersistenceHandler.password")); - - // Step 2: Create a Statement instance - final Statement stmt = conn.createStatement(); - - <emphasis role="bold">// Step 3: Creating the client side JDBC container holding our data records</emphasis> - <emphasis role="bold">final ResultSet data = stmt.executeQuery("SELECT * FROM Friends");</emphasis> <co - linkends="listingJdbcRead-1" xml:id="listingJdbcRead-1-co"/> - - <emphasis role="bold">// Step 4: Dataset iteration - while ( {</emphasis> <co linkends="listingJdbcRead-2" - xml:id="listingJdbcRead-2-co"/> - <emphasis role="bold">System.out.println(data.getInt("id")</emphasis> <co - linkends="listingJdbcRead-3" xml:id="listingJdbcRead-3-co"/> - <emphasis role="bold">+ ", " + data.getString("nickname")</emphasis> <co - linkends="listingJdbcRead-3" xml:id="listingJdbcRead-4-co"/> - <emphasis role="bold">+ ", " + data.getString("birthdate"));</emphasis> <co - linkends="listingJdbcRead-3" xml:id="listingJdbcRead-5-co"/> - } - } -}</programlisting> - </figure> - - <para>The marked code segment above shows difference with respect to - our data insertion application - <classname>sda.jdbc.intro.SimpleInsert</classname>. Some remarks are - in order:</para> - - <calloutlist> - <callout arearefs="listingJdbcRead-1-co" xml:id="listingJdbcRead-1"> - <para>As being mentioned in the introduction to this section the - <trademark - xlink:href="">JDBC</trademark> - standard comes with its own container interface rather than - <classname>java.util.List</classname> or similar.</para> - </callout> - - <callout arearefs="listingJdbcRead-2-co" xml:id="listingJdbcRead-2"> - <para>Calling <link - xlink:href="">next()</link> - prior to actually accessing data on the client side is mandatory! - The <link - xlink:href="">next()</link> - method places the internal iterator to the first element of our - dataset if not empty. Follow the link address and **read** the - documentation.</para> - </callout> - - <callout arearefs="listingJdbcRead-3-co listingJdbcRead-4-co listingJdbcRead-5-co" - xml:id="listingJdbcRead-3"> - <para>The access methods have to be chosen according to matching - types. An overview of database/<link - linkend="gloss_Java"><trademark>Java</trademark></link> type - mappings is being given in <uri - xlink:href=""></uri>.</para> - </callout> - </calloutlist> - - <para>We now present a series of exercises thereby exploring important - aspects of <xref linkend="glo_JDBC"/> read access.</para> - - <section xml:id="sectGetterTypeConversion"> - <title>Getter methods and type conversion</title> - - <qandaset defaultlabel="qanda" - xml:id="quandaentry_JdbcTypeConversion"> - <qandadiv> - <qandaentry> - <question> - <para>Apart from type mappings the <xref - linkend="glo_JDBC"/> access methods like <link - xlink:href="">getString()</link> - may also be used for type conversion. Modify <xref - linkend="listingJdbcRead"/> by:</para> - - <itemizedlist> - <listitem> - <para>Read the database attribute <code>id</code> by - <link - xlink:href="">getString(String)</link>.</para> - </listitem> - - <listitem> - <para>Read the database attribute nickname by <link - xlink:href="">getInt(String)</link>.</para> - </listitem> - </itemizedlist> - - <para>What do you observe?</para> - </question> - - <answer> - <para>Modifying our iteration loop:</para> - - <programlisting language="none">// Step 4: Dataset iteration -while ( { - System.out.println(data.<emphasis role="bold">getString</emphasis>("id") <co - linkends="jdbcReadWrongType-1" - xml:id="jdbcReadWrongType-1-co"/> - + ", " + data.<emphasis role="bold">getInt</emphasis>("nickname") <co - linkends="jdbcReadWrongType-2" - xml:id="jdbcReadWrongType-2-co"/> - + ", " + data.getString("birthdate")); -}</programlisting> - - <para>We observe:</para> - - <calloutlist> - <callout arearefs="jdbcReadWrongType-1-co" - xml:id="jdbcReadWrongType-1"> - <para>Calling <link - xlink:href="">getString()</link> - for a database attribute of type INTEGER does not cause - any trouble: The value gets silently converted to a - string value.</para> - </callout> - - <callout arearefs="jdbcReadWrongType-2-co" - xml:id="jdbcReadWrongType-2"> - <para>Calling <link - xlink:href="">getInt(String)</link> - for the database field of type CHAR yields an (expected) - Exception:</para> - </callout> - </calloutlist> - - <programlisting language="none">Exception in thread "main" java.sql.SQLException: Invalid value for getInt() - 'Jim' - at com.mysql.jdbc.SQLError.createSQLException( -...</programlisting> - - <para>We may however provide <quote>compatible</quote> data - records:</para> - - <programlisting language="none">DELETE FROM Friends; -INSERT INTO Friends VALUES (1, <emphasis role="bold">'31'</emphasis>, '1991-10-10');</programlisting> - - <para>This time our application executes perfectly - well:</para> - - <programlisting language="none">1, 31, 1991-10-10</programlisting> - - <para>Conclusion: The <xref linkend="glo_JDBC"/> driver - performs a conversion from a string type to an integer - similar like the <link - xlink:href="">parseInt(String)</link> - method.</para> - - <para>The next series of exercises aims on a more powerful - implementation of our person data insertion application in - <xref linkend="exerciseInsertLoginCredentials"/>.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="sectHandlingNullValues"> - <title>Handling NULL values.</title> - - <qandaset defaultlabel="qanda" xml:id="quandaentry_HandlingNull"> - <qandadiv> - <qandaentry> - <question> - <para>The attribute <code>birthday</code> in our database - table Friends allows <code>NULL</code> values:</para> - - <programlisting language="none">INSERT INTO Friends VALUES - (1, 'Jim', '1991-10-10') - ,(2, <emphasis role="bold"> NULL</emphasis>, '2003-5-24') - ,(3, 'Mick', '2001-12-30');</programlisting> - - <para>Starting our current application yields:</para> - - <programlisting language="none">1, Jim, 1991-10-10 -2, null, 2003-05-24 -3, Mick, 2001-12-30</programlisting> - - <para>This might be confuses with a person having the - nickname <quote>null</quote>. Instead we would like to - have:</para> - - <programlisting language="none">1, Jim, 1991-10-10 -2, -Name unknown- , 2003-05-24 -3, Mick, 2001-12-30</programlisting> - - <para>Extend the current code of - <classname>sda.jdbc.intro.SimpleRead</classname> to produce - the above result in case of nickname <code>NULL</code> - values.</para> - - <para>Hint: Read the documentation of <link - xlink:href="">wasNull()</link>.</para> - </question> - - <answer> - <para>A possible implementation is being given in - <classname>sda.jdbc.intro.v1.SimpleRead</classname>.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="sectUserAuthStrategy"> - <title>A user authentication <quote>strategy</quote></title> - - <qandaset defaultlabel="qanda" xml:id="exerciseInsecureAuth"> - <qandadiv> - <qandaentry> - <question> - <para>Our current application for entering - <code>Person</code> records lacks authentication: A user - simply connects to the database using credentials being hard - coded in a properties file. A programmer suggests to - implement authentication based on the following extension of - the <code>Person</code> table:</para> - - <programlisting language="none">CREATE TABLE Person ( - name char(80) NOT NULL - ,email CHAR(20) NOT NULL UNIQUE - ,login CHAR(10) UNIQUE -- login names must be unique -- - ,password CHAR(20) -);</programlisting> - - <para>On clicking <quote>Connect</quote> a user may enter - his login name and password, <quote>fred</quote> and - <quote>12345678</quote> in the following example:</para> - - <figure xml:id="figLogin"> - <title>Login credentials for database connection</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/login.screen.png" - scale="90"/> - </imageobject> - </mediaobject> - </figure> - - <para>Based on these input values the following SQL query is - being executed by a - <classname>java.sql.Statement</classname> object:</para> - - <programlisting language="none">SELECT * FROM Person WHERE login='<emphasis - role="bold">fred</emphasis>' and password = '<emphasis - role="bold">12345678</emphasis>'</programlisting> - - <para>Since the login attribute is UNIQUE we are sure to - receive either 0 or 1 dataset. Our programmer proposes to - grant login if the query returns at least one - dataset.</para> - - <para>Discuss this implementation sketch with a colleague. - Do you think this is a sensible approach? <emphasis - role="bold">Write down</emphasis> your results.</para> - </question> - - <answer> - <para>The approach is essentially unusable due to severe - security implications. Since it is based on - <classname>java.sql.Statement</classname> rater than on - <classname>java.sql.PreparedStatement</classname> objects it - is vulnerable to SQL injection attacks. A user my enter the - following password value in the GUI:</para> - - <programlisting language="none">sd' OR '1' = '1</programlisting> - - <para>Based on the login name <quote>fred</quote> the - following SQL string is being crafted:</para> - - <programlisting language="none">SELECT * FROM Person WHERE login='fred' and password = 'sd' OR <emphasis - role="bold">'1' = '1'</emphasis>;</programlisting> - - <para>Since the WHERE clause's last component always - evaluates to true, all objects from the <code>Person</code> - relation are returned thus permitting login.</para> - - <para>The implementation approach suffers from a second - deficiency: The passwords are stored in clear text. If an - attacker gains access to the <code>Person</code> table he'll - immediately retrieve the passwords of all users. This - problem can be solved by storing hash values of passwords - rather than the clear text values themselves.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="sectPasswordsHashed"> - <title>Passwords and hash values</title> - - <qandaset defaultlabel="qanda" xml:id="exerciseHashTraining"> - <qandadiv> - <qandaentry> - <question> - <para>In exercise <xref linkend="exerciseInsecureAuth"/> we - discarded the idea of clear text passwords in favour of - password hashes. In order to avoid Rainbow cracking so - called salted hashes are superior. You should read <uri - xlink:href=""></uri> - for overview purposes. The article contains further - references on the bottom of the page.</para> - - <para>With respect to an implementation <uri - xlink:href=""></uri> - provides a simple example for:</para> - - <itemizedlist> - <listitem> - <para>Creating a salted hash from a given password - string.</para> - </listitem> - - <listitem> - <para>Verify if a hash string matches a given clear text - password.</para> - </listitem> - </itemizedlist> - - <para>The example uses an external library. On <productname - xlink:href="">Ubuntu</productname> - Linux this may be installed by issuing - <command>aptitude</command> <option>install</option> - <option>libcommons-codec-java</option>. On successful - install the file - <filename>/usr/share/java/commons-codec-1.5.jar</filename> - may be appended to your <envar>CLASSPATH</envar>.</para> - - <para>You may as well use <uri - xlink:href=""></uri> - as a starting point. This example works standalone without - needing an external library. Note: Tis example produces - different (incompatible) hash values.</para> - - <para>Create a simple main() method to experiment with the - two class methods.</para> - </question> - - <answer> - <para>Starting from <uri - xlink:href=""></uri> - we create a slightly modified class - <classname>sda.jdbc.intro.auth.HashProvider</classname> - offering both hash providing <coref - linkend="hashProviderMethod"/> and verifying <coref - linkend="hashVerifyMethod"/> methods:</para> - - <programlisting language="none">package sda.jdbc.intro.auth; -... -public class HashProvider { -... - /** Computes a salted PBKDF2 hash of given plaintext password - suitable for storing in a database. */ - public static <emphasis role="bold">String getSaltedHash</emphasis> <co - xml:id="hashProviderMethod"/>(char [] password) { - byte[] salt; - try { - salt = SecureRandom.getInstance("SHA1PRNG").generateSeed(saltLen); - // store the salt with the password - return Base64.encodeBase64String(salt) + "$" + hash(password, salt); - } catch (NoSuchAlgorithmException e) { - e.printStackTrace(); - } - System.exit(1); - return null; - } - - /** Checks whether given plaintext password corresponds - to a stored salted hash of the password. */ - public static <emphasis role="bold">boolean check</emphasis> <co - xml:id="hashVerifyMethod"/>(char[] password, String stored){ - String[] saltAndPass = stored.split("\\$"); - if (saltAndPass.length != 2) - return false; - String hashOfInput = hash(password, Base64.decodeBase64(saltAndPass[0])); - return hashOfInput.equals(saltAndPass[1]); - } -...}</programlisting> - - <para>We may test the two class methods - <methodname>sda.jdbc.intro.auth.HashProvider.getSaltedHash(char[])</methodname>(...) - and - <methodname>sda.jdbc.intro.auth.HashProvider.check(char[],String)</methodname> - by a separate driver class. Notice the <quote>$</quote> sign - <coref linkend="saltPwhashSeparator"/> separating salt and - password hash:</para> - - <programlisting language="none">package sda.jdbc.intro.auth; - -public class TestHashProvider { - - public static void main(String [] args) throws Exception { - final char [] clearText = {'s', 'e', 'c'}; - final String hash = <emphasis role="bold">HashProvider.getSaltedHash(clearText)</emphasis>; - System.out.println("Hash:" + hash); - if (HashProvider.check(clearText, <co - xml:id="saltPwhashSeparator"/> - "<emphasis role="bold">HwX2DkuYiwp7xogm3AGndza8DKRVvCMntxRvCrCGFPw=</emphasis>$<emphasis - role="bold">6Ix11yHNB4uPZuF2IQYxVV/MYragJwTDE33OIFR9a24=</emphasis>")) { - System.out.println("hash matches"); - } else { - System.out.println("hash does not match"); ...</programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="guiAuthenticateTheRealMcCoy"> - <title>Gui authentication: The real McCoy</title> - - <qandaset defaultlabel="qanda" - xml:id="exerciseInsertLoginCredentials"> - <qandadiv> - <qandaentry> - <question> - <para>We now implement a refined version to enter - <code>Person</code> records based on the solutions of two - related exercises:</para> - - <glosslist> - <glossentry> - <glossterm><xref - linkend="exercisefilterUserInput"/></glossterm> - - <glossdef> - <para>Avoiding SQL injection by sanitizing user - input</para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm><xref - linkend="exerciseSqlInjectPrepare"/></glossterm> - - <glossdef> - <para>Avoiding SQL injection by using - <classname>java.sql.PreparedStatement</classname> - objects.</para> - </glossdef> - </glossentry> - </glosslist> - - <para>A better solution should combine both techniques. - Non-vulnerability a basic requirement. Checking an E-Mail - for minimal conformance is an added value.</para> - - <para>In order to address authentication the relation Person - has to be extended appropriately. The GUI needs two - additional fields for login name and password as well. The - following video demonstrates the intended behaviour:</para> - - <figure xml:id="videoConnectAuth"> - <title>Intended usage behaviour for insertion of data - records.</title> - - <mediaobject> - <videoobject> - <videodata fileref="Ref/Video/connectauth.mp4"/> - </videoobject> - </mediaobject> - </figure> - - <para>Don't forget to use password hashes like those from - <xref linkend="exerciseHashTraining"/>. Due to their length - you may want to consider the data type - <code>TEXT</code>.</para> - </question> - - <answer> - <para>In comparison to earlier versions it does make sense - to add some internal container structures. First we note, - that each GUI input field requires:</para> - - <itemizedlist> - <listitem> - <para>A label like <quote>Enter password</quote>.</para> - </listitem> - - <listitem> - <para>A corresponding field object to hold user entered - input.</para> - </listitem> - - <listitem> - <para>A validator checking for correctness of entered - data.</para> - </listitem> - - <listitem> - <para>A label or text field for warning messages in case - of invalid user input.</para> - </listitem> - </itemizedlist> - - <para>First we start by grouping label <coref - linkend="uiuLabel"/>, input field's verifier <coref - linkend="uiuVerifier"/> and the error message label <coref - linkend="uiuErrmsg"/> in - <classname>sda.jdbc.intro.auth.UserInputUnit</classname>:</para> - - <programlisting language="none">package sda.jdbc.intro.auth; -... -public class UserInputUnit { - - final JLabel label; <co xml:id="uiuLabel"/> - final InputVerifierNotify verifier; <co xml:id="uiuVerifier"/> - final JLabel errorMessage; <co xml:id="uiuErrmsg"/> - - public UserInputUnit(final String guiText, final InputVerifierNotify verifier) { - this.label = new JLabel(guiText); - this.verifier = verifier; - errorMessage = new JLabel(); - } ...</programlisting> - - <para>The actual GUI text field is being defined <coref - linkend="verfierGuiField"/> in class - <classname>sda.jdbc.intro.auth.InputVerifierNotify</classname>:</para> - - <programlisting language="none">package sda.jdbc.intro.auth; -... -public abstract class InputVerifierNotify extends InputVerifier { - - protected final String errorMessage; - public final JLabel validationLabel; - public final JTextField field; <co xml:id="verfierGuiField"/> - - public InputVerifierNotify(final JTextField field, final String errorMessage) { ...</programlisting> - - <para>We need two field verifier classes being derived from - <classname>sda.jdbc.intro.auth.InputVerifierNotify</classname>:</para> - - <glosslist> - <glossentry> - <glossterm><classname>sda.jdbc.intro.auth.RegexpVerifier</classname></glossterm> - - <glossdef> - <para>This one is well known from earlier versions and - is used to validate text input fields by regular - expressions.</para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm><classname>sda.jdbc.intro.auth.InputVerifierNotify</classname></glossterm> - - <glossdef> - <para>This verifier class is responsible for comparing - our two password fields to have identical - values.</para> - </glossdef> - </glossentry> - </glosslist> - - <para>All these components get assembled in - <classname>sda.jdbc.intro.auth.InsertPerson</classname>. We - remark some important points:</para> - - <programlisting language="none">package sda.jdbc.intro.auth; -... -public class InsertPerson extends JFrame { -... // GUI attributes for user input - final UserInputUnit name = <co linkends="listingInsertUserAuth-1" - xml:id="listingInsertUserAuth-1-co"/> - new UserInputUnit( - "Name", - new RegexpVerifier(new JTextField(15), "^[^;'\"]+$", "No special characters allowed")); - - // We need a reference to the password field to avoid - // casting from JTextField later. - private final JPasswordField passwordField = new JPasswordField(10); <co - linkends="listingInsertUserAuth-2" - xml:id="listingInsertUserAuth-2-co"/> - private final UserInputUnit password = - new UserInputUnit( - "Password", - new RegexpVerifier(passwordField, "^.{6,20}$", "length from 6 to 20 characters")); -... - private final UserInputUnit passwordRepeat = - new UserInputUnit( - "repeat pass.", - new EqualValueVerifier <co linkends="listingInsertUserAuth-3" - xml:id="listingInsertUserAuth-3-co"/> (new JPasswordField(10), passwordField, "Passwords do not match")); - - private final UserInputUnit [] userInputUnits = <co - linkends="listingInsertUserAuth-4" - xml:id="listingInsertUserAuth-4-co"/> - {name, email, login, password, passwordRepeat}; -... - private void userLoginDialog() {...} -... - public InsertPerson (){ -... - databaseFieldPanel.setLayout(new GridLayout(0, 3)); //Third column for validation label - add(databaseFieldPanel); - - for (UserInputUnit unit: userInputUnits) { <co - linkends="listingInsertUserAuth-5" - xml:id="listingInsertUserAuth-5-co"/> - databaseFieldPanel.add(unit.label); - databaseFieldPanel.add(unit.verifier.field); - databaseFieldPanel.add(unit.verifier.validationLabel); - } - insertButton.addActionListener(new ActionListener() { - @Override public void actionPerformed(ActionEvent e) { - if (inputValuesAllValid()) { - if (persistenceHandler.add( <co - linkends="listingInsertUserAuth-6" - xml:id="listingInsertUserAuth-6-co"/> - name.getText(), - email.getText(), - login.getText(), - passwordField.getPassword())) { - clearMask(); -...} - private void clearMask() { <co linkends="listingInsertUserAuth-7" - xml:id="listingInsertUserAuth-7-co"/> - for (UserInputUnit unit: userInputUnits) { - unit.verifier.field.setText(""); - unit.verifier.clear(); - } - } - private boolean inputValuesAllValid() {<co - linkends="listingInsertUserAuth-8" - xml:id="listingInsertUserAuth-8-co"/> - for (UserInputUnit unit: userInputUnits) { - if (!unit.verifier.verify(unit.verifier.field)){ - return false; - } - } - return true; - } -}</programlisting> - - <calloutlist> - <callout arearefs="listingInsertUserAuth-1-co" - xml:id="listingInsertUserAuth-1"> - <para>All GUI related stuff for entering a user's - name</para> - </callout> - - <callout arearefs="listingInsertUserAuth-2-co" - xml:id="listingInsertUserAuth-2"> - <para>Password fields need special treatment: - <code>getText()</code> is superseded by - <code>getPassword()</code>. In order to avoid casts from - <classname>javax.swing.JTextField</classname> to - <classname>javax.swing.JPasswordField</classname> we - simply keep an extra reference.</para> - </callout> - - <callout arearefs="listingInsertUserAuth-3-co" - xml:id="listingInsertUserAuth-3"> - <para>In order to check both password fields for - identical values we need a different validator - <classname>sda.jdbc.intro.auth.EqualValueVerifier</classname> - expecting both password fields in its - constructor.</para> - </callout> - - <callout arearefs="listingInsertUserAuth-4-co" - xml:id="listingInsertUserAuth-4"> - <para>All 5 user input elements get grouped by an array. - This allows for iterations like in <coref - linkend="listingInsertUserAuth-7-co"/> or <coref - linkend="listingInsertUserAuth-8-co"/>.</para> - </callout> - - <callout arearefs="listingInsertUserAuth-5-co" - xml:id="listingInsertUserAuth-5"> - <para>Adding all GUI elements to the base pane in a - loop.</para> - </callout> - - <callout arearefs="listingInsertUserAuth-6-co" - xml:id="listingInsertUserAuth-6"> - <para>Providing user entered values to the persistence - provider.</para> - </callout> - - <callout arearefs="listingInsertUserAuth-7-co" - xml:id="listingInsertUserAuth-7"> - <para>Whenever a dataset has been successfully sent to - the database we have to clean our GUI to possibly enter - another record.</para> - </callout> - - <callout arearefs="listingInsertUserAuth-8-co" - xml:id="listingInsertUserAuth-8"> - <para>Thanks to our grouping aggregation of individual - input GUI field validation states becomes easy.</para> - </callout> - </calloutlist> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="sectArchitectSecurityConsiderations"> - <title>Architectural security considerations</title> - - <qandaset defaultlabel="qanda" xml:id="quandaentry_ArchSecurity"> - <qandadiv> - <qandaentry> - <question> - <para>In <xref linkend="exerciseInsertLoginCredentials"/> we - achieved end user credential protection. How about the - overall application security? Provide improvement proposals - if appropriate. Hint: Consider the way credentials are being - supplied.</para> - </question> - - <answer> - <para>Connecting the client to our database server solely - depends on credentials <coref - linkend="databaseUserHdmPassword"/> being stored in a - properties file - <filename></filename>:</para> - - <programlisting language="none">PersistenceHandler.jdbcUrl=jdbc:mysql://localhost:3306/hdm -PersistenceHandler.username=hdmuser <co xml:id="databaseUserHdmUsername"/> -PersistenceHandler.password=<emphasis role="bold">XYZ</emphasis> <co - xml:id="databaseUserHdmPassword"/></programlisting> - - <para>This properties file is user accessible and contains - the password in clear text. Arbitrary applications - connecting to the database server using this account do have - all permissions being granted to <code>hdmuser</code> <coref - linkend="databaseUserHdmUsername"/>. In order for our - application to work correctly the set of granted permissions - contains at least inserting datasets. Thus new users e.g. - <code>smith</code> including credentials may be inserted. - Afterwards the original application can be started by - logging in as <code>smith</code>.</para> - - <para>Conclusion: The current application architecture is - seriously flawed with respect to security.</para> - - <para>Rather then using a common database account - <code>hdmuser</code> we may configure per-user accounts on - the database server having individual user credentials. This - way user credentials are no longer stored in our - <code>Person</code> table but are being managed by the - database server's user management and privilege facilities. - This completely avoids storing credentials on the client - side.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="sectRelationadatal2Xml"> - <title>Reversing <xref linkend="glo_XML"/> to Rdbms</title> - - <qandaset defaultlabel="qanda" xml:base="qandaRelationaldata2Xml" - xml:id="qandaRelationaldata2Xml"> - <qandadiv> - <qandaentry> - <question> - <para>Reverse exercise <xref - linkend="qandaXmldata2relational"/> to read Rdbms data via - <xref linkend="glo_JDBC"/> and export corresponding XML data - using Jdom.</para> - </question> - - <answer> - <annotation role="make"> - <para role="eclipse">P/rdbms2catalog</para> - </annotation> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="sda1SaxRdbms"> - <title>Generating HTML from XML and Rdbms data using SAX and <xref - linkend="glo_JDBC"/>.</title> - - <qandaset defaultlabel="qanda" xml:id="exercise_saxAttrib"> - <qandadiv> - <qandaentry xml:id="saxRdbms"> - <question> - <para>Implement the example given in <xref - linkend="saxRdbmsAccessPrinciple"/> to produce the output - sketched in <xref linkend="saxPriceOut"/>. You may start by - implementing <emphasis>and testing</emphasis> the following - methods of a RDBMS interfacing class using <xref - linkend="glo_JDBC"/>:</para> - - <programlisting language="none">package sax.rdbms; - -public class RdbmsAccess { - - public void connect(final String host, final int port, - final String userName, final String password) { - // <emphasis role="bold">open connection to a database</emphasis> - } - public String readPrice(final String articleNumber) { - return "0"; // <emphasis role="bold">To be implemented as access to a ResultSet object</emphasis> - } - public void close() { - // <emphasis role="bold">close database connection</emphasis> - } -}</programlisting> - - <para>You may find it helpful to write a small testbed for - the RDBMS access functionality prior to integrate it into - your <acronym - xlink:href="">SAX</acronym> - application producing HTML output.</para> - </question> - - <answer> - <para>We start by creating a suitable RDBMS Table:</para> - - <programlisting language="none">CREATE SCHEMA AUTHORIZATION midb2 -CREATE TABLE Product( - orderNo CHAR(10) NOT NULL PRIMARY KEY - ,price DECIMAL (9,2) NOT NULL -)</programlisting> - - <para>Next we feed some toy data:</para> - - <programlisting language="none">INSERT INTO Product VALUES('x-223', 330.20); -INSERT INTO Product VALUES('w-124', 110.40);</programlisting> - - <para>Now we implement our RDBMS access class:</para> - - <programlisting language="none">package dom.xsl; -... -public class DbAccess { - - public void connect(final String jdbcUrl, - final String userName, final String password) { - try { - conn = DriverManager.getConnection(jdbcUrl, userName, password); - priceQuery = conn.prepareStatement(sqlPriceQuery); - } catch (SQLException e) { - System.err.println("Unable to open connection to database:" + e);} - } - public String readPrice(final String articleNumber) { - String result; - try { - priceQuery.setString(1, articleNumber); - final ResultSet rs = priceQuery.executeQuery(); - if ( { - result = rs.getString("price"); - } else { - result = "No price available for article '" + articleNumber + "'"; - } - } catch (SQLException e) { - result = "Error reading price for article '" + articleNumber + "':" + e; - } - return result; - } - public void close() { - try {conn.close();} catch (SQLException e) { - System.err.println("Error closing database connection:" + e); - } - } - static { - try { Class.forName(""); - } catch (ClassNotFoundException e) { - System.err.println("Unable to register Driver:" + e);} - } - private static final String sqlPriceQuery = - "SELECT price FROM Product WHERE orderNo = ?"; - private PreparedStatement priceQuery = null; - private Connection conn = null; -}</programlisting> - - <para>This access layer may be tested independently from - handling catalog instances:</para> - - <programlisting language="none">package dom/xsl; - -public class DbAccessDriver { - - public static void main(String[] args) { - final DbAccess dbaccess = new DbAccess(); - dbaccess.connect("jdbc:db2://", - "midb2", "password"); - System.out.println(dbaccess.readPrice("x-223")); - System.out.println(dbaccess.readPrice("..aaargh!")); - dbaccess.close(); - } -}</programlisting> - - <para>If the above test succeeds we may embed the RDBMS - access layer into our The <acronym - xlink:href="">SAX</acronym> - handler:</para> - - <programlisting language="none">package sax.rdbms; -... -public class HtmlEventHandler extends DefaultHandler{ - public void startDocument() { - dbaccess.connect("jdbc:db2://", - "midb2", "password"); - System.out.println("<html><head><title>Catalog</title></head>"); - } - public void endDocument() { - System.out.println("</html>"); - dbaccess.close(); - } - public void startElement(String namespaceUri, String localName, - String rawName, Attributes attrs){ - if (rawName.equals("catalog")){ - System.out.println("<body><H1>A catalog</H1>" - +"<table border='1'><tbody>"); - System.out.println("<tr><th>Order number</th>\n" - + "<th>Price</th>\n" - +" <th>Product</th></tr>"); - } else if (rawName.equals("item")){ - final String orderNo = attrs.getValue("orderNo"); - System.out.print("<tr><td>" + orderNo - + "</td>\n<td>" + dbaccess.readPrice(orderNo) - + "</td>\n<td>"); - } else { - System.err.println("Element '" + rawName + "' unknown"); - } - } - public void endElement(String namespaceUri, String localName, - String rawName) { - if (rawName.equals("catalog")){ - System.out.println("</tbody></table>"); - } else if (rawName.equals("item")){ - System.out.println("</td></tr>\n"); - } - } - public void characters(char[] ch, int start, int length) { - System.out.print(new String(ch, start, length)); - } - private DbAccess dbaccess = new DbAccess(); -}</programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - </section> - </section> - </chapter> - - <chapter xml:id="chapUnitTesting"> - <title>Unit testing with <productname - xlink:href="">TestNG</productname></title> - - <para>This chapter presents a very short introduction to the basic usage - of unit testing. We start with a simple stack implementation:</para> - - <programlisting language="none">package sda.unittesting; - -public class MyStack { - int [] data = new int[5]; - int numElements = 0; - - public void push(final int n) { - data[numElements] = n; - numElements++; - } - public int pop() { - numElements--; - return data[numElements]; - } - public int top() { - return data[numElements - 1]; - } - public boolean empty() { - return 0 == numElements; - } -}</programlisting> - - <para>Readers being familiar with stacks will immediately notice a - deficiency in the above code: This stack is actually bounded. It only - allows us to store a maximum number of five integer values.</para> - - <para>The following implementation allows us to functionally test our - <classname>sda.unittesting.MyStack</classname> implementation with respect - to the usual stack behaviour:</para> - - <programlisting language="none" linenumbering="numbered">package sda.unittesting; - -public class MyStackFuncTest { - - private static void assertTrue(boolean status) { - if (!status) { - throw new RuntimeException("Assert failed"); - } - } - public static void main(String[] args) { - final MyStack stack = new MyStack(); - // Test 1: A new MyStack instance should not contain any elements. - assertTrue(stack.empty()); - - // Test 2: Adding and removal - stack.push(4); - assertTrue (!stack.empty()); - assertTrue (4 ==; - assertTrue (4 == stack.pop()); - assertTrue (stack.empty()); - - // Test 3: Trying to add more than five values - stack.push(1);stack.push(2);stack.push(3);stack.push(4); - stack.push(5); - stack.push(6); - assertTrue(6 == stack.pop()); - } -}</programlisting> - - <para>Execution yields a runtime exception which is due to the attempted - insert operation <code>stack.push(6)</code>:</para> - - <programlisting language="none">Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 5 - at sda.unittesting.MyStack.push( - at sda.unittesting.MyStackFuncTest.main(</programlisting> - - <para>The execution result is easy to understand since our - <classname>sda.unittesting.MyStack </classname> implementation only allows - to store 5 values.</para> - - <para>Our testing application is fine so far. It does however lack some - features:</para> - - <itemizedlist> - <listitem> - <para>automatic initialization before starting tests and finalization - at the end.</para> - </listitem> - - <listitem> - <para>Our test is monolithic: We used comments to document different - tests. This knowledge is implicit and thus invisible to testing - frameworks. Test results (failure/success) cannot be assigned to test - 1, test 2 for example.</para> - </listitem> - - <listitem> - <para>Aggregation and visualization of test results</para> - </listitem> - - <listitem> - <para>Dependencies between individual tests</para> - </listitem> - - <listitem> - <para>Ability to enable and disable tests according to a project's - maturity level. In our example test 3 might be disabled till an - unbounded implementation gets completed.</para> - </listitem> - </itemizedlist> - - <para>Testing frameworks like <productname - xlink:href="">Junit</productname> or <productname - xlink:href="">TestNG</productname> provide means for - efficient and flexible test organization. Using <productname - xlink:href="">TestNG</productname> our current test - application including only test 1 and test 2 reads:</para> - - <programlisting language="none">package sda.unittesting; - -import org.testng.annotations.Test; - -public class MyStackTestSimple { - - final MyStack stack = new MyStack(); - - @Test - public void empty() { - assert(stack.empty()); - } - @Test - public void pushPopEmpty() { - assert (stack.empty()); - stack.push(4); - assert (!stack.empty()); - assert (4 ==; - assert (4 == stack.pop()); - assert (stack.empty()); - } -}</programlisting> - - <para>We notice the absence of a <function>main()</function> method. Our - testing framework uses the above code for test definitions. In contrast to - our homebrew solution the individual tests are now defined in a machine - readable fashion. This allows for sophisticated statistics. Executing - inside <productname xlink:href="">TestNG</productname> - produces the following results:</para> - - <programlisting language="none">PASSED: empty -PASSED: pushPopEmpty - -=============================================== - Default test - Tests run: 2, Failures: 0, Skips: 0 -=============================================== - - -=============================================== -Default suite -Total tests run: 2, Failures: 0, Skips: 0 -===============================================</programlisting> - - <para>Both tests run successfully. So why did we omit test 3 which is - bound to fail? We now add it to the test suite:</para> - - <programlisting language="none">package sda.unittesting; -... -public class MyStackTestSimple1 { -... - @Test - public void empty() { - assert(stack.empty()); -... - - @Test - public void push6() { - stack.push(1); - stack.push(2); - stack.push(3); - stack.push(4); - stack.push(5); - stack.push(6); - assert (6 == stack.pop()); - } ...</programlisting> - - <para>As expected test 3 fails. But the result shows test 2 failing as - well:</para> - - <programlisting language="none">PASSED: empty -FAILED: push6 -java.lang.ArrayIndexOutOfBoundsException: 5 - at sda.unittesting.MyStack.push( - at sda.unittesting.MyStackTestSimple1.push6( - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) - ... - -FAILED: pushPopEmpty -java.lang.AssertionError - at sda.unittesting.MyStackTestSimple1.pushPopEmpty( - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) - ... - -=============================================== - Default test - Tests run: 3, Failures: 2, Skips: 0 -===============================================</programlisting> - - <para>This unexpected result is due to the execution order of the three - individual tests. Within our class - <classname>sda.unittesting.MyStackTestSimple1</classname> the three tests - appear in the sequence test 1, test 2 and test 3. This however is just the - order of source code. The testing framework will not infer any order and - thus execute our three tests in <emphasis role="bold">arbitrary</emphasis> - order. The execution log shows the actual order:</para> - - <orderedlist> - <listitem> - <para>Test <quote><code>empty</code></quote></para> - </listitem> - - <listitem> - <para>Test <quote><code>push6</code></quote></para> - </listitem> - - <listitem> - <para>Test <quote><code>pushPopEmpty</code></quote></para> - </listitem> - </orderedlist> - - <para>So the second test will raise an exception and leave the stack - filled with the maximum possible five elements. Thus it is not empty and - the <quote><code>pushPopEmpty</code></quote> test fails as well.</para> - - <para>If we want to avoid this type of errors we may:</para> - - <itemizedlist> - <listitem> - <para>Declare tests within separate (test class) definitions</para> - </listitem> - - <listitem> - <para>Define dependencies like test X can only be executed after test - Y.</para> - </listitem> - </itemizedlist> - - <para>The <productname xlink:href="">TestNG</productname> - framework offers a feature which allows the definition of test groups and - dependencies between them. We use this feature to refine our test - definition:</para> - - <programlisting language="none">package sda.unittesting; -... -public class MyStackTest { - ... - @Test (<emphasis role="bold">groups = "basic"</emphasis>) - public void empty() { - assert(stack.empty()); - } - @Test (<emphasis role="bold">groups = "basic"</emphasis>) - public void pushPopEmpty() { - ... - } - - @Test (<emphasis role="bold">dependsOnGroups = "basic"</emphasis>) - public void push6() { - ... - }</programlisting> - - <para>The first two tests will now belong to the same test group - <quote>basic</quote>. The <emphasis role="bold"><code>dependsOnGroups = - "basic"</code></emphasis> declaration will guarantee that our - <code>push6</code> test will be launched as the last one. So we get the - expected result:</para> - - <programlisting language="none">PASSED: empty -PASSED: pushPopEmpty -FAILED: push6 -java.lang.ArrayIndexOutOfBoundsException: 5 - at sda.unittesting.MyStack.push( - at sda.unittesting.MyStackTest.push6( - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) -... - - -=============================================== - Default test - Tests run: 3, Failures: 1, Skips: 0 -===============================================</programlisting> - - <para>In fact the order between the first two tests might be critical as - well. The <quote><code>pushPopEmpty</code></quote> test leaves our stack - in an empty state. If this is not the case reversing the execution order - of <quote><code>pushPopEmpty</code></quote> and - <quote><code>empty</code></quote> would cause an error as well.</para> - - <para>Programming <abbrev - xlink:href="">IDE</abbrev>s - like eclipse provide elements for test result visualization. Our last test - gets summarized as:</para> - - <screenshot> - <info> - <title><productname - xlink:href="">TestNG</productname> result - presentation in eclipse</title> - </info> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/eclipseTestngResult.screen.png" - scale="75"/> - </imageobject> - </mediaobject> - </screenshot> - - <para>We can drill down from a result of type failure to its occurrence - within the corresponding code.</para> - </chapter> - - <chapter xml:id="fo"> - <title>Generating printed output</title> - - <titleabbrev>Print</titleabbrev> - - <section xml:id="foIntro"> - <title>Online and print versions</title> - - <titleabbrev>online / print</titleabbrev> - - <para>We already learned how to transform XML documents into HTML by - means of a <abbrev xlink:href="">XSL</abbrev> - style sheet processor. In principle we may create printed output by - using a HTML Browser's print function. However the result will not meet - reasonable typographical standards. A list of commonly required features - for printed output includes:</para> - - <variablelist> - <varlistentry> - <term>Line breaks</term> - - <listitem> - <para>Text paragraphs have to be divided into lines. To achieve - best results the processor must implement the hyphenation rules of - the language in question in order to automatically hyphenate long - words. This is especially important for text columns of limited - width as appearing in newspapers.</para> - </listitem> - </varlistentry> - - <varlistentry> - <term>Page breaks</term> - - <listitem> - <para>Since printed pages are limited in height the content has to - be broken into pages. This may be difficult to achieve:</para> - - <itemizedlist> - <listitem> - <para>Large images being indivisible may have to be deferred - to the following page leaving large amounts of empty - space.</para> - </listitem> - - <listitem> - <para>Long tables may have to be subdivided into smaller - blocks. Thus it may be required to define sets of additional - footers like <quote>to be continued on the next page</quote> - and additional table headers containing column descriptions on - subsequent pages.</para> - </listitem> - </itemizedlist> - </listitem> - </varlistentry> - - <varlistentry> - <term>Page references</term> - - <listitem> - <para>Document internal references via <link - xlink:href="">ID</link> / <link - xlink:href="">IDREF</link> pairs may - be represented as page references like <quote>see page - 32</quote>.</para> - </listitem> - </varlistentry> - - <varlistentry> - <term>Left and right pages</term> - - <listitem> - <para>Books usually have a different layout for - <quote>left</quote> and <quote>right</quote> pages. Page numbers - usually appear on the left side of a <quote>left</quote> page and - vice versa.</para> - - <para>Very often the head of each page contains additional - information e.g. a chapter's name on each <quote>left</quote> page - head and the actual section's name on each <quote>right</quote> - page's head.</para> - - <para>In addition chapters usually start on a <quote>right</quote> - page. Sometimes a chapter's starting page has special layout - features e.g. a missing description in the page's head which will - only be given on subsequent pages.</para> - </listitem> - </varlistentry> - - <varlistentry> - <term>Footnotes</term> - - <listitem> - <para>Footnotes have to be numbered on a per page basis and have - to appear on the current page.</para> - </listitem> - </varlistentry> - </variablelist> - </section> - - <section xml:id="foStart"> - <title>A simple <abbrev - xlink:href="">FO</abbrev> - document</title> - - <titleabbrev>Simple <abbrev - xlink:href="">FO</abbrev></titleabbrev> - - <para>A renderer for printed output from XML content also needs - instructions how to format the different elements. A common way to - define these formatting properties is by using <emphasis>Formatting - Objects</emphasis> (<abbrev - xlink:href="">FO</abbrev>) - standard. <abbrev - xlink:href="">FO</abbrev> - documents may be compared to HTML. A HTML document has to be rendered by - a piece of software called a browser in order to be viewed as an image. - Likewise <abbrev - xlink:href="">FO</abbrev> - documents have to be rendered by a piece of software called a formatting - objects processor which typically yields PostScript or PDF output. As a - starting point we take a simple example:</para> - - <figure xml:id="foHelloWorld"> - <title>The most simple <abbrev - xlink:href="">FO</abbrev> - document</title> - - <programlisting language="none"><?xml version="1.0" encoding="utf-8"?> -<fo:root xmlns:fo=""> - - <fo:layout-master-set> - <!-- Define a simple page layout --> - <fo:simple-page-master master-name="simplePageLayout" - page-width="60mm" page-height="100mm"> - <fo:region-body/> - </fo:simple-page-master> - </fo:layout-master-set> - <!-- Print a set of pages using the previously defined layout --> - <fo:page-sequence master-reference="simplePageLayout"> - <fo:flow flow-name="xsl-region-body"> - <emphasis role="bold"><fo:block>Hello, World ...</fo:block></emphasis> - </fo:flow> - </fo:page-sequence> -</fo:root></programlisting> - </figure> - - <para>PDF generation is initiated by executing a <abbrev - xlink:href="">FO</abbrev> - processor. At the MI department the script <code>fo2pdf</code> invokes - <orgname>RenderX</orgname>'s <productname - xlink:href="">xep</productname> processor:</para> - - <programlisting language="none">fo2pdf -fo -pdf hello.pdf</programlisting> - - <para>This creates a PDF file which may be printed or previewed by e.g. - <productname xlink:href="">Adobe</productname>'s - acrobat reader or evince under Linux. For a list of command line options - see <productname - xlink:href="">xep's - documentation</productname>.</para> - </section> - - <section xml:id="layoutParam"> - <title>Page layout</title> - - <para>The result from of our <quote>Hello, World ...</quote> code is not - very impressive. In order to develop more elaborated examples we have to - understand the underlying layout model being defined in a <link - xlink:href="">fo:simple-page-master</link> - element. First of all <abbrev - xlink:href="">FO</abbrev> - allows to subdivide a physical page into different regions:</para> - - <figure xml:id="foRegionList"> - <title>Regions being defined in a page.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/regions.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>The most important area in this model is denoted by <link - xlink:href="">fo:region-body</link>. - Other regions like <link - xlink:href="">fo:region-before</link> - are typically used as containers for meta information such as chapter - headings and page numbering. We take a closer look to the <link - xlink:href="">fo:region-body</link> - area and supply an example of parameterization:</para> - - <figure xml:id="foParamRegBody"> - <title>A complete <abbrev - xlink:href="">FO</abbrev> - parameterizing of a physical page and the <link - xlink:href="">fo:region-body</link>.</title> - - <programlisting language="none"><?xml version="1.0" encoding="utf-8"?> -<fo:root xmlns:fo="" - font-size="6pt"> - - <fo:layout-master-set> <co xml:id="programlisting_fobodyreg_masterset"/> - <fo:simple-page-master master-name="<emphasis role="bold">simplePageLayout</emphasis>" <co - xml:id="programlisting_fobodyreg_simplepagelayout"/> - page-width = "50mm" page-height = "80mm" - margin-top = "5mm" margin-bottom = "20mm" - margin-left = "5mm" margin-right = "10mm"> - - <fo:region-body <co xml:id="programlisting_fobodyreg_regionbody"/> - margin-top = "10mm" margin-bottom = "5mm" - margin-left = "10mm" margin-right = "5mm"/> - </fo:simple-page-master> - </fo:layout-master-set> - - <fo:page-sequence master-reference="<emphasis role="bold">simplePageLayout</emphasis>"> <co - xml:id="programlisting_fobodyreg_pagesequence"/> - <fo:flow flow-name="xsl-region-body"> <co - xml:id="programlisting_fobodyreg_flow"/> - <fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <co - xml:id="programlisting_fobodyreg_block"/> - <fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <coref - linkend="programlisting_fobodyreg_block"/> - <fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <coref - linkend="programlisting_fobodyreg_block"/> - <fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <coref - linkend="programlisting_fobodyreg_block"/> - </fo:flow> - </fo:page-sequence> -</fo:root></programlisting> - </figure> - - <calloutlist> - <callout arearefs="programlisting_fobodyreg_masterset"> - <para>As the name suggests multiple layout definitions can appear - here. In this example only one layout is defined.</para> - </callout> - - <callout arearefs="programlisting_fobodyreg_simplepagelayout"> - <para>Each layout definition carries a key attribute master-name - being unique with respect to all defined layouts appearing in - <emphasis>the</emphasis> <tag - class="starttag">fo:layout-master-set</tag>. We may thus call it a - <emphasis>primary key</emphasis> attribute. The current layout - definition's key has the value <code>simplePageLayout</code>. The - length specifications appearing here are visualized in <xref - linkend="paramRegBodyVisul"/> and correspond to the white - rectangle.</para> - </callout> - - <callout arearefs="programlisting_fobodyreg_regionbody"> - <para>Each layout definition <emphasis>must</emphasis> have a region - body being the region in which the documents main text flow will - appear. A layout definition <emphasis>may</emphasis> also define - top, bottom and side regions as we will see <link - linkend="paramHeadFoot">later</link>. The body region is shown with - pink background in <xref linkend="paramRegBodyVisul"/>.</para> - </callout> - - <callout arearefs="programlisting_fobodyreg_pagesequence"> - <para>A <abbrev - xlink:href="">FO</abbrev> - document may have multiple page sequences for example one per each - chapter of a book. It <emphasis>must</emphasis> reference an - <emphasis>existing</emphasis> layout definition via its - <code>master-reference</code> attribute. So we may regard this - attribute as a foreign key targeting the set of all defined layout - definitions.</para> - </callout> - - <callout arearefs="programlisting_fobodyreg_flow"> - <para>A flow allows us to define in which region output shall - appear. In the current example only one layout containing one region - of type body definition being able to receive text output - exists.</para> - </callout> - - <callout arearefs="programlisting_fobodyreg_block"> - <para>A <tag class="starttag">fo:block</tag> element may be compared - to a paragraph element <tag class="starttag">p</tag> in HTML. The - attribute <link - xlink:href="">space-after</link>="2mm" - adds a space of two mm after each <link - xlink:href="">fo:block</link> - container.</para> - </callout> - </calloutlist> - - <para>The result looks like:</para> - - <figure xml:id="paramRegBodyVisul"> - <title>Parameterizing page- and region view port. All length - dimensions are in mm.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/overlay.fig"/> - </imageobject> - </mediaobject> - </figure> - </section> - - <section xml:id="headFoot"> - <title>Headers and footers</title> - - <titleabbrev>Header/footer</titleabbrev> - - <para>Referring to <xref linkend="foRegionList"/> we now want to add - fixed headers and footers frequently being used for page numbers. In a - textbook each page might have the actual chapter's name in its header. - This name should not change as long as the text below <link - xlink:href="">fo:region-body</link> - still belongs to the same chapter. In <abbrev - xlink:href="">FO</abbrev> - this is achieved by:</para> - - <itemizedlist> - <listitem> - <para>Encapsulating each chapter's content in a <link - xlink:href="">fo:page-sequence</link> - of its own.</para> - </listitem> - - <listitem> - <para>Defining the desired header text below <link - xlink:href="">fo:static-content</link> - in the area defined by <link - xlink:href="">fo:region-before</link>.</para> - </listitem> - </itemizedlist> - - <para>The notion <link - xlink:href="">fo:static-content</link> - refers to the fact that the content is constant (static) within the - given page sequence. The new version reads:</para> - - <figure xml:id="paramHeadFoot"> - <title>Parameterizing header and footer.</title> - - <programlisting language="none"><?xml version="1.0" encoding="utf-8"?> -<fo:root xmlns:fo="" - font-size="6pt"> - - <fo:layout-master-set> - <fo:simple-page-master master-name="simplePageLayout" - page-width = "50mm" page-height = "80mm" - margin-top = "5mm" margin-bottom = "20mm" - margin-left = "5mm" margin-right = "10mm"> - - <fo:region-body margin-top = "10mm" margin-bottom = "5mm" <co - xml:id="programlisting_head_foot_bodydef"/> - margin-left = "10mm" margin-right = "5mm"/> - - <fo:region-before extent="5mm"/> <co - xml:id="programlisting_head_foot_beforedef"/> - <fo:region-after extent="5mm"/> <co - xml:id="programlisting_head_foot_afterdef"/> - - </fo:simple-page-master> - </fo:layout-master-set> - - <fo:page-sequence master-reference="simplePageLayout"> - - <fo:static-content flow-name="xsl-region-before"> <co - xml:id="programlisting_head_foot_beforeflow"/> - <fo:block - font-weight="bold" - font-size="8pt">Headertext</fo:block> - </fo:static-content> - - <fo:static-content flow-name="xsl-region-after"> <co - xml:id="programlisting_head_foot_afterflow"/> - <fo:block> - <fo:page-number/> - </fo:block> - </fo:static-content> - - <fo:flow flow-name="xsl-region-body"> - <fo:block space-after="8mm">Dumb text .. dumb text.</fo:block> - <fo:block space-after="8mm">Dumb text .. dumb text.</fo:block> - <fo:block space-after="8mm">More text .. more text.</fo:block> - <fo:block space-after="8mm">More text .. more text.</fo:block> - <fo:block space-after="8mm">More text .. more text.</fo:block> - </fo:flow> - </fo:page-sequence> -</fo:root></programlisting> - </figure> - - <calloutlist> - <callout arearefs="programlisting_head_foot_bodydef"> - <para>Defining the body region.</para> - </callout> - - <callout arearefs="programlisting_head_foot_beforedef programlisting_head_foot_afterdef"> - <para>Defining two regions at the top and bottom of each page. The - <code>extent</code> attribute denotes the height of these regions. - <emphasis>Caveat</emphasis>: The attribute <code>extent</code>'s - value gets subtracted from the <code>margin-top</code> or - <code>margin-bottom</code> value being defined in the corresponding - <tag class="starttag">fo:region-body</tag> element. So if we - consider for example the <tag>fo:region-before</tag> we have to - obey:</para> - - <para>extent <= margin-top</para> - - <para>Otherwise we may not even see any output.</para> - </callout> - - <callout arearefs="programlisting_head_foot_beforeflow"> - <para>A <code>fo:static-content</code> denotes text portions which - are decoupled from the <quote>usual</quote> text flow. For example - as a book's chapter advances over multiple pages we expect the - constant chapter's title to appear on top of each page. In the - current example the static string <code>Headertext</code> will - appear on each page's top for the whole <tag - class="starttag">fo:page-sequence</tag> in which it is defined. - Notice the <code>flow-name="xsl-region-after"</code> reference to - the region being defined in <coref - linkend="programlisting_head_foot_beforedef"/>.</para> - </callout> - - <callout arearefs="programlisting_head_foot_afterflow"> - <para>We do the same here for the page's footer. Instead of static - text we output <tag>fo_page-number</tag> yielding the current page's - number.</para> - - <para>This time <code>flow-name="xsl-region-after"</code> references - the region definition in <coref - linkend="programlisting_head_foot_afterdef"/>. Actually the - attribute <code>flow-name</code> is restricted to the following five - values corresponding to all possible region definitions within a - layout:</para> - - <informaltable> - <?dbhtml table-width="50%" ?> - - <?dbfo table-width="50%" ?> - - <tgroup cols="2"> - <colspec align="left" colwidth="1*"/> - - <colspec align="left" colwidth="1*"/> - - <tbody> - <row> - <entry><tag class="starttag">fo:region-body</tag></entry> - - <entry>xsl-region-body</entry> - </row> - - <row> - <entry><tag class="starttag">fo:region-before</tag></entry> - - <entry>xsl-region-before</entry> - </row> - - <row> - <entry><tag class="starttag">fo:region-after</tag></entry> - - <entry>xsl-region-after</entry> - </row> - - <row> - <entry><tag class="starttag">fo:region-start</tag></entry> - - <entry>xsl-region-start</entry> - </row> - - <row> - <entry><tag class="starttag">fo:region-end</tag></entry> - - <entry>xsl-region-end</entry> - </row> - </tbody> - </tgroup> - </informaltable> - </callout> - </calloutlist> - - <para>This results in two pages with page numbers 1 and 2:</para> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/headfoot.fig"/> - </imageobject> - </mediaobject> - - <para>The free chapter from <xref linkend="bib_Harold04"/> book contains - additional information on extended <link - xlink:href="">layout - definitions</link>. The <orgname - xlink:href="">W3C</orgname> as the holder of the FO - standard defines the elements <link - xlink:href="">fo:layout-master-set</link>, - <link - xlink:href="">fo:simple-page-master</link> - and <link - xlink:href="">fo:page-sequence</link></para> - </section> - - <section xml:id="foContainer"> - <title>Important Objects</title> - - <section xml:id="fo_block"> - <title><code>fo:block</code></title> - - <para>The FO standard borrows a lot from the CSS standard. Most - formatting objects may have <link - xlink:href="">CSS - like properties</link> with similar semantics, some properties have - been added. We take a <link - xlink:href="">fo:block</link> - container as an example:</para> - - <figure xml:id="blockInline"> - <title>A <link - xlink:href="">fo:block</link> with - a <link - xlink:href="">fo:inline</link> - descendant.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/"/> - </imageobject> - </mediaobject> - - <programlisting language="none">... -<fo:block font-weight='bold' - border-bottom-style='dashed' - border-style='solid' - border='1mm'>A lot of attributes and <fo:inline background-color='black' - color='white'>inverted</fo:inline> text.</fo:block> ...</programlisting> - </figure> - - <para>The <link - xlink:href="">fo:inline</link> - descendant serves as a means to change the <quote>current</quote> - property set. In HTML/CSS this may be achieved by using the - <code>SPAN</code> tag:</para> - - <programlisting language="none"><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> -<html> - <head> - <title>Blocks/spans and CSS</title> - </head> - <body> - <h1>Blocks/spans and CSS</h1> - <p style="font-weight: bold; border: 1mm; - border-style: solid; border-bottom-style: dashed;" - >A lot of attributes and - <span style="color: white;background-color: black;" - >inverted</span> text.</p> - </body> -</html></programlisting> - - <para>Though being encapsulated in an attribute <code>class</code> we - find a one-to-one correspondence between FO and CSS in this case. The - HTML rendering works as expected.<mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/mozparaspancss.screen.png"/> - </imageobject> - </mediaobject>:</para> - </section> - - <section xml:id="fo_list"> - <title>Lists</title> - - <para>The easiest type of lists are unlabeled (itemized) lists as - being expressed by the <code>UL</code>/<code>LI</code> tags in HTML. - FO allows a much more detailed parametrization regarding indents and - distances between labels and item content. Relevant elements are <link - xlink:href="">fo:list-block</link>, - <link - xlink:href="">fo:list-item</link> - and <link - xlink:href="">fo:list-item-body</link>. - The drawback is a more complex setup for <quote>default</quote> - lists:</para> - - <figure xml:id="listItemize"> - <title>An itemized list and result.</title> - - <programlisting language="none">... -<fo:list-block - provisional-distance-between-starts="2mm"> - <fo:list-item> - <fo:list-item-label end-indent="label-end()"> - <fo:block>&#8226;</fo:block> - </fo:list-item-label> - <fo:list-item-body start-indent="body-start()"> - <fo:block>Flowers</fo:block> - </fo:list-item-body> - </fo:list-item> - - <fo:list-item> - <fo:list-item-label end-indent="label-end()"> - <fo:block>&#8226;</fo:block> - </fo:list-item-label> - <fo:list-item-body start-indent="body-start()"> - <fo:block>Animals</fo:block> - </fo:list-item-body> - </fo:list-item> -</fo:list-block> ...</programlisting> - - <mediaobject> - <imageobject> - <imagedata align="left" fileref="Ref/Fig/"/> - </imageobject> - </mediaobject> - </figure> - - <para>The result looks somewhat primitive in relation to the amount of - source code it necessitates. The power of these constructs shows up - when trying to format nested lists of possibly different types like - enumerations or definition lists under the requirement of - typographical excellence. More complex examples are presented in <link - xlink:href="">Xmlbible - book</link> of <xref linkend="bib_Harold04"/>.</para> - </section> - - <section xml:id="leaderRule"> - <title>Leaders and rules</title> - - <titleabbrev>Leaders/rules</titleabbrev> - - <para>Sometimes adjustable horizontal space between two neighbouring - objects has to be filled e.g. in a book's table of contents. The <link - xlink:href="">fo:leader</link> - serves this purpose:</para> - - <figure xml:id="leaderToc"> - <title>Two simulated entries in a table of contents.</title> - - <programlisting language="none">... -<fo:block text-align-last='justify'>Valid - XML<fo:leader leader-pattern="dots"/> -page 7</fo:block> - -<fo:block text-align-last='justify'>XSL -<fo:leader leader-pattern='dots'/> -page 42</fo:block> ...</programlisting> - - <mediaobject> - <imageobject> - <imagedata align="left" fileref="Ref/Fig/"/> - </imageobject> - </mediaobject> - </figure> - - <para>The attributes' value <link - xlink:href="">text-align-last</link> - = <code>'justify'</code> forces the <link - xlink:href="">fo:block</link> to - extend to the available width of the current <link - xlink:href="">fo:region-body</link> - area. The <link - xlink:href="">fo:leader</link> - inserts the necessary amount of content of the specified type defined - in in <link - xlink:href="">leader-pattern</link> - to fill up the gap between its neighbouring components. This principle - can be extended to multiple objects:</para> - - <figure xml:id="leaderMulti"> - <title>Four entries separated by equal amounts of dotted - space.</title> - - <programlisting language="none"><fo:block text-align-last='justify'>A<fo:leader -leader-pattern="dots"/>B<fo:leader -leader-pattern="dots"/>C<fo:leader leader-pattern="dots"/>D</fo:block></programlisting> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/"/> - </imageobject> - </mediaobject> - </figure> - - <para>A <link - xlink:href="">fo:leader</link> may - also be used to draw horizontal lines to separate objects. In this - case there are no neighbouring components within the - <quote>current</quote> line in which the <link - xlink:href="">fo:leader</link> - appears. This is frequently used to draw a border between - <code>xsl-region-body</code> and <code>xsl-region-before</code> and/or - <code>xsl-region-after</code>:</para> - - <figure xml:id="leaderSeparate"> - <title>A horizontal line separator between header and body of a - page.</title> - - <programlisting language="none">... -<fo:page-sequence master-reference="simplePageLayout"> - <fo:static-content flow-name="xsl-region-before"> - <fo:block text-align-last='justify'>FO<fo:leader/>page 5</fo:block> - <fo:block text-align-last='justify'> - <fo:leader leader-pattern="rule" leader-length="100%"/> - </fo:block> - </fo:static-content> - <fo:flow flow-name="xsl-region-body"> - <fo:block>Some body text ...</fo:block> - </fo:flow> -</fo:page-sequence>...</programlisting> - - <mediaobject> - <imageobject> - <imagedata align="left" fileref="Ref/Fig/"/> - </imageobject> - </mediaobject> - </figure> - - <para>Note the empty leader <code><</code> <link - xlink:href="">fo:leader</link> - <code>/></code> between the <quote> <code>FO</code> </quote> and - the <quote>page 5</quote> text node inserting horizontal whitespace to - get the page number centered to the header's right edge. This is in - accordance with the <link - xlink:href="">leader-pattern</link> - attributes default value <code>space</code>.</para> - </section> - - <section xml:id="pageNumbering"> - <title>Page numbers</title> - - <para>We already saw an example of page numbering via <link - xlink:href="">fo:page-number</link> - in <xref linkend="paramHeadFoot"/>. Sometimes a different style for - page numbering is desired. The default page numbering style may be - changed by means of the <link - xlink:href="">fo:page-sequence</link> - element's attribute <link - xlink:href="">format</link>. For a - closer explanation the <link - xlink:href="">W3X - XSLT standards documentation</link> may be consulted:</para> - - <figure xml:id="pageNumberingRoman"> - <title>Roman style page numbers.</title> - - <programlisting language="none">... -<fo:page-sequence format="i" - master-reference="simplePageLayout"> - <fo:static-content - flow-name="xsl-region-after"> - <fo:block text-align-last='justify'> - <fo:leader leader-pattern="rule" - leader-length="100%"/> - </fo:block> - <fo:block font-weight="bold"> - <fo:page-number/> - </fo:block> - </fo:static-content> - - <fo:flow flow-name="xsl-region-body"> - <fo:block>Some text...</fo:block> - <fo:block>More text, more text, - more text.</fo:block> - <fo:block>More text, more text, - more text.</fo:block> - <fo:block>Enough text.</fo:block> - </fo:flow> -</fo:page-sequence> ...</programlisting> - - <mediaobject> - <imageobject> - <imagedata align="left" fileref="Ref/Fig/pageStack.fig"/> - </imageobject> - </mediaobject> - </figure> - </section> - - <section xml:id="foMarker"> - <title>Marker</title> - - <figure xml:id="dictionary"> - <title>A dictionary with running page headers.</title> - - <programlisting language="none">... -<fo:page-sequence - master-reference="simplePageLayout"> - <fo:static-content flow-name="xsl-region-before"> - <fo:block font-weight="bold"> - <fo:retrieve-marker retrieve-class-name="alpha" - retrieve-position="first-starting-within-page" - />-<fo:retrieve-marker - retrieve-position="last-starting-within-page" - retrieve-class-name="alpha"/> - </fo:block> - <fo:block text-align-last='justify'> - <fo:leader leader-pattern="rule" leader-length="100%"/></fo:block> - </fo:static-content> - - <fo:flow flow-name="xsl-region-body"> - <fo:block> - <fo:marker marker-class-name="alpha">A - </fo:marker>Ant</fo:block> - <fo:block> - <fo:marker marker-class-name="alpha">B - </fo:marker>Bug</fo:block> - <fo:block> - <fo:marker marker-class-name="alpha">L - </fo:marker>Lion</fo:block> - <fo:block> - <fo:marker marker-class-name="alpha">N - </fo:marker>Nose</fo:block> - <fo:block> - <fo:marker marker-class-name="alpha">P - </fo:marker>Peg</fo:block> - </fo:flow> -</fo:page-sequence> ...</programlisting> - - <mediaobject> - <imageobject> - <imagedata align="left" fileref="Ref/Fig/dictionaryStack.fig"/> - </imageobject> - </mediaobject> - </figure> - </section> - - <section xml:id="foIntRef"> - <title>Internal references</title> - - <titleabbrev>References</titleabbrev> - - <para>Regarding printed documents we may define two categories of - document internal references:</para> - - <variablelist> - <varlistentry> - <term><emphasis>Page number references</emphasis></term> - - <listitem> - <para>This is the <quote>classical</quote> type of a reference - e.g. in books. An author refers the reader to a distant location - by writing <quote>... see further explanation in section 4.5 on - page 234</quote>. A book's table of contents assigning page - numbers to topics is another example. This way the - implementation of a reference relies solely on the features a - printed document offers.</para> - </listitem> - </varlistentry> - - <varlistentry> - <term><emphasis>Hypertext references</emphasis></term> - - <listitem> - <para>This way of implementing references utilizes features of - (online) viewers for printable documents. For example PDF - viewers like <productname - xlink:href="">Adobe's Acrobat - reader</productname> or the evince application are able to - follow hypertext links in a fashion known from HTML browsers. - This browser feature is based on hypertext capabilities defined - in the Adobe's PDF de-facto standard.</para> - </listitem> - </varlistentry> - </variablelist> - - <para>Of course the second type of references is limited to people who - use an online viewer application instead of reading a document from - physical paper.</para> - - <para>We now show the implementation of <abbrev - xlink:href="">FO</abbrev> - based page references. As already being discussed for <link - xlink:href="">ID</link> / <link - xlink:href="">IDREF</link> pairs we need - a link destination (anchor) and a link source. The <abbrev - xlink:href="">FO</abbrev> - standard uses the same anchor implementation as in XML for <link - xlink:href="">ID</link> typed attributes: - <abbrev - xlink:href="">FO</abbrev> - objects <emphasis>may</emphasis> have an attribute <link - xlink:href="">id</link> with a document - wide unique value. The <abbrev - xlink:href="">FO</abbrev> - element <link - xlink:href="">fo:page-number-citation</link> - is used to actually create a page reference via its attribute <link - xlink:href="">ref-id</link>:</para> - - <figure xml:id="refJavaXml"> - <title>Two blocks mutual page referencing each other.</title> - - <programlisting language="none">... - <fo:flow flow-name='xsl-region-body'> - <fo:block id='xml'>Java section see page - <fo:page-number-citation ref-id='java'/>. - </fo:block> - - <fo:block id='java'>XML section see page - <fo:page-number-citation ref-id='xml'/>. - </fo:block> - </fo:flow> ...</programlisting> - - <mediaobject> - <imageobject> - <imagedata align="left" fileref="Ref/Fig/pagerefStack.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>NB: Be careful defining <link - xlink:href="">id</link> attributes for - objects being descendants of <link - xlink:href="">fo:static-content</link> - nodes. Such objects typically appear on multiple pages and are - therefore no unique anchors. A reference carrying such an id value - thus actually refers to 1 <= n values on n different pages. - Typically a user agent will choose the first object of this set when - clicking the link. So in effect the parent <link - xlink:href="">fo:page-sequence</link> - is chosen as the effective link target.</para> - - <para>The element <link - xlink:href="">fo:basic-link</link> - creates PDF hypertext links. We extend the previous example:</para> - - <figure xml:id="refJavaXmlHyper"> - <title>Two blocks with mutual page- and hypertext - references.</title> - - <programlisting language="none"><fo:flow flow-name='xsl-region-body'> - <fo:block id='xml'>Java section see <fo:basic-link color="blue" - internal-destination="java">page<fo:page-number-citation - ref-id='java'/>.</fo:basic-link></fo:block> - -<fo:block id='java'>XML section see - <fo:basic-link color="blue" - internal-destination="xml">page <fo:page-number-citation - ref-id='xml'/>.</fo:basic-link></fo:block > -</fo:flow></programlisting> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/pagerefhyperStack.fig"/> - </imageobject> - </mediaobject> - </figure> - </section> - - <section xml:id="pdfBookmarks"> - <title>PDF bookmarks</title> - - <titleabbrev>Bookmarks</titleabbrev> - - <para>The PDF specification allows to define so called bookmarks - offering an explorer like navigation:</para> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/pdfbookmarks.screen.png"/> - </imageobject> - </mediaobject> - - <para>PDF bookmarks are <link - xlink:href="">part - of the XSL-FO 1.1</link> Standard. Some <abbrev - xlink:href="">FO</abbrev> - processors still continue to use proprietary solutions for bookmark - creation with respect to the older <abbrev - xlink:href="">FO</abbrev> - 1.0 standard. For details of bookmark extensions by - <orgname>RenderX</orgname>'s processor see <link - xlink:href="">xep's - documentation</link>.</para> - </section> - </section> - - <section xml:id="xml2fo"> - <title>Constructing <abbrev - xlink:href="">FO</abbrev> - from XML documents</title> - - <titleabbrev><abbrev - xlink:href="">FO</abbrev> - from XML</titleabbrev> - - <para>So far we have learnt some basic <abbrev - xlink:href="">FO</abbrev> - elements. As with HTML we typically generate FO code from other sources - rather than crafting it by hand. The general picture is:</para> - - <figure xml:id="htmlFoProduction"> - <title>Different target formats from common source.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/crossmedia.fig" scale="65"/> - </imageobject> - - <caption> - <para>We may generate both online and printed documentation from a - common source. This requires style sheets for the desired - destination formats in question.</para> - </caption> - </mediaobject> - </figure> - - <para>We discussed the <abbrev - xlink:href="">FO</abbrev> - standard as an input format for printable output production by a - renderer. In this way a <abbrev - xlink:href="">FO</abbrev> - document is similar to HTML being a format to be rendered by a web - browser for visual (screen oriented) output production. The - transformation from a XML source (e.g. a memo document) to <abbrev - xlink:href="">FO</abbrev> - is still missing. As for HTML we may use <abbrev - xlink:href="">XSL</abbrev> as a - transformation means. We generate the sender's surname from a memo - document instance:</para> - - <figure xml:id="memo2fosurname"> - <title>Generating a sender's surname for printing.</title> - - <programlisting language="none"><?xml version="1.0" encoding="utf-8"?> -<xsl:stylesheet version="1.0" - xmlns:fo="" - xmlns:xsl=""> - - <xsl:output method="xml" indent="yes"/> - - <xsl:template match="/"> - <fo:root> - <fo:layout-master-set> - <fo:simple-page-master master-name="simplePageLayout" - page-width="294mm" page-height="210mm" margin="5mm"> - <fo:region-body margin="15mm"/> - </fo:simple-page-master> - </fo:layout-master-set> - <fo:page-sequence master-reference="simplePageLayout"> - <fo:flow flow-name="xsl-region-body"> - <fo:block font-size="20pt"> - <xsl:text>Sender:</xsl:text> - <fo:inline font-weight='bold'> - <xsl:value-of select="memo/from/surname"/> - </fo:inline> - </fo:block> - </fo:flow> - </fo:page-sequence> - </fo:root> - </xsl:template> -</xsl:stylesheet></programlisting> - </figure> - - <para>A suitable XML document instance reads:</para> - - <figure xml:id="memoMessage"> - <title>A <code>memo</code> document instance.</title> - - <programlisting language="none"><memo ...="memo.xsd"> - <from> - <name>Martin</name> - <surname>Goik</surname> - </from> - <to> - <name>Adam</name> - <surname>Hacker</surname> - </to> - <to> - <name>Eve</name> - <surname>Intruder</surname> - </to> - <date year="2005" month="1" day="6"/> - <subject>Firewall problems</subject> - <content> - <para>Thanks for your excellent work.</para> - <para>Our firewall is definitely broken!</para> - </content> -</memo></programlisting> - </figure> - - <para>Some remarks:</para> - - <orderedlist> - <listitem> - <para>The <link - xlink:href="">xsl_stylesheet</link> - element contains a namespace definition for the target FO document's - namespace, namely:</para> - - <programlisting language="none">xmlns:xsl=""</programlisting> - - <para>This is required to use elements like <link - xlink:href="">fo:block</link> - belonging to the FO namespace.</para> - </listitem> - - <listitem> - <para>The option value <code>indent="yes"</code> in <link - xlink:href="">xsl_output</link> - is usually set to "no" in a production environment to avoid - whitespace related problems.</para> - </listitem> - - <listitem> - <para>The generation of a print format like PDF is actually a two - step process. To generate message.pdf from message.xml by a - stylesheet memo2fo.xsl we need the following calls:</para> - - <variablelist> - <varlistentry> - <term><emphasis>XML document instance to FO</emphasis></term> - - <listitem> - <programlisting language="none">xml2xml message.xml memo2fo.xsl -o</programlisting> - </listitem> - </varlistentry> - - <varlistentry> - <term><emphasis>FO to PDF</emphasis></term> - - <listitem> - <programlisting language="none">fo2pdf -fo -pdf message.pdf</programlisting> - </listitem> - </varlistentry> - </variablelist> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/xml2fo2pdf.fig"/> - </imageobject> - </mediaobject> - - <para>When debugging of the intermediate <abbrev - xlink:href="">FO</abbrev> - file is not required both steps may be combined into a single - call:</para> - - <programlisting language="none">fo2pdf -xml message.xml -xsl memo2fo.xsl -pdf message.pdf</programlisting> - </listitem> - </orderedlist> - </section> - - <section xml:id="foCatalog"> - <title>Formatting a catalog.</title> - - <titleabbrev>A catalog</titleabbrev> - - <para>We now take the <link linkend="climbingCatalog">climbing catalog - example</link> with prices being added and incrementally create a series - of PDF versions improving from one version to another.</para> - - <qandaset defaultlabel="qanda" xml:id="idCatalogStart"> - <title>A first PDF version of the catalog</title> - - <qandadiv> - <qandaentry> - <question> - <para>Write a <abbrev - xlink:href="">XSL</abbrev> script to - generate a starting version <filename - xlink:href="Ref/src/Dom/climbenriched.start.pdf">climbenriched.start.pdf</filename>.</para> - </question> - - <answer> - <programlisting language="none"><?xml version="1.0" encoding="utf-8"?> -<xsl:stylesheet version="1.0" - xmlns:fo="" - xmlns:xsl=""> - - <xsl:output method="xml" indent="yes"/> - - <xsl:template match="/"> - <fo:root font-size="10pt"> - <fo:layout-master-set> - <fo:simple-page-master master-name="productPage" - page-width="80mm" page-height="110mm" margin="5mm"> - <fo:region-body margin="15mm"/> - <fo:region-before extent="10mm"/> - </fo:simple-page-master> - </fo:layout-master-set> - <xsl:apply-templates select="catalog/product" /> - </fo:root> - </xsl:template> - - <xsl:template match="product"> - <fo:page-sequence master-reference="productPage"> - <fo:static-content flow-name="xsl-region-before"> - <fo:block font-weight="bold"> - <xsl:value-of select="title"/> - </fo:block> - </fo:static-content> - <fo:flow flow-name="xsl-region-body"> - <xsl:apply-templates select="description/para"/> - - <fo:block>Price:<xsl:value-of select="@price"/></fo:block> - <fo:block>Order no:<xsl:value-of select="@id"/></fo:block> - </fo:flow> - </fo:page-sequence> - </xsl:template> - - <xsl:template match="para"> - <fo:block space-after="10px"> - <xsl:value-of select="."/> - </fo:block> - </xsl:template> - -</xsl:stylesheet></programlisting> - </answer> - </qandaentry> - - <qandaentry xml:id="idCatalogProduct"> - <question> - <label>Header, page numbers and table formatting</label> - - <para>Extend <xref linkend="idCatalogStart"/> by adding page - numbers. The order number and prices shall be formatted as - tables. Add a ruler to each page's head. The result should look - like <filename - xlink:href="Ref/src/Dom/climbenriched.product.pdf">climbenriched.product.pdf</filename></para> - </question> - - <answer> - <para>Solution see <filename - xlink:href="Ref/src/Dom/catalog2fo.product.xsl">catalog2fo.product.xsl</filename>.</para> - </answer> - </qandaentry> - - <qandaentry xml:id="idCatalogToc"> - <question> - <label>A table of contents.</label> - - <para>Each product description's page number shall appear in a - table of contents together with the product's <code>title</code> - as in <filename - xlink:href="Ref/src/Dom/climbenriched.toc.pdf">climbenriched.toc.pdf</filename>.</para> - </question> - - <answer> - <para>Solution see <filename - xlink:href="Ref/src/Dom/catalog2fo.toc.xsl">catalog2fo.toc.xsl</filename>.</para> - </answer> - </qandaentry> - - <qandaentry xml:id="idCatalogToclink"> - <question> - <label>A table of contents with hypertext links.</label> - - <para>The table of contents' entries may offer hypertext - features to supporting browsers as in <filename - xlink:href="Ref/src/Dom/climbenriched.toclink.pdf">climbenriched.toclink.pdf</filename>. - In addition include the document's <tag - class="starttag">introduction</tag>.</para> - </question> - - <answer> - <para>Solution see <filename - xlink:href="Ref/src/Dom/catalog2fo.toclink.xsl">catalog2fo.toclink.xsl</filename>.</para> - </answer> - </qandaentry> - - <qandaentry xml:id="idCatalogFinal"> - <question> - <label>A final version.</label> - - <para>Add the following features:</para> - - <orderedlist> - <listitem> - <para>Number the table of contents starting with page i, ii, - iii, iv and so on. Start the product descriptions with page - 1. On each page's footer a text <quote>page xx of yy</quote> - shall be displayed. This requires the definition of an - anchor <code>id</code> on the <abbrev - xlink:href="">FO</abbrev> - document's last page.</para> - </listitem> - - <listitem> - <para>Add PDF bookmarks by using <orgname>XEP</orgname>'s - <abbrev - xlink:href="">FO</abbrev> - extensions. This requires the namespace declaration - <code>xmlns:rx=""</code> - in the XSLT script's header.</para> - </listitem> - </orderedlist> - - <para>The result may look like <filename - xlink:href="Ref/src/Dom/"></filename>. - N.B.: It may take some effort to achieve this result. This - effort is left to the <emphasis>interested</emphasis> - participants.</para> - </question> - - <answer> - <para>Solution see <filename - xlink:href="Ref/src/Dom/catalog2fo.toclink.xsl">catalog2fo.toclink.xsl</filename>.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - </chapter> - - <appendix> - <title>W3C production rules</title> - - <productionset> - <title><link - xlink:href="">Characters</link></title> - - <production xml:id="w3RecXml_NT-Letter"> - <lhs>Letter</lhs> - - <rhs><nonterminal def="#w3RecXml_NT-BaseChar">BaseChar</nonterminal> | - <nonterminal - def="#w3RecXml_NT-Ideographic">Ideographic</nonterminal></rhs> - </production> - - <production xml:id="w3RecXml_NT-BaseChar"> - <lhs>BaseChar</lhs> - - <rhs>[#x0041-#x005A] | [#x0061-#x007A] | [#x00C0-#x00D6] - | [#x00D8-#x00F6] | [#x00F8-#x00FF] | [#x0100-#x0131] - | [#x0134-#x013E] |...(values omitted here, see W3C - documentation)</rhs> - </production> - - <production xml:id="w3RecXml_NT-Ideographic"> - <lhs>Ideographic</lhs> - - <rhs>[#x4E00-#x9FA5] | #x3007 | [#x3021-#x3029]</rhs> - </production> - - <production xml:id="w3RecXml_NT-CombiningChar"> - <lhs>CombiningChar</lhs> - - <rhs>[#x0300-#x0345] | ...(values omitted here)</rhs> - </production> - - <production xml:id="w3RecXml_NT-Digit"> - <lhs>Digit</lhs> - - <rhs>[#x0030-#x0039] | [#x0660-#x0669] | [#x06F0-#x06F9] - | [#x0966-#x096F] | [#x09E6-#x09EF] | [#x0A66-#x0A6F] - | [#x0AE6-#x0AEF] | [#x0B66-#x0B6F] | [#x0BE7-#x0BEF] - | [#x0C66-#x0C6F] | [#x0CE6-#x0CEF] | [#x0D66-#x0D6F] - | [#x0E50-#x0E59] | [#x0ED0-#x0ED9] | [#x0F20-#x0F29]</rhs> - </production> - - <production xml:id="w3RecXml_NT-Extender"> - <lhs>Extender</lhs> - - <rhs>#x00B7 | #x02D0 | #x02D1 | #x0387 | #x0640 | #x0E46 | #x0EC6 - | #x3005 | [#x3031-#x3035] | [#x309D-#x309E] | [#x30FC-#x30FE]</rhs> - </production> - </productionset> - </appendix> - - <appendix> - <title>Glossary</title> - - <para/> - - <glossary> - <glossentry xml:id="gloss_API"> - <glossterm><abbrev xlink:href="" - xml:id="abbr_api">API</abbrev></glossterm> - - <glossdef> - <para>Application programming interface</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_SqlDdl"> - <glossterm><abbrev - xlink:href="" - xml:id="abbr_Ddl">DDL</abbrev> <link - linkend="gloss_SQL">(SQL)</link></glossterm> - - <glossdef> - <para>Data definition language. The subset of <link - linkend="gloss_SQL">SQL</link> dealing with the creation of tables, - views etc.</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_DOM"> - <glossterm><acronym xlink:href="" - xml:id="abbr_Dom">DOM</acronym></glossterm> - - <glossdef> - <para>The <link linkend="gloss_W3C">W3C</link> <link - xlink:href="">Document Object Model</link> - standard</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_DTD"> - <glossterm><abbrev - xlink:href="" - xml:id="abbr_Dtd">DTD</abbrev></glossterm> - - <glossdef> - <para>Document Type Definition. An older standard with respect to - <link linkend="gloss_RelaxNG">RelaxNG</link> and <link - linkend="gloss_RelaxNG">XML schema</link> to define an XML documents - grammar.</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_EBNF"> - <glossterm><abbrev>EBNF</abbrev></glossterm> - - <glossdef> - <para>Extended Backus-Naur form.</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_ftp"> - <glossterm><abbrev - xlink:href="" - xml:id="abbr_Ftp">ftp</abbrev></glossterm> - - <glossdef> - <para>File Transfer Protocol</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_FO"> - <glossterm><abbrev - xlink:href="" - xml:id="abbr_Fo">FO</abbrev></glossterm> - - <glossdef> - <para>The Formatting Objects Standard for printable output - generation</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_HDM"> - <glossterm><orgname xlink:href="" - xml:id="org_Hdm">Hdm</orgname></glossterm> - - <glossdef> - <para xml:lang="de">Hochschule der Medien.</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_Hql"> - <glossterm><abbrev - xlink:href="" - xml:id="abbr_Hql">HQL</abbrev></glossterm> - - <glossdef> - <para>The <link - xlink:href="">Hibernate - Query Language</link>.</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_http"> - <glossterm><abbrev xlink:href="" - xml:id="abbr_Http">http</abbrev></glossterm> - - <glossdef> - <para>The Hypertext Transfer Protocol</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_IDE"> - <glossterm><abbrev - xlink:href="" - xml:id="abbr_Ide">IDE</abbrev></glossterm> - - <glossdef> - <para>Integrated Development Environment</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_J2EE"> - <glossterm><trademark - xlink:href="" - xml:id="tm_J2ee">J2EE</trademark></glossterm> - - <glossdef> - <para>Java Platform, Enterprise Edition</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_Java"> - <glossterm><trademark - xlink:href="">Java</trademark></glossterm> - - <glossdef> - <para>General purpose programming language with support for object - oriented concepts.</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_Javadoc"> - <glossterm><trademark - xlink:href="">Javadoc</trademark></glossterm> - - <glossdef> - <para>Extracting documentation embedded in <link - linkend="gloss_Java"><trademark>Java</trademark></link> source - code.</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_JDBC"> - <glossterm><trademark - xlink:href="" - xml:id="tm_Jdbc">JDBC</trademark></glossterm> - - <glossdef> - <para>XXX.</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_JDK"> - <glossterm><trademark - xlink:href="" - xml:id="tm_Jdk">JDK</trademark></glossterm> - - <glossdef> - <para>Java Development Kit.</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_JPA"> - <glossterm><abbrev - xlink:href="" - xml:id="abbr_Jpa">JPA</abbrev></glossterm> - - <glossdef> - <para><link - xlink:href="">Java - Persistence Architecture</link></para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_JRE"> - <glossterm><trademark - xlink:href="" - xml:id="tm_Jre">JRE</trademark></glossterm> - - <glossdef> - <para>Java Runtime Environment</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_MathML"> - <glossterm><abbrev>MathML</abbrev></glossterm> - - <glossdef> - <para><link xlink:href="">Mathematical Markup - Language</link></para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_MIB"> - <glossterm><orgname xlink:href="" - xml:id="org_Mib">MIB</orgname></glossterm> - - <glossdef> - <para xml:lang="de">Bachelor Studiengang Medieninformatik</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_Mysql"> - <glossterm><trademark - xlink:href="" - xml:id="tm_Mysql">Mysql</trademark></glossterm> - - <glossdef> - <para>Open source Oracle database product</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_MP3"> - <glossterm><abbrev>MP3</abbrev></glossterm> - - <glossdef> - <para>Audio codec.</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_ORM"> - <glossterm><abbrev>ORM</abbrev></glossterm> - - <glossdef> - <para>Object relational mapping.</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_PHP"> - <glossterm><abbrev - xlink:href="">PHP</abbrev></glossterm> - - <glossdef> - <para>Hypertext preprocessor</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_RelaxNG"> - <glossterm><acronym - xlink:href="">RelaxNG</acronym></glossterm> - - <glossdef> - <para>An <link - xlink:href="">ISO</link> - standard to define the grammar of XML documents. Primary use for - document oriented applications.</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_SAX"> - <glossterm><acronym - xlink:href="">SAX</acronym></glossterm> - - <glossdef> - <para><link xlink:href="">Simple API for - XML</link>.</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_SQL"> - <glossterm><acronym - xlink:href="">SQL</acronym></glossterm> - - <glossdef> - <para><link xlink:href="">Structured - query language</link>.</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_SVG"> - <glossterm><abbrev>SVG</abbrev></glossterm> - - <glossdef> - <para><link xlink:href="">Scalable - Vector Graphics</link>.</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_TCP"> - <glossterm><acronym - xlink:href="" - xml:id="abbr_Tcp">TCP</acronym></glossterm> - - <glossdef> - <para>Transmission Control Protocol</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_URL"> - <glossterm><abbrev xlink:href="" - xml:id="abbr_Url">URL</abbrev></glossterm> - - <glossdef> - <para>Uniform Resource Locator</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_W3C"> - <glossterm><orgname - xlink:href="">W3C</orgname></glossterm> - - <glossdef> - <para>World Wide Web Consortium</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_XHTML"> - <glossterm><abbrev>XHTML</abbrev></glossterm> - - <glossdef> - <para>Html as <link linkend="gloss_XML">XML</link> <link - xlink:href="">standard</link>.</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_XML"> - <glossterm><abbrev - xlink:href="">Xml</abbrev></glossterm> - - <glossdef> - <para>The <link xlink:href="">Extensible Markup - Language</link>.</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_XmlSchema"> - <glossterm>XML Schema</glossterm> - - <glossdef> - <para>A W3C standard to define grammars for XML documents. Rich set - of features with respect to data modeling.</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_XPath"> - <glossterm><acronym xlink:href="" - xml:id="abbr_Xpath">XPath</acronym></glossterm> - - <glossdef> - <para>XML Path Language</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_XSD"> - <glossterm><abbrev - xlink:href="">XSD</abbrev></glossterm> - - <glossdef> - <para>XML Schema description Language</para> - </glossdef> - </glossentry> - - <glossentry xml:id="gloss_XSL"> - <glossterm><abbrev xlink:href="" - xml:id="abbr_Xsl">XSL</abbrev></glossterm> - - <glossdef> - <para>Extensible Stylesheet Language</para> - </glossdef> - </glossentry> - </glossary> - </appendix> - - <xi:include href="../glossary.xml" xpointer="element(/1)"/> - - <xi:include href="../bibliography.xml" xpointer="element(/1)"/> -</part> diff --git a/Sda1/testng.xml b/Sda1/testng.xml new file mode 100644 index 000000000..538484520 --- /dev/null +++ b/Sda1/testng.xml @@ -0,0 +1,326 @@ + <chapter xml:id="chapUnitTesting" version="5.0" xmlns="" + xmlns:xlink="" + xmlns:xi="" + xmlns:svg="" + xmlns:m="" + xmlns:html="" + xmlns:db=""> + + + + <title>Unit testing with <productname + xlink:href="">TestNG</productname></title> + + <para>This chapter presents a very short introduction to the basic usage + of unit testing. We start with a simple stack implementation:</para> + + <programlisting language="none">package sda.unittesting; + +public class MyStack { + int [] data = new int[5]; + int numElements = 0; + + public void push(final int n) { + data[numElements] = n; + numElements++; + } + public int pop() { + numElements--; + return data[numElements]; + } + public int top() { + return data[numElements - 1]; + } + public boolean empty() { + return 0 == numElements; + } +}</programlisting> + + <para>Readers being familiar with stacks will immediately notice a + deficiency in the above code: This stack is actually bounded. It only + allows us to store a maximum number of five integer values.</para> + + <para>The following implementation allows us to functionally test our + <classname>sda.unittesting.MyStack</classname> implementation with respect + to the usual stack behaviour:</para> + + <programlisting language="none" linenumbering="numbered">package sda.unittesting; + +public class MyStackFuncTest { + + private static void assertTrue(boolean status) { + if (!status) { + throw new RuntimeException("Assert failed"); + } + } + public static void main(String[] args) { + final MyStack stack = new MyStack(); + // Test 1: A new MyStack instance should not contain any elements. + assertTrue(stack.empty()); + + // Test 2: Adding and removal + stack.push(4); + assertTrue (!stack.empty()); + assertTrue (4 ==; + assertTrue (4 == stack.pop()); + assertTrue (stack.empty()); + + // Test 3: Trying to add more than five values + stack.push(1);stack.push(2);stack.push(3);stack.push(4); + stack.push(5); + stack.push(6); + assertTrue(6 == stack.pop()); + } +}</programlisting> + + <para>Execution yields a runtime exception which is due to the attempted + insert operation <code>stack.push(6)</code>:</para> + + <programlisting language="none">Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 5 + at sda.unittesting.MyStack.push( + at sda.unittesting.MyStackFuncTest.main(</programlisting> + + <para>The execution result is easy to understand since our + <classname>sda.unittesting.MyStack </classname> implementation only allows + to store 5 values.</para> + + <para>Our testing application is fine so far. It does however lack some + features:</para> + + <itemizedlist> + <listitem> + <para>automatic initialization before starting tests and finalization + at the end.</para> + </listitem> + + <listitem> + <para>Our test is monolithic: We used comments to document different + tests. This knowledge is implicit and thus invisible to testing + frameworks. Test results (failure/success) cannot be assigned to test + 1, test 2 for example.</para> + </listitem> + + <listitem> + <para>Aggregation and visualization of test results</para> + </listitem> + + <listitem> + <para>Dependencies between individual tests</para> + </listitem> + + <listitem> + <para>Ability to enable and disable tests according to a project's + maturity level. In our example test 3 might be disabled till an + unbounded implementation gets completed.</para> + </listitem> + </itemizedlist> + + <para>Testing frameworks like <productname + xlink:href="">Junit</productname> or <productname + xlink:href="">TestNG</productname> provide means for + efficient and flexible test organization. Using <productname + xlink:href="">TestNG</productname> our current test + application including only test 1 and test 2 reads:</para> + + <programlisting language="none">package sda.unittesting; + +import org.testng.annotations.Test; + +public class MyStackTestSimple { + + final MyStack stack = new MyStack(); + + @Test + public void empty() { + assert(stack.empty()); + } + @Test + public void pushPopEmpty() { + assert (stack.empty()); + stack.push(4); + assert (!stack.empty()); + assert (4 ==; + assert (4 == stack.pop()); + assert (stack.empty()); + } +}</programlisting> + + <para>We notice the absence of a <function>main()</function> method. Our + testing framework uses the above code for test definitions. In contrast to + our homebrew solution the individual tests are now defined in a machine + readable fashion. This allows for sophisticated statistics. Executing + inside <productname xlink:href="">TestNG</productname> + produces the following results:</para> + + <programlisting language="none">PASSED: empty +PASSED: pushPopEmpty + +=============================================== + Default test + Tests run: 2, Failures: 0, Skips: 0 +=============================================== + + +=============================================== +Default suite +Total tests run: 2, Failures: 0, Skips: 0 +===============================================</programlisting> + + <para>Both tests run successfully. So why did we omit test 3 which is + bound to fail? We now add it to the test suite:</para> + + <programlisting language="none">package sda.unittesting; +... +public class MyStackTestSimple1 { +... + @Test + public void empty() { + assert(stack.empty()); +... + + @Test + public void push6() { + stack.push(1); + stack.push(2); + stack.push(3); + stack.push(4); + stack.push(5); + stack.push(6); + assert (6 == stack.pop()); + } ...</programlisting> + + <para>As expected test 3 fails. But the result shows test 2 failing as + well:</para> + + <programlisting language="none">PASSED: empty +FAILED: push6 +java.lang.ArrayIndexOutOfBoundsException: 5 + at sda.unittesting.MyStack.push( + at sda.unittesting.MyStackTestSimple1.push6( + at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) + ... + +FAILED: pushPopEmpty +java.lang.AssertionError + at sda.unittesting.MyStackTestSimple1.pushPopEmpty( + at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) + ... + +=============================================== + Default test + Tests run: 3, Failures: 2, Skips: 0 +===============================================</programlisting> + + <para>This unexpected result is due to the execution order of the three + individual tests. Within our class + <classname>sda.unittesting.MyStackTestSimple1</classname> the three tests + appear in the sequence test 1, test 2 and test 3. This however is just the + order of source code. The testing framework will not infer any order and + thus execute our three tests in <emphasis role="bold">arbitrary</emphasis> + order. The execution log shows the actual order:</para> + + <orderedlist> + <listitem> + <para>Test <quote><code>empty</code></quote></para> + </listitem> + + <listitem> + <para>Test <quote><code>push6</code></quote></para> + </listitem> + + <listitem> + <para>Test <quote><code>pushPopEmpty</code></quote></para> + </listitem> + </orderedlist> + + <para>So the second test will raise an exception and leave the stack + filled with the maximum possible five elements. Thus it is not empty and + the <quote><code>pushPopEmpty</code></quote> test fails as well.</para> + + <para>If we want to avoid this type of errors we may:</para> + + <itemizedlist> + <listitem> + <para>Declare tests within separate (test class) definitions</para> + </listitem> + + <listitem> + <para>Define dependencies like test X can only be executed after test + Y.</para> + </listitem> + </itemizedlist> + + <para>The <productname xlink:href="">TestNG</productname> + framework offers a feature which allows the definition of test groups and + dependencies between them. We use this feature to refine our test + definition:</para> + + <programlisting language="none">package sda.unittesting; +... +public class MyStackTest { + ... + @Test (<emphasis role="bold">groups = "basic"</emphasis>) + public void empty() { + assert(stack.empty()); + } + @Test (<emphasis role="bold">groups = "basic"</emphasis>) + public void pushPopEmpty() { + ... + } + + @Test (<emphasis role="bold">dependsOnGroups = "basic"</emphasis>) + public void push6() { + ... + }</programlisting> + + <para>The first two tests will now belong to the same test group + <quote>basic</quote>. The <emphasis role="bold"><code>dependsOnGroups = + "basic"</code></emphasis> declaration will guarantee that our + <code>push6</code> test will be launched as the last one. So we get the + expected result:</para> + + <programlisting language="none">PASSED: empty +PASSED: pushPopEmpty +FAILED: push6 +java.lang.ArrayIndexOutOfBoundsException: 5 + at sda.unittesting.MyStack.push( + at sda.unittesting.MyStackTest.push6( + at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) +... + + +=============================================== + Default test + Tests run: 3, Failures: 1, Skips: 0 +===============================================</programlisting> + + <para>In fact the order between the first two tests might be critical as + well. The <quote><code>pushPopEmpty</code></quote> test leaves our stack + in an empty state. If this is not the case reversing the execution order + of <quote><code>pushPopEmpty</code></quote> and + <quote><code>empty</code></quote> would cause an error as well.</para> + + <para>Programming <abbrev + xlink:href="">IDE</abbrev>s + like eclipse provide elements for test result visualization. Our last test + gets summarized as:</para> + + <screenshot> + <info> + <title><productname + xlink:href="">TestNG</productname> result + presentation in eclipse</title> + </info> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Screen/eclipseTestngResult.screen.png" + scale="75"/> + </imageobject> + </mediaobject> + </screenshot> + + <para>We can drill down from a result of type failure to its occurrence + within the corresponding code.</para> + </chapter> + diff --git a/Sda1/try.xml b/Sda1/try.xml old mode 100755 new mode 100644 diff --git a/Sda1/xmlintro.xml b/Sda1/xmlintro.xml new file mode 100644 index 000000000..b8c0b1f84 --- /dev/null +++ b/Sda1/xmlintro.xml @@ -0,0 +1,529 @@ + <chapter xml:id="xmlIntro" version="5.0" xmlns="" + xmlns:xlink="" + xmlns:xi="" + xmlns:svg="" + xmlns:m="" + xmlns:html="" + xmlns:db=""> + + + <title>Introduction to XML</title> + + <section xml:id="xmlBasic"> + <title>The XML industry standard</title> + + <para>A short question might be: <quote>What is XML?</quote> An answer + might be: The acronym XML stands for + <quote>E<emphasis>x</emphasis>tensible <emphasis>M</emphasis>arkup + <emphasis>L</emphasis><foreignphrase>anguage</foreignphrase></quote> and + is an industry standard being published by the W3C standardization + organization. Like other industry software standards talking about XML + leads to talk about XML based software: Applications and frameworks + supplying added values to software implementors and enhancing data + exchange between applications.</para> + + <para>Many readers are already familiar with XML without explicitly + referring to the standard itself: The world wide web's + <foreignphrase>lingua franca</foreignphrase> HTML has been ported to an + XML dialect forming the <link + xlink:href="">XHTML</link> Standard. The idea + behind this standard is to distinguish between an abstract markup + language and rendered results being generated from so called document + instances by a browser:</para> + + <figure xml:id="renderXhtmlMarkup"> + <title>Rendering XHTML markup</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/xhtml.fig"/> + </imageobject> + </mediaobject> + </figure> + + <para>Xhtml is actually a good example to illustrate the tree like, + hierarchical structure of XML documents:</para> + + <figure xml:id="xhtmlTree"> + <title>Xhtml tree structure</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/xhtmlexample.fig" scale="65"/> + </imageobject> + </mediaobject> + </figure> + + <para>We may extend this example by representing a mathematical formula + via a standard called <link + xlink:href="">Mathml</link>:</para> + + <figure xml:id="mathmlExample"> + <title>A formula in <link + xlink:href="">MathML</link> + representation.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/sqrtrender.fig"/> + </imageobject> + </mediaobject> + </figure> + + <para>Again we observe a similar situation: A database like + <emphasis>representation</emphasis> of a formula on the left and a + <emphasis>rendered</emphasis> version on the right. Regarding XML we + have:</para> + + <itemizedlist> + <listitem> + <para>The <link xlink:href="">MathML</link> + standard intended to describe mathematical formulas. The standard + defines a set of <emphasis>tags</emphasis> like e.g. <tag + class="starttag">math:msqrt</tag> with well-defined semantics + regarding permitted attribute values and nesting rules.</para> + </listitem> + + <listitem> + <para>Informal descriptions of formatting expectations.</para> + </listitem> + + <listitem> + <para>Software transforming an XML formula representation into + visible or printable output. In other words: A rendering + engine.</para> + </listitem> + </itemizedlist> + + <para>XML documents may also be regarded as a persistence mechanism to + represent and store data. Similarities to Relational Database Systems + exist. A RDBMS + (<emphasis>R</emphasis><foreignphrase>elational</foreignphrase> + <emphasis>D</emphasis><foreignphrase>atabase</foreignphrase> + <emphasis>M</emphasis><foreignphrase>anagement</foreignphrase> + <emphasis>S</emphasis><foreignphrase>ystem</foreignphrase>) is typically + capable to hold Tera bytes of data being organized in tables. The + arrangement of data may be subject to various constraints like + candidate- or foreign key rules. With respect to both end users and + software developers a RDBMS itself is a building block in a complete + solution. We need an application on top of it acting as a user interface + to the data being contained.</para> + + <para>In contrast to a RDBMS XML allows data to be organized + hierarchically. The <link + xlink:href="">MathML</link> representation given + in <xref linkend="mathmlExample"/> may be graphically visualized:</para> + + <figure xml:id="mathmltree"> + <title>A tree graph representation of the <link + xlink:href="">MathML</link> example given + before.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/sqrtree.fig"/> + </imageobject> + </mediaobject> + </figure> + + <para>CAD applications may user XML documents as a representation of + graphical primitives:</para> + + <informalfigure> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/attributes.fig" scale="65"/> + </imageobject> + </mediaobject> + </informalfigure> + + <para>Of course RDBMS also allow the representation of tree like + structures or arbitrary graphs. But these have to be modelled by using + foreign key constraints since relational tables themselves have a + <quote>flat</quote> structure. Some RDBMS vendors provide extensions to + the SQL standard which allow <quote>native</quote> representations of + <xref linkend="glo_XML"/> documents.</para> + </section> + + <section xml:id="xmlHtml"> + <title>Well formed XML documents</title> + + <para>The general structure of an document is as + follows:</para> + + <figure xml:id="xmlbase"> + <title><xref linkend="glo_XML"/> basic + structure</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/xmlbase.fig" scale="65"/> + </imageobject> + </mediaobject> + </figure> + + <para>We explore a simple XML document representing messages like + E-mails:</para> + + <figure xml:id="memoWellFormed"> + <title>The representation of a short message.</title> + + <programlisting language="none"><?xml<co + xml:id="first_xml_code_magic"/> version="1.0"<co + xml:id="first_xml_code_version"/> encoding="UTF-8"<co + xml:id="first_xml_code_encoding"/>?> +<memo><co xml:id="first_xml_code_topelement"/> + <from>M. Goik</from><co xml:id="first_xml_code_from"/> + <to>B. King</to> + <to>A. June</to> + <subject>Best whishes</subject> + <content>Hi all, congratulations to your splendid party</content> +</memo></programlisting> + </figure> + + <calloutlist> + <callout arearefs="first_xml_code_magic"> + <para>The very first characters <code><?xml</code> may be + regarded as a <link + xlink:href="">magic + number string</link> being used as a format indicator which allows + to distinguish between different file types i.e. GIF, JPEG, HTML and + so on.</para> + </callout> + + <callout arearefs="first_xml_code_version"> + <para>The <code>version="1.0"</code> attribute tells us that all + subsequent lines will conform to the <link + xlink:href="">XML</link> standard of version + 1.0. This way a document can express its conformance to the version + 1.0 standard even if in the future this standard evolves to a higher + version e.g. <code>version="2.1"</code>.</para> + </callout> + + <callout arearefs="first_xml_code_encoding"> + <para>The attribute <code>encoding="UTF-8"</code> tells us that all + text in the current document uses <link + xlink:href="">Unicode</link> encoding. <link + xlink:href="">Unicode</link> is a widely accepted + industry standard for font encoding. Thus European, Cyrillic and + most Asian font codes are allowed to be used in documents + <emphasis>simultaneously</emphasis>. Other encodings may limit the + set of allowed characters, e.g. <code>encoding="ISO-8859-1"</code> + will only allow characters belonging to western European languages. + However a system also needs to have the corresponding fonts (e.g. + TrueType) being installed in order to render the document + appropriately. A document containing Chinese characters is of no use + if the underlying rendering system lacks e.g. a set of Chinese True + Type fonts.</para> + </callout> + + <callout arearefs="first_xml_code_topelement"> + <para>An XML document has exactly one top level + <emphasis>node</emphasis>. In contrast to the HTML standard these + nodes are commonly called elements rather than tags. In this example + the top level (root) element is <tag + class="starttag">memo</tag>.</para> + </callout> + + <callout arearefs="first_xml_code_from"> + <para>Each XML element like <tag class="starttag">from</tag> has a + corresponding counterpart <tag class="endtag">from</tag>. In terms + of XML we say each element being opened has to be closed. In + conjunction with the precedent point this is equivalent to the fact + that each XML document represents a tree structure as being shown in + the <link linkend="mathmltree">tree graph</link> + representation.</para> + </callout> + </calloutlist> + + <para>As with the introductory formula example this representation + itself is of limited usefulness: In an office environment we need a + rendered version being given either as print or as some online format + like E-Mail or HTML.</para> + + <para>From a software developer's point of view we may use a piece of + software called a <emphasis>parser</emphasis> to test the document's + standard conformance. At the MI department we may simply invoke + <userinput><command>xmlparse</command> message.xml</userinput> to start + a check:</para> + + <programlisting language="none"><errortext>goik>xmlparse wellformed.xml +Parsing was successful</errortext></programlisting> + + <para>Various XML related plugins are supplied for the <productname + xlink:href="">eclipse platform</productname> like the + <productname xlink:href="">Oxygen + software</productname> supplying <quote>life</quote> conformance + checking while editing XML documents. Now we test our assumptions by + violating some of the rules stated before. We deliberately omit the + closing element <tag class="endtag">from</tag>:</para> + + <figure xml:id="omitFrom"> + <title>An invalid XML document due to the omission of <tag + class="endtag">from</tag>.</title> + + <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> +<memo> + <from>M. Goik <co xml:id="omitFromMissingElement"/> + <to>B. King</to> + <to>A. June</to> + <subject>Best whishes</subject> + <content>Hi all, congratulations to your splendid party</content> +</memo></programlisting> + + <calloutlist> + <callout arearefs="omitFromMissingElement"> + <para>The opening element <tag class="starttag">from</tag> is not + terminated by <tag class="endtag">from</tag>.</para> + </callout> + </calloutlist> + </figure> + + <para>Consequently the parser's output reads:</para> + + <programlisting language="none"><errortext>goik>xmlparse omitfrom.xml +file:///ma/goik/workspace/Vorlesungen/Input/Memo/omitfrom.xml:8:3: +fatal error org.xml.sax.SAXParseException: The element type "from" +must be terminated by the matching end-tag "</from>". parsing error</errortext></programlisting> + + <para>Experienced HTML authors may be confused: In fact HTML is not an + XML standard. Instead HTML belongs to the set of SGML applications. SGML + is a much older standard namely the <emphasis>Standard Generalized + Markup Language</emphasis>.</para> + + <para>Even if every XML element has a closing counterpart the resulting + XML may be invalid:</para> + + <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> +<memo> + <from>M. Goik<to>B. King</from></to> + <to>A. June</to> + <subject>Best whishes</subject> + <content>Hi all, congratulations to your splendid party</content> +</memo></programlisting> + + <para>The parser echoes:</para> + + <programlisting language="none"><computeroutput>file:///ma/goik/workspace/Vorlesungen/Input/Memo/nonest.xml:3:29: +fatal error org.xml.sax.SAXParseException: The element type "to" must be +terminated by the matching end-tag "</to>". parsing error</computeroutput></programlisting> + + <para>This type of error is caused by so called improper nesting of + elements: The element <tag class="starttag">from</tag>is closed before + the <quote>inner</quote> element <tag class="starttag">to</tag> has been + closed. Actually this violates the expressibility of XML documents as a + tree like structure. The situation may be resolved by choosing:</para> + + <programlisting language="none">...<from>M. Goik<to>B. King</to></from>...</programlisting> + + <para>We provide two examples illustrating proper and improper nesting + of XML documents:</para> + + <figure xml:id="fig_nestingProper"> + <title>Proper nesting of XML elements</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/propernest.fig" scale="65"/> + </imageobject> + </mediaobject> + </figure> + + <para>The following example violates proper nesting constraint and thus + does not provide an XML document:</para> + + <figure xml:id="fig_improperNest"> + <title>Improperly nested elements</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/impropernest.fig"/> + </imageobject> + </mediaobject> + </figure> + + <!-- goik:later + <para>An animation showing the usage of the Oxygen plug in for the + examples given above can be found <uri + xlink:href="src/viewlet/wellformed/wellformed_viewlet_swf.html">here</uri>.</para> +--> + + <para>XML elements may have so called attributes like <tag + class="attribute">date</tag> in the following example:</para> + + <figure xml:id="memoWellAttrib"> + <title>An XML document with attributes.</title> + + <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> +<memo date="10.02.2006" priority="high"> + <from>M. Goik</from> + <to>B. King</to> + <to>A. June</to> + <subject>Best whishes</subject> + <content>Hi all, congratulations to your splendid party</content> +</memo></programlisting> + </figure> + + <para>The conformance of a XML document with the following rules may be + verified by invoking a parser:</para> + + <itemizedlist> + <listitem> + <para>Within the <emphasis>scope</emphasis> of a given element an + attribute name must be unique. In the example above one may not + define a second attribute <varname>date="..."</varname> within the + same element <memo ... >. This reflects the usual programming + language semantics of attributes: In a <xref linkend="glo_Java"/> + class an + attribute is represented by an unique identifier and thus cannot + appear twice.</para> + </listitem> + + <listitem> + <para>An attribute value must be enclosed either in single (') or + double (") quotes. This is different from the HTML standard which + allows attribute values without quotes provided the given attribute + value does not give rise to ambiguities. For example <tag + class="starttag">td align=left</tag> is allowed since the attribute + value <tag class="attvalue">left</tag> does not contain any spaces + thus allowing a parser to recognize the end of the value's + definition.</para> + </listitem> + </itemizedlist> + + <qandaset defaultlabel="qanda" xml:id="example_memoAttribTree"> + <title>A graphical representation of a memo.</title> + + <qandadiv> + <qandaentry> + <question> + <para>Draw a graphical representation similar as in <xref + linkend="mathmltree"/> of the memo document being given in <xref + linkend="memoWellAttrib"/>.</para> + </question> + + <answer> + <para>The <link linkend="memoWellAttrib">memo document's</link> + structure may be visualized as:</para> + + <informalfigure xml:id="memotreeFigure"> + <para>A graphical representation of <xref + linkend="memoWellAttrib"/>:</para> + + <informalfigure xml:id="memotreeFigureFalse"> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/memotree.fig"/> + </imageobject> + </mediaobject> + </informalfigure> + + <para>The sequence of <emphasis>element</emphasis> child nodes + is important in XML and has to be preserved. Only the order of + the two attributes <tag class="attribute">date</tag> and <tag + class="attribute">priority</tag> is undefined: They actually + belong to the <tag class="starttag">memo</tag> node serving as + a dictionary with the attribute names being the keys and the + attribute values being the values of the dictionary.</para> + </informalfigure> + </answer> + </qandaentry> + + <qandaentry xml:id="example_attribInQuotes"> + <question> + <label>Attributes and quotes</label> + + <para>As stated before XML attributes have to be enclosed in + single or double quotes. Construct an XML document with mixed + quotes like <code><date day="monday'></code>. How does the + parser react? Find the corresponding syntax definition of legal + attribute values in the <link + xlink:href="">XML standard W3C + Recommendation</link>.</para> + </question> + + <answer> + <para>The parser flags a mixture of single and double quotes for + a given attribute as an error. The XML standard <link + xlink:href="">defines</link> + the syntax of attribute values: An attribute value has to be + enclosed <emphasis>either</emphasis> in two single + <emphasis>or</emphasis> in two double quotes as being defined in + <uri + xlink:href=""></uri>.</para> + </answer> + </qandaentry> + + <qandaentry xml:id="quoteInAttributValue"> + <question> + <label>Quotes as part of an attributes value?</label> + + <para>Single and double quote are used to delimit an attribute + value. May quotes appear themselves as part of an at tribute's + value, e.g. like in a person's name <code>Gary "King" + Mandelson</code>?</para> + </question> + + <answer> + <para>Attribute values may contain double quotes if the + attributes value is enclosed in single quotes and vice versa. As + a limitation the value of an an attribute may not contain single + quotes and double quotes at the same time:</para> + + <informalfigure xml:id="exampleSingleDoubleQuotes"> + <para>Quotes as part of attribute values.</para> + + <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> +<test> + <person name='Gary "King" Mandelson'/> <!-- o.k. --> + <person name="Gary 'King' Mandelson"/> <!-- o.k. --> + <person name="Gary 'King 'S.' "Mandelson"'/> <!-- oops! --> +</test></programlisting> + </informalfigure> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + + <para>Some constraints being imposed on XML documents by the standard + defined so far may be summarized as:</para> + + <itemizedlist> + <listitem> + <para>A XML documents requires to have exactly one top level + element.</para> + </listitem> + + <listitem> + <para>Elements have to be properly nested. An element must not be + closed if an <quote>inner</quote> Element is still open.</para> + </listitem> + + <listitem> + <para>Attribute names within a given Element must be unique.</para> + </listitem> + + <listitem> + <para>Attribute values <emphasis>must</emphasis> be quoted + correctly.</para> + </listitem> + </itemizedlist> + + <para>The very last rule shows one of several differences to the HTML + Standard: In HTML a lot of elements don't have to be closed. For example + paragraphs (<tag class="starttag">p</tag>) or images (<tag + class="starttag">img src='foo.gif'</tag>) don't have to be closed + explicitly. This is due to the fact that HTML used to be defined in + accordance with the older <emphasis><emphasis + role="bold">S</emphasis>tandard <emphasis + role="bold">G</emphasis>eneralized <emphasis + role="bold">M</emphasis>arkup <emphasis + role="bold">L</emphasis>anguage</emphasis> (SGML) Standard.</para> + + <para>These constraints are part of the definition of a <link + xlink:href="">well formed + document</link>. The specification imposes additional constraints for a + document to be well-formed.</para> + </section> + </chapter> + diff --git a/Sda1/xmlschema.xml b/Sda1/xmlschema.xml new file mode 100644 index 000000000..e4f385e40 --- /dev/null +++ b/Sda1/xmlschema.xml @@ -0,0 +1,1832 @@ + <chapter xml:id="xmlSchema" version="5.0" xmlns="" + xmlns:xlink="" + xmlns:xi="" + xmlns:svg="" + xmlns:m="" + xmlns:html="" + xmlns:db=""> + + + <title>Beyond well- formedness</title> + + <section xml:id="motivationSchema"> + <title>Motivation</title> + + <para>So far we are able to create XML documents containing + hierarchically structured data. We may nest elements and thus create + tree structures of arbitrary depth. The only restrictions being imposed + by the XML standard are the constraints of well - formedness. For many + purposes in software development this is not sufficient.</para> + + <para>A company named <productname>Softmail</productname> might + implement an email system which uses <link + linkend="memoWellAttrib">memo</link> document files as low level data + representation serving as a persistence layer. Now a second company + named <productname>Hardmail</productname> wants to integrate mails + generated by <productname>Softmail</productname>'s system into its own + business product. The <productname>Hardmail</productname> software + developers might <emphasis>infer</emphasis> the logical structure of + <productname>Softmail</productname>'s email representation but the + following problems arise:</para> + + <itemizedlist> + <listitem> + <para>The logical structure will in practice become more complex: + E-mails may contain attachments leading to multi part messages. + Additional header information is required for standard Internet mail + compliance. This adds additional complexity to the XML structure + being mandatory for data representation. Relying only on + well-formedness the specification of an internal E-mail format can + only be achieved <emphasis>informally</emphasis>. Thus a rule like + <quote>Each E-mail must have a subject</quote> may be written down + in the specification. A software developer will code these rules but + probably make mistakes as the set of rules grows.</para> + + <para>In contrast a RDBMS based solution offers to solve such + problems in a declarative manner: A developer may use a <code>NOT + NULL</code> constraint on a subject attribute of type + <code>VARCHAR</code> thus inhibiting empty subjects.</para> + </listitem> + + <listitem> + <para>As <productname>Softmail</productname>'s product evolves its + internal E-mail XML format is subject to change due to functional + extensions and possibly bug fixes both giving rise to + interoperability problems.</para> + </listitem> + </itemizedlist> + + <para>Generally speaking well formed XML documents lack grammar + constraints as being available for programming languages. In case of + RDBMS developers can impose primary-, foreign and <code>CHECK</code> + constraints in a <emphasis>declarative</emphasis> manner rather than + hard coding them into their applications (A solution bad programmers are + in favour of though...). Various XML standards exist for declarative + constraint definitions namely:</para> + + <itemizedlist> + <listitem> + <para>DTDs</para> + </listitem> + + <listitem> + <para><link xlink:href="">XML + Schema</link></para> + </listitem> + + <listitem> + <para><link + xlink:href="">RelaxNG</link></para> + </listitem> + </itemizedlist> + </section> + + <section xml:id="dtdBasic"> + <title>XML Schema</title> + + <section xml:id="dtdFirstExample"> + <title>Structural descriptions for documents</title> + + <para>As an example we choose documents of type + <emphasis>memo</emphasis> as a starting point. Documents like the + example from <xref linkend="memoWellAttrib"/> may be + <emphasis>informally</emphasis> described to be a sequence of the + following mandatory items:</para> + + <figure xml:id="figure_memo_informalconstraints"> + <title>Informal constraints on <tag class="element">memo</tag> + document instances</title> + + <itemizedlist> + <listitem> + <para><emphasis>Exactly one</emphasis> sender.</para> + </listitem> + + <listitem> + <para><emphasis>One or more</emphasis> recipients.</para> + </listitem> + + <listitem> + <para>Subject</para> + </listitem> + + <listitem> + <para>Content</para> + </listitem> + </itemizedlist> + + <para>In addition we have:</para> + + <itemizedlist> + <listitem> + <para>A date string <emphasis>must</emphasis> be supplied</para> + </listitem> + + <listitem> + <para>A priority <emphasis>may</emphasis> be supplied with + allowed values to be chosen from the set of values <tag + class="attvalue">low</tag>, <tag class="attvalue">medium</tag> + or <tag class="attvalue">high</tag>.</para> + </listitem> + </itemizedlist> + </figure> + + <para>All these fields contain ordinary text to be filled in by a user + and shall appear exactly in the defined order. For simplicity we do + not care about email address syntax rules being described in <link + xlink:href="">RFC based address + schemes</link>. We will see how the <emphasis>constraints</emphasis> + mentioned above can be modelled in XML by an extension to the concept + of well formed documents.</para> + </section> + + <section xml:id="section_memo_machinereadable"> + <title>A machine readable description</title> + + <para>We now introduce an example of an XML schema. It allows for the + specification of additional constraints to both element nodes and + their attributes. Our set of <link + linkend="figure_memo_informalconstraints" revision="">informal + constraints</link> on memo documents may be expressed as:</para> + + <figure xml:id="figure_memo_dtd"> + <title>A schema to describe memo documents.</title> + + <programlisting language="none"><xs:schema xmlns:xs="" + xmlns:vc="" elementFormDefault="qualified" + vc:minVersion="1.0" vc:maxVersion="1.1"> + + <xs:element name="memo"> + <xs:complexType> + <xs:sequence> <co xml:id="memodtd_memodef"/> + <xs:element name="from" type="xs:string"/> <co + xml:id="memodtd_elem_from"/> + <xs:element name="to" minOccurs="1" maxOccurs="unbounded" type="xs:string"/> + <xs:element name="subject" type="xs:string"/> + <xs:element name="content" type="xs:string"/> + </xs:sequence> + <xs:attribute name="date" type="xs:date" use="required"/> <co + xml:id="memodtd_memo_attribs"/> + <xs:attribute name="priority" type="Priority" use="optional"/> + </xs:complexType> + + </xs:element> + + <xs:simpleType name="Priority"> + <xs:restriction base="xs:string"> + <xs:enumeration value="low"/> + <xs:enumeration value="medium"/> + <xs:enumeration value="high"/> + </xs:restriction> + </xs:simpleType> + +</xs:schema></programlisting> + + <calloutlist> + <callout arearefs="memodtd_memodef"> + <para>A <tag class="element">memo</tag> consists of a sender, at + least one recipient, a subject and content.</para> + </callout> + + <callout arearefs="memodtd_memo_attribs"> + <para>A <tag class="element">memo</tag> has got one required + attribute <varname>date</varname> and an optional attribute + <varname>priority</varname> being restricted to the three + allowed values <tag class="attvalue">low</tag>, <tag + class="attvalue">medium</tag> and <tag + class="attvalue">high</tag> being defined by a separate <tag + class="starttag">xs:simpleType</tag> directive.</para> + </callout> + + <callout arearefs="memodtd_elem_from"> + <para>A <tag class="starttag">from</tag> element consists of + ordinary text. This disallows XML markup. For example + <code><from>Smith & partner</from></code> is + disallowed since XML uses the ampersand (&) to denote the + beginning of an entity like <tag class="genentity">auml</tag> + for the German a-umlaut (ä). The correct form is + <code><from>Smith &amp; partner</from></code> + using the predefined entity <tag class="genentity">amp</tag> as + an escape sequence for the ampersand.</para> + + <para><code>type="xs:string"</code> is a built in XML Schema + type representing a restricted version of ordinary strings. + Without digging into details a <code>xs:string</code> string + must not contain any markup code like e.g. <tag + class="starttag">msqrt</tag>. This ensures that a string does + not interfere with the document's XML markup.</para> + </callout> + </calloutlist> + </figure> + + <para>We notice our schema's syntax itself is an XML document.</para> + + <para>From the viewpoint of software modeling an XML Schema instance + is a <emphasis>schema</emphasis> describing the syntax of a class of + XML document instances adhering to it. In the context of XML + technologies <link xlink:href="">XML + Schema</link> is one of several language alternatives which allow for + XML document structure descriptions.</para> + + <para>Readers being familiar with <abbrev + xlink:href="">BNF</abbrev> + or <abbrev + xlink:href="">EBNF</abbrev> + will be able to understand the grammatical rules being expressed + here.</para> + + <productionset> + <title>A message of type <tag class="starttag">memo</tag></title> + + <production xml:id="memo.ebnf.memo"> + <lhs>Memo Message</lhs> + + <rhs>'<memo>' <nonterminal + def="#memo.ebnf.sender">Sender</nonterminal> [<nonterminal + def="#memo.ebnf.recipient">Recipient</nonterminal>]+ <nonterminal + def="#memo.ebnf.subject">Subject</nonterminal> <nonterminal + def="#memo.ebnf.content">Content</nonterminal> + '</memo>'</rhs> + </production> + + <production xml:id="memo.ebnf.sender"> + <lhs>Sender</lhs> + + <rhs>'<from>' <nonterminal def="#memo.ebnf.text"> Text + </nonterminal> '</from>'</rhs> + </production> + + <production xml:id="memo.ebnf.recipient"> + <lhs>Recipient</lhs> + + <rhs>'<to>' <nonterminal def="#memo.ebnf.text"> Text + </nonterminal> '</to>'</rhs> + </production> + + <production xml:id="memo.ebnf.subject"> + <lhs>Subject</lhs> + + <rhs>'<subject>' <nonterminal def="#memo.ebnf.text"> Text + </nonterminal> '</subject>'</rhs> + </production> + + <production xml:id="memo.ebnf.content"> + <lhs>Content</lhs> + + <rhs>'<content>' <nonterminal def="#memo.ebnf.text"> Text + </nonterminal> '</content>'</rhs> + </production> + + <production xml:id="memo.ebnf.text"> + <lhs>Text</lhs> + + <rhs>[a-zA-Z0-9]* <lineannotation>In real documents this is too + restrictive!</lineannotation></rhs> + </production> + </productionset> + + <para>We may as well supply a graphical representation:</para> + + <figure xml:id="extendContModelGraph"> + <title>Graphical representation of the extended <code>content</code> + model.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/contentmixed.fig"/> + </imageobject> + </mediaobject> + </figure> + + <para>In comparison to our informal description of memo documents a + schema offers an added value: The grammar is machine readable and may + thus become input to a parser which in turn gets enabled to check + whether an XML document obeys the constraints being imposed. So the + parser must be instructed to use a schema in addition to the XML + document in question. For this purpose an XML document may define a + reference to a schema:</para> + + <figure xml:id="memo_external_dtd"> + <title>A memo document instance holding a reference to a document + external schema.</title> + + <programlisting language="none"><memo <co + xml:id="memo_external_dtd_top_element"/> xmlns:xsi="" + xsi:noNamespaceSchemaLocation="memo.xsd" <co + xml:id="memo_external_dtd_url"/> + date="2014-09-24" priority="high"> + <from>M. Goik</from> + <to>B. King</to> + <to>A. June</to> + <subject>Best whishes</subject> + <content>Hi all, congratulations to your splendid party</content> +</memo></programlisting> + + <calloutlist> + <callout arearefs="memo_external_dtd_top_element"> + <para>The element <tag class="starttag">memo</tag> is chosen to + be the top (root) element of the document's tree. It must be + defined in our schema <filename>memo.xsd</filename>. This is + really a choice since an XML schema defines a + <emphasis>set</emphasis> of elements in + <emphasis>arbitrary</emphasis> order. There is no such rule as + <quote>define before use</quote>. So an XML schema does not tell + us which element has to appear on top of a document.</para> + + <para>Suppose a given XML schema offers both <tag + class="starttag">book</tag> and <tag + class="starttag">report</tag> elements. An XML author writing a + complex document will choose <tag class="starttag">book</tag> as + top level element rather than <tag class="starttag">report</tag> + being more appropriate for a small piece of documentation. + Consequently it is an XML authors <emphasis>choice</emphasis> + which of the elements being defined in a schema shall appear as + <emphasis>the</emphasis> top level element</para> + </callout> + + <callout arearefs="memo_external_dtd_url"> + <para>The address of the schema's rule set. In the given example + it is just a filename but it may as well be an <link + xlink:href="">URL</link> of type + <abbrev + xlink:href="">ftp</abbrev>, + <abbrev xlink:href="">http</abbrev> + and so on, see <xref linkend="memoDtdOnFtp"/>.</para> + </callout> + </calloutlist> + </figure> + + <para>In presence of a schema parsing a document is actually a two + step process: First the parser will check the document for well + -formedness. Then the parser will read the referenced schema + <filename>memo.xsd</filename> and check the document for the + additional constraints being defined within.</para> + + <para>In the current example both the schema and the XML memo document + reside as text files in a common file system folder. For general use a + schema is usually kept at a centralized location. The attribute + <varname>xsi:noNamespaceSchemaLocation</varname> value is actually a + <emphasis>U</emphasis><foreignphrase>niform</foreignphrase> + <emphasis>R</emphasis><foreignphrase>esource</foreignphrase> + <emphasis>L</emphasis><foreignphrase>ocator</foreignphrase> <link + xlink:href="">(URL)</link>. Thus our + <filename>memo.xsd</filename> may also be supplied as a <abbrev + xlink:href="">http</abbrev> or <abbrev + xlink:href="">ftp</abbrev> + <link xlink:href="">URL</link>:</para> + + <figure xml:id="memoDtdOnFtp"> + <title>A schema reference to a FTP server.</title> + + <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> +<memo ... xsi:noNamespaceSchemaLocation=""> + <from>M. Goik</from> + ... +</memo></programlisting> + </figure> + + <para>Some terms are helpful in the context of schemas:</para> + + <variablelist> + <varlistentry> + <term>Validating / non-validating:</term> + + <listitem> + <para>A non-validating parser only checks a document for well- + formedness. If it also checks XML documents for conformance to + schema it is a <emphasis>validating</emphasis> parser.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>Valid / invalid documents:</term> + + <listitem> + <para>An XML document referencing a schema may either be valid + or invalid depending on its conformance to the schema in + question.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>Document instance:</term> + + <listitem> + <para>An XML memo document may conform to the <link + linkend="figure_memo_dtd">memo schema</link>. In this case we + call it a <emphasis>document instance</emphasis> of the memo + schema.</para> + + <para>This situation is quite similar as in typed programming + languages: A <xref linkend="glo_Java"/> + <code>class</code> declaration is a blueprint for the <xref linkend="glo_Java"/> runtime + system to construct <xref linkend="glo_Java"/> objects + in memory. This is done by e.g. a statement<code> String name = + new String();</code>. The identifier <code>name</code> will hold + a reference to an <emphasis>instance of class String</emphasis>. + So in a <xref linkend="glo_Java"/> runtime + environment a class declaration plays the same role as a schema + declaration in XML. See also <xref + linkend="example_memoJavaClass"/>.</para> + </listitem> + </varlistentry> + </variablelist> + + <para>For further discussions it is very useful to clearly distinguish + element definitions in a schema from their + <emphasis>realizations</emphasis> in a corresponding document + instance: Our memo schema defines an element <tag + class="starttag">from</tag> to be of content <type>xs:string</type>. + According to the schema at least one <tag class="starttag">from</tag> + clause must appear in a valid (conforming) document instance . If we + were talking about HTML document instances we would prefer to talk + about a <tag class="starttag">from</tag> <emphasis>tag</emphasis> + rather than a <tag class="starttag">from</tag> + <emphasis>element</emphasis>.</para> + + <para>In this document we will use the term <emphasis>element + type</emphasis> to denote an <code><xs:element ...</code> + definition in a schema. Thus we will talk about an element type <tag + class="element">subject</tag> being defined in + <filename>memo.xsd</filename>.</para> + + <para>An element type being defined in a <abbrev + xlink:href="">schema</abbrev> + may have document instances as realizations. For example the document + instance shown in <xref linkend="memo_external_dtd"/> has two + <emphasis>nodes</emphasis> of element type <tag + class="element">to</tag>. Thus we say that the document instance + contains two <emphasis>element nodes</emphasis> of type <tag + class="element">to</tag>. We will frequently abbreviate this by saying + the instance contains to <tag class="starttag">from</tag> element + nodes. And we may even omit the term <emphasis>nodes</emphasis> and + simply talk about two <tag class="starttag">from</tag> elements. But + the careful reader should always distinguish between a single type + <code>foo</code> being defined in a <abbrev + xlink:href="">schema</abbrev> + and the possibly empty set of <tag class="starttag">foo</tag> nodes + appearing in valid document instances.</para> + + <para><abbrev + xlink:href="">Schema</abbrev>'s + appear on top of well-formed XML documents:</para> + + <figure xml:id="wellformedandvalid"> + <title>Well-formed and valid documents</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/wellformedandvalid.fig" scale="65"/> + </imageobject> + </mediaobject> + </figure> + + <qandaset defaultlabel="qanda" xml:id="example_memoTestValid"> + <title>Validation of memo document instances.</title> + + <qandadiv> + <qandaentry> + <question> + <para>Copy the two files <link + xlink:href="Ref/src/Memo.1/message.xml">message.xml</link> and + <link xlink:href="Ref/src/Memo.1/memo.xsd">memo.xsd</link> + into your eclipse project. Use the Oxygen XML plug in to check + if the document is valid. Then subsequently do and undo the + following changes each time checking the document for + validity:</para> + + <itemizedlist> + <listitem> + <para>Omit the <tag class="starttag">from</tag> + element.</para> + </listitem> + + <listitem> + <para>Change the order of the two sub elements <tag + class="starttag">subject</tag> and <tag + class="starttag">content</tag>.</para> + </listitem> + + <listitem> + <para>Erase the <varname>date</varname> attribute and its + value.</para> + </listitem> + + <listitem> + <para>Erase the <varname>priority</varname> attribute and + its value.</para> + </listitem> + </itemizedlist> + + <para>What do you observe?</para> + </question> + + <answer> + <para>The <tag class="attribute">priority</tag> attribute is + declared as <code>optional</code> and may thus be omitted. + Erasing the <tag class="attribute">priority</tag> attribute + thus leaves the document in a valid state. The remaining three + edit actions yield an invalid document instance.</para> + </answer> + </qandaentry> + + <qandaentry xml:id="example_memoJavaClass"> + <question> + <label>A memo implementation sketch in Java</label> + + <para>The aim of this exercise is to clarify the (abstract) + relation between XML <abbrev + xlink:href="">schema</abbrev>'s + and sets of <xref linkend="glo_Java"/> + classes rather then building a running application. We want to + model the <link xlink:href="Ref/src/Memo.1/memo.xsd">memo + schema</link> as a set of <xref linkend="glo_Java"/> + classes.</para> + </question> + + <answer> + <para>The XML attributes <tag class="attribute">date</tag> and + <tag class="attribute">priority</tag> can be mapped as <xref linkend="glo_Java"/> + attributes. The same applies for the Memo elements <tag + class="element">from</tag>, <tag class="element">subject</tag> + and <tag class="element">content</tag> which may be + implemented as simple Strings or alternatively as separate + Classes wrapping the String content. The latter method of + implementation should be preferred if the Memo schema is + expected to grow in complexity. A simple sketch reads:</para> + + <programlisting language="none">import java.util.Date; +import java.util.SortedSet; + +public class Memo { + private Date date; + Priority priority = Priority.standard; + private String from, subject,content; + private SortedSet<String> to; + // Accessors not yet implemented +}</programlisting> + + <para>The only thing to note here is the implementation of the + <tag class="element">to</tag> element: We want to be able to + address a <emphasis>set</emphasis> of recipients. Thus we have + to disallow duplicates. Note that this is an + <emphasis>informal</emphasis> constraint not being handled by + our schema: A Memo document instance <emphasis>may</emphasis> + have duplicate content in <tag class="starttag">to</tag> + nodes. This is a weakness of <abbrev + xlink:href="">schema</abbrev>s: + We are unable to impose uniqueness constraints on the content + of partial sets of document nodes.</para> + + <para>On the other hand our set of recipients has to be + ordered: In a XML document instance the order of <tag + class="starttag">to</tag> nodes is important and has to be + preserved in a <xref linkend="glo_Java"/> + representation. Thus we choose an + <classname>java.util.SortedSet</classname> parametrized with + String type to fulfill both requirements.</para> + + <para>Our schema defines:</para> + + <programlisting language="none"><!ATTLIST memo ... priority (low|medium|high) #IMPLIED></programlisting> + + <para>Starting from <xref linkend="glo_Java"/> 1.5 we + may implement this constraint by a type safe enumeration in a + file <filename></filename>:</para> + + <programlisting language="none">public enum Priority{low, standard, high};</programlisting> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + + <para>In the following chapters we will extend the memo document type + (<code><!DOCTYPE memo ... ></code>) to demonstrate various + concepts of <abbrev + xlink:href="">schema</abbrev>'s + and other XML related standards. In parallel a series of exercises + deals with building a schema usable to edit books. This schema gets + extended as our knowledge about XML advances. We start with an initial + exercise:</para> + + <qandaset defaultlabel="qanda" xml:id="example_bookDtd"> + <title>A schema for editing books</title> + + <qandadiv> + <qandaentry> + <question> + <para>Write a schema describing book document instances with + the following features:</para> + + <itemizedlist> + <listitem> + <para>A book shall have a title to describe the book + itself.</para> + </listitem> + + <listitem> + <para>A book shall have at least one but possibly a + sequence of chapters.</para> + </listitem> + + <listitem> + <para>Each chapter shall have a title and at least one + paragraph.</para> + </listitem> + + <listitem> + <para>The titles and paragraphs shall consist of ordinary + text.</para> + </listitem> + </itemizedlist> + </question> + + <answer> + <para>A possible schema looks like:</para> + + <figure xml:id="figure_book.dtd_v1"> + <title>A first schema version for book documents</title> + + <programlisting language="none"><xs:schema xmlns:xs="" + xmlns:vc="" elementFormDefault="qualified" + vc:minVersion="1.0" vc:maxVersion="1.1"> + + <xs:element name="book"> + <xs:complexType> + <xs:sequence> + <xs:element ref="title"/> + <xs:element ref="chapter" minOccurs="1" maxOccurs="unbounded"/> + </xs:sequence> + </xs:complexType> + </xs:element> + + <xs:element name="title" type="xs:string"/> + <xs:element name="chapter"> + <xs:complexType> + <xs:sequence> + <xs:element ref="title"/> + <xs:element ref="para" minOccurs="1" maxOccurs="unbounded"/> + </xs:sequence> + </xs:complexType> + </xs:element> + + <xs:element name="para" type="xs:string"/> + +</xs:schema></programlisting> + </figure> + + <para>We supply a valid document instance:</para> + + <informalfigure xml:id="bookInitialInstance"> + <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> +<book xmlns:xsi="" + xsi:noNamespaceSchemaLocation="book.xsd"> + <title>Introduction to Java</title> + <chapter> + <title>Introduction</title> + <para>Java is a programming language</para> + </chapter> + <chapter> + <title>The virtual machine</title> + <para>We also call it the runtime system.</para> + </chapter> + <chapter> + <title>Annotations</title> + <para>Annotations provide a means to add meta information.</para> + <para>This is especially useful for framework authors.</para> + </chapter> +</book></programlisting> + </informalfigure> + + <para>.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="dtdVsSqlDdl"> + <title>Relating <abbrev + xlink:href="">schema</abbrev>'s + and <acronym + xlink:href="">SQL</acronym> - <abbrev + xlink:href="">DDL</abbrev></title> + + <para>XML <abbrev + xlink:href="">schema</abbrev>'s + and <acronym + xlink:href="">SQL</acronym> - <abbrev + xlink:href="">DDL</abbrev> + are related: They both describe data models and thus integrity + constraints. We consider a simple invoice example:</para> + + <figure xml:id="invoiceIntegrity"> + <title>Invoice integrity constraints</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/invoicedata.fig" scale="65"/> + </imageobject> + </mediaobject> + </figure> + + <para>A relational implementation may look like:</para> + + <figure xml:id="invoiceSqlDdl"> + <title>Relational implementation</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/invoicedataimplement.fig" scale="65"/> + </imageobject> + </mediaobject> + </figure> + + <qandaset defaultlabel="qanda" xml:id="qandaInvoiceSchema"> + <title>An XML schema representing invoices</title> + + <qandadiv> + <qandaentry> + <question> + <para>Represent the relational schema being described in <xref + linkend="invoiceSqlDdl"/> by an XML Schema and provide an + appropriate instance example.</para> + </question> + + <answer> + <para>A possible schema implementation:</para> + + <programlisting language="none"><xs:schema xmlns:xs="" + xmlns:vc="" elementFormDefault="qualified" + vc:minVersion="1.0" vc:maxVersion="1.1"> + + <xs:simpleType name="money"> + <xs:restriction base="xs:decimal"> + <xs:fractionDigits value="2"/> + </xs:restriction> + </xs:simpleType> + + <xs:element name="data"> + <xs:complexType> + <xs:sequence> + <xs:element ref="customer" maxOccurs="unbounded"/> + <xs:element ref="invoice" maxOccurs="unbounded"/> + </xs:sequence> + </xs:complexType> + <xs:key name="customerId"> + <xs:selector xpath="customer"/> + <xs:field xpath="@id"/> + </xs:key> + + <xs:keyref refer="customerId" name="customerToInvoice"> + <xs:selector xpath="invoice"/> + <xs:field xpath="@customer"></xs:field> + </xs:keyref> + </xs:element> + + <xs:element name="customer"> + <xs:complexType> + <xs:sequence> + <xs:element name="name" type="xs:string"/> + <xs:element name="phoneNumber" type="xs:string" minOccurs="0"/> + </xs:sequence> + <xs:attribute name="id" type="xs:int" use="required"/> + </xs:complexType> + </xs:element> + + <xs:element name="invoice"> + <xs:complexType> + <xs:sequence> + <xs:element name="amount" type="money"/> + <xs:element name="status"> + <xs:simpleType> + <xs:restriction base="xs:token"> + <xs:enumeration value="open"/> + <xs:enumeration value="due"/> + <xs:enumeration value="cleared"/> + </xs:restriction> + </xs:simpleType> + </xs:element> + </xs:sequence> + <xs:attribute name="customer" type="xs:int" use="required"/> + </xs:complexType> + </xs:element> + +</xs:schema></programlisting> + + <para>An example data set:</para> + + <programlisting language="none"><data xmlns:xsi="" + xsi:noNamespaceSchemaLocation="invoice.xsd"> + <customer id="5"> + <name>Clarke Jefferson</name> + </customer> + + <invoice customer="5"> + <amount>33.12</amount> + <status>due</status> + </invoice> +</data></programlisting> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="airlineXsd"> + <title>The airline example revisited</title> + + <qandaset defaultlabel="qanda" xml:id="qandaAirlineXsd"> + <title>Airline meta information by XML schema</title> + + <qandadiv> + <qandaentry> + <question> + <para>Transform the relational schema from <xref + linkend="airlineRelationalSchema"/> into an XML schema and + supply some test data. In particular consider the following + constraints:</para> + + <itemizedlist> + <listitem> + <para>Data types</para> + + <itemizedlist> + <listitem> + <para><link + xlink:href="">ICAO + airline designator</link></para> + </listitem> + + <listitem> + <para><link + xlink:href="">ICAO + airport code</link></para> + </listitem> + </itemizedlist> + </listitem> + + <listitem> + <para>Primary / Unique key definitions</para> + </listitem> + + <listitem> + <para>Foreign key definitions</para> + </listitem> + + <listitem> + <para>CHECK constraint: Your XML schema will require <tag + class="starttag">xs:assert test="..." </tag> and thus XML + schema version 1.1. You may want to read about + co-occurrence constraints as being described in <link + xlink:href="">Listing + 6. Assertion on complex type - @height < + @width</link>.</para> + </listitem> + </itemizedlist> + + <para>The following XML example instance may guide you towards + an <filename>airline.xsd</filename> schema:</para> + + <programlisting language="none"><top xmlns:xsi="" + xsi:noNamespaceSchemaLocation="airline.xsd"> + <airlines> + <airline airlineCode="DLH" id="1"> + <name>Lufthansa</name> + </airline> + <airline airlineCode="AFR" id="2"> + <name>Air France</name> + </airline> + </airlines> + <destinations> + <destination id="1" airportCode="EDDF"> + <fullName>Frankfurt International Airport – Frankfurt am Main</fullName> + </destination> + + <destination id="3" airportCode="EBCI"> + <fullName>Brussels South Charleroi Airport – Charleroi</fullName> + </destination> + </destinations> + + <flights> + <flight id="1" airline="2" origin="1" destination="3"> + <flightNumber>LH 4234</flightNumber> + </flight> + </flights> +</top></programlisting> + + <para>Hints:</para> + + <itemizedlist> + <listitem> + <para>Identify all relational schema constraints from + solution of <xref linkend="airlineRelationalSchema"/> and + model them accordingly.</para> + </listitem> + + <listitem> + <para>The above example does not contain any constraint + violations. In order to test your schema for completeness + tinkering with primary key, unique and referencing + attribute values may be helpful.</para> + </listitem> + </itemizedlist> + </question> + + <answer> + <programlisting language="none"><xs:schema xmlns:xs="" + xmlns:vc="" elementFormDefault="qualified" + vc:minVersion="1.1"> + + <xs:simpleType name="ICAOAirportCode"> + <xs:restriction base="xs:string"> + <xs:length value="4" /> + <xs:pattern value="[A-Z09]+"></xs:pattern> + </xs:restriction> + </xs:simpleType> + + <xs:simpleType name="ICAOAirlineCode"> + <xs:restriction base="xs:string"> + <xs:length value="3"/> + <xs:pattern value="[A-Z]+"></xs:pattern> + </xs:restriction> + </xs:simpleType> + + <xs:element name="top"> + <xs:complexType> + <xs:sequence> + <xs:element ref="airlines"/> + <xs:element ref="destinations"/> + <xs:element ref="flights"/> + </xs:sequence> + </xs:complexType> + + <xs:keyref name="_FK_Flight_airline" refer="_PK_Airline_id"> + <xs:selector xpath="flights/flight"/> + <xs:field xpath="@airline"/> + </xs:keyref> + + <xs:keyref name="_FK_Flight_origin" refer="_PK_Destination_id"> + <xs:selector xpath="flights/flight"/> + <xs:field xpath="@origin"/> + </xs:keyref> + + <xs:keyref name="_FK_Flight_destination" refer="_PK_Destination_id"> + <xs:selector xpath="flights/flight"/> + <xs:field xpath="@destination"/> + </xs:keyref> + + </xs:element> + + <xs:element name="airlines"> + <xs:complexType> + <xs:sequence> + <xs:element ref="airline" minOccurs="0" maxOccurs="unbounded"/> + </xs:sequence> + </xs:complexType> + + <xs:key name="_PK_Airline_id"> + <xs:selector xpath="airline"/> + <xs:field xpath="@id"/> + </xs:key> + + <xs:key name="_UN_Airline_name"> + <xs:selector xpath="airline"/> + <xs:field xpath="name"/> + </xs:key> + + <xs:key name="_UN_Airline_airlineCode"> + <xs:selector xpath="airline"/> + <xs:field xpath="@airlineCode"/> + </xs:key> + </xs:element> + + <xs:element name="airline"> + <xs:complexType> + <xs:sequence> + <xs:element name="name" type="xs:string"/> + </xs:sequence> + <xs:attribute name="id" type="xs:int" use="required"/> + <xs:attribute name="airlineCode" type="ICAOAirlineCode" use="required"/> + </xs:complexType> + </xs:element> + + <xs:element name="destinations"> + <xs:complexType> + <xs:sequence> + <xs:element ref="destination" minOccurs="0" maxOccurs="unbounded"/> + </xs:sequence> + </xs:complexType> + + <xs:key name="_PK_Destination_id"> + <xs:selector xpath="destination"/> + <xs:field xpath="@id"/> + </xs:key> + + <xs:key name="_UN_Destination_airportCode"> + <xs:selector xpath="destination"/> + <xs:field xpath="@airportCode"/> + </xs:key> + </xs:element> + + <xs:element name="destination"> + <xs:complexType> + <xs:sequence> + <xs:element name="fullName"/> + </xs:sequence> + <xs:attribute name="id" type="xs:int"/> + <xs:attribute name="airportCode" type="ICAOAirportCode"/> + </xs:complexType> + </xs:element> + + <xs:element name="flights"> + <xs:complexType> + <xs:sequence> + <xs:element ref="flight" minOccurs="0" maxOccurs="unbounded"/> + </xs:sequence> + </xs:complexType> + + <xs:key name="_PK_Flight_id"> + <xs:selector xpath="flight"/> + <xs:field xpath="@id"/> + </xs:key> + + <xs:key name="_UN_Flight_flightNumber"> + <xs:selector xpath="flight"/> + <xs:field xpath="flightNumber"/> + </xs:key> + + </xs:element> + + <xs:element name="flight"> + <xs:complexType> + <xs:sequence> + <xs:element name="flightNumber" type="xs:string"/> + </xs:sequence> + <xs:attribute name="id" type="xs:int" use="required"/> + <xs:attribute name="airline" type="xs:int" use="required"/> + <xs:attribute name="origin" type="xs:int"/> + <xs:attribute name="destination" type="xs:int"/> + <xs:assert test="not(@origin = @destination)"> + <xs:annotation> + <xs:documentation>CHECK constraint _CK_Flight_origin_destination</xs:documentation> + </xs:annotation> + </xs:assert> + </xs:complexType> + </xs:element> + +</xs:schema></programlisting> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="xmlAndJava"> + <title>Relating <abbrev + xlink:href="">schema</abbrev>'s + and <xref linkend="glo_Java"/> + class descriptions.</title> + + <para>We may also compare XML data constraints to <xref linkend="glo_Java"/>. A <xref linkend="glo_Java"/> class + declaration is actually a blueprint for a <trademark + xlink:href="">JRE</trademark> + to instantiate compatible objects. Likewise an XML schema restricts + well-formed documents:</para> + + <figure xml:id="fig_XmlAndJava"> + <title>XML <abbrev + xlink:href="">schema</abbrev>'s + and <xref linkend="glo_Java"/> + class declarations.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/xmlattribandjava.fig" scale="65"/> + </imageobject> + </mediaobject> + </figure> + </section> + + <section xml:id="xmlSchemaExercise"> + <title>XML schema exercises</title> + + <section xml:id="sectSchemaProductCatalog"> + <title>A product catalog</title> + + <qandaset defaultlabel="qanda" xml:id="quandaProductCatalog"> + <title>Product catalog schema</title> + + <qandadiv> + <qandaentry> + <question> + <para>Consider the following product catalog example:</para> + + <programlisting language="none"><catalog xmlns:xsi="" + xsi:noNamespaceSchemaLocation="catalog.xsd"> + <title>Outdoor products</title> + <introduction> + <para>We offer a great variety of basic stuff for mountaineering + such as ropes, harnesses and tents.</para> + <para>Our shop is proud for its large number of available + sleeping bags.</para> + </introduction> + <product id="x-223"> + <title>Multi freezing bag Nightmare camper</title> + <description> + <para>You will feel comfortable till minus 20 degrees - At + least if you are a penguin or a polar bear.</para> + </description> + </product> + <product id="r-334"> + <title>Rope 40m</title> + <description> + <para>Excellent for indoor climbing.</para> + </description> + </product> +</catalog></programlisting> + + <para>As you may have inferred the following rules shall + apply for arbitrary catalog documents:</para> + + <itemizedlist> + <listitem> + <para>Each <tag class="starttag">catalog</tag> shall + have exactly one <tag class="starttag">title</tag> and + <tag class="starttag">introduction</tag> element.</para> + </listitem> + + <listitem> + <para><tag class="starttag">introduction</tag> and <tag + class="starttag">description</tag> shall have at least + one <tag class="starttag">para</tag> child.</para> + </listitem> + + <listitem> + <para>Each <tag class="starttag">catalog</tag> shall + have at least one <tag + class="starttag">product</tag>.</para> + </listitem> + + <listitem> + <para>Each <tag class="starttag">product</tag> shall + have exactly one <tag class="starttag">title</tag> and + at least one <tag class="starttag">para</tag> child + element.</para> + </listitem> + + <listitem> + <para>The required <code>id</code> attribute shall not + contain whitespace and be unique with respect to all + <tag class="starttag">product</tag> elements.</para> + </listitem> + + <listitem> + <para>The attribute price shall represent money amounts + and be optional.</para> + </listitem> + </itemizedlist> + + <para>Provide a suitable <filename>catalog.xsd</filename> + schema.</para> + </question> + + <answer> + <programlisting language="none"><xs:schema xmlns:xs="" + xmlns:vc="" elementFormDefault="qualified" + vc:minVersion="1.0" vc:maxVersion="1.1"> + + <xs:simpleType name="money"> + <xs:restriction base="xs:decimal"> + <xs:fractionDigits value="2"/> + </xs:restriction> + </xs:simpleType> + + <xs:element name="title" type="xs:string"/> + <xs:element name="para" type="xs:string"/> + + <xs:element name="description" type="paraSequence"/> + <xs:element name="introduction" type="paraSequence"/> + + <xs:complexType name="paraSequence"> + <xs:sequence> + <xs:element ref="para" minOccurs="1" maxOccurs="unbounded"/> + </xs:sequence> + </xs:complexType> + + <xs:element name="product"> + <xs:complexType> + <xs:sequence> + <xs:element ref="title"/> + <xs:element ref="description"/> + </xs:sequence> + <xs:attribute name="id" type="xs:token" use="required"/> + <xs:attribute name="price" type="money" use="optional"/> + </xs:complexType> + </xs:element> + + <xs:element name="catalog"> + <xs:complexType> + <xs:sequence> + <xs:element ref="title"/> + <xs:element ref="introduction"/> + <xs:element ref="product" minOccurs="1" maxOccurs="unbounded"/> + </xs:sequence> + </xs:complexType> + + <xs:key name="uniqueProductId"> + <xs:selector xpath="product"></xs:selector> + <xs:field xpath="@id"/> + </xs:key> + </xs:element> + +</xs:schema></programlisting> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="sectQandaBookV1"> + <title>Book like documents</title> + + <qandaset defaultlabel="qanda" xml:id="example_operatorprecedence"> + <title>Book documents with mixed content and itemized + lists</title> + + <qandadiv> + <qandaentry xml:id="example_book_v2"> + <question> + <para>Extend the first version of <link + linkend="example_bookDtd">book.xsd</link> to support the + following features:</para> + + <itemizedlist> + <listitem> + <para>Within a <tag class="starttag">chapter</tag> node + <tag class="starttag">para</tag> and <tag + class="starttag">itemizedlist</tag> elements in + arbitrary order shall be allowed.</para> + </listitem> + + <listitem> + <para><tag class="starttag">itemizedlist</tag> nodes + shall contain at least one <tag + class="starttag">listitem</tag>.</para> + </listitem> + + <listitem> + <para><tag class="starttag">listitem</tag> nodes shall + be composed of one or more para or nested list item + elements.</para> + </listitem> + + <listitem> + <para>Within a <tag class="starttag">para</tag> we want + to be able to emphasize text passages.</para> + </listitem> + </itemizedlist> + + <para>The following sample document instance shall be + valid:</para> + + <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> +<book xmlns:xsi="" + xsi:noNamespaceSchemaLocation="catalog.xsd"> + <title>Introduction to Java</title> + <chapter> + <title>Introduction</title> + <para>Java supports <emphasis>lots</emphasis> of concepts:</para> + <itemizedlist> + <listitem> + <para>Single <emphasis>implementation</emphasis> inheritance.</para> + </listitem> + <listitem> + <para>Multiple <emphasis>interface</emphasis> inheritance.</para> + <itemizedlist> + <listitem><para>Built in types</para></listitem> + <listitem><para>User defined types</para></listitem> + </itemizedlist> + </listitem> + </itemizedlist> + </chapter> +</book></programlisting> + </question> + + <answer> + <para>An extended schema looks like:</para> + + <figure xml:id="paraListEmphasize"> + <title>Version 2 of book.xsd</title> + + <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> +<xs:schema xmlns:xs="" + xmlns:vc="" elementFormDefault="qualified" + vc:minVersion="1.0" vc:maxVersion="1.1"> + + <xs:import namespace="" schemaLocation="" /> + + + <xs:include schemaLocation="table.xsd"/> + + <!-- Type definitions --> + <xs:simpleType name="languageType"> + <xs:restriction base="xs:string"> + <xs:enumeration value="en"/> + <xs:enumeration value="fr"/> + <xs:enumeration value="de"/> + <xs:enumeration value="it"/> + <xs:enumeration value="es"/> + </xs:restriction> + </xs:simpleType> + + + <!-- Elements having no inner structure --> + <xs:element name="emphasis" type="xs:string"/> + <xs:element name="title" type="xs:string"/> + <xs:element name="link"> + <xs:complexType mixed="true"> + <xs:attribute name="linkend" type="xs:IDREF" use="required"/> + </xs:complexType> + </xs:element> + + <!-- Starting the game ... --> + <xs:element name="book"> + <xs:complexType> + <xs:sequence> + <xs:element ref="title"/> + <xs:element ref="chapter" minOccurs="1" maxOccurs="unbounded"/> + </xs:sequence> + <xs:attribute name="lang" type="languageType" use="optional"/> + </xs:complexType> + </xs:element> + + <xs:element name="chapter"> + <xs:complexType> + <xs:sequence> <co xml:id="figure_book.dtd_v2_chapter"/> + <xs:element ref="title"/> + <xs:choice minOccurs="1" maxOccurs="unbounded"> + <xs:element ref="para"/> + <xs:element ref="itemizedlist"/> + <xs:element ref="table"/> + </xs:choice> + </xs:sequence> + <xs:attribute name="id" type="xs:ID" use="optional"/> + <xs:attribute ref="xml:base"/> <!-- This allows for <xi:include ...> --> + </xs:complexType> + </xs:element> + + <xs:element name="para"> + <xs:complexType mixed="true"> <co + xml:id="figure_book.dtd_v2_para"/> + <xs:choice minOccurs="0" maxOccurs="unbounded"> + <xs:element ref="emphasis"/> + <xs:element ref="link"/> + </xs:choice> + <xs:attribute name="id" type="xs:ID" use="optional"/> + </xs:complexType> + </xs:element> + + <xs:element name="itemizedlist"> + <xs:complexType> + <xs:sequence> + <xs:element ref="listitem" minOccurs="1" <co + xml:id="figure_book.dtd_v2_itemizedlist"/> maxOccurs="unbounded"/> + </xs:sequence> + <xs:attribute name="id" type="xs:ID" use="optional"/> + </xs:complexType> + </xs:element> + + <xs:element name="listitem"> + <xs:complexType> + <xs:choice minOccurs="1" maxOccurs="unbounded"> <co + xml:id="figure_book.dtd_v2_listitem"/> + <xs:element ref="para"/> + <xs:element ref="itemizedlist"/> + </xs:choice> + </xs:complexType> + </xs:element> + +</xs:schema></programlisting> + + <caption> + <para>This allows emphasized text in <tag + class="starttag">para</tag> nodes and <tag + class="starttag">itemizedlists</tag>.</para> + </caption> + </figure> + + <calloutlist> + <callout arearefs="figure_book.dtd_v2_chapter"> + <para>We hook into <tag class="starttag">chapter</tag> + to allow arbitrary sequences of at least one <tag + class="starttag">para</tag> or <tag + class="starttag">itemizedlist</tag> element node.</para> + </callout> + + <callout arearefs="figure_book.dtd_v2_para"> + <para><tag class="starttag">para</tag> nodes now allow + mixed content.</para> + </callout> + + <callout arearefs="figure_book.dtd_v2_itemizedlist"> + <para>An <tag class="starttag">itemizedlist</tag> + contains at least one list item.</para> + </callout> + + <callout arearefs="figure_book.dtd_v2_listitem"> + <para>A <tag class="starttag">listitem</tag> contains a + sequence of at least one <tag + class="starttag">para</tag> or <tag + class="starttag">itemizedlist</tag> child node. The + latter gives rise to nested lists. We find a similar + construct in HTML namely unnumbered lists defined by + <code><UL><LI>... </code>constructs.</para> + </callout> + </calloutlist> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="sectQandaBookLang"> + <title>Allow different languages</title> + + <qandaset defaultlabel="qanda" xml:id="example_book.dtd_v3"> + <title>book.xsd and languages</title> + + <qandadiv> + <qandaentry> + <question> + <para>We want to extend our schema from <xref + linkend="example_book_v2"/> by allowing an author to define + the language to be used within the whole or parts of the + document in question. Add an attribute <code>lang</code> to + all relevant elements like e.g. <tag class="starttag">para + lang="es"</tag>. An XML editor may use this attribute to + activate corresponding dictionaries for spell + checking.</para> + + <para>The <code>lang</code> attribute shall be restricted to + the following values:</para> + + <itemizedlist> + <listitem> + <para><token>en</token></para> + </listitem> + + <listitem> + <para><token>fr</token></para> + </listitem> + + <listitem> + <para><token>de</token></para> + </listitem> + + <listitem> + <para><token>it</token></para> + </listitem> + + <listitem> + <para><token>es</token></para> + </listitem> + </itemizedlist> + </question> + + <answer> + <para>We define a suitable <tag + class="starttag">xs:attribute</tag> type:</para> + + <programlisting language="none"><xs:attribute <emphasis + role="bold">name="lang"</emphasis>> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:enumeration value="en"/> + <xs:enumeration value="fr"/> + <xs:enumeration value="de"/> + <xs:enumeration value="it"/> + <xs:enumeration value="es"/> + </xs:restriction> + </xs:simpleType> +</xs:attribute></programlisting> + + <para>Than we add this attribute to our elements like <tag + class="starttag">chapter</tag> and others:</para> + + <programlisting language="none"> <xs:element name="chapter"> + <xs:complexType> + <xs:sequence> ... </xs:sequence> + <xs:attribute <emphasis role="bold">ref="lang"</emphasis> use="optional"/> + ... + </xs:complexType> + </xs:element></programlisting> + + <para>This allows us to set a language on arbitrary + hierarchy level. But of course we may define it on top level + as well:</para> + + <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> +<book ... lang="english"> + <title>Introduction to Java</title> +...</programlisting> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="sectMixQuotes"> + <title>Mixing attribute quotes</title> + + <qandaset defaultlabel="qanda" xml:id="example_quotes"> + <title>Single and double quotes reconsidered</title> + + <qandadiv> + <qandaentry> + <question> + <para>We recall the problem of nested quotes yielding + non-well formed XML code:</para> + + <programlisting language="none"><img src="bold.gif" alt="We may use "quotes" here" /></programlisting> + + <para>The XML specification defines legal attribute value + definitions as:</para> + + <productionset> + <title><link + xlink:href="">Literals</link></title> + + <production xml:id="w3RecXml_NT-EntityValue"> + <lhs>EntityValue</lhs> + + <rhs>'"' ([^%&"] | <nonterminal + def="#w3RecXml_NT-PEReference">PEReference</nonterminal> + | <nonterminal + def="#w3RecXml_NT-Reference">Reference</nonterminal>)* + '"' | "'" ([^%&'] | <nonterminal + def="#w3RecXml_NT-PEReference">PEReference</nonterminal> + | <nonterminal + def="#w3RecXml_NT-Reference">Reference</nonterminal>)* + "'"</rhs> + </production> + + <production xml:id="w3RecXml_NT-AttValue"> + <lhs>AttValue</lhs> + + <rhs>'"' ([^<&"] | <nonterminal + def="#w3RecXml_NT-Reference">Reference</nonterminal>)* + '"' | "'" ([^<&'] | <nonterminal + def="#w3RecXml_NT-Reference">Reference</nonterminal>)* + "'"</rhs> + </production> + + <production xml:id="w3RecXml_NT-SystemLiteral"> + <lhs>SystemLiteral</lhs> + + <rhs>('"' [^"]* '"') | ("'" [^']* "'")</rhs> + </production> + + <production xml:id="w3RecXml_NT-PubidLiteral"> + <lhs>PubidLiteral</lhs> + + <rhs>'"' <nonterminal + def="#w3RecXml_NT-PubidChar">PubidChar</nonterminal>* + '"' | "'" (<nonterminal + def="#w3RecXml_NT-PubidChar">PubidChar</nonterminal> - + "'")* "'"</rhs> + </production> + + <production xml:id="w3RecXml_NT-PubidChar"> + <lhs>PubidChar</lhs> + + <rhs>#x20 | #xD | #xA | [a-zA-Z0-9] + | [-'()+,./:=?;!*#@$_%]</rhs> + </production> + </productionset> + + <para>Find out how it is possible to set the attribute <tag + class="attribute">alt</tag>'s value to the string <code>We + may use "quotes" here</code>.</para> + </question> + + <answer> + <para>The production rule for attribute values reads:</para> + + <productionset> + <productionrecap linkend="w3RecXml_NT-AttValue"/> + </productionset> + + <para>This allows us to use either of two alternatives to + delimit attribute values:</para> + + <glosslist> + <glossentry> + <glossterm><tag class="starttag">img ... + alt="..."/</tag></glossterm> + + <glossdef> + <para><emphasis>Validity constraint:</emphasis> do not + use <code>"</code> inside the value string.</para> + </glossdef> + </glossentry> + + <glossentry> + <glossterm><tag class="starttag">img ... + alt='...'/</tag></glossterm> + + <glossdef> + <para><emphasis>Validity constraint:</emphasis> do not + use <code>'</code> inside the value string.</para> + </glossdef> + </glossentry> + </glosslist> + + <para>We may take advantage of the second rule:</para> + + <programlisting language="none"><img src="bold.gif" alt='We may use "quotes" here' /></programlisting> + + <para>Notice that according to <xref + linkend="w3RecXml_NT-AttValue"/> the delimiting quotes must + not be mixed. The following code is thus not well + formed:</para> + + <programlisting language="none"><img src="bold.gif'/></programlisting> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="qandasetInternalRef"> + <title>Internal references</title> + + <qandaset defaultlabel="qanda" xml:id="example_book.dtd_v5"> + <title>book.xsd and internal references</title> + + <qandadiv> + <qandaentry> + <question> + <para>We want to extend <xref + linkend="example_book.dtd_v3"/> schema to allow for document + internal references by:</para> + + <itemizedlist> + <listitem> + <para>Allowing each <tag class="starttag">chapter</tag>, + <tag class="starttag">para</tag> and <tag + class="starttag">itemizedlist</tag> to become reference + targets.</para> + </listitem> + + <listitem> + <para>Extending the element <tag + class="element">para</tag>'s mixed content model by a + new element <tag class="element">link</tag> with an + attribute <tag class="attribute">linkend</tag> being a + reference to a target.</para> + </listitem> + </itemizedlist> + </question> + + <answer> + <para>We extend our schema:</para> + + <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> +<xs:schema xmlns:xs="" + xmlns:vc="" elementFormDefault="qualified" + vc:minVersion="1.0" vc:maxVersion="1.1"> + + <xs:import namespace="" schemaLocation="" /> + + + <xs:include schemaLocation="table.xsd"/> + + <!-- Type definitions --> + + <xs:attribute name="lang"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:enumeration value="en"/> + <xs:enumeration value="fr"/> + <xs:enumeration value="de"/> + <xs:enumeration value="it"/> + <xs:enumeration value="es"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + + <!-- Elements having no inner structure --> + <xs:element name="emphasis" type="xs:string"/> + <xs:element name="title" type="xs:string"/> + <xs:element name="link"> + <xs:complexType mixed="true"> <co + xml:id="progamlisting_book_v5_link"/> + <xs:attribute name="linkend" <co + xml:id="progamlisting_book_v5_link_linkend"/> type="xs:IDREF" use="required"/> + </xs:complexType> + </xs:element> + + <!-- Starting the game ... --> + <xs:element name="book"> + <xs:complexType> + <xs:sequence> + <xs:element ref="title"/> + <xs:element ref="chapter" minOccurs="1" maxOccurs="unbounded"/> + </xs:sequence> + <xs:attribute ref="lang" use="optional"/> + </xs:complexType> + </xs:element> + + <xs:element name="chapter"> + <xs:complexType> + <xs:sequence> + <xs:element ref="title"/> + <xs:choice minOccurs="1" maxOccurs="unbounded"> + <xs:element ref="para"/> + <xs:element ref="itemizedlist"/> + <xs:element ref="table"/> + </xs:choice> + </xs:sequence> + <xs:attribute ref="lang" use="optional"/> + <xs:attribute name="id" <co + xml:id="progamlisting_book_v5_chapter_id"/> type="xs:ID" use="optional"/> + <xs:attribute ref="xml:base"/> <!-- This allows for <xi:include ...> --> + </xs:complexType> + </xs:element> + + <xs:element name="para"> + <xs:complexType mixed="true"> <co + xml:id="progamlisting_book_v5_mixed_link"/> + <xs:choice minOccurs="0" maxOccurs="unbounded"> + <xs:element ref="emphasis"/> + <xs:element ref="link"/> + </xs:choice> + <xs:attribute ref="lang" use="optional"/> + <xs:attribute name="id" <co + xml:id="progamlisting_book_v5_para_id"/> type="xs:ID" use="optional"/> + </xs:complexType> + </xs:element> + + <xs:element name="itemizedlist"> + <xs:complexType> + <xs:sequence> + <xs:element ref="listitem" minOccurs="1" maxOccurs="unbounded"/> + </xs:sequence> + <xs:attribute ref="lang" use="optional"/> + <xs:attribute name="id" type="xs:ID" use="optional"/> + </xs:complexType> + </xs:element> + + <xs:element name="listitem"> + <xs:complexType> + <xs:choice minOccurs="1" maxOccurs="unbounded"> + <xs:element ref="para"/> + <xs:element ref="itemizedlist"/> + </xs:choice> + <xs:attribute ref="lang" use="optional"/> + </xs:complexType> + </xs:element> + +</xs:schema></programlisting> + + <calloutlist> + <callout arearefs="progamlisting_book_v5_chapter_id"> + <para>Defining an attribute <tag + class="attribute">id</tag> of type <code>ID</code> for + the elements <tag class="element">chapter</tag>, <tag + class="element">para</tag> and <tag + class="element">itemizedList</tag>. This enables an + author to define internal reference targets.</para> + </callout> + + <callout arearefs="progamlisting_book_v5_mixed_link"> + <para>A link is part of the element <tag + class="element">para</tag>'s mixed content model. Thus + an author may define internal references along with + ordinary text.</para> + </callout> + + <callout arearefs="progamlisting_book_v5_link"> + <para>Like in HTML a link may contain text. If converted + to HTML the formatting expectation is a hypertext + link.</para> + </callout> + + <callout arearefs="progamlisting_book_v5_link_linkend"> + <para>The attribute <tag class="attribute">linkend</tag> + holds the reference to an internal target being either a + <tag class="element">chapter</tag>, a <tag + class="element">para</tag> or an <tag + class="element">itemizedList</tag>.</para> + </callout> + </calloutlist> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + </section> + </section> + </chapter> + diff --git a/Sda1/xslt.xml b/Sda1/xslt.xml new file mode 100644 index 000000000..e0c7a520d --- /dev/null +++ b/Sda1/xslt.xml @@ -0,0 +1,2253 @@ + <chapter xml:id="xsl" version="5.0" xmlns="" + xmlns:xlink="" + xmlns:xi="" + xmlns:svg="" + xmlns:m="" + xmlns:html="" + xmlns:db=""> + + + <title>The Extensible Stylesheet Language XSL</title> + + <para>XSL is a <link xlink:href="">W3C + standard</link> which defines a language to transform XML documents into + the following output formats:</para> + + <itemizedlist> + <listitem> + <para>Ordinary text e.g in <link + xlink:href="">Unicode</link> encoding.</para> + </listitem> + + <listitem> + <para>XML.</para> + </listitem> + + <listitem> + <para>HTML</para> + </listitem> + + <listitem> + <para>XHTML</para> + </listitem> + </itemizedlist> + + <para>Transforming a source XML document into a target XML document may be + required if:</para> + + <itemizedlist> + <listitem> + <para>The target document expresses similar semantics but uses a + different XML dialect i.e. different tag names.</para> + </listitem> + + <listitem> + <para>The target document is only a view on the source document. We + may for example extract the chapter names from a <tag + class="starttag">book</tag> document to create a table of + contents.</para> + </listitem> + </itemizedlist> + + <section xml:id="xsl_helloworld"> + <title>A <quote>Hello, world</quote> <abbrev + xlink:href="">XSL</abbrev> example</title> + + <para>We start from an extended version of our + <filename>memo.xsd</filename>:</para> + + <programlisting language="none"><xs:schema xmlns:xs="" + xmlns:vc="" elementFormDefault="qualified" + vc:minVersion="1.0" vc:maxVersion="1.1"> + +<xs:element name="memo"> + <xs:complexType> + <xs:sequence> + <xs:element name="from" type="Person"/> + <xs:element name="to" type="Person" minOccurs="1" maxOccurs="unbounded"/> + <xs:element name="subject" type="xs:string"/> + <xs:element ref="content"/> + </xs:sequence> + <xs:attribute name="date" type="xs:date" use="required"/> + <xs:attribute name="priority" type="Priority" use="optional"/> + </xs:complexType> + </xs:element> + + <xs:complexType name="Person"> + <xs:simpleContent> + <xs:extension base="xs:string"> + <xs:attribute name="id" type="xs:ID"/> + </xs:extension> + </xs:simpleContent> + </xs:complexType> + + <xs:element name="content"> + <xs:complexType> + <xs:sequence> + <xs:element ref="para" minOccurs="1" maxOccurs="unbounded"/> + </xs:sequence> + </xs:complexType> + + </xs:element> + + <xs:element name="para"> + <xs:complexType mixed="true"> + <xs:sequence> + <xs:element ref="link" minOccurs="0"/> + </xs:sequence> + </xs:complexType> + </xs:element> + + <xs:element name="link"> + <xs:complexType mixed="true"> + <xs:simpleContent> + <xs:extension base="xs:string"> + <xs:attribute name="linkend" type="xs:IDREF"/> + </xs:extension> + </xs:simpleContent> + </xs:complexType> + </xs:element> + + <xs:simpleType name="Priority"> + <xs:restriction base="xs:string"> + <xs:enumeration value="low"/> + <xs:enumeration value="medium"/> + <xs:enumeration value="high"/> + </xs:restriction> + </xs:simpleType> + +</xs:schema></programlisting> + + <para>This schema allows a memo's document content to be structured into + paragraphs. A paragraph may contain links either to the sender or to a + recipient.</para> + + <figure xml:id="figure_memoref_instance"> + <title>A memo document instance with an internal reference.</title> + + <programlisting language="none"><memo xmlns:xsi="" + xsi:noNamespaceSchemaLocation="memo.xsd" + date="2014-09-24" priority="high" > + <from <emphasis role="bold">id="goik"</emphasis>>Martin Goik</from> + <to>Adam Hacker</to> + <to id="eve">Eve Intruder</to> + <subject>Firewall problems</subject> + <content> + <para>Thanks for your excellent work.</para> + <para>Our firewall is definitely broken! This bug has been reported by + the <link <emphasis role="bold">linkend="goik"</emphasis>>sender</link>.</para> + </content> +</memo></programlisting> + </figure> + + <para>We want to extract the sender's name from an arbitrary <tag + class="element">memo</tag> document instance. Using <abbrev + xlink:href="">XSL</abbrev> this task can be + accomplished by a script <filename>memo2sender.xsl</filename>:</para> + + <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> +<xsl:stylesheet xmlns:xsl="" + version="2.0"> + + <xsl:output method="text"/> + + <xsl:template match="/memo"> + <xsl:value-of select="from"/> + </xsl:template> + +</xsl:stylesheet></programlisting> + + <para>Before closer examining this code we first show its effect. We + need a piece of software called a <abbrev + xlink:href="">XSL</abbrev> processor. It + reads both a <tag>memo</tag> document instance and a style sheet and + produces the following output:</para> + + <programlisting language="none"><computeroutput>[goik@mupter Memoref]$ xml2xml message.xml memo2sender.xsl +Martin Goik</computeroutput></programlisting> + + <para>The result is the sender's name <computeroutput>Martin + Goik</computeroutput>. We may sketch the transformation + principle:</para> + + <figure xml:id="figure_xsl_principle"> + <title>An <abbrev + xlink:href="">XSL</abbrev> processor + transforming a XML document into a result using a stylesheet</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/xslconvert.fig"/> + </imageobject> + </mediaobject> + </figure> + + <para>The executable <filename>xml2xml</filename> defined at the MI + department is actually a script wrapping the <productname + xlink:href="">Saxon XSLT + processor</productname>. We may also use the Eclipse/Oxygen plugin + replacing the shell command by a GUI <link + xlink:href="">as + being described in the corresponding documentation</link>. Next we + closer examine the <abbrev + xlink:href="">XSL</abbrev> example + code:</para> + + <programlisting language="none"><xsl:stylesheet <co + xml:id="programlisting_helloxsl_stylesheet"/> xmlns:xsl <co + xml:id="programlisting_helloxsl_namespace_abbv"/> ="" + version="2.0" <co xml:id="programlisting_helloxsl_xsl_version"/> > + + <xsl:output method="text" <co + xml:id="programlisting_helloxsl_method_text"/>/> + + <xsl:template <co xml:id="programlisting_helloxsl_template"/> match <co + xml:id="programlisting_helloxsl_match"/> ="/memo"> + <xsl:value-of <co xml:id="programlisting_helloxsl_value-of"/> select <co + xml:base="" xml:id="programlisting_helloxsl_valueof_select_att"/> ="from" /> + </xsl:template> + +</xsl:stylesheet></programlisting> + + <calloutlist> + <callout arearefs="programlisting_helloxsl_stylesheet"> + <para>The element stylesheet belongs the the namespace + <code></code>. This namespace is + <emphasis>represented</emphasis> by the literal + <literal>xsl</literal>. As an alternative we might also use <tag + class="starttag">stylesheet + xmlns=""</tag> instead of <tag + class="starttag">xsl:stylesheet ...</tag>. The value of the + namespace itself gets defined next.</para> + </callout> + + <callout arearefs="programlisting_helloxsl_namespace_abbv"> + <para>The keyword <code>xmlns</code> is reserved by the <link + xlink:href="">Namespaces in + XML</link> specification. In <quote>pure</quote> XML the whole term + <code>xmlns:xsl</code> would simply define an attribute. In presence + of a namespace aware XML parser however the literal + <literal>xsl</literal> represents the attribute value <tag + class="attvalue"></tag>. This + value <emphasis>must not</emphasis> be changed! Otherwise a XSL + converter will fail since it cannot distinguish processing + instructions from other XML elements. An element <tag + class="starttag">stylesheet</tag> belonging to a different namespace + <code>http//</code> may have to be + generated.</para> + </callout> + + <callout arearefs="programlisting_helloxsl_xsl_version"> + <para>The <link xlink:href="">XSL + standard</link> is still evolving. The version number identifies the + conformance level for the subsequent code.</para> + </callout> + + <callout arearefs="programlisting_helloxsl_method_text"> + <para>The <tag class="attribute">method</tag> attribute in the <link + xlink:href=""><xsl:output></link> + element specifies the type of output to be generated. Depending on + this type we may also define indentation depths and/or encoding. + Allowed <tag class="attvalue">method</tag> values are:</para> + + <glosslist> + <glossentry> + <glossterm>text</glossterm> + + <glossdef> + <para>Ordinary text.</para> + </glossdef> + </glossentry> + + <glossentry> + <glossterm>html</glossterm> + + <glossdef> + <para><link + xlink:href="">HTML</link> + markup.</para> + </glossdef> + </glossentry> + + <glossentry> + <glossterm>xhtml</glossterm> + + <glossdef> + <para><link + xlink:href="">Xhtml</link> markup + differing from the former by e.g. the closing + <quote>/></quote> in <tag><img + src="..."/></tag>.</para> + </glossdef> + </glossentry> + + <glossentry> + <glossterm>xml</glossterm> + + <glossdef> + <para>XML code. This is most commonly used to create views on + or different dialects of a XML document instance.</para> + </glossdef> + </glossentry> + </glosslist> + </callout> + + <callout arearefs="programlisting_helloxsl_template"> + <para>A <tag class="starttag">xsl:template</tag> defines the output + that will be created for document nodes being defined by a + selector.</para> + </callout> + + <callout arearefs="programlisting_helloxsl_match"> + <para>The attribute <tag class="attribute">match</tag> tells us for + which nodes of a document instance the given <tag + class="starttag">xsl:template</tag> is appropriate. In the given + example the value <code>/memo</code> tells us that the template is + only responsible for <tag class="element">memo</tag> nodes appearing + at top level i.e. being the root element of the document + instance.</para> + </callout> + + <callout arch="" + arearefs="programlisting_helloxsl_value-of programlisting_helloxsl_valueof_select_att"> + <para>A <tag class="element">value-of</tag> element writes content + to the <abbrev xlink:href="">XSL</abbrev> + process' output. In this example the <code>#PCDATA</code> content + from the element <tag class="element">from</tag> will be written to + the output.</para> + </callout> + </calloutlist> + </section> + + <section xml:id="xpath"> + <title><link xlink:href="">XPath</link> and + node sets</title> + + <para>The <acronym + xlink:href="">XPath</acronym> standard allows + us to retrieve node sets from XML documents by predicate based queries. + Thus its role may be compared to <acronym + xlink:href="">SQL</acronym> + <code>SELECT</code> ... <code>FROM</code> ...<code>WHERE</code> queries. + Some simple examples:</para> + + <figure xml:id="fig_Xpath"> + <title>Simple <acronym + xlink:href="">XPath</acronym> + queries</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/xpath.fig" scale="65"/> + </imageobject> + </mediaobject> + </figure> + + <para>We are now interested in a list of all recipients being defined in + a <tag class="element">memo</tag> element. We introduce the element <tag + class="element">xsl:for-each</tag> which iterates over a result set of + nodes:</para> + + <figure xml:id="programlisting_tolist_xpath"> + <title>Iterating over the list of recipient nodes.</title> + + <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> + +<xsl:stylesheet xmlns:xsl="" + version="2.0"> + + <xsl:output method="text"/> + + <xsl:template match="/" <co xml:id="programlisting_tolist_match_root"/>> + <xsl:for-each select="memo/to" <co + xml:id="programlisting_tolist_xpath_memo_to"/> > + <xsl:value-of select="." <co xml:id="programlisting_tolist_value_of"/> /> + <xsl:text>,</xsl:text> <co + xml:id="programlisting_tolist_xsl_text"/> + </xsl:for-each> + </xsl:template> + +</xsl:stylesheet></programlisting> + </figure> + + <calloutlist> + <callout arearefs="programlisting_tolist_match_root"> + <para>This template matches the XML document instance, + <emphasis>not</emphasis> the visible <tag + class="element"><memo></tag> node.</para> + </callout> + + <callout arearefs="programlisting_tolist_xpath_memo_to"> + <para>The <link xlink:href="">XPath</link> + expression <tag class="attvalue">memo/to</tag> gets evaluated + starting from the invisible top level document node being the + context node. For the given document instance this will define a + result set containing both <tag class="element"><to></tag> + recipient nodes, see <xref + linkend="figure_memo_xpath_memo_to"/>.</para> + </callout> + + <callout arearefs="programlisting_tolist_value_of"> + <para>The dot <quote>.</quote> represents the <code>#PCDATA</code> + content of the current <tag class="element">to</tag> element.</para> + </callout> + + <callout arearefs="programlisting_tolist_xsl_text"> + <para>A comma is appended. This is not quite correct since it should + be absent for the last element.</para> + </callout> + </calloutlist> + + <figure xml:id="figure_recipientlist_trailing_comma"> + <title>A list of recipients.</title> + + <para>The <abbrev + xlink:href="">XSL</abbrev> presented before + yields:</para> + + <programlisting language="none"><computeroutput>Adam Hacker,Eve Intruder</computeroutput><emphasis + role="bold">,</emphasis></programlisting> + </figure> + + <para>Right now we do not bother about the trailing <quote>,</quote> + after the last recipient. The surrounding + <code><xsl:text></code>,<code></xsl:text></code> elements + <emphasis>may</emphasis> be omitted. We encourage the reader to leave + them in place since they increase readability when a template's body + gets more complex. The element <tag class="starttag">xsl:text</tag> is + used to append static text to the output. This way we append a separator + after each recipient. We now discuss the role of the two attributes <tag + class="attribute">match="/"</tag> and <tag + class="attribute">select=memo/to</tag>. Both are examples of so called + <link xlink:href="">XPath</link> expressions. + They allow to define <emphasis>node sets</emphasis> being subsets from + the set of all nodes from a given document instance.</para> + + <para>Conceptually <link + xlink:href="">XPath</link> expressions may be + compared to the <acronym + xlink:href="">SQL</acronym> language the + latter allowing the retrieval of data<emphasis>sets</emphasis> from a + relational database. We illustrate the current example by a + figure:</para> + + <figure xml:id="figure_memo_xpath_memo_to"> + <title>Selecting node sets from <tag class="element">memo</tag> + document instances</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/memoxpath.fig"/> + </imageobject> + </mediaobject> + </figure> + + <para>This figure needs some explanation. We observe an additional node + <quote>above</quote> <tag class="starttag">memo</tag> being represented + as <quote>filled</quote>. This node represents the document instance as + a whole and has got <tag>memo</tag> as its only child. We will + rediscover this additional root node when we discuss the <abbrev + xlink:href="">DOM</abbrev> + application programming interface.</para> + + <para>As already mentioned the expression <code>memo/to</code> evaluates + to a <emphasis>set</emphasis> of nodes. In our example this set consists + of two nodes of type <tag class="starttag">to</tag> each of them + representing a recipient of the memo. We observe a subtle difference + between the two <abbrev + xlink:href="">XPath</abbrev> + expressions:</para> + + <glosslist> + <glossentry> + <glossterm><code>match="/"</code></glossterm> + + <glossdef> + <para>The expression starts and actually consists of the string + <quote>/</quote>. Thus it can be called an + <emphasis>absolute</emphasis> <abbrev + xlink:href="">XPath</abbrev> expression. + Like a file specification <filename>C:\dos\myprog.exe</filename> + it starts on top level and needs no further context information to + get evaluated.</para> + + <para>A <abbrev + xlink:href="">XSL</abbrev> style sheet + <emphasis>must</emphasis> have an <link + xlink:href="">initial + context node</link> to start the transformation. This is achieved + by providing exactly one <tag class="starttag">xsl:template</tag> + with an absolute <abbrev + xlink:href="">XPath</abbrev> value for + its <tag class="attribute">match</tag> attribute like <tag + class="attvalue">/memo</tag>.<emphasis/></para> + </glossdef> + </glossentry> + + <glossentry> + <glossterm><code>select="memo/to"</code></glossterm> + + <glossdef> + <para>This expression can be compared to a + <emphasis>relative</emphasis> file path specification like e.g. + <filename>../images/hdm.gif</filename>. We need to add the base + (context) directory in order for a relative file specification to + become meaningful. If the base directory is + <filename>/home/goik/xml</filename> than this + <emphasis>relative</emphasis> file specification will address the + file <filename>/home/goik/images/hdm.gif</filename>.</para> + + <para>Likewise we have to define a <emphasis>context</emphasis> + node if we want to evaluate a relative <abbrev + xlink:href="">XPath</abbrev> expression. + In our example this is the root node. The XSL specification + introduces the term <link + xlink:href="">evaluation + context</link> for this purpose.</para> + </glossdef> + </glossentry> + </glosslist> + + <para>In order to explain relative <abbrev + xlink:href="">XPath</abbrev> expressions we + consider <code>content/para</code> starting from the (unique!) <tag + class="element">memo</tag> node:</para> + + <figure xml:id="memoXpathPara"> + <title>The node set represented by <code>content/para</code> starting + at the context node <tag class="starttag">memo</tag>.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/memorelativexpath.fig"/> + </imageobject> + + <caption> + <para>The dashed lines represent the relative <abbrev + xlink:href="">XPath</abbrev> expressions + starting from the context node to each of the nodes in the result + set.</para> + </caption> + </mediaobject> + </figure> + </section> + + <section xml:id="xsl_important_elements"> + <title>Some important <abbrev + xlink:href="">XSL</abbrev> elements</title> + + <section xml:id="xsl_if"> + <title><tag class="starttag">xsl:if</tag></title> + + <para>Sometimes we need conditional processing rules. We might want + create a list of sender and recipients with a defined value for the + attribute <tag class="attribute">id</tag>. In the <link + linkend="figure_memoref_instance">given example</link> this is only + valid for the (unique) sender and the recipient <code><to + id="eve">Eve Intruder</to></code>. We assume this set of + persons shall be inserted into a relational database table + <code>Customer</code> consisting of two <code>NOT NULL</code> columns + <code>id</code> an <code>name</code>. Thus both attributes + <emphasis>must</emphasis> be specified and we must exclude <tag + class="starttag">from</tag> or <tag class="starttag">to</tag> nodes + with undefined <tag class="attribute">id</tag> attributes:</para> + + <figure xml:id="programlisting_memo_export_sql"> + <title>Exporting SQL statements.</title> + + <programlisting language="none">... +<xsl:variable name="newline" <co xml:id="programlisting_xsl_if_definevar"/>> <!-- A newline \n --> + <xsl:text> +</xsl:text> +</xsl:variable> + +<xsl:template match="/memo"> + <xsl:for-each select="from|to" <co xml:id="programlisting_xsl_if_foreach"/>> + <xsl:if <emphasis role="bold">test="@id"</emphasis> <co + xml:id="programlisting_xsl_if_test"/>> + <xsl:text>INSERT INTO Customer (id, name) VALUES ('</xsl:text> + <xsl:value-of select="@id" <co + xml:id="programlisting_xsl_if_select_idattrib"/>/> + <xsl:text>', '</xsl:text> + <xsl:value-of select="." <co + xml:id="programlisting_xsl_if_selectcontent"/>/> + <xsl:text>')</xsl:text> + <xsl:value-of select="$newline" <co + xml:id="programlisting_xsl_if_usevar"/>/> + </xsl:if> + </xsl:for-each> +</xsl:template></programlisting> + + <caption> + <para>We want to export data from XML documents to a database + server. For this purpose INSERT statements are being crafted from + a XML document containing relevant data.</para> + </caption> + </figure> + + <calloutlist> + <callout arearefs="programlisting_xsl_if_definevar"> + <para>Define a file local variable <code>newline</code>. Dealing + with text output frequently requires the insertion of newlines. + Due to the syntax of the <tag class="element">xsl:text</tag> + elements this tends to clutter the code.</para> + </callout> + + <callout arearefs="programlisting_xsl_if_foreach"> + <para>Iterate over the set of the sender node and all recipient + nodes.</para> + </callout> + + <callout arearefs="programlisting_xsl_if_test"> + <para>The attribute value of <tag class="attribute">test</tag> + will be <link + xlink:href="">evaluated</link> + as a boolean. In this example it evaluates to <code>true</code> + iff the attribute <tag class="attribute">id</tag> is defined for + the context node. Since we are inside the <tag + class="element">xsl:for-each</tag> block all context nodes are + either of type <tag class="starttag">from</tag> or <tag + class="starttag">to</tag> and thus <emphasis>may</emphasis> have + an <tag class="attribute">id</tag> attribute.</para> + </callout> + + <callout arearefs="programlisting_xsl_if_select_idattrib"> + <para>The <tag class="attribute">id</tag> attributes value is + copied to the output. The <quote>@</quote> character in + <code>select="@id"</code> tells the <abbrev + xlink:href="">XSL</abbrev> processor to + read the value of an <emphasis>attribute</emphasis> with name <tag + class="attribute">id</tag> rather then the content of a nested + sub<emphasis>element</emphasis> like in <code><to + id="foo"><id>I am + nested!</id></to></code>.</para> + </callout> + + <callout arearefs="programlisting_xsl_if_selectcontent"> + <para>As stated earlier the dot <quote>.</quote> denotes the + current context element. In this example simply the + <code>#PCDATA</code> content is copied to the output.</para> + </callout> + + <callout arearefs="programlisting_xsl_if_usevar"> + <para>The <quote>$</quote> sign in front of <code>newline</code> + tells the <abbrev + xlink:href="">XSL</abbrev> processor to + access the variable <varname>newline</varname> previously defined + in <coref linkend="programlisting_xsl_if_definevar"/> rather then + interpreting it as the name of a sub element or an + attribute.</para> + </callout> + </calloutlist> + + <para>As expected the recipient entry <quote>Adam Hacker</quote> does + not appear due to the fact that no <tag class="attribute">id</tag> + attribute is defined in its <tag class="starttag">to</tag> + element:</para> + + <programlisting language="none"><computeroutput>INSERT INTO Customer (id, name) VALUES ('goik', 'Martin Goik') +INSERT INTO Customer (id, name) VALUES ('eve', 'Eve intruder')</computeroutput></programlisting> + + <qandaset defaultlabel="qanda" xml:id="example_position_last"> + <title>The XPath functions position() and last()</title> + + <qandadiv> + <qandaentry> + <question> + <para>We return to our recipient list in <xref + linkend="figure_recipientlist_trailing_comma"/>. We are + interested in a list of recipients avoiding the trailing + comma:</para> + + <programlisting language="none"><computeroutput>Adam Hacker,Eve Intruder</computeroutput></programlisting> + + <para>We may use a <tag class="element">xsl:if</tag> to insert + a comma for all but the very last recipient node. This can be + achieved by using the <abbrev + xlink:href="">XSL</abbrev> + functions <link + xlink:href="">position()</link> + and <link + xlink:href="">last()</link>. + Hint: The arithmetic operator <quote><</quote> may be used + in <abbrev + xlink:href="">XSL</abbrev> to + compare two integer numbers. However it must be escaped as + <code>&lt;</code> in order to be XML compatible.</para> + </question> + + <answer> + <para>We have to exclude the comma for the last node of the + recipient list. If we have e.g. 10 recipients the function + <code>position()</code> will return values integer values + starting at 1 and ending with 10. So for the last node the + comparison <code>10 < 10</code> will evaluate to + false:</para> + + <programlisting language="none"><xsl:for-each select="memo/to"> + <xsl:value-of select="."/> + <xsl:if test="position() &lt; last()"> + <xsl:text>,</xsl:text> + </xsl:if> +</xsl:for-each></programlisting> + </answer> + </qandaentry> + + <qandaentry xml:id="example_avoid_xsl_if"> + <question> + <label>Avoiding xsl:if</label> + + <para>In <xref linkend="programlisting_memo_export_sql"/> we + used the <abbrev + xlink:href="">XPath</abbrev> value + <quote>from|to</quote> to select the desired sender and + recipient nodes. Inside the <tag + class="element">xsl:for-each</tag> block we permitted only + those nodes which have an <tag class="attribute">id</tag> + attribute. These two steps may be combined into a single + <abbrev xlink:href="">XPath</abbrev> + expression obsoleting the <tag + class="element">xsl:if</tag>.</para> + </question> + + <answer> + <para>We simply need a modified <abbrev + xlink:href="">XPath</abbrev> in the + <tag class="element">for-each</tag>:</para> + + <programlisting language="none"><xsl:for-each select="<emphasis + role="bold">from[@id]|to[@id]</emphasis>"> + <xsl:text>INSERT INTO Customer (id, name) VALUES ('</xsl:text> + <xsl:value-of select="@id"/> + <xsl:text>', '</xsl:text> + <xsl:value-of select="."/> + <xsl:text>')</xsl:text> + <xsl:value-of select="$newline"/> +</xsl:for-each></programlisting> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="xsl_apply_templates"> + <title><tag class="starttag">xsl:apply-templates</tag></title> + + <para>We already used <tag class="element">xsl:for-each</tag> to + iterate over a list of element nodes. <abbrev + xlink:href="">XSL</abbrev> offers a + different possibility for this purpose. The idea is to define the + formatting rules at a centralized location. So the solution to <xref + linkend="example_position_last"/> in an equivalent way:</para> + + <programlisting language="none"><xsl:template match="/"> + <xsl:apply-templates select="memo/to" <co + xml:id="programlisting_apply_templates_apply"/>/> +</xsl:template> + +<xsl:template match="to" <co xml:id="programlisting_apply_templates_match"/>> + <xsl:value-of select="."/> + <xsl:if test="<emphasis role="bold">position()</emphasis> &lt; <emphasis + role="bold">last()</emphasis>"> + <xsl:text>,</xsl:text> + </xsl:if> +</xsl:template></programlisting> + + <calloutlist> + <callout arearefs="programlisting_apply_templates_apply"> + <para>Definition of the recipient node list. Each element of this + list shall be processed further.</para> + </callout> + + <callout arearefs="programlisting_apply_templates_match"> + <para>This template <emphasis>may</emphasis> be used by a XSL + processor to format nodes of type <tag class="starttag">to</tag>. + Since the processor is asked to do exactly this in <xref + linkend="programlisting_apply_templates_apply"/> the current + template will <emphasis>really</emphasis> be used in this + example.</para> + </callout> + </calloutlist> + + <para>The procedure outlined above may have the following + advantages:</para> + + <itemizedlist> + <listitem> + <para>Some elements may appear at different places of a given + document hierarchy. For example a <tag + class="starttag">title</tag> element is likely to appear as a + child of chapters, sections, tables figures and so on. It may be + sufficient to define a single template with a + <code>match="title"</code> attribute which contains all rules + being required.</para> + </listitem> + + <listitem> + <para>Sometimes the body of a <tag + class="starttag">xsl:for-each</tag> ... <tag + class="endtag">xsl:for-each</tag> spans multiple screens thus + limiting code readability. Factoring out the body into a template + may avoid this obstacle.</para> + </listitem> + </itemizedlist> + + <para>This method is well known from programming languages: If the + code inside a loop is needed multiple times or reaches a painful line + count <emphasis>good</emphasis> programmers tend to define a separate + method. For example:</para> + + <programlisting language="none">for (int i = 0; i < 10; i++){ + if (a[i] < b[i]){ + max[i] = b; + } else { + max[i] = a; + } + ... +}</programlisting> + + <para>Inside the loop's body the relative maximum value of two + variables gets computed. This may be needed at several locations and + thus it is convenient to centralize this code into a method:</para> + + <programlisting language="none">// cf. <xsl:template match="..."> +static int maximum(int a, int b){ + if (a < b){ + return b; + } else { + return a; + } +} +... +// cf. <xsl:apply-templates select="..."/> +for (int i = 0; i < 10; i++){ + max[i] = maximum(a[i], b[i]); +}</programlisting> + + <para>So far calling a static method in <xref + linkend="glo_Java"/> may be + compared to a <tag class="starttag">xsl:apply-templates</tag>. There + is however one big difference. In <abbrev + xlink:href="">XSL</abbrev> the + <quote>method</quote> being called may not exist at all. A <tag + class="starttag">xsl:apply-templates</tag> instructs a processor to + format a set of nodes. It does not contain information about any rules + being defined to do this job:</para> + + <programlisting language="none"><xsl:stylesheet xmlns:xsl="" + version="2.0"> + + <xsl:output method="text"/> + + <xsl:template match="/memo"> + <xsl:apply-templates <emphasis role="bold">select="content"</emphasis>/> + </xsl:template> + +</xsl:stylesheet></programlisting> + + <para>Since no suitable template supplying rules for <tag + class="starttag">content</tag> nodes exists a <abbrev + xlink:href="">XSL</abbrev> processor uses a + default formatting rule instead:</para> + + <programlisting language="none"><computeroutput>Thanks for your excellent work.Our firewall is definitely +broken! This bug has been reported by the sender.</computeroutput></programlisting> + + <para>We observe that the <code>#PCDATA</code> content strings of the + element itself and all (recursive) sub elements get glued together + into one string. In most cases this is definitely not intended. + Omitting a necessary template is usually a programming error. It is + thus good programming practice during style sheet development to + define a special template catching forgotten rules:</para> + + <programlisting language="none"><xsl:template match="/memo"> + <xsl:apply-templates select="content"/> +</xsl:template> + +<xsl:template match="*"> + <xsl:message> + <xsl:text>Error: No template defined matching element '</xsl:text> + <xsl:value-of select="name(.)"/> + <xsl:text>'</xsl:text> + </xsl:message> +</xsl:template></programlisting> + + <para>The <quote>*</quote> matches any element if there is no <link + xlink:href="">better + matching</link> rule defined. Since we did not supply any template for + <tag class="starttag">content</tag> nodes at all this default template + will match nodes of type <tag class="starttag">content</tag>. The + function <code>name()</code> is predefined in <abbrev + xlink:href="">XSL</abbrev> and returns the + element type name of a node. During the formatting process we will now + see the following warning message:</para> + + <programlisting language="none"><computeroutput>Error: No template defined matching element 'content'</computeroutput></programlisting> + + <para>We note that for document nodes <tag + class="starttag">xyz</tag><code>foo</code><tag + class="endtag">xyz</tag> containing only <code>#PCDATA</code> a simple + <tag class="emptytag">xsl:apply-templates select="xyz"</tag> is + sufficient: A <abbrev + xlink:href="">XSL</abbrev> processor uses + its default rule and copies the node's content <code>foo</code> to its + output.</para> + + <qandaset defaultlabel="qanda" xml:id="example_rdbms_person"> + <title>Extending the export to a RDBMS</title> + + <qandadiv> + <qandaentry> + <question> + <para>We assume that our RDBMS table <code>Customer</code> + from <xref linkend="programlisting_memo_export_sql"/> shall be + replaced by a table <code>Person</code>. We expect the senders + of memo documents to be employees of a given company. + Conversely the recipients of memos are expected to be + customers. Our <code>Person</code> table shall have a + <quote>tag</quote> like column named <code>type</code> having + exactly two allowed values <code>customer</code> or + <code>employee</code> being controlled by a <code>CHECK</code> + constraint, see <xref linkend="table_person"/>. Create a style + sheet generating the necessary SQL statements from a memo + document instance. Hint: Define two different templates for + <tag class="starttag">from</tag> and <tag + class="starttag">to</tag> nodes.</para> + </question> + + <answer> + <para>We define two templates differing only in the static + string value for a person's type. The relevant <abbrev + xlink:href="">XSL</abbrev> portion + reads:<programlisting language="none"><xsl:template match="/memo"> + <xsl:apply-templates select="from|to"/> +</xsl:template> + +<xsl:template match="from"> + <xsl:text>INSERT INTO Person (name, type) VALUES('</xsl:text> + <xsl:value-of select="."/> + <xsl:text>', <emphasis role="bold">'employee'</emphasis>)</xsl:text> + <xsl:value-of select="$newline"/> +</xsl:template> + + <xsl:template match="to"> + <xsl:text>INSERT INTO Person (name, type) VALUES('</xsl:text> + <xsl:value-of select="."/> + <xsl:text>', <emphasis role="bold">'customer'</emphasis>)</xsl:text> + <xsl:value-of select="$newline"/> +</xsl:template></programlisting></para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + + <table xml:id="table_person"> + <title>The Person table</title> + + <?dbhtml table-width="30%" ?> + + <?dbfo table-width="40%" ?> + + <tgroup cols="2"> + <colspec colwidth="3*"/> + + <colspec colwidth="2*"/> + + <thead> + <row> + <entry>name</entry> + + <entry>type</entry> + </row> + </thead> + + <tbody> + <row> + <entry>Martin Goik</entry> + + <entry>employee</entry> + </row> + + <row> + <entry>Adam Hacker</entry> + + <entry>customer</entry> + </row> + + <row> + <entry>Eve intruder</entry> + + <entry>customer</entry> + </row> + </tbody> + </tgroup> + </table> + </section> + + <section xml:id="xsl_choose"> + <title><tag class="starttag">xsl:choose</tag></title> + + <para>We already described the <tag class="starttag">xsl:if</tag> + which can be compared to an <code>if(..){...}</code> statement in many + programming languages. The <tag class="starttag">xsl:choose</tag> + element can be compared to multiple <code>else</code> conditions + including an optional final <code>else</code> block being reached if + all boolean tests fail:</para> + + <programlisting language="none">if (condition a){ +...//block 1 +} else if (condition b){ +... //block b +} ... +... +else { + ... //code being reached whan all conditions evaluate to false +}</programlisting> + + <para>We want to generate a list of memo recipient names with roman + type numeration up to 10. Higher numbers shall be displayed in + ordinary decimal notation:</para> + + <programlisting language="none"><computeroutput>I:Adam Hacker +II:Eve intruder +III: ... +IV: ... +...</computeroutput></programlisting> + + <para>Though <abbrev + xlink:href="">XSL</abbrev> offers <link + xlink:href="">a better way</link> + we may generate these number literals by:</para> + + <programlisting language="none"><xsl:template match="/memo"> + <xsl:apply-templates select="to"/> +</xsl:template> + +<xsl:template match="to"> + <xsl:choose> + <xsl:when test="1 = position()">I</xsl:when> + <xsl:when test="2 = position()">II</xsl:when> + <xsl:when test="3 = position()">III</xsl:when> + <xsl:when test="4 = position()">IV</xsl:when> + <xsl:when test="5 = position()">V</xsl:when> + <xsl:when test="6 = position()">VI</xsl:when> + <xsl:when test="7 = position()">VII</xsl:when> + <xsl:when test="8 = position()">VIII</xsl:when> + <xsl:when test="9 = position()">IX</xsl:when> + <xsl:when test="10 = position()">X</xsl:when> + <xsl:otherwise> + <xsl:value-of select="position()"/> + </xsl:otherwise> + </xsl:choose> + + <xsl:text>:</xsl:text> + <xsl:value-of select="."/> + <xsl:value-of select="$newline"/> +</xsl:template></programlisting> + + <para>Note that this conversion is incomplete: If the number in + question is larger than 10 it will be formatted in ordinary decimal + style according to the <tag class="starttag">xsl:otherwise</tag> + clause.</para> + </section> + + <section xml:id="section_html_book"> + <title>A complete HTML formatting example</title> + + <para>We now present a series of exercises showing how to format <tag + class="starttag">book</tag> document instances to XHTML. This is done + in a step by step manner each time showing correspondent code snippets + for our <filename>memo.xsd</filename>.</para> + + <section xml:id="section_memo_to_list"> + <title>Listing the recipients of a memo</title> + + <para>In order to generate a XHTML <link + xlink:href="">list</link> + of all <tag class="starttag">memo</tag> recipients of a memo we have + to use <tag class="starttag">xsl:output method="xhtml"</tag> and + embed the required HTML tags in our <abbrev + xlink:href="">XSL</abbrev> style + sheet:</para> + + <programlisting language="none"><xsl:output method="xhtml" indent="yes"/> + +<xsl:template match="/memo"> + <html> + <head> + <title>Recipient list</title> + </head> + <body> + <ul> + <xsl:apply-templates select="to"/> + </ul> + </body> + </html> +</xsl:template> + +<xsl:template match="to"> + <li> + <xsl:value-of select="."/> + </li> +</xsl:template></programlisting> + + <para>Processing this style sheet for a <tag + class="starttag">memo</tag> document instance yields:</para> + + <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> +<html> + <head> + <title>Recipient list</title> + </head> + <body> + <ul> + <li>Adam Hacker</li> + <li>Eve intruder</li> + </ul> + </body> +</html></programlisting> + + <para>The generated Xhtml code does not contain a reference to a + DTD. We may supply this reference by modifying our <tag + class="emptytag">xsl:output</tag> directive:</para> + + <programlisting language="none"><xsl:output method="xhtml" indent="yes" + <emphasis role="bold">doctype-public</emphasis>="-//W3C//DTD XHTML 1.0 Strict//EN" + <emphasis role="bold">doctype-system</emphasis>=""/></programlisting> + + <para>This adds a corresponding header which allows to validate the + generated HTML:</para> + + <programlisting language="none"><!DOCTYPE html + PUBLIC "<emphasis role="bold">-//W3C//DTD XHTML 1.0 Strict//EN</emphasis>" + "<emphasis role="bold"></emphasis>"> +<html><head> ...</programlisting> + + <para>This may be improved further by instructing the XSL formatter + to use <uri + xlink:href=""></uri> + as default namespace:</para> + + <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> +<xsl:stylesheet <emphasis role="bold">xmlns=""</emphasis> + xmlns:xsl="" version="2.0"> + +<xsl:output method="xhtml" indent="yes" + doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN" + doctype-system=""/> + + <xsl:template match="/"> + <html><head> ... + </xsl:template> +... +</xsl:stylesheet></programlisting> + + <para>This yields the following output::</para> + + <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE html + PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" + ""> + +<html <emphasis role="bold">xmlns=""</emphasis>> + <head> ... +</html></programlisting> + + <para>The top level element <tag class="element">html</tag> is now + declared to belong to the namespace + <code>xmlns="</code>. This will be + inherited by all inner Xhtml elements.</para> + + <qandaset defaultlabel="qanda" xml:id="example_xsl_book_1_dtd"> + <title>Transforming book instances to Xhtml</title> + + <qandadiv> + <qandaentry> + <question> + <para>Create a <abbrev + xlink:href="">XSL</abbrev> style + sheet to transform instances of the first version of <link + endterm="example_bookDtd" + linkend="example_bookDtd">book.xsd</link> (<xref + linkend="example_bookDtd"/>) into <uri + xlink:href="">Xhtml + 1.0 strict</uri>.</para> + + <para>You should first construct a Xhtml document + <emphasis>manually</emphasis> before coding the XSL. After + you have a <quote>working</quote> Xhtml example document + create a <abbrev + xlink:href="">XSL</abbrev> style + sheet which transforms arbitrary + <filename>book.xsd</filename> document instances into a + corresponding Xhtml file.</para> + </question> + + <answer> + <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> +<xsl:stylesheet xmlns:xsl="" version="2.0"> + + <xsl:output indent="yes" method="xhtml"/> + + <xsl:template match="/book"> + <html> + <head> + <title><xsl:value-of select="title"/></title> + </head> + <body> + <h1><xsl:value-of select="title"/></h1> + <xsl:apply-templates select="chapter"/> + </body> + </html> + </xsl:template> + + <xsl:template match="chapter"> + <h2><xsl:value-of select="title"/></h2> + <xsl:apply-templates select="para"/> + </xsl:template> + + <xsl:template match="para"> + <p><xsl:value-of select="."/></p> + </xsl:template> + +</xsl:stylesheet></programlisting> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="section_xsl_attribute"> + <title><tag class="starttag">xsl:attribute</tag></title> + + <para>Sometimes we want to set attribute values in a generated XML + document. For example we might want to set the background color + <quote>red</quote> if a memo has a priority value of <tag + class="attvalue">high</tag>:</para> + + <programlisting language="none"><h1 style="background:red">Firewall problems</h1></programlisting> + + <para>Regarding our memo example this may be achieved by:</para> + + <programlisting language="none"><xsl:template match="/memo"> + <html> + ... + <body> + <xsl:variable name="<emphasis role="bold">messageColor</emphasis>" <co + xml:id="programlisting_priority_lolor_vardef"/>> + <xsl:choose> + <xsl:when test="@priority = 'low'">green</xsl:when> + <xsl:when test="@priority = 'medium'">yellow</xsl:when> + <xsl:when test="@priority = 'high'">red</xsl:when> + </xsl:choose> + </xsl:variable> + <h1 style="background:{<emphasis role="bold">$messageColor</emphasis>};" <co + xml:id="programlisting_priority_lolor_usevar"/>> + <xsl:value-of select="subject"/> + </h1> + </body> + </html> +</xsl:template></programlisting> + + <calloutlist> + <callout arearefs="programlisting_priority_lolor_vardef"> + <para>Definition of a color name depending on the attribute <tag + class="attvalue">priority</tag>'s value. The set off possible + attribute values (low,medium,high) is mapped to the color names + (green, yellow,red).</para> + </callout> + + <callout arearefs="programlisting_priority_lolor_usevar"> + <para>The color variable is used to compose the attribute <tag + class="attribute">style</tag>'s value. The curly + <code>{...}</code> braces are part of the <abbrev + xlink:href="">XSL</abbrev> standard's + syntax. They are required here to instruct the <abbrev + xlink:href="">XSL</abbrev> processor + to substitute the local variable <code>messageColor</code>'s + value instead of simply copying the literal string + <quote><code>$messageColor</code></quote> itself to the output + document e.g. generating <tag class="starttag">h1 style = + "background:$messageColor;"</tag>.</para> + </callout> + </calloutlist> + + <para>Instead of constructing an extra variable <abbrev + xlink:href="">XSL</abbrev> offers a + slightly more compact way for the same purpose. The <tag + class="starttag">xsl:attribute</tag> element allows us to define the + name of an attribute to be added together with an attribute value + specification:</para> + + <programlisting language="none"><xsl:template match="/memo"> + <html> + ... + <h1> + <xsl:attribute name="<emphasis role="bold">style</emphasis>"> + <xsl:text>background:</xsl:text> + <xsl:choose> + <xsl:when test="@priority = 'low'">green</xsl:when> + <xsl:when test="@priority = 'medium'">yellow</xsl:when> + <xsl:when test="@priority = 'high'">red</xsl:when> + </xsl:choose> + </xsl:attribute> + <xsl:value-of select="subject"/> + </h1> + </body> + </html> +</xsl:template></programlisting> + + <qandaset defaultlabel="qanda" xml:id="example_book_toc"> + <title>Adding a table of contents (toc)</title> + + <qandadiv> + <qandaentry> + <question> + <para>For larger document instances it is convenient to add + a table of contents to the generated Xhtml document. <!-- We + demonstrate the desired result as an <uri + xlink:href="src/viewlet/bookhtmltoc/bookhtmltoc_viewlet_swf.html">animation</uri>.--></para> + + <para>For this exercise you need a unique string value for + each <tag class="starttag">chapter</tag> node. If a <tag + class="starttag">chapter</tag>'s <tag + class="attribute">id</tag> attribute had been declared as + <code>#REQUIRED</code> its value would do this job + perfectly. Unfortunately you cannot rely on its existence + since it is declared to be <code>#IMPLIED</code> and may + thus be absent.</para> + + <para>XSL offers a standard function for this purpose namely + <link + xlink:href="">generate-id(...)</link>. + In a nutshell this function takes a XML node as an argument + (or being called without arguments it uses the context node) + and creates a string value being unique with respect to + <emphasis>all</emphasis> other nodes in the document. For a + given node the function may be called repeatedly and is + guaranteed to always return the same value during the + <emphasis>same</emphasis> transformation run. So it suffices + to add something like <tag class="starttag">a + href="#{generate-id(...)}"</tag> or use it in conjunction + with <tag class="starttag">xsl:attribute</tag>.</para> + </question> + + <answer> + <para>We use the <code>generate-id()</code> function to + create a unique identity string for each chapter node. Since + we also want to define links to the table of contents we + need another unique string value. It is tempting to simply + use a static value like <quote>__toc__</quote> for this + purpose. However we can not be sure that this value + coincides with one of the <code>generate-id()</code> + function return values.</para> + + <para>A cleaner solution uses the <tag + class="starttag">book</tag> node's generated identity string + for this purpose. As stated before this value is + definitively unique:</para> + + <programlisting language="none"><xsl:template match="/book"> +... + <body> + <h1><xsl:value-of select="title"/></h1> + <h2 id="{generate-id(.)}" <co xml:base="" + xml:id="programlisting_book_toc_def_toc"/>>Table of contents</h2> + <ul> + <xsl:for-each select="chapter"> + <li> + <a href="#{generate-id(.)}" <co xml:base="" + xml:id="programlisting_book_toc_ref_chap"/>><xsl:value-of select="title"></xsl:value-of></a> + </li> + </xsl:for-each> + </ul> + <xsl:apply-templates select="chapter"/> + </body> + </html> +</xsl:template> + +<xsl:template match="chapter"> + <h2 id="{generate-id(.)}" <co xml:base="" + xml:id="programlisting_book_toc_def_chap"/>> + <a href="#{generate-id(/book)}" <co xml:base="" + xml:id="programlisting_book_toc_ref_toc"/>> + <xsl:value-of select="title"/> + </a> + </h2> + <xsl:apply-templates select="para"/> +</xsl:template> +...</programlisting> + + <calloutlist> + <callout arearefs="programlisting_book_toc_def_toc"> + <para>The current context node is <tag + class="starttag">book</tag>. We use it as argument to + <code>generate-id()</code> to create a unique identity + string.</para> + </callout> + + <callout arearefs="programlisting_book_toc_ref_chap"> + <para>The <tag class="starttag">xsl:for-each</tag> + iterates over all <tag class="starttag">chapter</tag> + nodes. We reference the corresponding target nodes being + created in <xref + linkend="programlisting_book_toc_def_chap"/>.</para> + </callout> + + <callout arearefs="programlisting_book_toc_def_chap"> + <para>Each <tag class="starttag">chapter</tag>'s heading + is supplied with a unique identity string being + referenced from <xref + linkend="programlisting_book_toc_ref_chap"/>.</para> + </callout> + + <callout arearefs="programlisting_book_toc_ref_toc"> + <para>Clicking on a chapter's title shall take us back + to the table of contents (toc). So we create a hypertext + link referencing our toc heading's identity string being + defined in <xref + linkend="programlisting_book_toc_def_toc"/>.</para> + </callout> + </calloutlist> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="section_xsl_mixed"> + <title>XSL and mixed content</title> + + <para>The subsequent example shows an element <tag + class="starttag">content</tag> having a mixed content model possibly + containing <tag class="starttag">url</tag> and <tag + class="starttag">emphasis</tag> child nodes:</para> + + <programlisting language="none"><content>The <emphasis + role="bold"><url href="">XML</url></emphasis> language + is <emphasis role="bold"><emphasis>easy</emphasis></emphasis> to learn. However you need + some <emphasis role="bold"><emphasis>time</emphasis></emphasis>.</content></programlisting> + + <para>Embedded element nodes have been set to bold style in order to + distinguish them from <code>xs:text</code> nodes. A possible + <acronym>XHtml</acronym> output might look like:</para> + + <programlisting language="none"><p>The <emphasis role="bold"><a href="">XML</a>language is<em>easy</em></emphasis> to learn. However you +need some <emphasis role="bold"><em>time</em></emphasis>.</p></programlisting> + + <para>We start with a first version of an <abbrev + xlink:href="">XSL</abbrev> + template:</para> + + <programlisting language="none"> <xsl:template match="content"> + <p> + <xsl:value-of select="."/> + </p> + </xsl:template></programlisting> + + <para>As mentioned earlier all <code>#PCDATA</code> text nodes of + the whole subtree are glued together leading to:</para> + + <programlisting language="none"><p>The XML language is easy to learn. However you need some time.</p></programlisting> + + <para>Our next attempt is to define templates to format the elements + <tag class="starttag">url</tag> and <tag + class="starttag">emphasis</tag>:</para> + + <programlisting language="none">... +<xsl:template match="content"> + <p> + <xsl:apply-templates select="emphasis|url"/> + </p> +</xsl:template> + +<xsl:template match="url"> + <a href="{@href}"><xsl:value-of select="."/></a> +</xsl:template> + +<xsl:template match="emphasis"> + <em><xsl:value-of select="."/></em> +</xsl:template> +...</programlisting> + + <para>As expected the sub elements are formatted correctly. + Unfortunately the <code>#PCDATA</code> text nodes between the + element nodes are lost:</para> + + <programlisting language="none"><p> + <a href="">XML</a> + <em>easy</em> + <em>time</em> +</p></programlisting> + + <para>To correct this transformation script we have to tell the + formatting processor to include bare text nodes into the output. The + <abbrev xlink:href="">XPath</abbrev> + standard defines a function <link + xlink:href="">text()</link> + for this purpose. It returns the boolean value <code>true</code> for + an argument node of type text:</para> + + <programlisting language="none">... +<xsl:template match="content"> + <p> + <xsl:apply-templates select="<emphasis role="bold">text()</emphasis>|emphasis|url"/> + </p> +</xsl:template> +...</programlisting> + + <para>The yields the desired output. The text node result elements + are shown in bold style</para> + + <programlisting language="none"><p><emphasis role="bold">The</emphasis> <a href="">XML</a><emphasis + role="bold"> language is </emphasis><em>easy</em><emphasis + role="bold"> to learn. However +you need some </emphasis><em>time</em><emphasis role="bold">.</emphasis></p></programlisting> + + <para>Some remarks:</para> + + <orderedlist> + <listitem> + <para>The <abbrev + xlink:href="">XPath</abbrev> + expression <code>select="text()|emphasis|url"</code> corresponds + nicely to the schema's content model definition:</para> + + <programlisting language="none"><xs:element name="content"> + <xs:complexType <emphasis role="bold">mixed="true"</emphasis>> + <xs:choice minOccurs="0" maxOccurs="unbounded"> + <xs:element <emphasis role="bold">ref="emphasis"</emphasis>/> + <xs:element <emphasis role="bold">ref="url"</emphasis>/> + </xs:choice> + ... + </xs:complexType> +</xs:element></programlisting> + </listitem> + + <listitem> + <para>In most mixed content models <emphasis>all</emphasis> sub + elements of e.g. <tag class="starttag" role="">content</tag> + have to be formatted. During development some of the elements + defined in a schema are likely to be omitted by accidence. For + this reason the <quote>typical</quote> <abbrev + xlink:href="">XPath</abbrev> + expression acting on mixed content models is defined to match + <emphasis>any</emphasis> sub element nodes:</para> + + <programlisting language="none">select="text()|<emphasis + role="bold">*</emphasis>"</programlisting> + </listitem> + + <listitem> + <para>Regarding <code>select="text()|emphasis|url"</code> we + have defined two templates for element nodes <tag + class="starttag">emphasis</tag> and <tag + class="starttag">url</tag>. What happens to those text nodes + being matched by <code>text()</code>? These are subject to a + default rule: The content of bare text nodes is written to the + output. We may however redefine this default rule by adding a + template:</para> + + <programlisting language="none"><xsl:template match="text()"> + <emphasis role="bold"><span style="color:red"> + <xsl:value-of select="."/> + </span></emphasis> +</xsl:template></programlisting> + + <para>This yields:</para> + + <programlisting language="none"><p> + <emphasis role="bold"><span style="color:red">The </span></emphasis> + <a href="">XML</a> + <emphasis role="bold"><span style="color:red"> language is </span></emphasis> + <em>easy</em> + <emphasis role="bold"><span style="color:red"> to learn. However you need some </span></emphasis> + <em>time</em> + <emphasis role="bold"><span style="color:red">.</span></emphasis> +</p></programlisting> + + <para>In most cases it is not desired to replace all text nodes + throughout the whole document. In the current example we might + only format text nodes being <emphasis>immediate</emphasis> + children of <tag class="starttag">content</tag>. This may be + achieved by restricting the <abbrev + xlink:href="">XPath</abbrev> + expression to <tag class="starttag">xsl:template + match="content/text()"</tag>.</para> + </listitem> + </orderedlist> + </section> + + <section xml:id="section_xsl_functionid"> + <title>The function <code>id()</code></title> + + <para>In <abbrev + xlink:href="">XSL</abbrev> we sometimes + want to lookup nodes by an attribute value of type <link + xlink:href="???">ID</link>. We consider our product catalog from + <xref linkend="sectSchemaProductCatalog"/>. The following <abbrev + xlink:href="">XSL</abbrev> may be used to + create <acronym>XHtml</acronym>l documents from <tag + class="starttag">catalog</tag> instances:</para> + + <programlisting language="none" xml:lang=""><xsl:template match="/catalog"> + <html> + <head><title>Product catalog</title></head> + <body> + <h1>List of Products</h1> + <xsl:apply-templates select="product"/> + </body> + </html> +</xsl:template> + +<xsl:template match="product"> + <h2 id="{@id}" <co xml:base="" + xml:id="programlisting_catalog2html_v1_defid"/>><xsl:value-of select="title"/></h2> + <xsl:apply-templates select="para"/> +</xsl:template> + +<xsl:template match="para"> + <p><xsl:apply-templates select="text()|*" <co + xml:id="programlisting_catalog2html_v1_mixed"/>/></p> +</xsl:template> + +<xsl:template match="link"> + <a href="#{@ref}" <co xml:id="programlisting_catalog2html_v1_refid"/>><xsl:value-of select="."/></a> +</xsl:template></programlisting> + + <calloutlist> + <callout arearefs="programlisting_catalog2html_v1_defid"> + <para>The <code>ID</code> attribute <tag + class="starttag">product id="foo"</tag> is unique within the + document instance. We may thus use it as an unique string value + in the generated Xhtml, too.</para> + </callout> + + <callout arearefs="programlisting_catalog2html_v1_mixed"> + <para>Mixed content consisting of text and <tag + class="starttag">link</tag> nodes.</para> + </callout> + + <callout arearefs="programlisting_catalog2html_v1_refid"> + <para>We define a file local Xhtml reference to a + product.</para> + </callout> + </calloutlist> + + <para>The <tag class="starttag">para</tag> element from the example + document instance containing a <tag class="starttag">link + ref="homeTrainer"</tag> reference will be formatted as:</para> + + <programlisting language="none"><p>If you hate rain look <a href="#homeTrainer">here</a>.</p></programlisting> + + <para>Now suppose we want to add the product's title <emphasis>Home + trainer</emphasis> here to give the reader an idea about the product + without clicking the hypertext link:</para> + + <programlisting language="none"><p>If you hate rain look <a href="#homeTrainer">here</a> <emphasis + role="bold">(Home trainer)</emphasis>.</p></programlisting> + + <para>This title text node is part of the <tag + class="starttag">product</tag>node being referenced from the current + <tag class="starttag">para</tag>:</para> + + <figure xml:id="linkIdrefProduct"> + <title>A graphical representation of our <tag + class="starttag">catalog</tag>.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/xsl_id.fig"/> + </imageobject> + + <caption> + <para>The dashed line shows the <code>IDREF</code> based + reference from the <tag class="starttag">link</tag> to the + <tag class="starttag">product</tag> node.</para> + </caption> + </mediaobject> + </figure> + + <para>In <abbrev + xlink:href="">XSL</abbrev> we may follow + <code>ID</code> reference by means of the built in function <link + xlink:href="">id(...)</link>:</para> + + <programlisting language="none"><xsl:template match="link"> + <a href="#{@ref}"><xsl:value-of select="."/></a> + <xsl:text> (</xsl:text> + <xsl:value-of select="<emphasis role="bold">id(@ref)</emphasis>/title" <co + xml:id="programlisting_xsl_id_follow"/>/> + <xsl:text>)</xsl:text> +</xsl:template></programlisting> + + <para>Evaluating <code>id(@ref)</code> at <xref + linkend="programlisting_xsl_id_follow"/> returns the first <tag + class="starttag">product</tag> <emphasis>node</emphasis>. We simply + take its <tag class="starttag">title</tag> value and embed it into a + pair of braces. This way the desired text portion <emphasis + role="bold">(Home trainer)</emphasis> gets added after the hypertext + link.</para> + + <qandaset defaultlabel="qanda" xml:id="example_book_xsl_mixed"> + <title>Extending the memo style sheet by mixed content and + itemized lists</title> + + <qandadiv> + <qandaentry> + <question> + <para>In <xref linkend="example_book.dtd_v5"/> we + constructed a schema allowing itemized lists and mixed + content for <tag class="starttag">book</tag> instances. This + schema also allowed to define <tag + class="starttag">emphasis</tag>, <tag + class="starttag">table</tag> and <tag + class="starttag">link</tag> elements being part of a mixed + content definition. Extend the current book2html.xsl to + account for these extensions.</para> + + <para + xlink:href="">As + we already saw in our memo example itemized lists in Xhtml + are represented by the element <tag + class="starttag">ul</tag> containing <tag + class="starttag">li</tag> elements. Since <tag + class="starttag">p</tag> elements are also allowed to appear + as children our itemized lists can be easily mapped to Xhtml + tags. A<tag class="starttag">link</tag> node may be + transformed into <tag class="starttag">a href="..."</tag> + Xhtml node.</para> + + <para>The table model is a simplified version of the Xhtml + table model. Read the <abbrev + xlink:href="">XSL</abbrev> + documentation of the element <tag + class="emptytag">xsl:copy-of</tag> at <link + xlink:href="">copy-of</link> + for processing tables.</para> + </question> + + <answer> + <para>The full source code of the solution is available at + <link + xlink:href="Ref/src/Dtd/book/v5/book2html.1.xsl">(Online + HTML version) ... book2html.1.xsl</link>. We discuss some + important aspects. The following table provides mapping + rules from <filename>book.xsd</filename> to Xhtml:</para> + + <table xml:id="table_book2xhtml_element_mappings"> + <title>Mapping elements from <filename>book.xsd</filename> + to Xhtml</title> + + <?dbhtml table-width="50%" ?> + + <?dbfo table-width="50%" ?> + + <tgroup cols="2"> + <colspec colwidth="3*"/> + + <colspec colwidth="2*"/> + + <thead> + <row> + <entry>book.xsd</entry> + + <entry>Xhtml</entry> + </row> + </thead> + + <tbody> + <row> + <entry><tag class="starttag">book</tag>/<tag + class="starttag">title</tag></entry> + + <entry><tag class="starttag">h1</tag></entry> + </row> + + <row> + <entry><tag class="starttag">chapter</tag>/<tag + class="starttag">title</tag></entry> + + <entry><tag class="starttag">h2</tag></entry> + </row> + + <row> + <entry><tag class="starttag">para</tag> (mixed + content)</entry> + + <entry><tag class="starttag">p</tag></entry> + </row> + + <row> + <entry><tag class="starttag">link + href="foo"</tag></entry> + + <entry><tag class="starttag">a + href="foo"</tag></entry> + </row> + + <row> + <entry><tag class="starttag">emphasis</tag></entry> + + <entry><tag class="starttag">em</tag></entry> + </row> + + <row> + <entry><tag + class="starttag">itemizedlist</tag></entry> + + <entry><tag class="starttag">ul</tag></entry> + </row> + + <row> + <entry><tag class="starttag">listitem</tag></entry> + + <entry><tag class="starttag">li</tag></entry> + </row> + + <row> + <entry><tag class="starttag">table</tag>, <tag + class="starttag">caption</tag>,<tag + class="starttag">tr</tag>, <tag + class="starttag">td</tag> along with all + attributes</entry> + + <entry>Identity copy</entry> + </row> + </tbody> + </tgroup> + </table> + + <para>Since our table model is a subset of the HTML table + model we may simply copy corresponding nodes to the + output:</para> + + <programlisting language="none"><xsl:template match="table"> + <xsl:copy-of select="."/> +</xsl:template></programlisting> + + <para>Next we need rules for itemized lists and paragraphs. + Our model already implements lists in a way that closely + resembles XHTML lists. Since the structure are compatible we + only have to provide a mapping:</para> + + <programlisting language="none"><xsl:template match="para"> + <p id="{generate-id(.)}"><xsl:apply-templates select="text()|*" /></p> +</xsl:template> + +<xsl:template match="itemizedlist"> + <ul><xsl:apply-templates select="listitem"/></ul> +</xsl:template> + +<xsl:template match="listitem"> + <li><xsl:apply-templates select="*"/></li> +</xsl:template></programlisting> + + <para>Since <emphasis>all</emphasis> chapters are reachable + via hypertext links from the table of contents we + <emphasis>must</emphasis> supply a unique <code>id</code> + value <xref + linkend="programlisting_book2html_single_chapterid"/> for + <emphasis>all</emphasis> of them. Chapters and paragraphs + may be referenced by <tag class="starttag">link</tag> + elements and thus <emphasis>both</emphasis> need a unique + identity value. For simplicity we create both of them via + <code>generate-id()</code>. In a more sophisticated solution + the strategy would be slightly different:</para> + + <itemizedlist> + <listitem> + <para>If a <tag class="starttag">chapter</tag> node does + have an <code>id</code> attribute defined then take its + value.</para> + </listitem> + + <listitem> + <para>If a <tag class="starttag">chapter</tag> node does + <emphasis>not</emphasis> have an <code>id</code> + attribute defined then use + <code>generate-id()</code>.</para> + </listitem> + + <listitem> + <para><tag class="starttag">para</tag> nodes only get + values in XHTML if they do have an <code>id</code> + attribute defined. This is consistent since these nodes + are never referenced from the table of contents. Thus an + identity is only required if the <tag + class="starttag">para</tag> node is referenced by a <tag + class="starttag">link</tag>. If that is a case the <tag + class="starttag">para</tag> surely does have a defined + identity value.</para> + </listitem> + </itemizedlist> + + <para>We also have to provide a hypertext link <xref + linkend="programlisting_book2html_single_toclink"/> to the + table of contents:</para> + + <programlisting language="none"><xsl:template match="chapter"> + <h2 id="{<emphasis role="bold">generate-id(.)</emphasis>}" <co + xml:base="" + xml:id="programlisting_book2html_single_chapterid"/>> + <a href="#{<emphasis role="bold">generate-id(/book)</emphasis>}" <co + xml:base="" + xml:id="programlisting_book2html_single_toclink"/>><xsl:value-of select="title"/></a> + </h2> + <xsl:apply-templates select="para|itemizedlist|table"/> +</xsl:template></programlisting> + + <para>Implementing the <tag class="starttag">link</tag> + element is somewhat more complicated. We cannot use the + <code>@ref</code> attribute values itself as <tag + class="starttag">a href="..."</tag> attribute values since + the target's identity string is generated via + <code>generate-id()</code>. But we may follow the reference + via the <abbrev + xlink:href="">XPath</abbrev> <link + linkend="section_xsl_functionid">id()</link> function and + then use the target's identity value:</para> + + <programlisting language="none"><xsl:template match="link"> + <a href="#{generate-id(id(@linkend))}"> + <xsl:value-of select="."/> + </a> +</xsl:template></programlisting> + + <para>The call to <code>id(@linkend)</code> returns either a + <tag class="starttag">chapter</tag> or a <tag + class="starttag">para</tag> node since attributes of type + <code>ID</code> are only defined for these two elements. + Using this node as input to <code>generate-id()</code> + returns the desired identity value to be used in the + generated Xhtml.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + + <section xml:id="xslAxis"> + <title>XSL axis definitions</title> + + <para>XSL allows us to traverse a document instance's graph in + different directions. We start with a memo document instance:</para> + + <programlisting language="none"><memo xmlns:xsi="" + xsi:noNamespaceSchemaLocation="memo.xsd" date="9.9.2099"> + <from>Joe</from> + <to>Jack</to> + <to>Eve</to> + <to>Jude</to> + <to>Tolstoi</to> + <subject>Ignore me!</subject> + <content> + <para>Dumb text.</para> + </content> +</memo></programlisting> + + <para>This instance defines four nodes of type <tag + class="starttag">to</tag>. For each of these we want to create a + line of text showing also the preceding and the following + recipients:</para> + + <programlisting language="none"> <----Jack----> Eve Jude Tolstoi <co + xml:id="programlisting_axis_jack"/> +Jack <----Eve----> Jude Tolstoi <co xml:id="programlisting_axis_eve"/> +Jack Eve <----Jude----> Tolstoi <co xml:id="programlisting_axis_jude"/> +Jack Eve Jude <----Tolstoi----> <co + xml:id="programlisting_axis_tolstoi"/></programlisting> + + <calloutlist> + <callout arearefs="programlisting_axis_jack"> + <para>Jack has no predecessor and 3 successors</para> + </callout> + + <callout arearefs="programlisting_axis_eve"> + <para>Eve has 1 predecessor and 2 successors</para> + </callout> + + <callout arearefs="programlisting_axis_jude"> + <para>Jude has 2 predecessors and 1 successor</para> + </callout> + + <callout arearefs="programlisting_axis_tolstoi"> + <para><personname>Tolstoi</personname> has 3 predecessors and no + successor</para> + </callout> + </calloutlist> + + <para>XSL supports this type of transformation by supplying <acronym + xlink:href="">XPath</acronym> axis + definitions. We consider a memo document with 9 <tag + class="starttag">to</tag> nodes:</para> + + <figure xml:id="memo9recipients"> + <title>A memo with 9 recipients</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/memofour.fig"/> + </imageobject> + </mediaobject> + </figure> + + <para>We marked the 4-th recipient to represent the context node. + All three <tag class="starttag">to</tag> nodes to the + <quote>left</quote> belong to the <emphasis>set</emphasis> of + preceding siblings with respect to the context node. Likewise the 5 + neighbours to the right are called following siblings. Returning to + our <quote>four recipient</quote> example we may create the desired + output by:</para> + + <programlisting language="none"><xsl:template match="/"> + <xsl:apply-templates select="memo/to"/> +</xsl:template> + +<xsl:template match="to"> + + <xsl:for-each select="preceding-sibling::to" <co + xml:id="programlisting_memo_four_xsl_preceding"/>> + <xsl:value-of select="."/> + <xsl:text> </xsl:text> + </xsl:for-each> + + <xsl:text> &lt;----</xsl:text> + <xsl:value-of select="."/> <co + xml:id="programlisting_memo_four_xsl_context"/> + <xsl:text>----&gt; </xsl:text> + + <xsl:for-each select="following-sibling::to"> <co + xml:id="programlisting_memo_four_xsl_following"/> + <xsl:value-of select="."/> + <xsl:text> </xsl:text> + </xsl:for-each> + <xsl:value-of select="$newline"/> +</xsl:template></programlisting> + + <calloutlist> + <callout arearefs="programlisting_memo_four_xsl_preceding"> + <para>Iterate on the set of recipients <quote>left</quote> of + the context node.</para> + </callout> + + <callout arearefs="programlisting_memo_four_xsl_context"> + <para>Taking the context node's value embedded in <code><---- + ... ----></code>.</para> + </callout> + + <callout arearefs="programlisting_memo_four_xsl_following"> + <para>Iterate on the set of recipients <quote>right</quote> of + the context node.</para> + </callout> + </calloutlist> + + <para>More formally the set of preceding siblings is defined to be + the set of all nodes having the same parent as the context node and + appearing <quote>before</quote> the context node. The notion + <quote>before</quote> is meant in the sense of a <link + xlink:href="">depth-first</link> + traversal of the document tree. <abbrev + xlink:href="">XPath</abbrev> provides + different axis definitions, see <uri + xlink:href=""></uri> + for details. We provide an illustration here:</para> + + <figure xml:id="disjointAxeSets"> + <title>Disjoint <acronym + xlink:href="">XPath</acronym> axis + definitions.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/preceding.fig"/> + </imageobject> + + <caption> + <para>The sets defined by ancestor, descendant, following, + preceding and self are disjoint. Their union forms the set of + all document nodes.</para> + </caption> + </mediaobject> + </figure> + + <para>Some remarks:<itemizedlist> + <listitem> + <para>If the context node is already the topmost node i.e. the + root node then the sets defined by <code>ancestor</code> and + <code>parent</code> are empty.</para> + </listitem> + + <listitem> + <para>The <code>parent</code> set <emphasis>always</emphasis> + contains zero or one node.</para> + </listitem> + </itemizedlist></para> + </section> + + <section xml:id="xslChunking"> + <title>Splitting documents into chunks</title> + + <para>Sometimes we want to generate multiple output documents from a + single XML source. It may for example be a bad idea to transform a + book of 200 printed pages into a <emphasis>single</emphasis> online + HTML page. Instead we may split each chapter into a separate HTML + file and create navigation links between them.</para> + + <para>We consider a memo document instance. We want to generate one + text file for each memo recipient containing just the recipient's + name using the <abbrev + xlink:href="">XSL</abbrev> element <link + xlink:href=""><xsl:result-document></link>:</para> + + <programlisting language="none"><xsl:template match="/memo"> + <xsl:apply-templates select="to"/> +</xsl:template> + +<xsl:template match="to"> + <emphasis role="bold"><xsl:result-document</emphasis> + <co xml:id="programlisting_xsl_result_document_main"/> + <emphasis role="bold">href="file_{position()}.txt"</emphasis> + <co xml:id="programlisting_xsl_result_document_href"/> + <emphasis role="bold">method="text"</emphasis> + <co xml:id="programlisting_xsl_result_document_method"/>> + <xsl:value-of select="."/> <co + xml:id="programlisting_xsl_result_document_content"/> + + <emphasis role="bold"></xsl:result-document></emphasis> +</xsl:template></programlisting> + + <calloutlist> + <callout arearefs="programlisting_xsl_result_document_main"> + <para>The output from all generating <abbrev + xlink:href="">XSL</abbrev> directives + will be redirected from standard output to another output + channel.</para> + </callout> + + <callout arearefs="programlisting_xsl_result_document_href"> + <para>The output will be written to a file named + <filename>file_i.txt</filename> with decimal number + <code>i</code> ranging from value 1 up to the number of + recipients.</para> + </callout> + + <callout arearefs="programlisting_xsl_result_document_method"> + <para>The <code>method</code> attribute possibly overrides a + value being given in the <tag class="starttag">xsl:output</tag> + element. We may also redefine <link + xlink:href="">other + attributes</link> from <tag class="starttag">xsl:output</tag> + like <code>doctype-{public.system}</code> and the generated + file's <code>encoding</code>.</para> + </callout> + + <callout arearefs="programlisting_xsl_result_document_content"> + <para>All output being generated in this region gets redirected + to the channel specified in <xref + linkend="programlisting_xsl_result_document_href"/>.</para> + </callout> + </calloutlist> + + <qandaset defaultlabel="qanda" xml:id="example_book_chunk"> + <title>Splitting book into chapter files</title> + + <qandadiv> + <qandaentry> + <question> + <para>Extend your solution of <xref + linkend="example_book_xsl_mixed"/> by writing each <tag + class="starttag">chapter</tag>'s content into a separate + Xhtml file. In addition create a file + <filename>index.html</filename> which contains references to + the corresponding <tag class="starttag">chapter</tag> + documents. Thus for a document instance with two chapters + the overall navigation structure is illustrated by <xref + linkend="figure_book_navigation"/>.</para> + + <para>Implementing the <tag class="starttag">link</tag> tag + may cause a problem: An internal link may reference a <tag + class="starttag">para</tag>. You need to identify the <tag + class="starttag">chapter</tag> node embedding this para. + This may be done by using a suitable <abbrev + xlink:href="">XPath</abbrev> axis + direction.</para> + </question> + + <answer> + <para>The full source code of the solution is available at + <link + xlink:href="Ref/src/Dtd/book/v5/book2chunks.1.xsl">(Online + HTML version) ... book2chunks.1.xsl</link>. First we + generate the table of contents file + <filename>index.html</filename>:</para> + + <programlisting language="none"><xsl:template match="/"> + <xsl:result-document href="index.html"> + <xsl:apply-templates select="book"/> + </xsl:result-document> + + <xsl:for-each select="book/chapter"> + <xsl:result-document href="{generate-id(.)}.html"> + <xsl:apply-templates select="."/> + </xsl:result-document> + </xsl:for-each> +</xsl:template> + +<xsl:template match="book"> + <html> + <head><title><xsl:value-of select="title"/></title></head> + <body> + <h1><xsl:value-of select="title"/></h1> + <h2>Table of contents</h2> + <ul> + <xsl:for-each select="<emphasis role="bold">chapter</emphasis>"> + <li><a href="{<emphasis role="bold">generate-id(.)</emphasis>}.html"><xsl:value-of select="title"/></a></li> + </xsl:for-each> + </ul> + </body> + </html> +</xsl:template></programlisting> + + <para>The <tag class="starttag">link ref="..."</tag> may + reference a <tag class="starttag">chapter</tag> or a <tag + class="starttag">para</tag>. So we may need to <quote>step + up</quote> from a paragraph to the corresponding chapter + node:</para> + + <programlisting language="none"><xsl:template match="link"> + <xsl:variable name="reftargetNode" select="id(@linkend)"/> + <xsl:variable name="reftargetParentChapter" + select="$reftargetNode/ancestor-or-self::chapter"/> + + <a href="{generate-id($reftargetParentChapter)}.html#{ + generate-id($reftargetNode)}"> + <xsl:value-of select="."/> + </a> +</xsl:template></programlisting> + + <para>This is consistent since <emphasis>all</emphasis> <tag + class="starttag">p</tag> nodes in the generated Xhtml + receive a unique <code>id</code> value regardless whether + the originating <tag class="starttag">para</tag> node does + have one.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + + <figure xml:id="figure_book_navigation"> + <title>A <tag class="starttag">book</tag> document with two + chapters</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/booknavigate.fig"/> + </imageobject> + </mediaobject> + </figure> + </section> + </section> + </section> + </chapter> + diff --git a/lectures.xml b/lectures.xml new file mode 100644 index 000000000..a1a5d5b2d --- /dev/null +++ b/lectures.xml @@ -0,0 +1,54 @@ +<?xml version="1.0" encoding="UTF-8"?> +<book version="5.0" xmlns="" + xmlns:xlink="" + xmlns:xi="" + xmlns:svg="" + xmlns:m="" + xmlns:html="" + xmlns:db=""> + <info> + <title>Lecture notes</title> + + <author> + <personname><firstname>Martin</firstname> + <surname>Goik</surname></personname> + + <affiliation> + <orgname></orgname> + </affiliation> + </author> + + <legalnotice> + <para>Source code available at <uri + xlink:href=""></uri></para> + </legalnotice> + </info> + + <part xml:id="sda1"> + <info> + <title>Structured Data and Applications 1</title> + </info> + + <xi:include href="Sda1/prerequisites.xml" xpointer="element(/1)"/> + + <xi:include href="Sda1/xmlintro.xml" xpointer="element(/1)"/> + + <xi:include href="Sda1/xmlschema.xml" xpointer="element(/1)"/> + + <xi:include href="Sda1/xslt.xml" xpointer="element(/1)"/> + + <xi:include href="Sda1/sax.xml" xpointer="element(/1)"/> + + <xi:include href="Sda1/dom.xml" xpointer="element(/1)"/> + + <xi:include href="Sda1/jdbc.xml" xpointer="element(/1)"/> + + <xi:include href="Sda1/testng.xml" xpointer="element(/1)"/> + + <xi:include href="Sda1/fo.xml" xpointer="element(/1)"/> + + <xi:include href="bibliography.xml" xpointer="element(/1)"/> + + <xi:include href="glossary.xml" xpointer="element(/1)"/> + </part> +</book> -- GitLab