From 306792a4c782ae3d141f379dff07c2475ecf1ee9 Mon Sep 17 00:00:00 2001 From: Martin Goik <goik@hdm-stuttgart.de> Date: Thu, 14 Mar 2013 07:10:20 +0100 Subject: [PATCH] old state for HEAD --- Doc/course.xml | 17903 +++++++++++------------------------------------ 1 file changed, 3998 insertions(+), 13905 deletions(-) diff --git a/Doc/course.xml b/Doc/course.xml index e62c0544e..b4fbd3717 100644 --- a/Doc/course.xml +++ b/Doc/course.xml @@ -659,446 +659,476 @@ drwxr-xr-x 4 goik fb1prof 4096 Nov 8 22:04 .. </section> </chapter> - <chapter xml:id="xmlIntro"> - <title>Introduction to XML</title> + <chapter xml:id="introPersistence"> + <title>Accessing Relational Data</title> - <section xml:id="xmlBasic"> - <title>The XML industry standard</title> + <section xml:id="persistence"> + <title>Persistence in Object Oriented languages</title> - <para>A short question might be: <quote>What is XML?</quote> An answer - might be: The acronym XML stands for - <quote>E<emphasis>x</emphasis>tensible <emphasis>M</emphasis>arkup - <emphasis>L</emphasis><foreignphrase>anguage</foreignphrase></quote> - and is an industry standard being published by the W3C standardization - organization. Like other industry software standards talking about XML - leads to talk about XML based software: Applications and frameworks - supplying added values to software implementors and enhancing data - exchange between applications.</para> + <para>Following <xref linkend="Bauer05"/> we may define persistence + by:</para> - <para>Many readers are already familiar with XML without explicitly - referring to the standard itself: The world wide web's - <foreignphrase>lingua franca</foreignphrase> HTML has been ported to - an XML dialect forming the <link - xlink:href="http://www.w3.org/MarkUp">XHTML</link> Standard. The idea - behind this standard is to distinguish between an abstract markup - language and rendered results being generated from so called document - instances by a browser:</para> + <blockquote> + <para>persistence allows an object to outlive the process that + created it. The state of the object may be stored to disk and an + object with the same state re-created at some point in the + future.</para> + </blockquote> - <figure xml:id="renderXhtmlMarkup"> - <title>Rendering XHTML markup</title> + <para>The notion of <quote>process</quote> refers to operating + systems. Let us start wit a simple example assuming a <trademark + xlink:href="http://www.oracle.com/us/technologies/java">Java</trademark> + class User:</para> - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/xhtml.fig"/> - </imageobject> - </mediaobject> - </figure> + <programlisting>public class User { + String cname; //The user's common name e.g. 'Joe Bix' + String uid; //The user's unique system ID (login name) e.g. 'bix' - <para>We may extend this example by representing a mathematical - formula via a standard called <link - xlink:href="http://www.w3.org/Math">Mathml</link>:</para> +// getters, setters and other stuff + ... +}</programlisting> - <figure xml:id="mathmlExample"> - <title>A formula in <link - xlink:href="http://www.w3.org/Math">MathML</link> - representation.</title> + <para>A relational implementation might look like:</para> + + <programlisting>CREATE TABLE User( + CHAR(80) cname + ,CHAR(10) uid PRIMARY KEY +)</programlisting> + + <para>Now a <trademark + xlink:href="http://www.oracle.com/us/technologies/java">Java</trademark> + application may create instances of class <code>User</code> and save + these to a database:</para> + + <figure xml:id="processObjPersist"> + <title>Persistence across process boundaries</title> <mediaobject> <imageobject> - <imagedata fileref="Ref/Fig/sqrtrender.fig"/> + <imagedata fileref="Ref/Fig/persistence.fig"/> </imageobject> </mediaobject> </figure> - <para>Again we observe a similar situation: A database like - <emphasis>representation</emphasis> of a formula on the left and a - <emphasis>rendered</emphasis> version on the right. Regarding XML we - have:</para> + <para>Both the <trademark + xlink:href="http://www.oracle.com/technetwork/java/javase">JRE</trademark> + instances and the RDBMS database server are processes (or sets of + processes) typically existing in different address spaces. The two + <trademark + xlink:href="http://www.oracle.com/technetwork/java/javase">JRE</trademark> + processes mentioned here may as well be started in disjoint address + spaces. In fact we might even run two entirely different applications + implemented in different programming languages like <abbrev + xlink:href="http://www.php.net">PHP</abbrev>.</para> - <itemizedlist> - <listitem> - <para>The <link xlink:href="http://www.w3.org/Math">MathML</link> - standard intended to describe mathematical formulas. The standard - defines a set of <emphasis>tags</emphasis> like e.g. <tag - class="starttag">math:msqrt</tag> with well-defined semantics - regarding permitted attribute values and nesting rules.</para> - </listitem> + <para>It is important to mention that the two arrows +  <quote>save</quote> and <quote>load</quote> thus typically denote a + communication across machine boundaries.</para> + </section> - <listitem> - <para>Informal descriptions of formatting expectations.</para> - </listitem> + <section xml:id="jdbcIntro"> + <title>Introduction to <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark></title> - <listitem> - <para>Software transforming an XML formula representation into - visible or printable output. In other words: A rendering - engine.</para> - </listitem> - </itemizedlist> + <section xml:id="jdbcWrite"> + <title>Write access, principles</title> - <para>XML documents may also be regarded as a persistence mechanism to - represent and store data. Similarities to Relational Database Systems - exist. A RDBMS - (<emphasis>R</emphasis><foreignphrase>elational</foreignphrase> - <emphasis>D</emphasis><foreignphrase>atabase</foreignphrase> - <emphasis>M</emphasis><foreignphrase>anagement</foreignphrase> - <emphasis>S</emphasis><foreignphrase>ystem</foreignphrase>) is - typically capable to hold Tera bytes of data being organized in - tables. The arrangement of data may be subject to various constraints - like candidate- or foreign key rules. With respect to both end users - and software developers a RDBMS itself is a building block in a - complete solution. We need an application on top of it acting as a - user interface to the data being contained.</para> + <para>Connecting an application to a database means to establish a + connection from a client to a database server:</para> - <para>In contrast to a RDBMS XML allows data to be organized - hierarchically. The <link - xlink:href="http://www.w3.org/Math">MathML</link> representation given - in <xref linkend="mathmlExample"/> may be graphically - visualized:</para> + <figure xml:id="jdbcClientServer"> + <title>Networking between clients and database servers</title> - <figure xml:id="mathmltree"> - <title>A tree graph representation of the <link - xlink:href="http://www.w3.org/Math">MathML</link> example given - before.</title> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/clientserv.fig"/> + </imageobject> + </mediaobject> + </figure> - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/sqrtree.fig"/> - </imageobject> - </mediaobject> - </figure> + <para>So <trademark + xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> + is just one among a whole bunch of protocol implementations + connecting database servers and applications. Consequently + <trademark + xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> + is expected to appear in the lower layer of multi-tier applications. + We take a three-tier application as a starting point:</para> - <para>Of course RDBMS also allow the representation of tree like - structures or arbitrary graphs. But these have to be modelled by using - foreign key constraints since relational tables themselves have a - <quote>flat</quote> structure. Some RDBMS vendors provide extensions - to the SQL standard which allow <quote>native</quote> representations - of XML documents.</para> - </section> + <figure xml:id="jdbcThreeTier"> + <title>The role of <trademark + xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> + in a three-tier application</title> - <section xml:id="xmlHtml"> - <title>Well formed XML documents</title> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/jdbcThreeTier.fig"/> + </imageobject> + </mediaobject> + </figure> - <para>We start with a simple XML document representing a - message:</para> + <para>We may add an additional layer. Web applications are typically + being build on top of an application server (<productname + xlink:href="http://www.ibm.com/software/de/websphere/">WebSphere</productname>, + <productname + xlink:href="http://glassfish.java.net">Glassfish</productname>, + <productname + xlink:href="http://www.jboss.org/jbossas">Jboss</productname>,...) + providing additional services:</para> - <figure xml:id="memoWellFormed"> - <title>The representation of a short message.</title> + <figure xml:id="jdbcFourTier"> + <title><trademark + xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> + connecting application server and database.</title> - <programlisting><?xml<co xml:id="first_xml_code_magic"/> version="1.0"<co - xml:id="first_xml_code_version"/> encoding="UTF-8"<co - xml:id="first_xml_code_encoding"/>?> -<memo><co xml:id="first_xml_code_topelement"/> - <from>M. Goik</from><co xml:id="first_xml_code_from"/> - <to>B. King</to> - <to>A. June</to> - <subject>Best whishes</subject> - <content>Hi all, congratulations to your splendid party</content> -</memo></programlisting> - </figure> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/jdbcFourTier.fig"/> + </imageobject> + </mediaobject> + </figure> - <calloutlist> - <callout arearefs="first_xml_code_magic"> - <para>The very first characters <code><?xml</code> may be - regarded as a <link - xlink:href="http://en.wikipedia.org/wiki/Magic_number_(programming)">magic - number string</link> being used as a format indicator which allows - to distinguish between different file types i.e. GIF, JPEG, HTML - and so on.</para> - </callout> + <para>So what is actually required to connect to a database server? + A client requires the following parameter values to open a + connection:</para> - <callout arearefs="first_xml_code_version"> - <para>The <code>version="1.0"</code> attribute tells us that all - subsequent lines will conform to the <link - xlink:href="http://www.w3.org/TR/xml">XML</link> standard of - version 1.0. This way a document can express its conformance to - the version 1.0 standard even if in the future this standard - evolves to a higher version e.g. - <code>version="2.1"</code>.</para> - </callout> + <orderedlist> + <listitem xml:id="ItemJdbcProtocol"> + <para>The type of database server i.e. <productname + xlink:href="http://www.oracle.com/us/products/database">Oracle</productname>, + <productname + xlink:href="www.ibm.com/software/data/db2">DB2</productname>, + <productname + xlink:href="http://www-01.ibm.com/software/data/informix">Informix</productname>, + <productname + xlink:href="http://www.mysql.com">Mysql</productname> etc. This + information is needed because of vendor dependent <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + protocol implementations.</para> + </listitem> - <callout arearefs="first_xml_code_encoding"> - <para>The attribute <code>encoding="UTF-8"</code> tells us that - all text in the current document uses <link - xlink:href="http://unicode.org">Unicode</link> encoding. <link - xlink:href="http://unicode.org">Unicode</link> is a widely - accepted industry standard for font encoding. Thus European, - Cyrillic and most Asian font codes are allowed to be used in - documents <emphasis>simultaneously</emphasis>. Other encodings may - limit the set of allowed characters, e.g. - <code>encoding="ISO-8859-1"</code> will only allow characters - belonging to western European languages. However a system also - needs to have the corresponding fonts (e.g. TrueType) being - installed in order to render the document appropriately. A - document containing Chinese characters is of no use if the - underlying rendering system lacks e.g. a set of Chinese True Type - fonts.</para> - </callout> - - <callout arearefs="first_xml_code_topelement"> - <para>An XML document has exactly one top level - <emphasis>node</emphasis>. In contrast to the HTML standard these - nodes are commonly called elements rather than tags. In this - example the top level (root) element is <tag - class="starttag">memo</tag>.</para> - </callout> + <listitem> + <para>The server's <link + xlink:href="http://en.wikipedia.org/wiki/Domain_Name_System">DNS</link> + name or IP number</para> + </listitem> - <callout arearefs="first_xml_code_from"> - <para>Each XML element like <tag class="starttag">from</tag> has a - corresponding counterpart <tag class="endtag">from</tag>. In terms - of XML we say each element being opened has to be closed. In - conjunction with the precedent point this is equivalent to the - fact that each XML document represents a tree structure as being - shown in the <link linkend="mathmltree">tree graph</link> - representation.</para> - </callout> - </calloutlist> + <listitem> + <para>The database service's port number at the previously + defined host. The database server process listens for + connections to this port number.</para> + </listitem> - <para>As with the introductory formula example this representation - itself is of limited usefulness: In an office environment we need a - rendered version being given either as print or as some online format - like E-Mail or HTML.</para> + <listitem xml:id="itemJdbcDatabaseName"> + <para>The database name within the given database server</para> + </listitem> - <para>From a software developer's point of view we may use a piece of - software called a <emphasis>parser</emphasis> to test the document's - standard conformance. At the MI department we may simply invoke - <userinput><command>xmlparse</command> message.xml</userinput> to - start a check:</para> + <listitem> + <para>Optional: A database user's account name and + password.</para> + </listitem> + </orderedlist> - <programlisting><errortext>goik>xmlparse wellformed.xml -Parsing was successful</errortext></programlisting> + <para>Items <xref linkend="ItemJdbcProtocol"/> - <xref + linkend="itemJdbcDatabaseName"/> will be encapsulated into a so + called <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + <link + xlink:href="http://en.wikipedia.org/wiki/Uniform_Resource_Locator">URL</link>. + We consider a typical example corresponding to the previous + parameter list:</para> - <para>Various XML related plugins are supplied for the <productname - xlink:href="http://eclipse.org">eclipse platform</productname> like - the <productname xlink:href="http://oxygenxml.com">Oxygen - software</productname> supplying <quote>life</quote> conformance - checking while editing XML documents. Now we test our assumptions by - violating some of the rules stated before. We deliberately omit the - closing element <tag class="endtag">from</tag>:</para> + <figure xml:id="jdbcUrlComponents"> + <title>Components of a <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + URL</title> - <figure xml:id="omitFrom"> - <title>An invalid XML document due to the omission of <tag - class="endtag">from</tag>.</title> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/jdbcurl.fig"/> + </imageobject> + </mediaobject> + </figure> - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<memo> - <from>M. Goik <co xml:id="omitFromMissingElement"/> - <to>B. King</to> - <to>A. June</to> - <subject>Best whishes</subject> - <content>Hi all, congratulations to your splendid party</content> -</memo></programlisting> + <para>In fact this <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + URL example closely resembles other types of URL strings as being + defined in <uri + xlink:href="http://www.ietf.org/rfc/rfc2396.txt">http://www.ietf.org/rfc/rfc2396.txt</uri>. + Look for <code>opaque_part</code> to understand the second + <quote>:</quote> in the protocol definition part of a <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + URL. Common example for <abbrev + xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev>s + are:</para> - <calloutlist> - <callout arearefs="omitFromMissingElement"> - <para>The opening element <tag class="starttag">from</tag> is - not terminated by <tag class="endtag">from</tag>.</para> - </callout> - </calloutlist> - </figure> + <itemizedlist> + <listitem> + <para><code>http://www.hdm-stuttgart.de/aaa</code></para> + </listitem> - <para>Consequently the parser's output reads:</para> + <listitem> + <para><code>http://someserver.com:8080/someResource</code></para> + </listitem> - <programlisting><errortext>goik>xmlparse omitfrom.xml -file:///ma/goik/workspace/Vorlesungen/Input/Memo/omitfrom.xml:8:3: -fatal error org.xml.sax.SAXParseException: The element type "from" -must be terminated by the matching end-tag "</from>". parsing error</errortext></programlisting> + <listitem> + <para><code>ftp://mirror.mi.hdm-stuttgart.de/Firmen</code></para> + </listitem> + </itemizedlist> - <para>Experienced HTML authors may be confused: In fact HTML is not an - XML standard. Instead HTML belongs to the set of SGML applications. - SGML is a much older standard namely the <emphasis>Standard - Generalized Markup Language</emphasis>.</para> + <para>We notice the explicit mentioning of a port number 8080 in the + second example; The default <abbrev + xlink:href="http://www.w3.org/Protocols">http</abbrev> protocol port + number is 80. So if a web server accepts connections at port 80 we + do not have to specify this value. A web browser will automatically + use this default port.</para> - <para>Even if every XML element has a closing counterpart the - resulting XML may be invalid:</para> + <para>Actually the notion <quote><code>jdbc:mysql</code></quote> + denotes a sub protocol implementation namely<orgname> + Mysql</orgname>'s implementation of <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>. + Connecting to an IBM DB2 server would require jdbc:db2 for this + protocol part.</para> - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<memo> - <from>M. Goik<to>B. King</from></to> - <to>A. June</to> - <subject>Best whishes</subject> - <content>Hi all, congratulations to your splendid party</content> -</memo></programlisting> + <para>In contrast to <abbrev + xlink:href="http://www.w3.org/Protocols">http</abbrev> no standard + ports are <quote>officially</quote> assigned for <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + protocol variants. Due to vendor specific implementations this does + not make any sense. Thus we <emphasis role="bold">always</emphasis> + have to specify the port number when opening <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + connections.</para> - <para>The parser echoes:</para> + <para>Writing <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + based applications follows a simple scheme:</para> - <programlisting><computeroutput>file:///ma/goik/workspace/Vorlesungen/Input/Memo/nonest.xml:3:29: -fatal error org.xml.sax.SAXParseException: The element type "to" must be -terminated by the matching end-tag "</to>". parsing error</computeroutput></programlisting> + <figure xml:id="jdbcArchitecture"> + <title>Architecture of JDBC</title> - <para>This type of error is caused by so called improper nesting of - elements: The element <tag class="starttag">from</tag>is closed before - the <quote>inner</quote> element <tag class="starttag">to</tag> has - been closed. Actually this violates the expressibility of XML - documents as a tree like structure. The situation may be resolved by - choosing:</para> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/jdbcarch.fig"/> + </imageobject> + </mediaobject> + </figure> - <programlisting>...<from>M. Goik<to>B. King</to></from>...</programlisting> + <para>From a programmer's point of view the + <classname>java.sql.DriverManager</classname> is a bootstrapping + object: Other objects like Statement instances are created from this + central and unique object.</para> - <!-- goik:later - <para>An animation showing the usage of the Oxygen plug in for the - examples given above can be found <uri - xlink:href="src/viewlet/wellformed/wellformed_viewlet_swf.html">here</uri>.</para> ---> + <para>The first instance being created by the + <classname>java.sql.DriverManager</classname> is an object of type + <classname>java.sql.Connection</classname>. In <xref + linkend="exerciseJdbcWhyInterface"/> we discuss the way vendor + specific implementation details are hidden by Interfaces. We can + distinguish between:</para> - <para>XML elements may have so called attributes like <tag - class="attribute">date</tag> in the following example:</para> + <orderedlist> + <listitem> + <para>Vendor neutral specific parts of a <trademark + xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> + environment. These are those components being shipped by Oracle + or other organizations providing <trademark + xlink:href="http://www.oracle.com/us/technologies/java">Java</trademark> + runtimes. The class + <classname>java.sql.DriverManager</classname> belongs to this + domain.</para> + </listitem> - <figure xml:id="memoWellAttrib"> - <title>An XML document with attributes.</title> + <listitem> + <para>Vendor specific parts. In <xref + linkend="jdbcArchitecture"/> this starts with the + <classname>java.sql.Connection</classname> object.</para> + </listitem> + </orderedlist> - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<memo date="10.02.2006" priority="high"> - <from>M. Goik</from> - <to>B. King</to> - <to>A. June</to> - <subject>Best whishes</subject> - <content>Hi all, congratulations to your splendid party</content> -</memo></programlisting> - </figure> + <para>The <classname>java.sql.Connection</classname> object thus + marks the boundary between a <trademark + xlink:href="http://www.oracle.com/technetwork/java/javase">JDK</trademark> + / <trademark + xlink:href="http://www.oracle.com/technetwork/java/javase">JRE</trademark> + and a <trademark + xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> + Driver implementation from e.g. Oracle or other institutions.</para> - <para>The conformance of a XML document with the following rules may - be verified by invoking a parser:</para> + <para><xref linkend="jdbcArchitecture"/> does not show details about + the relations between <classname>java.sql.Connection</classname>, + <classname>java.sql.Statement</classname> and + <classname>java.sql.ResultSet</classname> objects. We start by + giving a rough description of the tasks and responsibilities these + three types have:</para> - <itemizedlist> - <listitem> - <para>Within the <emphasis>scope</emphasis> of a given element an - attribute name must be unique. In the example above one may not - define a second attribute <varname>date="..."</varname> within the - same element <memo ... >. This reflects the usual - programming language semantics of attributes: In a <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - class an attribute is represented by an unique identifier and thus - cannot appear twice.</para> - </listitem> + <glosslist> + <glossentry> + <glossterm><classname>java.sql.Connection</classname></glossterm> - <listitem> - <para>An attribute value must be enclosed either in single (') or - double (") quotes. This is different from the HTML standard which - allows attribute values without quotes provided the given - attribute value does not give rise to ambiguities. For example - <TD align=left> is allowed since the attribute value <tag - class="attvalue">left</tag> does not contain any spaces thus - allowing a parser to recognize the end of the value's - definition.</para> - </listitem> - </itemizedlist> + <glossdef> + <para>Holding a permanent connection to a database server. + Both client and server can contact each other. The database + server may for example terminate a transaction if problems + like deadlocks occur.</para> + </glossdef> + </glossentry> - <qandaset role="exercise"> - <title>A graphical representation of a memo.</title> + <glossentry> + <glossterm><classname>java.sql.Statement</classname></glossterm> - <qandadiv> - <qandaentry xml:id="example_memoAttribTree"> - <question> - <para>Draw a graphical representation similar as in <xref - linkend="mathmltree"/> of the memo document being given in - <xref linkend="memoWellAttrib"/>.</para> - </question> + <glossdef> + <para>We have two distinct classes of actions:</para> - <answer> - <para>The <link linkend="memoWellAttrib">memo - document's</link> structure may be visualized as:</para> + <orderedlist> + <listitem> + <para>Instructions to modify data on the database server. + These include <code>INSERT</code>, <code>UPDATE</code> and + <code>DELETE</code> operations as far as + <abbrev>SQL-DML</abbrev> is concerned. <trademark + xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> + acts as a means of transport and merely returns integer + values back to the client like the number of rows being + affected by an UPDATE.</para> + </listitem> - <informalfigure xml:id="memotreeFigure"> - <para>A graphical representation of <xref - linkend="memoWellAttrib"/>:</para> + <listitem> + <para>Instructions reading data from the server. This is + done by sending SELECT statements. It is not sufficient to + just return integer values: Instead <trademark + xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> + needs to copy complete datasets back to the client to fill + containers being accessible by applications. This is being + discussed in <xref linkend="jdbcRead"/>.</para> + </listitem> + </orderedlist> + </glossdef> + </glossentry> + </glosslist> - <informalfigure xml:id="memotreeFigureFalse"> - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/memotree.fig"/> - </imageobject> - </mediaobject> - </informalfigure> + <para>We shed some light on the relationship between these important + <trademark + xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> + components and their respective creation:<figure + xml:id="jdbcObjectCreation"> + <title>Important <trademark + xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> + instances and relationships.</title> - <para>The sequence of <emphasis>element</emphasis> child - nodes is important in XML and has to be preserved. Only the - order of the two attributes <tag - class="attribute">date</tag> and <tag - class="attribute">priority</tag> is undefined: They actually - belong to the <tag class="starttag">memo</tag> node serving - as a dictionary with the attribute names being the keys and - the attribute values being the values of the - dictionary.</para> - </informalfigure> - </answer> - </qandaentry> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/jdbcObjectRelation.fig"/> + </imageobject> + </mediaobject> + </figure></para> + </section> - <qandaentry xml:id="example_attribInQuotes"> - <question> - <label>Attributes and quotes</label> + <section xml:id="writeAccessCoding"> + <title>Write access, coding!</title> - <para>As stated before XML attributes have to be enclosed in - single or double quotes. Construct an XML document with mixed - quotes like <code><date day="monday'></code>. How does - the parser react? Find the corresponding syntax definition of - legal attribute values in the <link - xlink:href="http://www.w3.org/TR/xml">XML standard W3C - Recommendation</link>.</para> - </question> + <para>So how does it actually work with respect to coding? You may + want to read <xref linkend="toolingConfigJdbc"/> before starting + your exercises. We first prepare a database table using Eclipse's + database tools:</para> - <answer> - <para>The parser flags a mixture of single and double quotes - for a given attribute as an error. The XML standard <link - xlink:href="http://www.w3.org/TR/xml#NT-AttValue">defines</link> - the syntax of attribute values: An attribute value has to be - enclosed <emphasis>either</emphasis> in two single - <emphasis>or</emphasis> in two double quotes as being defined - in <uri - xlink:href="http://www.w3.org/TR/xml/#NT-AttValue">http://www.w3.org/TR/xml/#NT-AttValue</uri>.</para> - </answer> - </qandaentry> + <figure xml:id="figSchemaPerson"> + <title>A relation <code>Person</code> containing names and email + addresses</title> - <qandaentry xml:id="quoteInAttributValue"> - <question> - <label>Quotes as part of an attributes value?</label> + <programlisting><emphasis role="strong">CREATE</emphasis> <emphasis + role="strong">TABLE</emphasis> Person ( + name CHAR(20) + ,email CHAR(20) <emphasis>UNIQUE</emphasis>)</programlisting> + </figure> - <para>Single and double quote are used to delimit an attribute - value. May quotes appear themselves as part of an at tribute's - value, e.g. like in a person's name <code>Gary "King" - Mandelson</code>?</para> - </question> + <para>Our actual (toy) <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + application will insert a single object ('Jim', 'jim@foo.org') into + the <code>Person</code> relation. This is simpler than reading data + since no client <classname>java.sql.ResultSet</classname> container + is needed:</para> - <answer> - <para>Attribute values may contain double quotes if the - attributes value is enclosed in single quotes and vice versa. - As a limitation the value of an an attribute may not contain - single quotes and double quotes at the same time:</para> + <figure xml:id="figJdbcSimpleWrite"> + <title>A simple <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + application inserting data into a relational table.</title> - <informalfigure xml:id="exampleSingleDoubleQuotes"> - <para>Quotes as part of attribute values.</para> + <programlisting language="java">01 package sda.jdbc.intro.v1; +02 +03 import java.sql.Connection; +04 import java.sql.DriverManager; +05 import java.sql.SQLException; +06 import java.sql.Statement; +07 +08 public class SimpleInsert { +09 +10 public static void main(String[] args) throws SQLException { +11 // Step 1: Open a connection to the database server +12 final Connection conn = DriverManager.getConnection( +13 "jdbc:mysql://localhost:3306/hdm", "hdmuser", "XYZ"); +14 // Step 2: Create a Statement instance +15 final Statement stmt = conn.createStatement(); +16 // Step 3: Execute the desired INSERT +17 final int updateCount = stmt.executeUpdate( +18 "INSERT INTO Person VALUES('Jim', 'jim@foo.org')"); +19 // Step 4: Give feedback to the enduser +20 System.out.println("Successfully inserted " + updateCount + " dataset(s)"); +21 } +22 }</programlisting> + </figure> - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<test> - <person name='Gary "King" Mandelson'/> <!-- o.k. --> - <person name="Gary 'King' Mandelson"/> <!-- o.k. --> - <person name="Gary 'King 'S.' "Mandelson"'/> <!-- oops! --> -</test></programlisting> - </informalfigure> - </answer> - </qandaentry> - </qandadiv> - </qandaset> + <para>Looks simple? Unfortunately it does not (yet) work:</para> - <para>Some constraints being imposed on XML documents by the standard - defined so far may be summarized as:</para> + <programlisting>Exception in thread "main" java.sql.SQLException: <emphasis + role="bold">No suitable driver found for jdbc:mysql://localhost:3306/hdm</emphasis> + at java.sql.DriverManager.getConnection(DriverManager.java:604) + at java.sql.DriverManager.getConnection(DriverManager.java:221) + at sda.jdbc.intro.SimpleInsert.main(SimpleInsert.java:12)</programlisting> - <itemizedlist> - <listitem> - <para>A XML documents requires to have exactly one top level - element.</para> - </listitem> + <para>What's wrong here? In <xref linkend="figureConfigJdbcDriver"/> + we needed a <productname + xlink:href="http://www.mysql.com">Mysql</productname> <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + Driver implementation <filename>mysql-connector-java.jar</filename> + as a prerequisite to open connections to a database server. This + implementation is mandatory for our toy application as well. All we + have to do is adding <filename>mysql-connector-java.jar</filename> + to our <trademark + xlink:href="http://www.oracle.com/us/technologies/java">Java</trademark> + <varname>CLASSPATH</varname> at <emphasis + role="bold">runtime</emphasis>.</para> - <listitem> - <para>Elements have to be properly nested. An element must not be - closed if an <quote>inner</quote> Element is still open.</para> - </listitem> + <para>Depending on our <trademark + xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> + environment this will be achieved by different means. Eclipse + requires the definition of a run configuration as being described in + <uri + xlink:href="http://help.eclipse.org/juno/index.jsp?topic=/org.eclipse.jdt.doc.user/tasks/tasks-java-local-configuration.htm">http://help.eclipse.org/juno/index.jsp?topic=/org.eclipse.jdt.doc.user/tasks/tasks-java-local-configuration.htm</uri>. + When configuring a run-time configuration for + <classname>sda.jdbc.intro.SimpleInsert</classname> we have to add + <filename>mysql-connector-java.jar</filename> to the + <varname>Classpath</varname> tab. The following screen shot shows a + working configuration:</para> - <listitem> - <para>Attribute names within a given Element must be - unique.</para> - </listitem> + <figure xml:id="figureConfigRunExtJar"> + <title>Creating an Eclipse run time configuration containing a + <productname xlink:href="http://www.mysql.com">Mysql</productname> + <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + Driver Jar marked red.</title> + + <screenshot> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Screen/runConfigJarAnnot.screen.png"/> + </imageobject> + </mediaobject> + </screenshot> + </figure> + + <para>This time execution works as expected:</para> + + <programlisting>Successfully inserted 1 dataset(s)</programlisting> -<<<<<<< HEAD <qandaset role="exercise"> <title>Exception on inserting objects</title> @@ -1108,1208 +1138,1446 @@ terminated by the matching end-tag "</to>". parsing error</computeroutput> <para>A second invocation of <classname>sda.jdbc.intro.v1.SimpleInsert</classname> yields the following runtime error:</para> -======= - <listitem> - <para>Attribute values <emphasis>must</emphasis> be quoted - correctly.</para> - </listitem> - </itemizedlist> ->>>>>>> d6323b6ef90b1907e7f809b7765f28762f5ea52d - <para>The very last rule shows one of several differences to the HTML - Standard: In HTML a lot of elements don't have to be closed. For - example paragraphs (<tag class="starttag">p</tag>) or images (<tag - class="starttag">img src='foo.gif'</tag>) don't have to be closed - explicitly. This is due to the fact that HTML used to be defined in - accordance with the older <emphasis><emphasis - role="bold">S</emphasis>tandard <emphasis - role="bold">G</emphasis>eneralized <emphasis - role="bold">M</emphasis>arkup <emphasis - role="bold">L</emphasis>anguage</emphasis> (SGML) Standard.</para> - - <para>These constraints are part of the definition of a <link - xlink:href="http://www.w3.org/TR/xml#sec-well-formed">well formed - document</link>. The specification imposes additional constraints for - a document to be well-formed. Some of these constraints require an - understanding of so called entities being described in <xref - linkend="chapter_entities"/>.</para> - </section> - </chapter> - - <chapter xml:id="dtd"> - <title>Beyond well- formedness</title> - - <section xml:id="motivationDdt"> - <title>Motivation</title> - - <para>So far we are able to create XML documents containing - hierarchically structured data. We may nest elements and thus create - tree structures of arbitrary depth. The only restrictions being - imposed by the XML standard are the constraints of well - formedness. - For many purposes in software development this is not - sufficient.</para> - - <para>A company named <productname>Softmail</productname> might - implement an email system which uses <link - linkend="memoWellAttrib">memo</link> document files as low level data - representation serving as a persistence layer. Now a second company - named <productname>Hardmail</productname> wants to integrate mails - generated by <productname>Softmail</productname>'s system into its own - business product. The <productname>Hardmail</productname> software - developers might <emphasis>infer</emphasis> the logical structure of - <productname>Softmail</productname>'s email representation but the - following problems arise:</para> - - <itemizedlist> - <listitem> - <para>The logical structure will in practice become more complex: - E-mails may contain attachments leading to multi part messages. - Additional header information is required for standard Internet - mail compliance. This adds additional complexity to the XML - structure being mandatory for data representation. Relying only on - well-formedness the specification of an internal E-mail format can - only be achieved <emphasis>informally</emphasis>. Thus a rule like - <quote>Each E-mail must have a subject</quote> may be written down - in the specification. A software developer will code these rules - but probably make mistakes as the set of rules grows.</para> - - <para>In contrast a RDBMS based solution offers to solve such - problems in a declarative manner: A developer may use a <code>NOT - NULL</code> constraint on a subject attribute of type - <code>VARCHAR</code> thus inhibiting empty subjects.</para> - </listitem> - - <listitem> - <para>As <productname>Softmail</productname>'s product evolves its - internal E-mail XML format is subject to change due to functional - extensions and possibly bug fixes both giving rise to - interoperability problems.</para> - </listitem> - </itemizedlist> - - <para>Generally speaking well formed XML documents lack grammar - constraints as being available for programming languages. In case of - RDBMS developers can impose primary-, foreign and <code>CHECK</code> - constraints in a <emphasis>declarative</emphasis> manner rather than - hard coding them into their applications (A solution bad programmers - are in favour of though...). Various XML standards exist for - declarative constraint definitions namely:</para> - - <itemizedlist> - <listitem> - <para>Document Type Definitions being discussed in <xref - linkend="dtdBasic"/>.</para> - </listitem> + <programlisting>Exception in thread "main" com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: + <emphasis role="bold">Duplicate entry 'jim@foo.org' for key 'email'</emphasis> +... + at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1617) + at sda.jdbc.intro.SimpleInsert.main(SimpleInsert.java:17)</programlisting> + </question> - <listitem> - <para><link xlink:href="http://www.w3.org/XML/Schema">XML - Schema</link></para> - </listitem> + <answer> + <para>This expected error is easy to understand: The + exception's message text <emphasis role="bold">Duplicate + entry 'Jim' for key 'PRIMARY'</emphasis> informs us about a + UNIQUE key constraint violation with respect to the + attribute <code>email</code> in our schema definition in + <xref linkend="figSchemaPerson"/>. We cannot add a second + entry with the same value <code>'jim@foo.org'</code>.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> - <listitem> - <para><link - xlink:href="http://www.relaxng.org">RelaxNG</link></para> - </listitem> - </itemizedlist> - </section> + <para>It is worth to mention that the <productname + xlink:href="http://www.mysql.com">Mysql</productname> driver + implementation does not have to be available at compile time. JDBC + uses interfaces in favour of concrete class. Only at runtime we do + need concrete classes.</para> - <section xml:id="dtdBasic"> - <title>Document type definitions (DTD)</title> + <para>On the other hand when working with eclipse we need a separate + runtime configuration for each runnable <trademark + xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> + application. This becomes tedious after some time. So you may want + to follow the author and just add + <filename>mysql-connector-java.jar</filename> to your compile time + <envar>CLASSPATH</envar>.</para> - <section xml:id="dtdFirstExample"> - <title>Structural descriptions for documents</title> + <para>We now discuss some important methods being defined in the + relevant <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + interfaces:</para> - <para>As an example we choose documents of type - <emphasis>memo</emphasis> as a starting point. Documents like the - example from <xref linkend="memoWellAttrib"/> may be - <emphasis>informally</emphasis> described to be a sequence of the - following mandatory items:</para> + <glosslist> + <glossentry> + <glossterm><classname>java.sql.Connection</classname></glossterm> - <figure xml:id="figure_memo_informalconstraints"> - <title>Informal constraints on <tag class="element">memo</tag> - document instances</title> + <glossdef> + <itemizedlist> + <listitem> + <para><link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#createStatement()">createStatement()</link></para> + </listitem> - <itemizedlist> - <listitem> - <para><emphasis>Exactly one</emphasis> sender.</para> - </listitem> + <listitem> + <para><link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#setAutoCommit(boolean)">setAutoCommit()</link>, + <link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#getAutoCommit()">getAutoCommit()</link></para> + </listitem> - <listitem> - <para><emphasis>One or more</emphasis> recipients.</para> - </listitem> + <listitem> + <para><link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#getWarnings()">getWarnings()</link></para> + </listitem> - <listitem> - <para>Subject</para> - </listitem> + <listitem> + <para><link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#isClosed()">isClosed()</link>, + <link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#isValid(int)">isValid(int + timeout)</link></para> + </listitem> - <listitem> - <para>Content</para> - </listitem> - </itemizedlist> + <listitem> + <para><link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#rollback()">rollback()</link>, + <link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#commit()">commit()</link> + and .</para> + </listitem> - <para>In addition we have:</para> + <listitem> + <para><link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#close()">close()</link></para> + </listitem> + </itemizedlist> + </glossdef> + </glossentry> - <itemizedlist> - <listitem> - <para>A date string <emphasis>must</emphasis> be - supplied</para> - </listitem> + <glossentry> + <glossterm><classname>java.sql.Statement</classname></glossterm> - <listitem> - <para>A priority <emphasis>may</emphasis> be supplied with - allowed values to be chosen from the set of values <tag - class="attvalue">low</tag>, <tag class="attvalue">medium</tag> - or <tag class="attvalue">high</tag>.</para> - </listitem> - </itemizedlist> - </figure> + <glossdef> + <itemizedlist> + <listitem> + <para><link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#executeUpdate(java.lang.String)">executeUpdate(String + sql)</link></para> + </listitem> - <para>All these fields contain ordinary text to be filled in by a - user and shall appear exactly in the defined order. For simplicity - we do not care about email address syntax rules being described in - <link xlink:href="http://www.w3.org/Protocols/rfc822">RFC based - address schemes</link>. We will see how the - <emphasis>constraints</emphasis> mentioned above can be modelled in - XML by an extension to the concept of well formed documents.</para> - </section> + <listitem> + <para><link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#getConnection()">getConnection()</link></para> + </listitem> - <section xml:id="section_memo_machinereadable"> - <title>A machine readable description</title> + <listitem> + <para><link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#getResultSet()">getResultSet()</link></para> + </listitem> - <para>We now introduce an example of a <link - xlink:href="http://www.w3.org/TR/xml#dt-doctype">Document Type - Definition (DTD)</link> being part of the XML 1.0 standard. Such a - DTD allows the specification of additional constraints to both - element nodes and their attributes. Our set of <link - linkend="figure_memo_informalconstraints" revision="">informal - constraints</link> on memo documents may now be expressed as:</para> + <listitem> + <para><link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#close()">close()</link> + and <link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#isClosed()">isClosed()</link></para> + </listitem> + </itemizedlist> + </glossdef> + </glossentry> + </glosslist> -<<<<<<< HEAD <qandaset role="exercise"> - <title>Closing <trademark + <title><trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - connections</title> + and transactions</title> <qandadiv> <qandaentry> <question> - <para>Why is it very important to call the close() method - for <classname>java.sql.Connection</classname> and / or - <classname>java.sql.Statement</classname> instances?</para> + <para><link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#setAutoCommit(boolean)">How + does the method setAutoCommit()</link> relate to <link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#commit()">commit()</link> + and <link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#rollback()">rollback()</link>?</para> </question> -======= - <figure xml:id="figure_memo_dtd"> - <title>A DTD to describe memo documents.</title> ->>>>>>> d6323b6ef90b1907e7f809b7765f28762f5ea52d - <programlisting><!ELEMENT memo (from, to+, subject, content)> <co - xml:id="memodtd_memodef"/> - -<!ATTLIST memo <co xml:id="memodtd_memo_attribs"/> - date CDATA #REQUIRED - priority (low|medium|high) #IMPLIED> - -<!ELEMENT from (#PCDATA)> <co xml:id="memodtd_elem_from"/> -<!ELEMENT to (#PCDATA)> -<!ELEMENT subject (#PCDATA)> -<!ELEMENT content (#PCDATA)></programlisting> + <answer> + <para>A connections default state is <code>autocommit == + true</code>. This means that individual SQL statements are + executed as separate transactions.</para> - <calloutlist> - <callout arearefs="memodtd_memodef"> - <para>A <tag class="element">memo</tag> consists of a sender, - at least one recipient, a subject and content.</para> - </callout> + <para>If we want to group two or more statements into a + transaction we have to:</para> - <callout arearefs="memodtd_memo_attribs"> - <para>A <tag class="element">memo</tag> has got one required - attribute <varname>date</varname> and an optional attribute - <varname>priority</varname> being restricted to the three - allowed values <tag class="attvalue">low</tag>, <tag - class="attvalue">medium</tag> and <tag - class="attvalue">high</tag>.</para> - </callout> + <orderedlist> + <listitem> + <para>Call + <code>connection.setAutoComit(false)</code></para> + </listitem> - <callout arearefs="memodtd_elem_from"> - <para>A <tag class="starttag">from</tag> element consists of - ordinary text. This disallows XML markup. For example - <code><from>Smith & partner</from></code> is - disallowed since XML uses the ampersand (&) to denote the - beginning of an entity like <tag class="genentity">auml</tag> - for the German a-umlaut (ä). The correct form is - <code><from>Smith &amp; partner</from></code> - using the predefined entity <tag class="genentity">amp</tag> - as an escape sequence for the ampersand.</para> + <listitem> + <para>From now on subsequent SQL statements will + implicitly become part of a transaction till either of + the three events happens:</para> - <para>The term <code>#PCDATA</code> is an acronym for - <emphasis>P</emphasis><foreignphrase>arsed</foreignphrase> - <emphasis>C</emphasis><foreignphrase>haracter</foreignphrase> - <emphasis>Data</emphasis>, an abbreviation for a restricted - version of ordinary strings. Without digging into details a - <code>#PCDATA</code> string must not contain any markup code - like e.g. <tag class="starttag">msqrt</tag>. This ensures that - a string does not interfere with the document's XML markup. - Parsed Character Data also means that from the viewpoint of - XML the element's content is <emphasis>atomic</emphasis> so it - can't be divided into substructures by an XML parser.</para> - </callout> - </calloutlist> - </figure> + <orderedlist numeration="loweralpha"> + <listitem> + <para><code>connection.commit()</code></para> + </listitem> - <para>We notice the non-XML syntax of a DTD. It looks similar to an - XML document (<!ELEMENT ...>) but in fact it is not even - well-formed due to e.g. the exclamation mark in front of the - <code>ELEMENT</code> keyword. <abbrev - xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>'s - use a different syntax which has been specified in order to describe - an XML document's grammar.</para> + <listitem> + <para><code>connection.rollback()</code></para> + </listitem> - <para>From the viewpoint of software modeling a DTD is a - <emphasis>schema</emphasis>. In the context of XML technologies the - term <emphasis>schema</emphasis> refers to <link - xlink:href="http://www.w3.org/XML/Schema">XML Schema</link> being an - alternative language to describe the structure of XML - documents.</para> + <listitem> + <para>The transaction gets aborted by the database + server. This may for example happen in case of a + deadlock conflict with a second transaction.</para> + </listitem> + </orderedlist> - <para>Readers being familiar with <abbrev - xlink:href="http://en.wikipedia.org/wiki/Backus-Naur_form">BNF</abbrev> - or <abbrev - xlink:href="http://en.wikipedia.org/wiki/Extended_Backus_Naur_form">EBNF</abbrev> - will be able to understand the grammatical rules being expressed - here.</para> + <para>Note that the first two events are initiated by + our client software. The third possible action is being + carried out by the database server.</para> + </listitem> + </orderedlist> + </answer> + </qandaentry> + </qandadiv> + </qandaset> - <productionset> - <title>A message of type <tag class="starttag">memo</tag></title> + <qandaset role="exercise"> + <title>Closing <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + connections</title> - <production xml:id="memo.ebnf.memo"> - <lhs>Memo Message</lhs> + <qandadiv> + <qandaentry> + <question> + <para>Why is it very important to call the close() method + for <classname>java.sql.Connection</classname> and / or + <classname>java.sql.Statement</classname> instances?</para> + </question> - <rhs>'<memo>' <nonterminal - def="#memo.ebnf.sender">Sender</nonterminal> [<nonterminal - def="#memo.ebnf.recipient">Recipient</nonterminal>]+ - <nonterminal def="#memo.ebnf.subject">Subject</nonterminal> - <nonterminal def="#memo.ebnf.content">Content</nonterminal> - '</memo>'</rhs> - </production> + <answer> + <para>A <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + connection ties network resources (socket connections). + These may be used up if e.g. new connections get established + within a loop without being closed.</para> - <production xml:id="memo.ebnf.sender"> - <lhs>Sender</lhs> + <para>The situation is comparable to memory leaks when using + programming languages lacking a garbage collector.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> - <rhs>'<from>' <nonterminal def="#memo.ebnf.text"> Text - </nonterminal> '</from>'</rhs> - </production> + <qandaset role="exercise"> + <title>Aborted transactions</title> - <production xml:id="memo.ebnf.recipient"> - <lhs>Recipient</lhs> + <qandadiv> + <qandaentry> + <question> + <para>In the previous exercise we mentioned the possibility + of a transaction abort issued by the database server. Which + responsibility arises for an application programmer? Hint: + How may an implementation become aware of such an abort + transaction event?</para> + </question> - <rhs>'<to>' <nonterminal def="#memo.ebnf.text"> Text - </nonterminal> '</to>'</rhs> - </production> + <answer> + <para>If a database server aborts a transaction a + <classname>java.sql.SQLException</classname> will be thrown. + An application must be aware of this possibility and thus + implement a sensible <code>catch(...)</code> clause + accordingly.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> - <production xml:id="memo.ebnf.subject"> - <lhs>Subject</lhs> + <qandaset role="exercise"> + <title>Interfaces and classes in <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark></title> - <rhs>'<subject>' <nonterminal def="#memo.ebnf.text"> Text - </nonterminal> '</subject>'</rhs> - </production> + <qandadiv> + <qandaentry xml:id="exerciseJdbcWhyInterface"> + <question> + <para>The <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + standard mostly defines interfaces as + <classname>java.sql.Connection</classname> and + <classname>java.sql.Statement</classname>. Why are these not + being defined as classes? Moreover why is + <classname>java.sql.DriverManager</classname> being defined + as a class rather than an interface?</para> - <production xml:id="memo.ebnf.content"> - <lhs>Content</lhs> + <para>You may want to supply code examples to explain your + argumentation.</para> + </question> - <rhs>'<content>' <nonterminal def="#memo.ebnf.text"> Text - </nonterminal> '</content>'</rhs> - </production> + <answer> + <para>Figure <xref linkend="jdbcArchitecture"/> tells us + about the vendor independent architecture of <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>. + Oracle for example may implement a class + <code>com.oracle.jdbc.OracleConnection</code>:</para> - <production xml:id="memo.ebnf.text"> - <lhs>Text</lhs> + <programlisting annotations="nojavadoc">package com.oracle.jdbc; - <rhs>[a-zA-Z0-9]* <lineannotation>In real documents this is too - restrictive!</lineannotation></rhs> - </production> - </productionset> +import java.sql.Connection; +import java.sql.Statement; +import java.sql.SQLException; - <para>In comparison to our informal description of memo documents a - DTD offers an added value: The grammar is machine readable and may - thus be used by a parser to check whether an XML document obeys the - constraints being imposed. So the parser must be instructed to use a - DTD in addition to the XML document in question. For this purpose an - XML document may define a reference to a DTD:</para> +public class OracleConnection implements Connection { - <figure xml:id="memo_external_dtd"> - <title>A memo document instance holding a reference to a document - external DTD.</title> +... - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE memo<co xml:id="memo_external_dtd_top_element"/> SYSTEM<co - xml:id="memo_external_dtd_system_decl"/> "memo.dtd"<co - xml:id="memo_external_dtd_url"/> > -<memo date="10.02.2006" priority="high"> - <from>M. Goik</from> - <to>B. King</to> - <to>A. June</to> - <subject>Best whishes</subject> - <content>Hi all, congratulations to your splendid party</content> -</memo></programlisting> +Statement createStatement(int resultSetType, + int resultSetConcurrency) + throws SQLException) { + // Implementation omitted here due to + // limited personal hacking capabilities + ... +} +... +}</programlisting> - <calloutlist> - <callout arearefs="memo_external_dtd_top_element"> - <para>The element <tag class="element">memo</tag> is chosen to - be the top (root) element of the document's tree. It must be - defined in the file <filename>memo.dtd</filename>. This is - really a choice since a DTD defines a <emphasis>set</emphasis> - of elements in <emphasis>arbitrary</emphasis> order. There is - no such rule as <quote>define before use</quote>. So a DTD - does not tell us which element has to appear on top of a - document.</para> + <para>If a programmer only uses the <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + interfaces rather than a vendor's classes it is much easier + to make the resulting application work with different + databases from other vendors. This way a company's + implementation is not exposed to our own <trademark + xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> + code.</para> - <para>Suppose a given DTD offers both <tag - class="starttag">book</tag> and <tag - class="starttag">report</tag> elements. An XML author writing - a complex document will choose <tag - class="starttag">book</tag> as top level element rather than - <tag class="starttag">report</tag> being more appropriate for - a small piece of documentation. Consequently it is an XML - authors <emphasis>choice</emphasis> which of the elements - being defined in a DTD shall appear as - <emphasis>the</emphasis> top level element</para> - </callout> + <para>Regarding the special role of + <classname>java.sql.DriverManager</classname> we notice the + need of a starting point: We have to create an initial + instance of some class. In theory (<emphasis role="bold">BUT + NOT IN PRACTICE!!!</emphasis>) the following (ugly code) + might be possible:</para> - <callout arearefs="memo_external_dtd_system_decl"> - <para>The <code>SYSTEM</code> keyword states that the DTD - rules reside outside the XML document as a separate entity. - Though this situation is the most common the grammar rules may - also be <link linkend="dtd_and_document">defined inside</link> - the XML document itself. For professional use this is not - particularly useful but during DTD development it may be an - option.</para> - </callout> + <programlisting>package my.personal.application; - <callout arearefs="memo_external_dtd_url"> - <para>The address of the DTD rule set. In the given example it - is just a filename but it may as well be an <link - xlink:href="http://www.w3.org/Addressing">URL</link> of type - <abbrev - xlink:href="http://en.wikipedia.org/wiki/File_Transfer_Protocol">ftp</abbrev>, - <abbrev xlink:href="http://www.w3.org/Protocols">http</abbrev> - and so on, see <xref linkend="memoDtdOnFtp"/>.</para> - </callout> - </calloutlist> - </figure> +import java.sql.Connection; +import java.sql.Statement; +import java.sql.SQLException; - <para>In presence of a DTD parsing a document is actually a two step - process: First the parser will check the document for well - -formedness. Then the parser will read the referenced DTD (memo.dtd) - and check the document for the additional constraints being defined - there.</para> +public someClass { - <para>In the current example both the DTD and the XML memo document - reside as text files in a common file system folder. For general use - a DTD is usually kept at a centralized location. The string - following the <code>SYSTEM</code> keyword is actually a - <emphasis>U</emphasis><foreignphrase>niform</foreignphrase> - <emphasis>R</emphasis><foreignphrase>esource</foreignphrase> - <emphasis>L</emphasis><foreignphrase>ocator</foreignphrase> <link - xlink:href="http://www.w3.org/Addressing">(URL)</link>. Thus our - <filename>memo.dtd</filename> may also be supplied as a <abbrev - xlink:href="http://www.w3.org/Protocols">http</abbrev> or <abbrev - xlink:href="http://en.wikipedia.org/wiki/File_Transfer_Protocol">ftp</abbrev> - <link xlink:href="http://www.w3.org/Addressing">URL</link>:</para> + public void someMethod(){ - <figure xml:id="memoDtdOnFtp"> - <title>A DTD reference to a FTP server.</title> + Connection conn = <emphasis role="bold">new OracleConnection()</emphasis>; // bad idea! + ... + } + ... +}</programlisting> - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE memo SYSTEM "ftp://www.hdm-stuttgart.de/memo.dtd"> -<memo date="10.02.2006" priority="high"> - <from>M. Goik</from> - ... -</memo></programlisting> - </figure> + <para>The problem with this approach is the explicit + constructor call: Whenever we want to use another database + we have two possibilities:</para> - <para>For development purposes we may combine a DTD and a conforming - document into a single unit. This is achieved by in line replacing - the <code>SYSTEM "memo.dtd"</code> clause by the DTD itself:</para> + <itemizedlist> + <listitem> + <para>Rewrite our code.</para> + </listitem> - <figure xml:id="dtd_and_document"> - <title>DTD and document within the same file</title> + <listitem> + <para>Introduce some sort of switch statement to provide + a fixed number of databases beforehand:</para> - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE memo [<co xml:id="memo_inline_dtd_start"/> -<!ELEMENT memo (from, to+, subject, content)> + <programlisting>public void someMethod(final String vendor){ -<!ATTLIST memo date CDATA #REQUIRED - priority (low|medium|high) #IMPLIED> + final Connection conn; -<!ELEMENT from (#PCDATA)> -<!ELEMENT to (#PCDATA)> -<!ELEMENT subject (#PCDATA)> -<!ELEMENT content (#PCDATA)> -]<co xml:id="memo_inline_dtd_end"/>> <co xml:id="memo_inline_doc_start"/> -<memo date="10.02.2006" priority="high"> - <from>M. Goik</from> - <to>B. King</to> - <to>A. June</to> - <subject>Best whishes</subject> - <content>Hi all, congratulations to your splendid party</content> -</memo></programlisting> + switch(vendor) { + case "ORACLE": + conn = new OracleConnection(); + break; - <calloutlist> - <callout arearefs="memo_inline_dtd_start"> - <para>The DTD definitions start right after the left bracket - <quote>[</quote> thus replacing the <code>SYSTEM - "memo.dtd"</code> declaration.</para> - </callout> + case "DB2": + conn = new Db2Connection(); + break; - <callout arearefs="memo_inline_dtd_end"> - <para>The right bracket <quote>]</quote> terminates the DTD - declarations. After finishing the <code><!DOCTYPE ... - ></code> declaration the document's content starts.</para> - </callout> + default: + conn = null; + break; + } + ... +}</programlisting> - <callout arearefs="memo_inline_doc_start"> - <para>Start of document content.</para> - </callout> - </calloutlist> - </figure> + <para>Adding a new database still requires code + rewriting.</para> + </listitem> + </itemizedlist> + </answer> + </qandaentry> + </qandadiv> + </qandaset> - <para>Some terms are helpful in the context of <abbrev - xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>'s:</para> + <qandaset role="exercise"> + <title>Driver dispatch mechanism</title> - <variablelist> - <varlistentry> - <term>Validating / non-validating:</term> + <qandadiv> + <qandaentry> + <question> + <para>In exercise <xref linkend="exerciseJdbcWhyInterface"/> + we saw a hypothetic way to resolve the interface/class + resolution problem by using a switch clause. How is this + <code>switch</code> clause's logic actually realized in a + <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + based application? (<quote>behind the scenes</quote>)</para> - <listitem> - <para>A non-validating parser only checks a document for well- - formedness. If it also checks XML documents for conformance to - DTD it is a <emphasis>validating</emphasis> parser. Caution: - Even a non-validating parser needs to read a DTD (if being - supplied) since it might have to expand <link - linkend="section_generalentities">general entity</link> - declarations being defined in it.</para> - </listitem> - </varlistentry> + <para>Hint: Read the documentation of + <classname>java.sql.DriverManager</classname>.</para> + </question> - <varlistentry> - <term>Valid / invalid documents:</term> + <answer> + <para>Prior to opening a Connection a <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + driver registers itself at the + <classname>java.sql.DriverManager</classname> singleton + instance. For this purpose the standard defined the method + <link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/DriverManager.html#registerDriver(java.sql.Driver)">registerDriver(Driver)</link>. + On success the <classname>java.sql.DriverManager</classname> + adds the driver to an internal dictionary:</para> - <listitem> - <para>An XML document referencing a DTD may either be valid or - invalid depending on its conformance to the DTD in - question.</para> - </listitem> - </varlistentry> + <informaltable border="1"> + <col width="20%"/> - <varlistentry> - <term>Document instance:</term> + <col width="30%"/> - <listitem> - <para>An XML memo document may conform to the <link - linkend="figure_memo_dtd">memo DTD</link>. In this case we - call it a <emphasis>document instance</emphasis> of the memo - DTD.</para> + <tr> + <th>protocol</th> - <para>This situation is quite similar as in typed programming - languages: A <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - <code>class</code> declaration is a blueprint for the - <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - runtime system to construct <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - objects in memory. This is done by e.g. a statement<code> - String name = new String();</code>. The identifier - <code>name</code> will hold a reference to an - <emphasis>instance of class String</emphasis>. So in a - <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - runtime environment a class declaration plays the same role as - a DTD declaration in XML. See also <xref - linkend="example_memoJavaClass"/>.</para> - </listitem> - </varlistentry> - </variablelist> + <th>driver instance</th> + </tr> - <para>For further discussions it is very useful to clearly - distinguish element definitions in a DTD from their - <emphasis>realizations</emphasis> in a corresponding document - instance: The memo DTD defines an element <tag - class="element">from</tag> to be of content <code>#PCDATA</code>. - According to the DTD in a document instance at least one <tag - class="starttag">from</tag> clause must appear. If we were talking - about HTML document instances we would prefer to talk about a <tag - class="starttag">from</tag> <emphasis>tag</emphasis> rather than a - <tag class="starttag">from</tag> - <emphasis>element</emphasis>.</para> + <tr> + <td>jdbc:mysql</td> - <para>In this document we will use the term <emphasis>element - type</emphasis> to denote an <code><!ELEMENT ...</code> - definition in a DTD. Thus we will talk about an element type <tag - class="element">subject</tag> being defined in - <filename>memo.dtd</filename>.</para> - - <para>An element type being defined in a DTD may have document - instances as realizations. For example the document instance shown - in <xref linkend="memo_external_dtd"/> has two - <emphasis>nodes</emphasis> of element type <tag - class="element">to</tag>. Thus we say that the document instance - contains two <emphasis>element nodes</emphasis> of type <tag - class="element">to</tag>. We will frequently abbreviate this by - saying the instance contains to <tag class="starttag">from</tag> - element nodes. And we may even omit the term - <emphasis>nodes</emphasis> and simply talk about two <tag - class="starttag">from</tag> elements. But the careful reader should - always distinguish between a single type <code>foo</code> being - defined in a DTD and the possibly empty set of <tag - class="starttag">foo</tag> nodes appearing in valid document - instances.</para> - - <qandaset role="exercise"> - <title>Validation of memo document instances.</title> - - <qandadiv> - <qandaentry xml:id="example_memoTestValid"> - <question> - <para>Copy the two files <link - xlink:href="Ref/src/Memo.1/message.xml">message.xml</link> - and <link - xlink:href="Ref/src/Memo.1/memo.dtd">memo.dtd</link> into - your eclipse project. Use the Oxygen XML plug in to check if - the document is valid. Then subsequently do and undo the - following changes each time checking the document for - validity:</para> - - <itemizedlist> - <listitem> - <para>Omit the <tag class="starttag">from</tag> - element.</para> - </listitem> + <td>mysqlDriver instance</td> + </tr> - <listitem> - <para>Change the order of the two sub elements <tag - class="starttag">subject</tag> and <tag - class="starttag">content</tag>.</para> - </listitem> + <tr> + <td>jdbc:oracle</td> - <listitem> - <para>Erase the <varname>date</varname> attribute and - its value.</para> - </listitem> + <td>oracleDriver instance</td> + </tr> - <listitem> - <para>Erase the <varname>priority</varname> attribute - and its value.</para> - </listitem> - </itemizedlist> + <tr> + <td>...</td> - <para>What do you observe?</para> - </question> + <td>...</td> + </tr> + </informaltable> - <answer> - <para>The <tag class="attribute">priority</tag> attribute is - declared as <code>#IMPLIED</code> so it may be omitted. - Erasing the <tag class="attribute">priority</tag> attribute - thus leaves the document in a valid state. The remaining - three edit actions yield an invalid document - instance.</para> + <para>So whenever the method <link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/DriverManager.html#getConnection(java.lang.String,%20java.lang.String,%20java.lang.String)">getConnection()</link> + is being called the + <classname>java.sql.DriverManager</classname> will scan the + <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + URL and isolate the protocol part. If we start with + <code>jdbc:mysql://someserver.com:3306/someDatabase</code> + this is just <code>jdbc:mysql</code>. The value is then + being looked up in the above table of registered drivers to + choose an appropriate instance or null otherwise. This way + our hypothetic switch including the default value null is + actually implemented.</para> </answer> </qandaentry> + </qandadiv> + </qandaset> + </section> - <qandaentry xml:id="example_memoJavaClass"> - <question> - <label>A memo implementation sketch in Java</label> + <section xml:id="propertiesFile"> + <title>Connection properties</title> - <para>The aim of this exercise is to clarify the (abstract) - relation between XML <abbrev - xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>'s - and sets of <trademark - xlink:href="http://www.oracle.com/us/technologies/java">Java</trademark> - classes rather then building a running application. We want - to model the <link xlink:href="Ref/src/Memo.1/memo.dtd">memo - DTD</link> as a set of <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - classes.</para> - </question> + <para>So far our application depicted in <xref + linkend="figJdbcSimpleWrite"/> suffers both from missing error + handling and hard-coded parameters.</para> - <answer> - <para>The XML attributes <tag class="attribute">date</tag> - and <tag class="attribute">priority</tag> can be mapped as - <trademark - xlink:href="http://www.oracle.com/us/technologies/java">Java</trademark> - attributes. The same applies for the Memo elements <tag - class="element">from</tag>, <tag - class="element">subject</tag> and <tag - class="element">content</tag> which may be implemented as - simple Strings or alternatively as separate Classes wrapping - the String content. The latter method of implementation - should be preferred if the Memo DTD is expected to grow in - complexity. A simple sketch reads:</para> + <para>Professional applications must be configurable. Changing the + password currently requires source code modification and + recompilation. <trademark + xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> + offers a standard procedure to externalize parameters like + <varname>username</varname>, <varname>password</varname> an + <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + connection URL as being present in <xref + linkend="figJdbcSimpleWrite"/>: We may externalize these parameters + to external so called properties files:</para> - <programlisting language="java">import java.util.Date; -import java.util.SortedSet; + <figure xml:id="propertyExternalization"> + <title>Externalize a single string <code>"User name"</code> to a + separate file <filename>message.properties</filename>.</title> -public class Memo { - private Date date; - Priority priority = Priority.standard; - private String from, subject,content; - private SortedSet<String> to; - // Accessors not yet implemented -}</programlisting> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/externalize.fig"/> + </imageobject> + </mediaobject> + </figure> - <para>The only thing to note here is the implementation of - the <tag class="element">to</tag> element: We want to be - able to address a <emphasis>set</emphasis> of recipients. - Thus we have to disallow duplicates. Note that this is an - <emphasis>informal</emphasis> constraint not being handled - by our DTD: A Memo document instance - <emphasis>may</emphasis> have duplicate content in <tag - class="starttag">to</tag> nodes. This is a weakness of - <abbrev - xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>s: - We are unable to impose uniqueness constraints on the - content of partial sets of document nodes.</para> + <para>The current figure shows the externalization of just a single + property. The file <filename>message.properties</filename> contains + key-value pairs. The key <code>PropHello.uname</code> contains the + value <code>User name</code>. Multiple strings may be externalized + to the same properties file.</para> - <para>On the other hand our set of recipients has to be - ordered: In a XML document instance the order of <tag - class="starttag">to</tag> nodes is important and has to be - preserved in a <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - representation. Thus we choose an - <classname>java.util.SortedSet</classname> parametrized with - String type to fulfill both requirements.</para> + <para>Eclipse does have tool support for externalization. Simply hit + Source --> Externalize Strings from the context menu. This + activates a wizard to define property keys, renaming the generated + helper class' name and finally create the actual + <filename>message.properties</filename> file.</para> - <para>Our DTD defines:</para> + <qandaset role="exercise"> + <title>Moving <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + <abbrev + xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev> and + credentials to a property file</title> - <programlisting><!ATTLIST memo ... priority (low|medium|high) #IMPLIED></programlisting> + <qandadiv> + <qandaentry> + <question> + <para>Start executing the code given in <xref + linkend="figJdbcSimpleWrite"/>. Then extend this example by + externalizing all <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + related connection parameters to a + <filename>jdbc.properties</filename> file like:</para> - <para>Starting from <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - 1.5 we may implement this constraint by a type safe - enumeration in a file - <filename>Priority.java</filename>:</para> + <programlisting>SimpleInsert.jdbcUrl=jdbc:mysql://localhost:3306/hdm +SimpleInsert.password=XYZ +SimpleInsert.username=hdmuser</programlisting> - <programlisting language="java">public enum Priority{low, standard, high};</programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> + <para>As being stated earlier the eclipse wizard assists you + by generating both the properties file and a helper class + reading that file at runtime.</para> + </question> - <para>In the following chapters we will extend the memo document - type (<code><!DOCTYPE memo ... ></code>) to demonstrate - various concepts of <abbrev - xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>'s - and other XML related standards. In parallel a series of exercises - deals with building a DTD usable to edit books. This DTD gets - extended as our knowledge about XML advances. We start with an - initial exercise:</para> + <answer> + <para>The current exercise is mostly related to tooling. + From our <trademark + xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> + code the context menu allows us to choose the desired + wizard:</para> - <qandaset role="exercise"> - <title>>A DTD for editing books</title> + <informalfigure> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Screen/externalize.screen.png"/> + </imageobject> + </mediaobject> + </informalfigure> - <qandadiv> - <qandaentry xml:id="example_bookDtd"> - <question> - <para>Write a DTD describing book document instances with - the following features:</para> + <para>We may now:</para> <itemizedlist> <listitem> - <para>A book shall have a title to describe the book - itself.</para> + <para>Select the strings to be externalized.</para> </listitem> <listitem> - <para>A book shall have at least one but possibly a - sequence of chapters.</para> + <para>Supply key names. In the subsequent screenshot + this task has already been started by manually replacing + the default <code>SimpleInsert.1</code> by + <code>Simpleinsert.jdbc</code>.</para> </listitem> <listitem> - <para>Each chapter shall have a title and at least one - paragraph.</para> + <para>Redefine other parameters like prefix, properties + file name etc. In the following screenshot only the + first of three keys has been manually renamed to the + sensible value + <varname>SimpleInsert.jdbc</varname>.</para> </listitem> + </itemizedlist> - <listitem> - <para>The titles and paragraphs shall consist of - ordinary text.</para> - </listitem> - </itemizedlist> - </question> + <informalfigure> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Screen/externalize2.screen.png"/> + </imageobject> + </mediaobject> + </informalfigure> - <answer> - <para>A possible DTD looks like:</para> + <para>The wizard also generates a class + <classname>sda.jdbc.intro.v1.DbProps</classname> to actually + access our properties:</para> - <figure xml:id="figure_book.dtd_v1"> - <title>A first DTD version for book documents</title> + <programlisting language="java">package sda.jdbc.intro.v1; +... +public class DbProps { + private static final String BUNDLE_NAME = "sda.jdbc.intro.v1.database"; - <programlisting><!ELEMENT book (title, chapter+)> -<!ELEMENT chapter (title, para+)> -<!ELEMENT title (#PCDATA)> -<!ELEMENT para (#PCDATA)></programlisting> - </figure> + private static final ResourceBundle RESOURCE_BUNDLE = ResourceBundle + .getBundle(BUNDLE_NAME); - <para>We supply a valid document instance:</para> + private DbProps() { + } - <informalfigure xml:id="bookInitialInstance"> - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE book SYSTEM "book.dtd"> -<book> - <title>Introduction to Java</title> - <chapter> - <title>Introduction</title> - <para>Java is a programming language</para> - </chapter> - <chapter> - <title>The virtual machine</title> - <para>We also call it the runtime system.</para> - </chapter> - <chapter> - <title>Annotations</title> - <para>Annotations provide a means to add meta information.</para> - <para>This is especially useful for framework authors.</para> - </chapter> -</book></programlisting> - </informalfigure> + public static String getString(String key) { + try { + return RESOURCE_BUNDLE.getString(key); + } catch (MissingResourceException e) { + return '!' + key + '!'; + } + } +}</programlisting> - <para>.</para> + <para>Our <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + related code now contains three references to external + properties:</para> + + <programlisting language="java">package sda.jdbc.intro.v1; +... +public class SimpleInsert { + + + public static void main(String[] args) throws SQLException { + // Step 1: Open a connection to the database server + final Connection conn = DriverManager.getConnection ( + <emphasis role="bold">DbProps.getString("PersistenceHandler.jdbcUrl"), </emphasis> + <emphasis role="bold">DbProps.getString("PersistenceHandler.username")</emphasis>, + <emphasis role="bold">DbProps.getString("PersistenceHandler.password")</emphasis>); + // Step 2: Create a Statement instance + final Statement stmt = conn.createStatement(); + // Step 3: Execute the desired INSERT + final int updateCount = stmt.executeUpdate( + "INSERT INTO Person VALUES('Jim', 'jim@foo.org')"); + // Step 4: Give feedback to the enduser + System.out.println("Successfully inserted " + updateCount + " dataset(s)"); + } +}</programlisting> + + <para>The current base name + <classname>sda.jdbc.intro.v1.PersistenceHandler</classname> + is related to a later exercise.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> - <section xml:id="section_dtdDetail"> - <title><abbrev - xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>'s - in detail</title> + <section xml:id="sectSimpleInsertGui"> + <title>A first GUI sketch</title> - <para>We have already seen that elements are building blocks of XML - documents. Now we regard the formal rules that govern the way - <code><!ELEMENT ...></code> declarations may appear in XML. - This will lead to the notion of the term <emphasis>Content - Model</emphasis>.</para> + <para>So far all data records being transferred to the database + server are still hard-coded in our application. In practice a user + wants to enter data of persons to be submitted to the + database.</para> - <para>Then we will shed some light on <code><!ATTRIBUTE - ...></code> declarations. We will learn about possible attribute - types and default values.</para> + <para>We now guide you to develop a first version of a simple GUI + for this tasks. A more <link linkend="figureDataInsert2">elaborate + version</link> will be presented in a follow-up exercise. The + screenshot illustrates the intended application behaviour:</para> - <para>Next we explore the <emphasis>physical</emphasis> structure of - XML documents. We will see that <abbrev - xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>s - and document instances may be physically subdivided into - <emphasis>entities</emphasis> without touching their logical - structure.</para> + <figure xml:id="simpleInsertGui"> + <title>A simple GUI to insert data into a database server.</title> - <para>Since we want to illustrate DTD grammars by <userinput - xlink:href="http://en.wikipedia.org/wiki/Ebnf">EBNF</userinput> - diagrams we first show some helpful non-terminals starting with the - definition of white space. Apparently this is the same as in most - programming languages:</para> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Screen/simpleInsertGui.screen.png"/> + </imageobject> + </mediaobject> - <productionset> - <title>White Space</title> + <caption> + <para>After clicking <quote>Insert</quote> a message is being + presented to the user. This message may as well indicate a + failure.</para> + </caption> + </figure> - <production xml:id="w3RecXml_NT-S"> - <lhs>S</lhs> + <para>Implementing Swing GUI applications requires knowledge as + being taught in e.g. <link + xlink:href="http://www.hdm-stuttgart.de/studenten/stundenplan/vorlesungsverzeichnis/vorlesung_detail?vorlid=5212221">113300 + Entwicklung von Web-Anwendungen</link>. If you do not (yet) feel + comfortable writing <productname + xlink:href="http://docs.oracle.com/javase/tutorial/uiswing/index.html">Swing</productname> + applications you may want to read <uri + xlink:href="http://www.javamex.com/tutorials/swing">http://www.javamex.com/tutorials/swing</uri> + and <emphasis role="bold">really</emphasis> understand the examples + being presented therein.</para> - <rhs>(#x20 | #x9 | #xD | #xA)+ <lineannotation>space, tabulator, - carriage return and line feed</lineannotation></rhs> - </production> - </productionset> + <qandaset role="exercise"> + <title>GUI for inserting Person data to a database server</title> - <para>The production rule for <code>Name</code> defines legal - identifier names for element names like <tag - class="element">memo</tag>. We learn that such an identifier must - not begin with a digit. So the rule presented here resembles the - grammar constraint on legal identifiers in the <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - programming language. The type <code>NMTOKEN</code> will be needed - later when defining element attributes.</para> + <qandadiv> + <qandaentry> + <question> + <para>Write a GUI application as being outlined in <xref + linkend="simpleInsertGui"/>. You may proceed as + follows:</para> - <productionset> - <title>Names and Tokens</title> + <orderedlist> + <listitem> + <para>Write a dummy GUI without any database + functionality. Only present the two labels an input + fields and the Insert button.</para> + </listitem> - <production xml:id="w3RecXml_NT-NameChar"> - <lhs>NameChar</lhs> + <listitem> + <para>Add an + <classname>java.awt.event.ActionListener</classname> + which generates a SQL INSERT Statement when clicking the + Insert button. Return this string to the user as being + shown in the message window of <xref + linkend="simpleInsertGui"/>.</para> - <rhs><nonterminal def="#w3RecXml_NT-Letter">Letter</nonterminal> - | <nonterminal def="#w3RecXml_NT-Digit">Digit</nonterminal> | - '.' | '-' | '_' | ':' | <nonterminal - def="#w3RecXml_NT-CombiningChar" - xlink:href="#w3RecXml_NT-CombiningChar">CombiningChar</nonterminal> - | <nonterminal - def="#w3RecXml_NT-Extender">Extender</nonterminal></rhs> - </production> + <para>At this point you still do not need a database + connection. The message shown to the user is just a + fake, so the GUI <emphasis + role="bold">appears</emphasis> to be working.</para> + </listitem> - <production xml:id="w3RecXml_NT-Name"> - <lhs>Name</lhs> + <listitem> + <para>Establish a + <classname>java.sql.Connection</classname> and create a + <classname>java.sql.Statement</classname> instance when + launching your application. Use the latter in your + <classname>java.awt.event.ActionListener</classname> to + actually insert datasets into your database.</para> + </listitem> + </orderedlist> + </question> - <rhs>(<nonterminal - def="#w3RecXml_NT-Letter">Letter</nonterminal> | '_' | ':') - (<nonterminal - def="#w3RecXml_NT-NameChar">NameChar</nonterminal>)*</rhs> - </production> + <answer> + <para>The complete implementation resides in + <classname>sda.jdbc.intro.v01.InsertPerson</classname>:</para> - <production xml:id="w3RecXml_NT-Names"> - <lhs>Names</lhs> + <programlisting language="java">package sda.jdbc.intro.v01; - <rhs><nonterminal def="#w3RecXml_NT-Name">Name</nonterminal> - (#x20 <nonterminal - def="#w3RecXml_NT-Name">Name</nonterminal>)*</rhs> - </production> +import ... - <production xml:id="w3RecXml_NT-Nmtoken"> - <lhs>Nmtoken</lhs> +public class InsertPerson extends JFrame { - <rhs>(<nonterminal - def="#w3RecXml_NT-NameChar">NameChar</nonterminal>)+</rhs> - </production> + ... - <production xml:id="w3RecXml_NT-Nmtokens"> - <lhs>Nmtokens</lhs> + public InsertPerson () throws SQLException{ + super ("Add a person's data"); - <rhs><nonterminal - def="#w3RecXml_NT-Nmtoken">Nmtoken</nonterminal> (#x20 - <nonterminal - def="#w3RecXml_NT-Nmtoken">Nmtoken</nonterminal>)*</rhs> - </production> - </productionset> + setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); - <section xml:id="section_contentmodel"> - <title>The content model</title> + final JPanel databaseFieldPanel = new JPanel(); + databaseFieldPanel.setLayout(new GridLayout(0,2)); + add(databaseFieldPanel, BorderLayout.CENTER); - <para>We already saw examples of XML elements being composed of - other elements in our <link - linkend="figure_memo_dtd">memo.dtd</link>:</para> + databaseFieldPanel.add(new JLabel("Name:")); + final JTextField nameField = new JTextField(15); + databaseFieldPanel.add(nameField); - <programlisting><!ELEMENT memo (from, to+, subject, content)></programlisting> - - <para>We call the right side the <emphasis>content - model</emphasis> of the <tag class="element">memo</tag> element. - The XML 1.0 specification defines <link - xlink:href="http://www.w3.org/TR/xml#dt-eldecl">four</link> - different <link - xlink:href="http://www.w3.org/TR/2006/REC-xml-20060816/#elemdecls">element - type definitions</link>:</para> + databaseFieldPanel.add(new JLabel("E-mail:")); + final JTextField emailField = new JTextField(15); + databaseFieldPanel.add(emailField); - <productionset xml:id="productionset_element_decl"> - <title>Element Type Declaration</title> + final JButton insertButton = new JButton("Insert"); + add(insertButton, BorderLayout.SOUTH); - <production xml:id="w3RecXml_NT-elementdecl"> - <lhs>elementdecl</lhs> + final Connection conn = DriverManager.getConnection( + "jdbc:mysql://localhost:3306/hdm", "hdmuser", "XYZ"); + final Statement stmt = conn.createStatement(); - <rhs>'<!ELEMENT' <nonterminal - def="#w3RecXml_NT-S">S</nonterminal> <nonterminal - def="#w3RecXml_NT-Name">Name</nonterminal> <nonterminal - def="#w3RecXml_NT-S">S</nonterminal> <nonterminal - def="#w3RecXml_NT-contentspec">contentspec</nonterminal> - <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? - '>'</rhs> - </production> + insertButton.addActionListener(new ActionListener() { + // Linking the GUI to the database server. We assume an open + // connection and a correctly initialized Statement instance + @Override + public void actionPerformed(ActionEvent event) { + final String sql = "INSERT INTO Person VALUES('" + nameField.getText()+ "', '" + + emailField.getText() + "')"; + // We have to catch this Exception because an ActionListener's signature + // prohibits the existence of a "throws" clause. + try { + final int updateCount = stmt.executeUpdate(sql); + JOptionPane.showMessageDialog(null, "Successfully executed \n'" + sql + "'\nand inserted " + + updateCount + " dataset"); + } catch (SQLException e) { + e.printStackTrace(); + } + } + }); + pack(); + } +}</programlisting> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> - <production xml:id="w3RecXml_NT-contentspec"> - <lhs>contentspec</lhs> + <section xml:id="jdbcExceptions"> + <title>Handling possible exceptions</title> - <rhs>'EMPTY' | 'ANY' | <nonterminal - def="#w3RecXml_NT-Mixed">Mixed</nonterminal> | <nonterminal - def="#w3RecXml_NT-children">children</nonterminal></rhs> - </production> - </productionset> + <para>Our current code lacks any kind of error handling: Exceptions + will not be caught at all and invariably lead to program + termination. This is of course inadequate regarding professional + software. In case of problems we have to:</para> - <glosslist> - <glossentry> - <glossterm><link - linkend="section_empty">EMPTY</link></glossterm> + <itemizedlist> + <listitem> + <para>Gracefully recover or shut down our application. We may + for example show a pop up window <quote>Terminating due to an + internal error</quote>.</para> + </listitem> - <glossdef> - <para>The element doesn't have any content at all. This - makes sense for elements with attributes being allowed as in - <tag class="emptytag"> img src="foo.gif"</tag>.</para> - </glossdef> - </glossentry> + <listitem> + <para>Enable the customer to supply the development team with + helpful information. The user may for example be asked to submit + a log file in case of errors.</para> + </listitem> + </itemizedlist> - <glossentry> - <glossterm><link linkend="section_any">ANY</link></glossterm> + <para>In addition the solution + <classname>sda.jdbc.intro.v01.InsertPerson</classname> contains an + ugly mix of GUI components and database related code. We take a + first step to decouple these two distinct concerns:</para> - <glossdef> - <para>The element in question may contain a sequence of - arbitrary elements and ordinary text - (<code>#PCDATA</code>).</para> - </glossdef> - </glossentry> + <qandaset role="exercise" xml:id="exercicseGuiStateful"> + <title>Handling the database layer</title> - <glossentry> - <glossterm><nonterminal - def="#w3RecXml_NT-Mixed">Mixed</nonterminal></glossterm> + <qandadiv> + <qandaentry> + <question> + <para>Implement a class <code>PersistenceHandler</code> to + be later used as a component of our next step GUI + application prototype. This class should have the following + methods:</para> - <glossdef> - <para>The element may contain an arbitrary sequence from a - set of child elements possibly interspersed with ordinary - text.</para> - </glossdef> - </glossentry> + <programlisting language="java">... +/** + * Handle database communication. There are two + * distinct internal states <q>disconnected</q> and <q>connected</q>, see + * {@link #isConnected()}. These two states may be toggled by invoking + * {@link #connect()} and {@link #disconnect()} respectively. + * + * The following snippet illustrates the intended usage: + * <pre> public static void main(String[] args) { + final PersistenceHandler ph = new PersistenceHandler(); + if (ph.connect()) { + if (!ph.add("Jim", "jim@foo.com")) { + System.err.println("Insert Error:" + ph.getErrorMessage()); + } + } else { + System.err.println("Connect error:" + ph.getErrorMessage()); + } + }</pre> + * + * @author goik + */ +public class PersistenceHandler { + ... + /** + * Instance in <q>disconnected</q> state. See {@link #isConnected()} + */ + public PersistenceHandler() {/* only present here to supply Javadoc comment */} - <glossentry> - <glossterm><nonterminal - def="#w3RecXml_NT-children">children</nonterminal></glossterm> + /** + * Inserting a (name, email) record into the database server. In case of + * errors corresponding messages may subsequently be retrieved by calling + * {@link #getErrorMessage()}. + * + * <dt><b>Precondition:</b></dt> <dd>must be in + * <q>connected</q> state, see {@link #isConnected()}</dd> + * + * @param name + * A person's name + * @param email + * A person's email address + * + * @return true if the current data record has been successfully inserted + * into the database server. false in case of error(s). + */ + public boolean add(final String name, final String email){ + ... + } - <glossdef> - <para>An element contains <emphasis>only</emphasis> other - elements. A node of the element type in question may appear - as child of itself giving rise to recursion:</para> + /** + * Retrieving error messages in case a call to {@link #add(String, String)}, + * {@link #connect()}, or {@link #disconnect()} yields an error. + * + * @return the error explanation corresponding to the latest failed + * operation, null if no error yet occurred. + */ + public String getErrorMessage() { + return ...; + } - <programlisting>... -<chapter> - <chapter> ...</chapter> -</chapter></programlisting> - </glossdef> - </glossentry> - </glosslist> + /** + * Open a connection to a database server. + * + * <dt><b>Precondition:</b><dd> + * <dd>must be in <q>disconnected</q> state, see {@link #isConnected()}</dd> + * + * <dt><b>Precondition:</b><dd> + * <dd>The following properties must be set: + * <pre>PersistenceHandler.jdbcUrl=jdbc:mysql://localhost:3306/hdm +PersistenceHandler.password=XYZ +PersistenceHandler.username=foo</pre> + * </dd> + * + * @return true if connecting was successful + */ + public boolean connect () { + ... + } - <para>All elements being declared are subject to the following - validity constraint:</para> + /** + * Close a connection to a database server and clean up JDBC related resources + * + * Error messages in case of failure may subsequently be retrieved by + * calling {@link #getErrorMessage()}. + * + * <dt><b>Precondition:</b></dt> + * <dd>must be in <q>connected</q> state, see {@link #isConnected()}</dd> + * + * @return true if disconnecting was successful, false in case error(s) occur. + */ + public boolean disconnect() { + ... + } - <constraintdef> - <para>An element type MUST NOT be declared more than - once.</para> - </constraintdef> + /** + * An instance can either be in <q>connected</q> or <q>disconnected</q> state. The + * state can be toggled by invoking {@link #connect()} or + * {@link #disconnect()} respectively. + * + * @return true if connected, false otherwise + */ + public boolean isConnected() { + return ...; + } +}</programlisting> - <para>Programmers will not be surprised: The above constraint is - common to most programming languages. In <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - for example a given local variable may not be redefined:</para> + <para>Notice the two internal states + <quote>disconnected</quote> and + <quote>connected</quote>:</para> - <programlisting language="java">int count = 3; -double pi=3.1415926; -int count = 2; // Fatal error: A variable must not be - // redefined within the given scope</programlisting> + <figure xml:id="figPersistenceHandlerStates"> + <title>Possible states and transitions for instances of + <code>PersistenceHandler</code>.</title> - <para>However there is no such rule like <quote>Define before - use</quote>: Element <emphasis>and</emphasis> attribute - definitions may refer to elements being defined - <quote>later</quote>:</para> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/persistHandlerStates.fig"/> + </imageobject> + </mediaobject> + </figure> - <programlisting><!ATTLIST memo<co - xml:id="programlisting_elemattorder_memoatt"/> date CDATA #REQUIRED - priority (low|medium|high) #IMPLIED> + <para>According to the above documentation a newly created + <code>PersistenceHandler</code> instance should be in + disconnected state. As being shown in the <trademark + xlink:href="http://docs.oracle.com/javase/1.5.0/docs/guide/javadoc">Javadoc</trademark> + class description you may test your implementation without + any GUI code. If you are already familiar with unit testing + this might be a good start as well.</para> + </question> -<!ELEMENT memo<co xml:id="programlisting_elemattorder_memodecl"/> (from, to+, subject, content)> + <answer> + <para>We show a possible implementation of + <classname>sda.jdbc.intro.v1.PersistenceHandler</classname>:</para> -<!ELEMENT from (#PCDATA)> -<!ELEMENT to (#PCDATA)> -<!ELEMENT subject (#PCDATA)> -<!ELEMENT content (#PCDATA)></programlisting> + <programlisting language="java">package sda.jdbc.intro.v1; +... - <calloutlist> - <callout arearefs="programlisting_elemattorder_memoatt"> - <para>Two attributes <varname>date</varname> and - <varname>priority</varname> are defined for the element <tag - class="starttag">memo</tag> which itself gets defined - immediately <emphasis>after</emphasis> this definition.</para> - </callout> +public class PersistenceHandler { - <callout arearefs="programlisting_elemattorder_memodecl"> - <para>The <tag class="element">memo</tag> type definition - refers to the element types <tag class="element">from</tag>, - <tag class="element">to</tag>, <tag - class="element">subject</tag> and <tag - class="element">content</tag> all being defined - afterwards.</para> - </callout> - </calloutlist> + Connection conn = null; + Statement stmt = null; - <section xml:id="section_empty"> - <title>The <code>EMPTY</code> declaration</title> + String errorMessage = null; - <para>Element nodes of content type <code>EMPTY</code> are - familiar from e.g. HTML:</para> + /** + * New instances are in <q>disconnected</q> state. See {@link #isConnected()} + */ + public PersistenceHandler() {/* only present here to supply Javadoc comment */} - <programlisting>... -<p>We saw the picture <img src="person.gif"> of the officer. -...</programlisting> + /** + * Inserting a (name, email) record into the database server. In case of + * errors corresponding messages may subsequently be retrieved by calling + * {@link #getErrorMessage()}. + * + * <dt><b>Precondition:</b></dt> <dd>must be in + * <q>connected</q> state, see {@link #isConnected()}</dd> + * + * @param name + * A person's name + * @param email + * A person's email address + * + * @return true if the current data record has been successfully inserted + * into the database server. false in case of error(s). + */ + public boolean add(final String name, final String email){ + final String sql = "INSERT INTO Person VALUES('" + name + "', '" + + email + "')"; + try { + stmt.executeUpdate(sql); + return true; + } catch (SQLException e) { + errorMessage = "Unable to execute '" + sql + "': '" + e.getMessage() + "'"; + return false; + } + } - <para>This code fragment shows an image embedded <emphasis>in - line</emphasis> with the current text flow. This is possible in - HTML being an SGML standard but it is <emphasis>not</emphasis> - allowed in XML. Also the omission of <tag - class="starttag">/p</tag> to close the paragraph is disallowed. - In XML either of the two forms has to be chosen:</para> + /** + * Retrieving error messages in case a call to {@link #add(String, String)}, + * {@link #connect()}, or {@link #disconnect()} yields an error. + * + * @return the error explanation corresponding to the latest failed + * operation, null if no error yet occurred. + */ + public String getErrorMessage() { + return errorMessage; + } - <itemizedlist> - <listitem> - <para><code><p>We saw the picture <img - src="person.gif"></img> of the - officer.</p></code></para> - </listitem> + /** + * Open a connection to a database server. + * + * <dt><b>Precondition:</b><dd> + * <dd>must be in <q>disconnected</q> state, see {@link #isConnected()}</dd> + * + * <dt><b>Precondition:</b><dd> + * <dd>The following properties must be set: + * <pre>PersistenceHandler.jdbcUrl=jdbc:mysql://localhost:3306/hdm +PersistenceHandler.password=XYZ +PersistenceHandler.username=foo</pre> + * </dd> + * + * @return true if connecting was successful + */ + public boolean connect () { + try { + conn = DriverManager.getConnection( + DbProps.getString("PersistenceHandler.jdbcUrl"), + DbProps.getString("PersistenceHandler.username"), + DbProps.getString("PersistenceHandler.password")); + try { + stmt = conn.createStatement(); + return true; + } catch (SQLException e) { + errorMessage = "Connection opened but Statement creation failed:\"" + e.getMessage() + "\"."; + try { + conn.close(); + } catch (SQLException ee) { + errorMessage += "Closing connection failed:\"" + e.getMessage() + "\"."; + } + conn = null; + } - <listitem> - <para><code><p>We saw the picture <img - src="person.gif"/> of the - officer.</p></code></para> - </listitem> - </itemizedlist> + } catch (SQLException e) { + errorMessage = "Unable to open connection:\"" + e.getMessage() + "\"."; + } + return false; + } - <para>Using <tag class="starttag">img .../</tag> as a shorthand - for an empty element is legal in XML but disallowed in SGML and - thus HTML. This is one of the possible obstacles when migrating - from SGML based HTML documents to an XML version of HTML like - <link xlink:href="http://www.w3.org/MarkUp">Xhtml</link>. From - <xref linkend="productionset_element_decl"/> we can infer the - corresponding DTD declaration:</para> + /** + * Close a connection to a database server and clean up JDBC related resources + * + * Error messages in case of failure may subsequently be retrieved by + * calling {@link #getErrorMessage()}. + * + * <dt><b>Precondition:</b></dt> + * <dd>must be in <q>connected</q> state, see {@link #isConnected()}</dd> + * + * @return true if disconnecting was successful, false in case error(s) occur. + */ + public boolean disconnect() { + boolean resultStatus = true; + final StringBuffer messageCollector = new StringBuffer(); + try { + stmt.close(); + } catch (SQLException e) { + resultStatus = false; + messageCollector.append("Unable to close Statement:\"" + e.getMessage() + "\"."); + } + stmt = null; + try { + conn.close(); + } catch (SQLException e) { + resultStatus = false; + messageCollector.append("Unable to close connection:\"" + e.getMessage() + "\"."); + } + conn = null; + if (!resultStatus) { + errorMessage = messageCollector.toString(); + } + return resultStatus; + } - <programlisting><!ELEMENT img EMPTY></programlisting> - </section> + /** + * An instance can either be in <q>connected</q> or <q>disconnected</q> state. The + * state can be toggled by invoking {@link #connect()} or + * {@link #disconnect()} respectively. + * + * @return true if connected, false otherwise + */ + public boolean isConnected() { + return null != conn; + } +}</programlisting> + </answer> + </qandaentry> + </qandadiv> + </qandaset> - <section xml:id="section_any"> - <title>The <code>ANY</code> declaration</title> + <para>We may now complete the next enhancement step of our GUI + database client.</para> - <para>The <code>ANY</code> declaration allows every element of a - given DTD to appear as a child of the element being defined - including the element itself. It is not possible to exclude - certain elements from an <code>ANY</code> rule:</para> + <qandaset role="exercise"> + <title>Connection on user action</title> - <figure xml:id="figure_any_declaration"> - <title>The <code>ANY</code> declaration</title> + <qandadiv> + <qandaentry xml:id="exerciseGuiWriteTakeTwo"> + <question> + <label>An application writing records to a database + server</label> - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE theater [ -<!ELEMENT theater ANY <co xml:id="figure_any_declaration_any"/> > + <para>Our aim is to enhance the first GUI prototype being + described in <xref linkend="simpleInsertGui"/>. The + application shall start being disconnected from the database + server. Prior to entering data the user shall be guided to + open a connection. The following video illustrates the + desired user interface:</para> -<!ELEMENT actor (#PCDATA) <co xml:id="figure_any_declaration_actor"/> > -<!ELEMENT show (#PCDATA) <co xml:id="figure_any_declaration_show"/>> -]> -<theater> - <actor>Peter Sun</actor> - some text <co xml:id="figure_any_declaration_doc_text"/> - <show>Must go on</show> - <theater>Self referencing!</theater> <co - xml:id="figure_any_declaration_actor_self_reference"/> - <!-- An error: --> - <b>Ooops, no such element defined in DTD</b> <co - xml:id="figure_any_declaration_actor_undefined"/> -</theater></programlisting> + <figure xml:id="figureDataInsert2"> + <title>A GUI frontend for adding personal data to a + server.</title> - <calloutlist> - <callout arearefs="figure_any_declaration_any"> - <para>A <tag class="element">theater</tag> element may - consist of a sequence of arbitrary content. Every child - element must be defined in the DTD.</para> - </callout> + <mediaobject> + <videoobject> + <videodata fileref="Ref/Video/dataInsert.mp4"/> + </videoobject> + </mediaobject> + </figure> - <callout arearefs="figure_any_declaration_actor figure_any_declaration_show"> - <para>Two elements <tag class="element">actor</tag> and - <tag class="element">show</tag> consisting of mere textual - content.</para> - </callout> + <para>In case a user closes the main window while still + being connected a disconnect from the database server shall + be enforced. For this purpose we must handle the event when + the user clicks on the closing button within the window + decoration. An exit handler method is being required to + terminate a potentially open database connection.</para> + </question> - <callout arearefs="figure_any_declaration_doc_text"> - <para>Ordinary text may also be part of the <tag - class="starttag">theater</tag> element and may appear - everywhere.</para> - </callout> + <answer> + <para>Our implementation uses the class + <classname>sda.jdbc.intro.v1.PersistenceHandler</classname> + for handling all database communication. The GUI needs to + visualize the two different states + <quote>disconnected</quote> and <quote>connected</quote>. In + <quote>disconnected</quote> state the whole input pane for + entering datasets and clicking the <quote>Insert</quote> + button is locked. So the user is forced to actively open a + database connection.</para> - <callout arearefs="figure_any_declaration_actor_self_reference"> - <para>A <tag class="starttag">theater</tag> element may - appear as a child of itself. This gives rise to recursion - of arbitrary depth.</para> - </callout> + <para>Notice also the + <classname>java.awt.event.WindowAdapter</classname> + implementation being executed when closing the application's + main window. The <methodname>windowClosing(WindowEvent + e)</methodname> method disconnects any existing database + connection thus freeing resources.</para> - <callout arearefs="figure_any_declaration_actor_undefined"> - <para>There is no element <tag class="starttag">b</tag> - defined in the DTD. Thus the current XML document is - invalid.</para> - </callout> - </calloutlist> - </figure> + <programlisting language="java">package sda.jdbc.intro.v1; - <para>Remark: The restriction to elements being defined in a DTD - is common to other content model types as well. Actually every - element being referenced by a definition in the DTD - <emphasis>must</emphasis> itself be defined in order for the - document to be valid.</para> - </section> +import ... - <section xml:id="section_mixed"> - <title>Mixed content</title> +public class InsertPerson extends JFrame { + + private static final long serialVersionUID = 6815975741605247675L; + + final PersistenceHandler persistenceHandler = new PersistenceHandler(); + + final JTextField nameField = new JTextField(15), + emailField = new JTextField(20); - <para>Mixed content is similar to the ANY declaration. But the - set of elements allowed to appear is restricted. We show an - example:</para> + final JButton toggleConnectButton = new JButton(), + insertButton = new JButton("Insert"); - <figure xml:id="figure_memo_content_mixed"> - <title>Extending the memo content type.</title> + final JPanel databaseFieldPanel = new JPanel(); - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE memo [ -... -<!ELEMENT content (#PCDATA|emphasis|url)*> -<!ELEMENT emphasis (#PCDATA)> -<!ELEMENT url (#PCDATA)> -<!ATTLIST url href CDATA #REQUIRED> -]> -... -<content>The <url href="http://w3.org/XML">XML</url> language - is <emphasis>easy</emphasis> to learn. However you need - some <emphasis>time</emphasis>.</content> ...</programlisting> + private void setGuiConnectionState(final boolean state) { + if (state) { + toggleConnectButton.setText("Disconnect"); + } else { + toggleConnectButton.setText("Connect"); + } + for (final Component c: databaseFieldPanel.getComponents()){ + c.setEnabled(state); + } + } - <caption> - <para>This grammar allows to emphasize text passages and to - define hypertext links.</para> - </caption> - </figure> + public static void main(String[] args) throws SQLException { + InsertPerson app = new InsertPerson(); + app.setVisible(true); + } + + public InsertPerson (){ + super ("Add a person's data"); + + setSize(500, 500); - <para>The formatting expectation is <quote>... The <link - xlink:href="http://w3.org/XML">XML</link> language is - <emphasis>easy</emphasis> to learn. However you need some - <emphasis>time</emphasis>. ...</quote>. We may visualize this - document instance as a tree:</para> + addWindowListener(new WindowAdapter() { + // In case a user closes our application window while still being connected + // we have to close the database connection. + @Override + public void windowClosing(WindowEvent e) { + super.windowClosing(e); + if (persistenceHandler.isConnected() && !persistenceHandler.disconnect()) { + System.exit(1); + } else { + System.exit(0); + } + }); + Box top = Box.createHorizontalBox(); + add(top, BorderLayout.NORTH); + top.add(toggleConnectButton); + + toggleConnectButton.addActionListener(new ActionListener() { + + @Override + public void actionPerformed(ActionEvent e) { + if (persistenceHandler.isConnected()) { + if (persistenceHandler.disconnect()){ + setGuiConnectionState(false); + } else { + JOptionPane.showMessageDialog(null, persistenceHandler.getErrorMessage()); + } + } else { + if (persistenceHandler.connect()){ + setGuiConnectionState(true); + } else { + JOptionPane.showMessageDialog(null, persistenceHandler.getErrorMessage()); + } + } + } + }); + + databaseFieldPanel.setLayout(new GridLayout(0,2)); + add(databaseFieldPanel); - <figure xml:id="extendContModelGraph"> - <title>Graphical representation of the extended - <code>content</code> model.</title> + databaseFieldPanel.add(new JLabel("Name:")); + databaseFieldPanel.add(nameField); + + databaseFieldPanel.add(new JLabel("E-mail:")); + databaseFieldPanel.add(emailField); + + insertButton.addActionListener(new ActionListener() { + @Override + public void actionPerformed(ActionEvent e) { + if (persistenceHandler.add(nameField.getText(), emailField.getText())) { + nameField.setText(""); + emailField.setText(""); + JOptionPane.showMessageDialog(null, "Succesfully inserted dataset"); + } else { + JOptionPane.showMessageDialog(null, persistenceHandler.getErrorMessage()); + } + } + }); + databaseFieldPanel.add(Box.createGlue()); + databaseFieldPanel.add(insertButton); + setGuiConnectionState(false); + pack(); + } +}</programlisting> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/contentmixed.fig"/> - </imageobject> - </mediaobject> - </figure> + <section xml:id="jdbcSecurity"> + <title><trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + and security</title> - <para>More formally the W3C specification defines mixed content - models as:</para> + <section xml:id="jdbcSecurityNetwork"> + <title>Network sniffing</title> - <productionset xml:id="productionset_w3RecXml_NT-Mixed"> - <title>Mixed-content Declaration</title> + <para>Sniffing <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + network traffic is one possibility for intruders to compromise + database applications. This requires physical access to either + of:</para> - <production xml:id="w3RecXml_NT-Mixed"> - <lhs>Mixed</lhs> + <itemizedlist> + <listitem> + <para>Server host</para> + </listitem> - <rhs>'(' <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? - '#PCDATA' (<nonterminal - def="#w3RecXml_NT-S">S</nonterminal>? '|' <nonterminal - def="#w3RecXml_NT-S">S</nonterminal>? <nonterminal - def="#w3RecXml_NT-Name">Name</nonterminal>)* <nonterminal - def="#w3RecXml_NT-S">S</nonterminal>? ')*' | '(' - <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? '#PCDATA' - <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? ')'</rhs> - </production> - </productionset> + <listitem> + <para>Client host</para> + </listitem> - <para>We notice that out simple <code><!ELEMENT from - (#PCDATA)></code> is also described by this definition. It is - just a special case of a single text node and no element nodes - being present.</para> + <listitem> + <para>intermediate hub, switch or router.</para> + </listitem> + </itemizedlist> - <qandaset role="exercise"> - <title>Variations of mixed content models</title> + <figure xml:id="figJdbcSniffing"> + <title>Sniffing a <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + connection by an intruder.</title> - <qandadiv> - <qandaentry xml:id="example_allowed_mixed"> - <question> - <para>You may assume that the element types <tag - class="element">emphasize</tag> and <tag - class="element">URL</tag> are correctly defined. Are the - following definitions allowed?</para> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/jdbcSniffing.fig"/> + </imageobject> + </mediaobject> + </figure> + + <para>We demonstrate a possible attack by analyzing the network + traffic between our application shown in <xref + linkend="figJdbcSimpleWrite"/> and the <productname + xlink:href="http://www.mysql.com">Mysql</productname> database + server. Prior to starting the application we set up <productname + xlink:href="http://www.wireshark.org">Wireshark</productname> for + filtered capturing:</para> + + <itemizedlist> + <listitem> + <para>Connecting to the <varname>loopback</varname> (lo) + interface only. This is sufficient since our client connects + to <varname>localhost</varname>.</para> + </listitem> + + <listitem> + <para>Filtering packets if not of type <acronym + xlink:href="http://en.wikipedia.org/wiki/Transmission_Control_Protocol">TCP</acronym> + and having port number 3306</para> + </listitem> + </itemizedlist> + + <para>This yields the following capture being shortened for the + sake of brevity:</para> + + <programlisting>[... +5.5.24-0ubuntu0.12.04.1.%...X*e?I1ZQ...................e,F[yoA5$T[N.mysql_native_password. + A...........!.......................hdmuser <co xml:id="tcpCaptureUsername"/>......U.>S.%..~h...!.xhdm............j..../* + + ... INSERT INTO Person VALUES('Jim', 'jim@foo.org') <co + xml:id="tcpCaptureSqlInsert"/>6... + .&.#23000Duplicate entry 'jim@foo.org' for key 'email' <co + xml:id="tcpCaptureErrmsg"/></programlisting> + + <calloutlist> + <callout arearefs="tcpCaptureUsername"> + <para>The <varname>username</varname> initiating the + connection to the database server.</para> + </callout> + + <callout arearefs="tcpCaptureSqlInsert"> + <para>The <code>INSERT ...</code> statement.</para> + </callout> + + <callout arearefs="tcpCaptureErrmsg"> + <para>The resulting error message being sent back to the + client.</para> + </callout> + </calloutlist> + + <para>Something seems to be missing here: The user's password. Our + code in <xref linkend="figJdbcSimpleWrite"/> contains the password + <quote><varname>XYZ</varname></quote> in clear text. But even + using the search function of <productname + xlink:href="http://www.wireshark.org">Wireshark</productname> does + not show any such string within the above capture. The + <productname xlink:href="http://www.mysql.com">Mysql</productname> + documentation however <link + xlink:href="http://dev.mysql.com/doc/refman/5.0/en/security-against-attack.html">reveals</link> + that everything but the password is transmitted in clear text. So + all we might identify is a hash of <code>XYZ</code>.</para> + + <para>So regarding our (current) <productname + xlink:href="http://www.mysql.com">Mysql</productname> + implementation the impact of this attack type is somewhat limited + but still severe: All data being transmitted between client and + server may be disclosed. This typically comprises sensible data as + well. Possible solutions:</para> + + <itemizedlist> + <listitem> + <para>Create an encrypted tunnel between client and server + like e.g. <link + xlink:href="http://www.debianadmin.com/howto-use-ssh-local-and-remote-port-forwarding.html">ssh + port forwarding</link> or <link + xlink:href="http://de.wikipedia.org/wiki/Virtual_Private_Network">VPN</link>.</para> + </listitem> + + <listitem> + <para>Many database vendors <link + xlink:href="http://dev.mysql.com/doc/refman/5.1/de/connector-j-reference-using-ssl.html">supply + SSL</link> or similar <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + protocol encryption extensions. This requires additional + configuration procedures like setting up server side + certificates. Moreover similar to the http/https protocols + encryption generally slows down data traffic.</para> + </listitem> + </itemizedlist> + + <para>Of course this is only relevant if the transport layer is + considered to be insecure. If both server and client reside within + the same trusted infrastructure no action has to be taken. We also + note that this kind of problem is not limited to <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>. + In fact all protocols lacking encryption are subject to this type + of attack.</para> + </section> + + <section xml:id="sqlInjection"> + <title>SQL injection</title> + + <para>Before diving into technical details we shed some light on + the possible impact of this common attack type being described in + this chapter. Our example is the well known Heartland Payment + Systems data breach:</para> + + <figure xml:id="figHeartlandSecurityBreach"> + <title>Summary about possible SQL injection impact based on the + Heartland security breach</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/heartland.fig"/> + </imageobject> + </mediaobject> + </figure> + + <para>Why should we be concerned with SQL injection? In the + introduction of <xref linkend="bibClarke09"/> a compelling + argument is being given:</para> + + <blockquote> + <para>Many people say they know what SQL injection is, but all + they have heard about or experienced are trivial examples. SQL + injection is one of the most devastating vulnerabilities to + impact a business, as it can lead to exposure of all of the + sensitive information stored in an application's database, + including handy information such as usernames, passwords, names, + addresses, phone numbers, and credit card details.</para> + </blockquote> + + <para>In this lecture due to limited resources we only deal with + trivial examples mentioned above. One possible way SQL injection + attacks work is by inserting SQL code into fields being designed + for end user input:</para> + + <figure xml:id="figSqlInject"> + <title>SQL injection triggered by ordinary user input.</title> + + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/sqlinject.fig"/> + </imageobject> + </mediaobject> + </figure> + + <qandaset role="exercise"> + <title>Attack from the dark side</title> + + <qandadiv> + <qandaentry xml:id="sqlInjectDropTable"> + <question> + <para>Use the application from <xref + linkend="exerciseGuiWriteTakeTwo"/> and <xref + linkend="figSqlInject"/> to launch a SQL injection attack. + We provide some hints:</para> + + <orderedlist> + <listitem> + <para>The <productname + xlink:href="http://www.mysql.com">Mysql</productname> + <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + driver implementation already provides precautions to + hamper SQL injection attacks. In its default + configuration a sequence of SQL commands separated by + semicolons (<quote>;</quote>) will not be executed but + flagged as a SQL syntax error. We take an + example:</para> + + <programlisting>INSERT INTO Person VALUES (...);DROP TABLE Person</programlisting> -<<<<<<< HEAD <para>In order to execute these so called multi user queries we explicitly have to enable a <productname xlink:href="http://www.mysql.com">Mysql</productname> @@ -2317,70 +2585,61 @@ int count = 2; // Fatal error: A variable must not be <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> URL:</para> -======= - <itemizedlist> - <listitem> - <para><code><! ELEMENT mix - (#PCDATA)*></code></para> - </listitem> ->>>>>>> d6323b6ef90b1907e7f809b7765f28762f5ea52d - <listitem> - <para><code><! ELEMENT mix - (emphasize|#PCDATA)*></code></para> - </listitem> + <programlisting>jdbc:mysql://localhost:3306/hdm?<emphasis + role="bold">allowMultiQueries=true</emphasis></programlisting> -<<<<<<< HEAD <para>The <productname xlink:href="http://www.mysql.com">Mysql</productname> manual <link xlink:href="http://dev.mysql.com/doc/refman/5.1/en/connector-j-reference-configuration-properties.html">contains </link>a remark regarding this parameter:</para> -======= - <listitem> - <para><code><! ELEMENT mix - (#PCDATA|URL)></code></para> - </listitem> ->>>>>>> d6323b6ef90b1907e7f809b7765f28762f5ea52d - <listitem> - <para><code><! ELEMENT mix - (emphasize|#PCDATA)+></code></para> - </listitem> - </itemizedlist> - </question> + <remark>Notice that this has the potential for SQL + injection if using plain java.sql.Statements and your + code doesn't sanitize input correctly.</remark> - <answer> - <programlisting><! ELEMENT mix (#PCDATA)*></programlisting> + <para>In other words: You have been warned!</para> + </listitem> - <para>Valid due to syntax diagram.</para> + <listitem> + <para>You may now use either of the two input fields + <quote>name</quote> or <quote>email</quote> to inject + arbitrary SQL code.</para> + </listitem> + </orderedlist> + </question> - <programlisting><! ELEMENT mix (emphasize|#PCDATA)*></programlisting> + <answer> + <para>We construct a suitable string being injected to + drop our <code>Person</code> table:</para> - <para>Not valid. According to the production rule in - <xref linkend="productionset_w3RecXml_NT-Mixed"/> the - term <code>#PCDATA</code> <emphasis>must</emphasis> be - the first token.</para> + <programlisting>Jim', 'jim@c.com');DROP TABLE Person;INSERT INTO Person VALUES('Joe</programlisting> - <programlisting><code><! ELEMENT mix (#PCDATA|URL)></code>, <code><! ELEMENT mix (emphasize|#PCDATA)+></code></programlisting> + <para>This being entered into the name field kills our + <code>Table</code> relation effectively. As the error + message shows two INSERT statements are separated by a + DROP TABLE statement. So after executing the first INSERT + our database server drops the whole table. At last the + second INSERT statement fails giving rise to an error + message no end user will ever understand:</para> - <para>Both variants are disallowed: The indicator of - multiplicity <quote>*</quote> is mandatory and the only - legal token to appear.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> + <figure xml:id="figSqlInjectDropPerson"> + <title>Dropping the <code>Person</code> table by SQL + injection</title> - <section xml:id="section_element_content"> - <title>Element content</title> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Screen/sqlInject.screen.png"/> + </imageobject> + </mediaobject> + </figure> - <para>We refer to our first version of our <link - linkend="figure_memo_dtd">memo.dtd</link>. The <tag - class="element">memo</tag> type declaration reads:</para> + <para>According to the message text the table + <code>Person</code> gets dropped as expected. Thus the + subsequent (second) <code>INSERT</code> action is bound to + fail.</para> -<<<<<<< HEAD <para>In practice this result my be avoided. The database user will (hopefully!) not have sufficient permissions to drop the whole table. Malicious modifications by INSERT, @@ -2390,207 +2649,165 @@ int count = 2; // Fatal error: A variable must not be </qandadiv> </qandaset> </section> -======= - <programlisting><!ELEMENT memo (from, to+, subject, content)></programlisting> ->>>>>>> d6323b6ef90b1907e7f809b7765f28762f5ea52d - <para>Basically this states that for valid document instances a - <tag class="starttag">memo</tag> node consists of a sequence of - other nodes. In this context we denote <tag - class="starttag">memo</tag> as <emphasis>parent</emphasis> node. - <tag class="element">from</tag>, <tag class="element">to</tag>, - <tag class="element">subject</tag> and <tag - class="element">content</tag> are called - <emphasis>child</emphasis> nodes or - <emphasis>children</emphasis> for short.</para> + <section xml:id="sanitizeUserInput"> + <title>Sanitizing user input</title> - <para>A sequence of elements is a special case of a more general - definition of element content in the XML specification. We - already used the <quote>+</quote> operator to allow a node to - appear multiple times. Actually there are three such operators - being defined:</para> + <para>There are at least two general ways to deal with the + disastrous result of <xref linkend="sqlInjectDropTable"/>:</para> - <glosslist> - <glossentry> - <glossterm>?</glossterm> + <itemizedlist> + <listitem> + <para>Keep the database server from interpreting user input + completely. This is probably the best way and will be + discussed in <xref linkend="sectPreparedStatements"/>.</para> + </listitem> - <glossdef> - <para>A node may appear once or never.</para> - </glossdef> - </glossentry> + <listitem> + <para>Let the application check and process user input. + Dangerous user input may be modified prior to being embedded + in SQL statements or being rejected completely.</para> + </listitem> + </itemizedlist> - <glossentry> - <glossterm>+</glossterm> + <para>The first method is definitely superior in most cases. There + are however cases where the restrictions being implied are too + severe. We may for example choose dynamically which tables shall + be accessed. So an SQL statement's structure rather than just its + predicates is affected by user input. There are at least two + standard procedures dealing with this problem:</para> - <glossdef> - <para>A node must appear <emphasis>at least</emphasis> - once.</para> - </glossdef> - </glossentry> + <glosslist> + <glossentry> + <glossterm>Input Filtering</glossterm> - <glossentry> - <glossterm>*</glossterm> + <glossdef> + <para>In the simplest case we check a user's input by + regular expressions. An example is an input field in a login + window representing a system user name. Legal input may + allows letters and digits only. Special characters, + whitespace etc. are typically prohibited. The input does + have a minimum length of one character. A maximum length may + be imposed as well. So we may choose the regular expression + <code>[A-Za-z0-9]+</code> to check valid user names.</para> + </glossdef> + </glossentry> - <glossdef> - <para>A node may appear an arbitrary number of times, - possibly not at all.</para> - </glossdef> - </glossentry> - </glosslist> + <glossentry> + <glossterm><foreignphrase>Whitelisting</foreignphrase></glossterm> - <para>So far we only talked about sequences of element nodes. We - may also define mutually exclusive alternatives:</para> + <glossdef> + <para>In many cases Input fields only allow a restricted set + of values. Consider an input field for names of planets. An + application may keep a dictionary table to validate user + input:</para> - <figure xml:id="operatorContentAlt"> - <title>The operator <quote>|</quote> defining exclusive - alternatives.</title> + <informaltable border="1"> + <col width="10%"/> - <programlisting>... -<!ELEMENT address (email|telephone|town)<co - xml:id="programlisting_alternative_address"/> > -<!ELEMENT email (#PCDATA)> -<!ELEMENT telephone (#PCDATA)> -<!ELEMENT town (#PCDATA)> -... + <col width="5%"/> - <address><co xml:id="programlisting_alternative_emailchild"/> - <email>goik@hdm-stuttgart.de</email> - </address> -... - <address><co xml:id="programlisting_alternative_telephonechild"/> - <telephone>+49 (0)711-8923-2164</telephone> - </address> -...</programlisting> + <tr> + <td>Mercury</td> - <calloutlist> - <callout arearefs="programlisting_alternative_address"> - <para>An <tag class="element">address</tag> node has - <emphasis>either</emphasis> an <tag - class="starttag">email</tag> child <emphasis>or</emphasis> - a <tag class="starttag">telephone</tag> or a <tag - class="starttag">town</tag> child.</para> - </callout> + <td>1</td> + </tr> - <callout arearefs="programlisting_alternative_emailchild"> - <para>An <tag class="starttag">address</tag> node having - an <tag class="starttag">email</tag> child.</para> - </callout> + <tr> + <td>Venus</td> - <callout arearefs="programlisting_alternative_telephonechild"> - <para>An <tag class="starttag">address</tag> node having - an <tag class="starttag">telephone</tag> child.</para> - </callout> - </calloutlist> - </figure> + <td>2</td> + </tr> - <para>Now we have collected the basic means allowing to - structure XML documents. We have the three indicators - <quote>?</quote>, <quote>+</quote> and <quote>*</quote> which - govern the multiplicity of nodes. On the other hand the two - operators <quote>,</quote> and <quote>|</quote> allow us to - define sequences or mutually exclusive alternatives of element - nodes. The XML standard defines the notion of <emphasis>content - particles</emphasis> (<abbrev>cp</abbrev>) which allows these - two types of structuring elements to be grouped and - nested:</para> + <tr> + <td>Earth</td> - <productionset> - <title>Element-content Models</title> + <td>3</td> + </tr> - <production xml:id="w3RecXml_NT-children"> - <lhs>children</lhs> + <tr> + <td>...</td> - <rhs>(<nonterminal - def="#w3RecXml_NT-choice">choice</nonterminal> | - <nonterminal def="#w3RecXml_NT-seq">seq</nonterminal>) ('?' - | '*' | '+')?</rhs> - </production> + <td>...</td> + </tr> - <production xml:id="w3RecXml_NT-cp"> - <lhs>cp</lhs> + <tr> + <td>Neptune</td> - <rhs>(<nonterminal - def="#w3RecXml_NT-Name">Name</nonterminal> | <nonterminal - def="#w3RecXml_NT-choice">choice</nonterminal> | - <nonterminal def="#w3RecXml_NT-seq">seq</nonterminal>) ('?' - | '*' | '+')?</rhs> - </production> + <td>9</td> + </tr> - <production xml:id="w3RecXml_NT-choice"> - <lhs>choice</lhs> + <tr> + <td><emphasis role="bold">Default:</emphasis></td> - <rhs>'(' <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? - <nonterminal def="#w3RecXml_NT-cp">cp</nonterminal> ( - <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? '|' - <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? - <nonterminal def="#w3RecXml_NT-cp">cp</nonterminal> )+ - <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? ')'</rhs> - </production> + <td><emphasis role="bold">0</emphasis></td> + </tr> + </informaltable> - <production xml:id="w3RecXml_NT-seq"> - <lhs>seq</lhs> + <para>So if a user enters a valid planet name a + corresponding number representing this particular planet + will be sent to the database. If the user enters an invalid + string an error message may be raised.</para> - <rhs>'(' <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? - <nonterminal def="#w3RecXml_NT-cp">cp</nonterminal> ( - <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? ',' - <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? - <nonterminal def="#w3RecXml_NT-cp">cp</nonterminal> )* - <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? ')'</rhs> - </production> - </productionset> + <para>In a GUI in many situations this may be better + accomplished by presenting the list of planets to choose + from. In this case a user has no chance to enter invalid or + even malicious code.</para> + </glossdef> + </glossentry> + </glosslist> - <para>We give two examples:</para> + <para>So we have an <quote>interceptor</quote> sitting between + user input fields and SQL generating code:</para> - <figure xml:id="pureElementContent"> - <title>Examples of pure element content models</title> + <figure xml:id="figInputFiltering"> + <title>Validating user input prior to dynamically composing SQL + statements.</title> - <glosslist> - <glossentry> - <glossterm><code><!ELEMENT address - (email|(name,street,town,telephone?))</code></glossterm> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/filtering.fig"/> + </imageobject> + </mediaobject> + </figure> - <glossdef> - <para>An <tag class="element">address</tag> is given - either by an email or by a postal address plus an - optional telephone number.</para> - </glossdef> - </glossentry> + <qandaset role="exercise"> + <title>Using regular expressions in <trademark + xlink:href="http://www.oracle.com/us/technologies/java">Java</trademark></title> - <glossentry> - <glossterm><code><!ELEMENT figurelist (title, - ((table|image|animation), - caption?)+)></code></glossterm> + <qandadiv> + <qandaentry> + <question> + <para>This exercise is a preparation for <xref + linkend="exercisefilterUserInput"/>. The aim is to deal + with regular expressions and to use them in <trademark + xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark>. + If you don't know yet about regular expressions / pattern + matching you may want to read either of:</para> - <glossdef> - <para>We will call table, image and animations - <emphasis>block</emphasis> elements. The <tag - class="starttag">figurelist</tag> element defines a list - of figures. The whole list starts with an overall title. - Then we have at least one occurrence of a block element - and an optional caption.</para> - </glossdef> - </glossentry> - </glosslist> - </figure> + <itemizedlist> + <listitem> + <para><link + xlink:href="http://www.aivosto.com/vbtips/regex.html">Regular + expressions - An introduction</link></para> + </listitem> - <qandaset role="exercise"> - <title>Content models and operator priority></title> + <listitem> + <para><link + xlink:href="http://www.codeproject.com/Articles/939/An-Introduction-to-Regular-Expressions">An + Introduction to Regular Expressions</link></para> + </listitem> - <qandadiv> - <qandaentry xml:id="example_operatorprecedence"> - <question> - <para>Find and explain the error being buried in the - following DTD. After correcting the error construct a - valid document instance.</para> + <listitem> + <para><link + xlink:href="http://www.regular-expressions.info/tutorial.html">Regular + Expression Tutorial</link></para> + </listitem> + </itemizedlist> - <programlisting><!ELEMENT addresslist (address*) > -<!ELEMENT address (email | town,street) > -<!ELEMENT email (#PCDATA)> -<!ELEMENT town (#PCDATA)> -<!ELEMENT street (#PCDATA)></programlisting> - </question> + <para>Complete the implementation of the following + skeleton:</para> -<<<<<<< HEAD <programlisting language="java">... import java.util.regex.Matcher; import java.util.regex.Pattern; @@ -2605,33 +2822,33 @@ public static void main(String[] args) { } } } -======= - <answer> - <para>The following document uses the DTD:</para> - - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE addresslist SYSTEM "address.dtd"> -<addresslist> - <address> - <email>bingo@cheat.com</email> - </address> - <address> - <town>Paris</town> - <street>Avenue Kléber</street> - </address> -</addresslist></programlisting> ->>>>>>> d6323b6ef90b1907e7f809b7765f28762f5ea52d - <para>This yields the following parsing error:</para> +/** + * Matching a given word by a regular expression. A log message is being + * written to stdout. + * + * Hint: The implementation is based on the explanation being given in the + * introduction to {@link Pattern} + * + * @param word This string will be matched by the subsequent argument. + * @param regexp The regular expression tested to match the previous argument. + * @return true if regexp matches word, false otherwise. + */ +public static boolean testMatch(final String word, final String regexp) { +.../* to be implemented by <emphasis role="bold">**YOU**</emphasis> */ +}</programlisting> - <programlisting><errortext>A ')' is required in the declaration of element type "address".</errortext></programlisting> + <para>As being noted in the <trademark + xlink:href="http://docs.oracle.com/javase/1.5.0/docs/guide/javadoc">Javadoc</trademark> + above you may want to read the documentation of class + <classname>java.util.regex.Pattern</classname>. The + intended output of the above application is:</para> - <para>Like many other error messages this one is not - really enlightening the reader. We examine the content - model of the element <tag - class="element">address</tag>:</para> + <programlisting>The expression '[A-K].*' matches 'Eric' +The expression '[^0-9]+.*' ... +...</programlisting> + </question> -<<<<<<< HEAD <answer> <para>A possible implementation is given by <classname>sda.regexp.RegexpPrimer</classname>.</para> @@ -2639,107 +2856,106 @@ public static void main(String[] args) { </qandaentry> </qandadiv> </qandaset> -======= - <programlisting>email | town,street</programlisting> ->>>>>>> d6323b6ef90b1907e7f809b7765f28762f5ea52d - <para>We have tree elements joined by two operators - namely alternative and sequence. In contrast to e.g. - Boolean Algebras the XML standard does not define any - operator priority with respect to <quote>|</quote> and - <quote>,</quote>. Instead a DTD author must use braces - to explicitly define the desired priority:</para> + <qandaset role="exercise"> + <title>Input validation by regular expressions</title> - <programlisting><!ELEMENT address (email | (town,street)) ></programlisting> + <qandadiv> + <qandaentry xml:id="exercisefilterUserInput"> + <question> + <para>The application of <xref + linkend="sqlInjectDropTable"/> proved to be vulnerable to + SQL injection. Sanitize the two user input field's values + to prevent such behaviour.</para> - <para>We note that the operators <quote>*</quote>, - <quote>+</quote> and <quote>?</quote> have precedence - over <quote>|</quote> and <quote>,</quote>. Thus we may - write <code>town,street+</code> instead of the clumsy - term <code>town,(street)+</code>.</para> - </answer> - </qandaentry> + <itemizedlist> + <listitem> + <para>Find appropriate regular expressions to check + both username and email. Some hints:</para> - <qandaentry xml:id="example_book_v2"> - <question> - <label>Book documents with mixed content and itemized - lists</label> + <glosslist> + <glossentry> + <glossterm>username</glossterm> - <para>Extend the first version of <link - linkend="example_bookDtd">book.dtd</link> to support the - following features:</para> + <glossdef> + <para>Regarding SQL injection the <quote>;</quote> + character is among the most critical. You may want + to exclude certain special characters. This doesn't + harm since their presence in a user's name is likely + to be a typo rather then any sensitive input.</para> + </glossdef> + </glossentry> - <itemizedlist> - <listitem> - <para>Within a <tag class="starttag">chapter</tag> - node <tag class="starttag">para</tag> and <tag - class="starttag">itemizedlist</tag> elements in - arbitrary order shall be allowed.</para> - </listitem> + <glossentry> + <glossterm>email</glossterm> - <listitem> - <para><tag class="starttag">itemizedlist</tag> nodes - shall contain at least one <tag - class="starttag">listitem</tag>.</para> - </listitem> + <glossdef> + <para>There are tons of <quote>ultimate</quote> + regular expressions available to check email + addresses. Remember that rather avoiding + <quote>wrong</quote> email addresses the present + task is to avoid SQL injection. So find a reasonable + one which may be too permissive regarding RFC email + syntax rules but sufficient to secure your + application.</para> - <listitem> - <para><tag class="starttag">listitem</tag> nodes - shall be composed of one or more para or nested list - item elements.</para> - </listitem> + <para>A concise definition of an email's syntax is + being given in <link + xlink:href="http://tools.ietf.org/html/rfc5322#section-3.4.1">RFC5322</link>. + Its implementation is beyond scope of the current + lecture. Moreover it is questionable whether E-mail + clients and mail transfer agents implement strict + RFC compliance.</para> + </glossdef> + </glossentry> + </glosslist> - <listitem> - <para>Within a <tag class="starttag">para</tag> we - want to be able to emphasize text passages.</para> - </listitem> - </itemizedlist> + <para>Both regular expressions must cover the whole + user input from the beginning to the end. This can be + achieved by using <code>^ ... $</code>.</para> + </listitem> - <para>The following sample document instance shall be - valid:</para> + <listitem> + <para>The <trademark + xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> + standard class + <classname>javax.swing.InputVerifier</classname> may + help you validating user input.</para> + </listitem> - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE book SYSTEM "book.dtd"> -<book> - <title>Introduction to Java</title> - <chapter> - <title>Introduction</title> - <para>Java supports <emphasis>lots</emphasis> of concepts:</para> - <itemizedlist> - <listitem> - <para>Single <emphasis>implementation</emphasis> inheritance.</para> - </listitem> - <listitem> - <para>Multiple <emphasis>interface</emphasis> inheritance.</para> - <itemizedlist> - <listitem><para>Built in types</para></listitem> - <listitem><para>User defined types</para></listitem> - </itemizedlist> - </listitem> - </itemizedlist> - </chapter> -</book></programlisting> - </question> + <listitem> + <para>The following screenshot may provide an idea for + GUI realization and user interaction in case of + errors. Of course the submit button's action should be + disabled in case of erroneous input. The user should + receive a helpful error message instead.</para> - <answer> - <para>An extended DTD looks like:</para> + <figure xml:id="figInsertValidate"> + <title>Error message being presented to the + user.</title> - <figure xml:id="paraListEmphasize"> - <title>Version 2 of book.dtd</title> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Screen/insertValidate.screen.png"/> + </imageobject> + </mediaobject> - <programlisting><!ELEMENT book (title, chapter+)> -<!ELEMENT chapter (title, (para|itemizedlist)+ <co - xml:id="figure_book.dtd_v2_chapter"/>)> -<!ELEMENT title (#PCDATA)> -<!ELEMENT para (#PCDATA|emphasis)*<co xml:id="figure_book.dtd_v2_para"/>> -<!ELEMENT emphasis (#PCDATA)> + <caption> + <para>In the current example the trailing + <quote>;</quote> within the E-Mail field is + invalid.</para> + </caption> + </figure> + </listitem> + </itemizedlist> + </question> -<!ELEMENT itemizedlist (listitem+)<co - xml:id="figure_book.dtd_v2_itemizedlist"/>> -<!ELEMENT listitem ((para|itemizedlist)<co - xml:id="figure_book.dtd_v2_listitem"/>+)></programlisting> + <answer> + <para>Extending + <classname>javax.swing.InputVerifier</classname> allows us + to build a generic class to filter user text input by + arbitrary regular expressions:</para> -<<<<<<< HEAD <programlisting language="java">package sda.jdbc.intro.v1.sanitize; ... public class RegexpVerifier extends InputVerifier { @@ -2832,632 +3048,703 @@ public class InsertPerson extends JFrame { </qandadiv> </qandaset> </section> -======= - <caption> - <para>This allows emphasized text in <tag - class="starttag">para</tag> nodes and <tag - class="starttag">itemizedlists</tag>.</para> - </caption> - </figure> - <calloutlist> - <callout arearefs="figure_book.dtd_v2_chapter"> - <para>We hook into <tag - class="starttag">chapter</tag> to allow arbitrary - sequences of at least one <tag - class="starttag">para</tag> or <tag - class="starttag">itemizedlist</tag> element - node.</para> - </callout> + <section xml:id="sectPreparedStatements"> + <title><classname>java.sql.PreparedStatement</classname> + objects</title> - <callout arearefs="figure_book.dtd_v2_para"> - <para><tag class="starttag">para</tag> nodes now - allow mixed content.</para> - </callout> - - <callout arearefs="figure_book.dtd_v2_itemizedlist"> - <para>An itemized list contains at least one list - item.</para> - </callout> + <para>Sanitizing user input is an essential means to secure an + application. The <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + standard however provides a mechanism being superior regarding the + purpose of protecting applications against SQL injection attacks. + We shed some light on our current mechanism sending SQL statements + to a database server:</para> - <callout arearefs="figure_book.dtd_v2_listitem"> - <para>A list item contains a sequence of at least - one <tag class="starttag">para</tag> or <tag - class="starttag">itemizedlist</tag> node. The latter - gives rise to nested lists. We find a similar - construct in HTML namely unnumbered lists defined by - <code><UL><LI>... - </code>constructs.</para> - </callout> - </calloutlist> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> + <figure xml:id="sqlTransport"> + <title>SQL statements in <trademark + xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> + applications get parsed at the database server</title> - <section xml:id="comments_processing"> - <title>Comments and processing instructions</title> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/sqlTransport.fig"/> + </imageobject> + </mediaobject> + </figure> - <para>A XML comment uses the syntax <code><!-- This is a - comment! I love comments! --></code>. Without going into - details here comments may appear in many locations both within - <abbrev - xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>'s - and document instances:</para> + <para>This architecture raises two questions:</para> - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE addresslist [ -<!-- An addresslist may contain an arbitrary number of address nodes --> -<!ELEMENT addresslist (address)*> -<!ELEMENT address (#PCDATA)> -]> -<addresslist> - <!-- the document author --> - <address>goik@hdm-stuttgart.de</address> - <address>bingo@problemcompany.com</address> -</addresslist></programlisting> + <orderedlist> + <listitem> + <para>What happens in case identical SQL statements are + executed repeatedly? This may happen inside a loop when + thousands of records with identical structure are being sent + to a database.</para> + </listitem> - <para>Newbies to XML are sometimes confused about so called - <emphasis>processing instructions</emphasis> (PI). Similar to - XML comments it is possible to embed processing instructions - into XML documents. As an example we show an excerpt from the - <link - xlink:href="http://www.w3.org/TR/2006/REC-xml-20060816/REC-xml-20060816.xml">source - file</link> of the XML specification:</para> + <listitem> + <para>Is this architecture adequate with respect to security + concerns?</para> + </listitem> + </orderedlist> - <programlisting><?xml version='1.0' encoding='UTF-8'?> -<!DOCTYPE spec SYSTEM "xmlspec.dtd" [ - <!ENTITY base.uri "http://www.w3.org/TR/2006/"> -... -]> -<?xml-stylesheet type="text/xsl" href="REC-xml.xsl" <co - xml:id="programmlisting_xmlspecsrc_xsltref"/> ?> <co - xml:id="programmlisting_xmlspecsrc_pi"/> -<spec w3c-doctype="rec" xml:lang="en"> -... - <title>Extensible Markup Language (XML)</title> -... -</spec></programlisting> + <para>The first question is related to performance: Parsing + statements being identical despite the properties being contained + within is a waste of resources. We consider the transfer of + records between different databases:</para> - <calloutlist> - <callout arearefs="programmlisting_xmlspecsrc_xsltref"> - <para>A reference to a document external style sheet file. - The file <filename>REC-xml.xsl</filename> resides in the - same folder as the XML document itself. Thus a relative - <link xlink:href="http://www.w3.org/Addressing">URL</link> - is sufficient.</para> - </callout> + <programlisting>INSERT INTO Person VALUES ('Jim', 'jim@q.org') +INSERT INTO Person VALUES ('Eve', 'eve@y.org') +INSERT INTO Person VALUES ('Pete', 'p@rr.com') +...</programlisting> - <callout arearefs="programmlisting_xmlspecsrc_pi"> - <para>A processing instruction allowing a web browser to - render the XML file appropriately.</para> - </callout> - </calloutlist> ->>>>>>> d6323b6ef90b1907e7f809b7765f28762f5ea52d + <para>In this case it does not make sense to repeatedly parse + identical SQL statements. Using single <code>INSERT</code> + statements with multiple data records may not be an option when + the number of records grows.</para> - <para>We first note that from a parser's <quote>point of - view</quote> both XML comments and processing instructions are - ignored. But software applications working with XML documents - may inspect both types and interpret their content.</para> + <para>The second question is related to our current security + topic: The database server's interpreter my be so + <quote>kind</quote> to interpret an attacker's malicious code as + well.</para> - <para>The purpose of the processing instruction in the above - document is to enable web browsers to render its content in a - meaningful way. In contrast to HTML an arbitrary XML document - does not provide any semantics being necessary to create - meaningful renderings to end users. A <tag - class="element">memo</tag> document may be interesting from a - programmer's point of view but an end user will probably prefer - either a HTML or a PDF document being - <emphasis>generated</emphasis> from it. As we shall see in <xref - linkend="xsl"/> the file <filename>REC-xml.xsl</filename> - contains style sheet information adhering to the XSLT standard. - Thus a browser being capable to process XSLT may visualize the - XML document directly.</para> - </section> + <para>Both topics are being addressed by + <classname>java.sql.PreparedStatement</classname> objects. + Basically these objects allow for separation of an SQL statements + structure from parameter values contained within. The scenario + given in <xref linkend="sqlTransport"/> may be implemented + as:</para> - <section xml:id="section_cdatasection"> - <title><acronym>CDATA</acronym> sections</title> + <figure xml:id="sqlTransportPrepare"> + <title>Using <classname>java.sql.PreparedStatement</classname> + objects.</title> - <para>Editing XML documents with text editors it is tedious - since we have to avoid XML markup in <code>#PCDATA</code> or - attribute content. A computer scientist writing a documentation - on C++ code might want to express <emphasis>bit shift</emphasis> - and <emphasis>address of</emphasis> operators:</para> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/sqlTransportPrepare.fig"/> + </imageobject> + </mediaobject> + </figure> - <programlisting><para>If a < b we set c = & <co - xml:id="programlisting_wrongmarkup_amp"/> (a >> <co - xml:id="programlisting_wrongmarkup_gt"/> b); </para></programlisting> + <para>Prepared statements are an example for parameterized SQL + statements which exist in various programming languages. When + using <classname>java.sql.PreparedStatement</classname> instances + we actually have three distinct phases:</para> - <calloutlist> - <callout arearefs="programlisting_wrongmarkup_amp"> - <para>First error: The operator <quote>&</quote> is - reserved for <link linkend="chapter_entities">general entity - references</link> like e.g. <code>&lt;</code>.</para> - </callout> + <orderedlist> + <listitem> + <para xml:id="exerciseGuiWritePrepared">Creating an instance + of <classname>java.sql.PreparedStatement</classname>. The SQL + statement possibly containing place holders gets + parsed.</para> + </listitem> - <callout arearefs="programlisting_wrongmarkup_gt"> - <para>Second error: The character <quote>></quote> is - reserved to denote an element node's termination.</para> - </callout> - </calloutlist> + <listitem> + <para>Setting all placeholder values. This does not involve + any further SQL syntax parsing.</para> + </listitem> - <para>XML offers 5 predefined replacement entities for this - purpose:</para> + <listitem> + <para>Execute the statement.</para> + </listitem> + </orderedlist> - <table xml:id="xmlStandardEntities"> - <title>Replacement entities for XML markup characters</title> + <para>Steps 2. and 3. may be repeated as often as desired without + any re-parsing of SQL statements thus saving resources on the + database server side.</para> - <?dbhtml table-width="15%" ?> + <para>Our introductory toy application <xref + linkend="figJdbcSimpleWrite"/> may be rewritten using + <classname>java.sql.PreparedStatement</classname> objects:</para> - <?dbfo table-width="15%" ?> + <programlisting language="java">sda.jdbc.intro.v1; +... +public class SimpleInsert { + + public static void main(String[] args) throws SQLException { + + final Connection conn = DriverManager.getConnection (... + + // Step 2: Create a PreparedStatement instance + final PreparedStatement pStmt = conn.prepareStatement("INSERT INTO Person VALUES(<emphasis + role="bold">?, ?</emphasis>)");<co xml:id="listPrepCreate"/> + + // Step 3a: Fill in desired attribute values + pStmt.setString(1, "Jim");<co xml:id="listPrepSet1"/> + pStmt.setString(2, "jim@foo.org");<co xml:id="listPrepSet2"/> + + // Step 3b: Execute the desired INSERT + final int updateCount = pStmt.executeUpdate();<co xml:id="listPrepExec"/> + + // Step 4: Give feedback to the enduser + System.out.println("Successfully inserted " + updateCount + " dataset(s)"); + } +}</programlisting> - <tgroup cols="2"> - <colspec colwidth="1*"/> + <calloutlist> + <callout arearefs="listPrepCreate"> + <para>An instance of + <classname>java.sql.PreparedStatement</classname> is being + created. Notice the two question marks representing two place + holders for string values to be inserted in the next + step.</para> + </callout> - <colspec colwidth="2*"/> + <callout arearefs="listPrepSet1 listPrepSet2"> + <para>Fill in the two placeholder values being defined at + <coref linkend="listPrepCreate"/>.</para> - <tbody> - <row> - <entry><</entry> + <caution> + <para>Since half the world of programming folks will index a + list of n elements starting from 0 to n-1, <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + apparently counts from 1 to n. Working with <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + would have been too easy otherwise.</para> + </caution> + </callout> - <entry><tag class="genentity">lt</tag></entry> - </row> + <callout arearefs="listPrepExec"> + <para>Execute the beast! Notice the empty parameter list. No + SQL is required since we already prepared it in <coref + linkend="listPrepCreate"/>.</para> + </callout> + </calloutlist> - <row> - <entry>></entry> + <para>The problem of SQL injection disappears completely when + using <classname>java.sql.PreparedStatement</classname> instances. + An attacker may safely enter offending strings like:</para> - <entry><tag class="genentity">gt</tag></entry> - </row> + <programlisting>Jim', 'jim@c.com');DROP TABLE Person;INSERT INTO Person VALUES('Joe</programlisting> - <row> - <entry>&</entry> + <para>The above string will be taken <quote>as is</quote> and thus + simply becomes part of the database server's content.</para> - <entry><tag class="genentity">amp</tag></entry> - </row> + <qandaset role="exercise"> + <title>Prepared Statements to keep the barbarians at the + gate</title> - <row> - <entry>"</entry> + <qandadiv> + <qandaentry xml:id="exerciseSqlInjectPrepare"> + <question> + <para>In <xref linkend="sqlInjectDropTable"/> we found our + implementation in <xref + linkend="exerciseGuiWriteTakeTwo"/> to be vulnerable with + respect to SQL injection. Rather than sanitizing user + input you shall use + <classname>java.sql.PreparedStatement</classname> objects + to secure the application.</para> + </question> - <entry><tag class="genentity">quot</tag></entry> - </row> + <answer> + <para>Due to our separation of GUI and persistence + handling we only need to re-implement + <classname>sda.jdbc.intro.sqlinject.PersistenceHandler</classname>. + We have to replace + <classname>java.sql.Statement</classname> by + <classname>java.sql.PreparedStatement</classname> + instances. A possible implementation is + <classname>sda.jdbc.intro.v1.prepare.PersistenceHandler</classname>. + We may now safely enter offending strings like:</para> - <row> - <entry>'</entry> + <programlisting>Jim', 'jim@c.com');DROP TABLE Person;INSERT INTO Person VALUES('Joe</programlisting> - <entry><tag class="genentity">apos</tag></entry> - </row> - </tbody> - </tgroup> - </table> + <para>This time the input value is taken <quote>as + is</quote> and yields the following error message:</para> - <para>So without an appropriate editor our poor computer - scientist will have to write:</para> + <informalfigure> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Screen/sqlInjectPrepare.screen.png"/> + </imageobject> + </mediaobject> + </informalfigure> - <programlisting><para>If a &lt; b we set c = &amp; (a &gt;&gt; b); </para></programlisting> + <para>The offending string exceeds the length of the + attribute <code>name</code> within the database table + <code>Person</code>. We may enlarge this value to allow + the <code>INSERT</code> operation:</para> - <para>Looks promising, right? Actually the better alternative is - to use an XML capable editor which allows an author to type - <code>If a < b we set c = & (a >> b);</code>. The - editor software will present this text to the author and - <emphasis>internally</emphasis> save the correct XML code as - presented before.</para> + <programlisting>CREATE TABLE Person ( + name char(<emphasis role="bold">80</emphasis>) <emphasis role="bold">-- a little bit longer --</emphasis> + ,email CHAR(20) UNIQUE +);</programlisting> + </answer> + </qandaentry> + </qandadiv> + </qandaset> - <para>If someone is forced to use a pure text editor - <acronym>CDATA</acronym> sections the second best alternative. A - <acronym>CDATA</acronym> Section encloses a text string which - will not be interpreted by an XML parser. It starts with the - reserved sequence <code><![CDATA[</code> and terminates with - <quote>]]></quote>. The example given before reads:</para> + <para>We may have followed the track of test-driven development. + In that case we would have written tests before actually + implementing our application. In the current lecture we will do + this the other way round in the following exercise. The idea is to + assure software quality when fixing bugs or extending an + application.</para> - <programlisting><para>If <![CDATA[a < b we set c = & (a >> b);]]> </para></programlisting> + <para>The subsequent exercise requires the <productname + xlink:href="http://testng.org/doc/eclipse.html#eclipse-installation">TestNG</productname> + plugin for Eclipse to be installed. This should already be the + case both in the MI exercise classrooms and in the Virtualbox + image provided at <uri + xlink:href="ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.vdi">ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.vdi</uri>. + If you use a private Eclipse installation you may want to follow + <xref linkend="testngInstall"/>.</para> - <para>The precise definition is:</para> + <qandaset role="exercise"> + <title>Testing + <classname>sda.jdbc.intro.v1.PersistenceHandler</classname> + using <productname + xlink:href="http://testng.org">TestNG</productname></title> - <productionset> - <title><acronym>CDATA</acronym> Sections</title> + <qandadiv> + <qandaentry> + <question> + <para>Read <xref linkend="chapUnitTesting"/>. Then + test:</para> - <production xml:id="w3RecXml_NT-CDSect"> - <lhs>CDSect</lhs> + <itemizedlist> + <listitem> + <para>Proper behaviour when opening and closing + connections.</para> + </listitem> - <rhs><nonterminal - def="#w3RecXml_NT-CDStart">CDStart</nonterminal> - <nonterminal def="#w3RecXml_NT-CData">CData</nonterminal> - <nonterminal - def="#w3RecXml_NT-CDEnd">CDEnd</nonterminal></rhs> - </production> + <listitem> + <para>Proper behavior when inserting data</para> + </listitem> - <production xml:id="w3RecXml_NT-CDStart"> - <lhs>CDStart</lhs> + <listitem> + <para>Expected behaviour when entering duplicate + values violating integrity constraints. Look for error + messages as well.</para> + </listitem> + </itemizedlist> - <rhs>'<![CDATA['</rhs> - </production> + <para>You may write code to initialize the database state + appropriately prior to start tests.</para> + </question> - <production xml:id="w3RecXml_NT-CData"> - <lhs>CData</lhs> + <answer> + <para><productname + xlink:href="http://testng.org">TestNG</productname> may be + directed by + <classname>sda.jdbc.intro.v1.prepare.PersistenceHandlerTest</classname>.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> + </section> - <rhs>(<nonterminal - def="#w3RecXml_NT-Char">Char</nonterminal>* - (<nonterminal - def="#w3RecXml_NT-Char">Char</nonterminal>* ']]>' - <nonterminal - def="#w3RecXml_NT-Char">Char</nonterminal>*))</rhs> - </production> + <section xml:id="jdbcRead"> + <title>Read Access</title> - <production xml:id="w3RecXml_NT-CDEnd"> - <lhs>CDEnd</lhs> + <para>So far we've sent records to a database server. Applications + however need both directions: Pushing data to a Server and receiving + data as well. The overall process looks like:</para> - <rhs>']]>'</rhs> - </production> - </productionset> + <figure xml:id="jdbcReadWrite"> + <title>Server / client object's life cycle</title> - <para>Thus inside a <acronym>CDATA</acronym> section only the - exact sequence <quote>]]></quote> is disallowed.</para> - </section> - </section> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/jdbcReadWrite.fig"/> + </imageobject> + </mediaobject> + </figure> - <section xml:id="section_attributetypes"> - <title>Attribute types</title> + <para>So far we've only covered the second (<code>UPDATE</code>) + part of this picture. Reading objects from a database server into a + client's (transient) address space requires a container object to + hold the data in question. Though <trademark + xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> + offers standard container interfaces like + <classname>java.util.List</classname> the <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + standard has created separate specifications like + <classname>java.sql.ResultSet</classname>. Instances of + <classname>java.sql.ResultSet</classname> will hold transient copies + of (database) objects. The next figure outlines the basic + approach:</para> - <para>When discussing the content model type <link - linkend="section_empty">EMPTY</link> we already mentioned the - possibility of element nodes having attributes like <tag - class="emptytag">img src="..."</tag>. We will now discuss two - features of element node attributes: The <emphasis>type</emphasis> - of an attribute and the way default values are specified.</para> + <figure xml:id="figJdbcRead"> + <title>Reading data from a database server.</title> - <para>We already observed that content model definitions allow us - to define <emphasis>composition</emphasis> rules. Thus a <tag - class="starttag">chapter</tag> may consist of a <tag - class="starttag">title</tag> node followed by <tag - class="starttag">para</tag> and other nodes. This defines - hierarchical , tree like structures. But the - <emphasis>actual</emphasis> string content is defined as - <code>#PCDATA</code>. We are unable to specify a node's content to - consist purely of numbers for example. In contrast XML attribute - definitions offer a limited set of predefined types to choose - from.</para> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/jdbcread.fig"/> + </imageobject> + </mediaobject> + </figure> - <section xml:id="section_cdata"> - <title><code>CDATA</code></title> + <para>We take an example. Suppose our database contains a table of + our friends' nicknames and their respective birth dates:</para> - <para>An element type may be defined to have attributes of type - <code>CDATA</code>:</para> + <table border="1" xml:id="figRelationFriends"> + <caption>Names and birth dates of friends.</caption> - <programlisting><!ATTLIST img <co - xml:id="programlisting_img_element"/> - src<co xml:id="programlisting_img_att_src"/> CDATA<co - xml:id="programlisting_img_att_src_type"/> #REQUIRED<co - xml:id="programlisting_img_att_src_default"/> ></programlisting> + <tr> + <td><programlisting>CREATE TABLE Friends ( + id INTEGER NOT NULL PRIMARY KEY + ,nickname char(10) + ,birthdate DATE +);</programlisting></td> - <calloutlist> - <callout arearefs="programlisting_img_element"> - <para>Start of the definition of a <emphasis>set</emphasis> - of attributes for the element type <tag - class="element">img</tag>.</para> - </callout> + <td><programlisting>INSERT INTO Friends VALUES + (1, 'Jim', '1991-10-10') + ,(2, 'Eve', '2003-05-24') + ,(3, 'Mick','2001-12-30') + ;</programlisting></td> + </tr> + </table> - <callout arearefs="programlisting_img_att_src"> - <para>Start of the first at tribute's definition named <tag - class="attribute">src</tag>.</para> - </callout> + <para>Following the outline in <xref linkend="figJdbcRead"/> we may + access our data by:</para> - <callout arearefs="programlisting_img_att_src_type"> - <para>The attribute <tag class="attribute">src</tag>'s type - is <code>CDATA</code>.</para> - </callout> + <figure xml:id="listingJdbcRead"> + <title>Accessing relational data</title> - <callout arearefs="programlisting_img_att_src_default"> - <para>The attribute <tag class="attribute">src</tag> is - mandatory, see <xref linkend="section_attribute_default"/> - .</para> - </callout> - </calloutlist> + <programlisting language="java">package sda.jdbc.intro; +... +public class SimpleRead { - <para>We have to be careful here. The term <code>CDATA</code> - resembles <code>#PCDATA</code> already being introduced for - content models. Actually these two terms are completely distinct - since <code>CDATA</code> refers to attribute values. Consider - the following code snippet:</para> + public static void main(String[] args) throws SQLException { + + // Step 1: Open a connection to the database server + final Connection conn = DriverManager.getConnection ( + DbProps.getString("PersistenceHandler.jdbcUrl"), + DbProps.getString("PersistenceHandler.username"), + DbProps.getString("PersistenceHandler.password")); + + // Step 2: Create a Statement instance + final Statement stmt = conn.createStatement(); + + <emphasis role="bold">// Step 3: Creating the client side JDBC container holding our data records</emphasis> + <emphasis role="bold">final ResultSet data = stmt.executeQuery("SELECT * FROM Friends");</emphasis> <co + linkends="listingJdbcRead-1" xml:id="listingJdbcRead-1-co"/> + + <emphasis role="bold">// Step 4: Dataset iteration + while (data.next()) {</emphasis> <co linkends="listingJdbcRead-2" + xml:id="listingJdbcRead-2-co"/> + <emphasis role="bold">System.out.println(data.getInt("id")</emphasis> <co + linkends="listingJdbcRead-3" xml:id="listingJdbcRead-3-co"/> + <emphasis role="bold">+ ", " + data.getString("nickname")</emphasis> <co + linkends="listingJdbcRead-3" xml:id="listingJdbcRead-4-co"/> + <emphasis role="bold">+ ", " + data.getString("birthdate"));</emphasis> <co + linkends="listingJdbcRead-3" xml:id="listingJdbcRead-5-co"/> + } + } +}</programlisting> + </figure> - <programlisting><para>We may use "quotes" here</para></programlisting> + <para>The marked code segment above shows difference with respect to + our data insertion application + <classname>sda.jdbc.intro.SimpleInsert</classname>. Some remarks are + in order:</para> - <para>This is completely legal since all characters being used - refer to the production rule of <code>#PCDATA</code>. But using - the same as an attribute value instead causes trouble:</para> + <calloutlist> + <callout arearefs="listingJdbcRead-1-co" + xml:id="listingJdbcRead-1"> + <para>As being mentioned in the introduction to this section the + <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + standard comes with its own container interface rather than + <classname>java.util.List</classname> or similar.</para> + </callout> - <programlisting><img src="bold.gif" alt="We may use "quotes" here" /></programlisting> + <callout arearefs="listingJdbcRead-2-co" + xml:id="listingJdbcRead-2"> + <para>Calling <link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#next()">next()</link> + prior to actually accessing data on the client side is + mandatory! The <link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#next()">next()</link> + method places the internal iterator to the first element of our + dataset if not empty. Follow the link address and **read** the + documentation.</para> + </callout> - <para>This is indeed not even well formed XML. The two inner - quotes embedding the substring <code>quotes</code> interfere - with the two outer quotes delimiting the attribute <tag - class="attribute">src</tag>'s value. As we shall see in <xref - linkend="example_quotes"/> there is a solution to this problem - but the current example shows that the production rules of - <code>#PCDATA</code> and <code>CDATA</code> differ.</para> + <callout arearefs="listingJdbcRead-3-co listingJdbcRead-4-co listingJdbcRead-5-co" + xml:id="listingJdbcRead-3"> + <para>The access methods have to be chosen according to matching + types. An overview of database/<trademark + xlink:href="http://www.oracle.com/us/technologies/java">Java</trademark> + type mappings is being given in <uri + xlink:href="http://docs.oracle.com/javase/1.3/docs/guide/jdbc/getstart/mapping.html">http://docs.oracle.com/javase/1.3/docs/guide/jdbc/getstart/mapping.html</uri>.</para> + </callout> + </calloutlist> - <qandaset role="exercise"> - <title>book.dtd and languages></title> + <qandaset role="exercise"> + <title>Getter methods and type conversion</title> - <qandadiv> - <qandaentry xml:id="example_book.dtd_v3"> - <question> - <para>We want to extend our DTD from <xref - linkend="example_book_v2"/> by allowing an author to - define the language used within the document. Add an - attribute declaration to the top level element <tag - class="element">book</tag>.</para> - </question> + <qandadiv> + <qandaentry> + <question> + <para>Apart from type mappings the <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + access methods like <link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getString(int)">getString()</link> + may also be used for type conversion. Modify <xref + linkend="listingJdbcRead"/> by:</para> - <answer> - <para>We simply have to add a single line to our - DTD:</para> + <itemizedlist> + <listitem> + <para>Read the database attribute <code>id</code> by + <link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getString(java.lang.String)">getString(String)</link>.</para> + </listitem> - <programlisting><!ELEMENT book (title, chapter+)> -<emphasis role="bold"><!ATTLIST book lang CDATA #IMPLIED ></emphasis> -...</programlisting> + <listitem> + <para>Read the database attribute nickname by <link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getInt(java.lang.String)">getInt(String)</link>.</para> + </listitem> + </itemizedlist> - <para>This allows us to globally set a language for a - document:</para> + <para>What do you observe?</para> + </question> - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE book SYSTEM "book.dtd"> -<book lang="english"> - <title>Introduction to Java</title> -...</programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> + <answer> + <para>Modifying our iteration loop:</para> - <para>The XML specification defines attribute definitions - belonging to element types as:</para> + <programlisting>// Step 4: Dataset iteration +while (data.next()) { + System.out.println(data.<emphasis role="bold">getString</emphasis>("id") <co + linkends="jdbcReadWrongType-1" + xml:id="jdbcReadWrongType-1-co"/> + + ", " + data.<emphasis role="bold">getInt</emphasis>("nickname") <co + linkends="jdbcReadWrongType-2" + xml:id="jdbcReadWrongType-2-co"/> + + ", " + data.getString("birthdate")); +}</programlisting> - <productionset> - <title>Attribute-list Declaration</title> + <para>We observe:</para> - <production xml:id="w3RecXml_NT-AttlistDecl"> - <lhs>AttlistDecl</lhs> + <calloutlist> + <callout arearefs="jdbcReadWrongType-1-co" + xml:id="jdbcReadWrongType-1"> + <para>Calling <link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getString(int)">getString()</link> + for a database attribute of type INTEGER does not cause + any trouble: The value gets silently converted to a + string value.</para> + </callout> - <rhs>'<!ATTLIST' <nonterminal - def="#w3RecXml_NT-S">S</nonterminal> <nonterminal - def="#w3RecXml_NT-Name">Name</nonterminal> <nonterminal - def="#w3RecXml_NT-AttDef">AttDef</nonterminal>* <nonterminal - def="#w3RecXml_NT-S">S</nonterminal>? '>'</rhs> - </production> + <callout arearefs="jdbcReadWrongType-2-co" + xml:id="jdbcReadWrongType-2"> + <para>Calling <link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getInt(java.lang.String)">getInt(String)</link> + for the database field of type CHAR yields an (expected) + Exception:</para> + </callout> + </calloutlist> - <production xml:id="w3RecXml_NT-AttDef"> - <lhs>AttDef</lhs> + <programlisting>Exception in thread "main" java.sql.SQLException: Invalid value for getInt() - 'Jim' + at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073) +...</programlisting> - <rhs><nonterminal def="#w3RecXml_NT-S">S</nonterminal> - <nonterminal def="#w3RecXml_NT-Name">Name</nonterminal> - <nonterminal def="#w3RecXml_NT-S">S</nonterminal> - <nonterminal - def="#w3RecXml_NT-AttType">AttType</nonterminal> - <nonterminal def="#w3RecXml_NT-S">S</nonterminal> - <nonterminal - def="#w3RecXml_NT-DefaultDecl">DefaultDecl</nonterminal></rhs> - </production> - </productionset> + <para>We may however provide <quote>compatible</quote> data + records:</para> - <para>The first rule tells us that multiple attributes may be - defined for a given element. This is quite <quote>normal</quote> - since the same applies for example when attributes are defined - within <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - or C++ classes. Actually in <link - xlink:href="http://www.w3.org/MarkUp">XHTML</link> the <tag - class="emptytag">img</tag> element's attribute list is defined - as:</para> + <programlisting>DELETE FROM Friends; +INSERT INTO Friends VALUES (1, <emphasis role="bold">'31'</emphasis>, '1991-10-10');</programlisting> - <programlisting><!ATTLIST img - src CDATA #REQUIRED - alt CDATA #REQUIRED - longdesc CDATA #IMPLIED - height CDATA #IMPLIED - width CDATA #IMPLIED - ... ></programlisting> + <para>This time our application executes perfectly + well:</para> - <para>The second production rule tells us that attribute names - like <tag class="attribute">src</tag> must be of <link - linkend="w3RecXml_NT-Name">Name</link> production. For example - <code>4element</code> would be an illegal name since attribute - name strings may contain numbers but not at the beginning. This - is quite common in most programming languages and refers to the - term of a legal identifier.</para> + <programlisting>1, 31, 1991-10-10</programlisting> - <para>The second rule also tells us that <code>CDATA</code> is - only one among other possible attribute types:</para> + <para>Conclusion: The <trademark + xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> + driver performs a conversion from a string type to an + integer similar like the <link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/lang/Integer.html#parseInt(java.lang.String)">parseInt(String)</link> + method.</para> - <productionset> - <title>Attribute Types</title> + <para>The next series of exercises aims on a more powerful + implementation of our person data insertion application in + <xref linkend="exerciseInsertLoginCredentials"/>.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> - <production xml:id="w3RecXml_NT-AttType"> - <lhs>AttType</lhs> + <qandaset role="exercise"> + <title>Handling NULL values.</title> - <rhs><nonterminal - def="#w3RecXml_NT-StringType">StringType</nonterminal> | - <nonterminal - def="#w3RecXml_NT-TokenizedType">TokenizedType</nonterminal> - | <nonterminal - def="#w3RecXml_NT-EnumeratedType">EnumeratedType</nonterminal></rhs> - </production> + <qandadiv> + <qandaentry> + <question> + <para>The attribute <code>birthday</code> in our database + table Friends allows <code>NULL</code> values:</para> - <production xml:id="w3RecXml_NT-StringType"> - <lhs>StringType</lhs> + <programlisting>INSERT INTO Friends VALUES + (1, 'Jim', '1991-10-10') + ,(2, <emphasis role="bold"> NULL</emphasis>, '2003-5-24') + ,(3, 'Mick', '2001-12-30');</programlisting> - <rhs>'CDATA'</rhs> - </production> + <para>Starting our current application yields:</para> - <production xml:id="w3RecXml_NT-TokenizedType"> - <lhs>TokenizedType</lhs> + <programlisting>1, Jim, 1991-10-10 +2, null, 2003-05-24 +3, Mick, 2001-12-30</programlisting> - <rhs>'ID'| 'IDREF'| 'IDREFS'| 'ENTITY'| 'ENTITIES'| - 'NMTOKEN'| 'NMTOKENS'</rhs> - </production> - </productionset> + <para>This might be confuses with a person having the + nickname <quote>null</quote>. Instead we would like to + have:</para> - <para>The discussion of <code>ENTITY</code> types will be - deferred till <xref linkend="chapter_entities"/>. Before - discussing the remaining types we mention a topic common to all - attribute types:</para> + <programlisting>1, Jim, 1991-10-10 +2, -Name unknown- , 2003-05-24 +3, Mick, 2001-12-30</programlisting> - <qandaset role="exercise"> - <title>Enclosing quotes</title> + <para>Extend the current code of + <classname>sda.jdbc.intro.SimpleRead</classname> to produce + the above result in case of nickname <code>NULL</code> + values.</para> - <qandadiv> - <qandaentry xml:id="example_quotes"> - <question> - <para>We recall the problem of nested quotes yielding - non-well formed XML code:</para> + <para>Hint: Read the documentation of <link + xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#wasNull()">wasNull()</link>.</para> + </question> - <programlisting><img src="bold.gif" alt="We may use "quotes" here" /></programlisting> + <answer> + <para>A possible implementation is being given in + <classname>sda.jdbc.intro.v1.SimpleRead</classname>.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> - <para>The XML specification defines legal attribute - value definitions as:</para> + <qandaset role="exercise"> + <title>A user authentication <quote>strategy</quote></title> - <productionset> - <title>Literals</title> + <qandadiv> + <qandaentry xml:id="exerciseInsecureAuth"> + <question> + <para>Our current application for entering + <code>Person</code> records lacks authentication: A user + simply connects to the database using credentials being hard + coded in a properties file. A programmer suggests to + implement authentication based on the following extension of + the <code>Person</code> table:</para> - <production xml:id="w3RecXml_NT-EntityValue"> - <lhs>EntityValue</lhs> + <programlisting>CREATE TABLE Person ( + name char(80) NOT NULL + ,email CHAR(20) NOT NULL UNIQUE + ,login CHAR(10) UNIQUE -- login names must be unique -- + ,password CHAR(20) +);</programlisting> - <rhs>'"' ([^%&"] | <nonterminal - def="#w3RecXml_NT-PEReference">PEReference</nonterminal> - | <nonterminal - def="#w3RecXml_NT-Reference">Reference</nonterminal>)* - '"' | "'" ([^%&'] | <nonterminal - def="#w3RecXml_NT-PEReference">PEReference</nonterminal> - | <nonterminal - def="#w3RecXml_NT-Reference">Reference</nonterminal>)* - "'"</rhs> - </production> + <para>On clicking <quote>Connect</quote> a user may enter + his login name and password, <quote>fred</quote> and + <quote>12345678</quote> in the following example:</para> - <production xml:id="w3RecXml_NT-AttValue"> - <lhs>AttValue</lhs> + <figure xml:id="figLogin"> + <title>Login credentials for database connection</title> - <rhs>'"' ([^<&"] | <nonterminal - def="#w3RecXml_NT-Reference">Reference</nonterminal>)* - '"' | "'" ([^<&'] | <nonterminal - def="#w3RecXml_NT-Reference">Reference</nonterminal>)* - "'"</rhs> - </production> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Screen/login.screen.png"/> + </imageobject> + </mediaobject> + </figure> - <production xml:id="w3RecXml_NT-SystemLiteral"> - <lhs>SystemLiteral</lhs> + <para>Based on these input values the following SQL query is + being executed by a + <classname>java.sql.Statement</classname> object:</para> - <rhs>('"' [^"]* '"') | ("'" [^']* "'")</rhs> - </production> + <programlisting>SELECT * FROM Person WHERE login='<emphasis + role="bold">fred</emphasis>' and password = '<emphasis + role="bold">12345678</emphasis>'</programlisting> - <production xml:id="w3RecXml_NT-PubidLiteral"> - <lhs>PubidLiteral</lhs> + <para>Since the login attribute is UNIQUE we are sure to + receive either 0 or 1 dataset. Our programmer proposes to + grant login if the query returns at least one + dataset.</para> - <rhs>'"' <nonterminal - def="#w3RecXml_NT-PubidChar">PubidChar</nonterminal>* - '"' | "'" (<nonterminal - def="#w3RecXml_NT-PubidChar">PubidChar</nonterminal> - - "'")* "'"</rhs> - </production> + <para>Discuss this implementation sketch with a colleague. + Do you think this is a sensible approach? <emphasis + role="bold">Write down</emphasis> your results.</para> + </question> - <production xml:id="w3RecXml_NT-PubidChar"> - <lhs>PubidChar</lhs> + <answer> + <para>The approach is essentially unusable due to severe + security implications. Since it is based on + <classname>java.sql.Statement</classname> rater than on + <classname>java.sql.PreparedStatement</classname> objects it + is vulnerable to SQL injection attacks. A user my enter the + following password value in the GUI:</para> - <rhs>#x20 | #xD | #xA | [a-zA-Z0-9] - | [-'()+,./:=?;!*#@$_%]</rhs> - </production> - </productionset> + <programlisting>sd' OR '1' = '1</programlisting> - <para>Find out how it is possible to set the attribute - <tag class="attribute">alt</tag>'s value to the string - <code>We may use "quotes" here</code>.</para> - </question> + <para>Based on the login name <quote>fred</quote> the + following SQL string is being crafted:</para> - <answer> - <para>The production rule for attribute values - reads:</para> + <programlisting>SELECT * FROM Person WHERE login='fred' and password = 'sd' OR <emphasis + role="bold">'1' = '1'</emphasis>;</programlisting> - <productionset> - <productionrecap linkend="w3RecXml_NT-AttValue"/> - </productionset> + <para>Since the WHERE clause's last component always + evaluates to true, all objects from the <code>Person</code> + relation are returned thus permitting login.</para> - <para>This allows us to use either of two alternatives - to delimit attribute values:</para> + <para>The implementation approach suffers from a second + deficiency: The passwords are stored in clear text. If an + attacker gains access to the <code>Person</code> table he'll + immediately retrieve the passwords of all users. This + problem can be solved by storing hash values of passwords + rather than the clear text values themselves.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> - <glosslist> - <glossentry> - <glossterm><tag class="starttag">img ... - alt="..."/</tag></glossterm> + <qandaset role="exercise" xml:id="passwordHashes"> + <title>Passwords and hash values</title> - <glossdef> - <para><emphasis>Validity constraint:</emphasis> do - not use <code>"</code> inside the value - string.</para> - </glossdef> - </glossentry> + <qandadiv> + <qandaentry xml:id="exerciseHashTraining"> + <question> + <para>In exercise <xref linkend="exerciseInsecureAuth"/> we + discarded the idea of clear text passwords in favour of + password hashes. In order to avoid Rainbow cracking so + called salted hashes are superior. You should read <uri + xlink:href="https://www.heckrothindustries.co.uk/articles/an-introduction-to-password-hashes">https://www.heckrothindustries.co.uk/articles/an-introduction-to-password-hashes</uri> + for overview purposes. The article contains further + references on the bottom of the page.</para> - <glossentry> - <glossterm><tag class="starttag">img ... - alt='...'/</tag></glossterm> + <para>With respect to an implementation <uri + xlink:href="http://stackoverflow.com/questions/2860943/suggestions-for-library-to-hash-passwords-in-java#11038230">http://stackoverflow.com/questions/2860943/suggestions-for-library-to-hash-passwords-in-java</uri> + provides a simple example for:</para> - <glossdef> - <para><emphasis>Validity constraint:</emphasis> do - not use <code>'</code> inside the value - string.</para> - </glossdef> - </glossentry> - </glosslist> + <itemizedlist> + <listitem> + <para>Creating a salted hash from a given password + string.</para> + </listitem> - <para>We may take advantage of the second rule:</para> + <listitem> + <para>Verify if a hash string matches a given clear text + password.</para> + </listitem> + </itemizedlist> - <programlisting><img src="bold.gif" alt='We may use "quotes" here' /></programlisting> + <para>The example uses an external library. On <productname + xlink:href="http://www.ubuntu.com">Ubuntu</productname> + Linux this may be installed by issuing + <command>aptitude</command> <option>install</option> + <option>libcommons-codec-java</option>. On successful + install the file + <filename>/usr/share/java/commons-codec-1.5.jar</filename> + may be appended to your <envar>CLASSPATH</envar>.</para> - <para>Notice that according to <xref - linkend="w3RecXml_NT-AttValue"/> the delimiting quotes - must not be mixed. The following code is thus not well - formed:</para> + <para>You may as well use <uri + xlink:href="http://crackstation.net/hashing-security.htm#javasourcecode">http://crackstation.net/hashing-security.htm#javasourcecode</uri> + as a starting point. This example works standalone without + needing an external library. Note: Tis example produces + different (incompatible) hash values.</para> - <programlisting><img src="bold.gif'/></programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="section_nmtoken"> - <title><code>NMTOKEN</code> /<code>NMTOKENS</code></title> - - <para>Name tokens are essentially strings composed of a - restricted character set. A name token must for example not - contain any white space. We already mentioned its production - rule:</para> - - <productionset> - <productionrecap linkend="w3RecXml_NT-Nmtoken"/> - </productionset> - - <para>This may be used to restrict attribute values. We consider - a configuration file containing a list of user accounts:</para> - - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE userlist [ -<!ELEMENT userlist (account*)> -<!ELEMENT account EMPTY> -<!ATTLIST account - username NMTOKEN #REQUIRED - password CDATA #IMPLIED - > -]> -<userlist> - <account username="Joe"/> - <account username="Mr. Bean"/> - <!-- Whoops, an illegal space!--> -</userlist></programlisting> - - <para>We extend the above example by allowing each user to - belong to a <emphasis>set</emphasis> of groups. We achieve this - by adding an attribute <tag class="attribute">groups</tag> of - type <code>NMTOKENS</code>:</para> + <para>Create a simple main() method to experiment with the + two class methods.</para> + </question> -<<<<<<< HEAD <answer> <para>Starting from <uri xlink:href="http://stackoverflow.com/questions/2860943/suggestions-for-library-to-hash-passwords-in-java#11038230">http://stackoverflow.com/questions/2860943/suggestions-for-library-to-hash-passwords-in-java</uri> @@ -3505,55 +3792,9 @@ public class HashProvider { class. Notice the <quote>$</quote> sign <coref linkend="saltPwhashSeparator"/> separating salt and password hash:</para> -======= - <programlisting>... -<!ATTLIST account - username NMTOKEN #REQUIRED - groups NMTOKENS #IMPLIED - password CDATA #IMPLIED - > -]> -<userlist> - <account username="Joe" groups="admin staff team"/> -</userlist></programlisting> - - <para>This defines a user <code>Joe</code> belonging to the - three groups <code>admin</code>, <code>staff</code> and - <code>team</code>. Informally we see a list of tokens separated - by spaces. This is indeed the formal W3C specification:</para> - - <productionset> - <productionrecap linkend="w3RecXml_NT-Nmtokens"/> - </productionset> - - <para>According to this rule only single spaces (#20) are legal. - Actual parser implementations seem to accept more general - whitespace here. Thus a sequence of spaces, tabs, carriage - returns and newlines is also accepted as a separator - value.</para> - </section> - <section xml:id="section_name_token_group"> - <title>Enumeration values</title> - - <para>The XML standard allows us to define enumerations by - restricting an attribute value to a predefined set of name - tokens:</para> - - <productionset> - <title>Enumerated Attribute Types</title> - - <production xml:id="w3RecXml_NT-EnumeratedType"> - <lhs>EnumeratedType</lhs> ->>>>>>> d6323b6ef90b1907e7f809b7765f28762f5ea52d - - <rhs><nonterminal - def="#w3RecXml_NT-NotationType">NotationType</nonterminal> | - <nonterminal - def="#w3RecXml_NT-Enumeration">Enumeration</nonterminal></rhs> - </production> + <programlisting language="java">package sda.jdbc.intro.auth; -<<<<<<< HEAD public class TestHashProvider { public static void main(String [] args) throws Exception { @@ -3574,125 +3815,89 @@ public class TestHashProvider { <qandaset role="exercise" xml:id="exercise_GuiEnterPersonAuth"> <title>Gui authentication: The real McCoy</title> -======= - <production xml:id="w3RecXml_NT-NotationType"> - <lhs>NotationType</lhs> - - <rhs>'NOTATION' <nonterminal - def="#w3RecXml_NT-S">S</nonterminal> '(' <nonterminal - def="#w3RecXml_NT-S">S</nonterminal>? <nonterminal - def="#w3RecXml_NT-Name">Name</nonterminal> (<nonterminal - def="#w3RecXml_NT-S">S</nonterminal>? '|' <nonterminal - def="#w3RecXml_NT-S">S</nonterminal>? <nonterminal - def="#w3RecXml_NT-Name">Name</nonterminal>)* <nonterminal - def="#w3RecXml_NT-S">S</nonterminal>? ')'</rhs> - </production> - - <production xml:id="w3RecXml_NT-Enumeration"> - <lhs>Enumeration</lhs> - - <rhs>'(' <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? - <nonterminal - def="#w3RecXml_NT-Nmtoken">Nmtoken</nonterminal> - (<nonterminal def="#w3RecXml_NT-S">S</nonterminal>? '|' - <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? - <nonterminal - def="#w3RecXml_NT-Nmtoken">Nmtoken</nonterminal>)* - <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? ')'</rhs> - </production> - </productionset> ->>>>>>> d6323b6ef90b1907e7f809b7765f28762f5ea52d - <para>We start with an example of a <emphasis>Name Token - Group</emphasis> aka enumeration:</para> + <qandadiv> + <qandaentry xml:id="exerciseInsertLoginCredentials"> + <question> + <para>We now implement a refined version to enter + <code>Person</code> records based on the solutions of two + related exercises:</para> - <figure xml:id="figure_nametokengroup"> - <title>A name token group</title> + <glosslist> + <glossentry> + <glossterm><xref + linkend="exercisefilterUserInput"/></glossterm> - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE top [ -<!ELEMENT top (chemical*)> -<!ELEMENT chemical (#PCDATA)> -<!ATTLIST chemical state (solid|liquid|gas) <co - xml:id="figure_nametokengroup_att_state"/> #REQUIRED <co - xml:id="figure_nametokengroup_att_state_required"/>> -]> -<top> - <chemical state="gas" <co xml:id="figure_nametokengroup_oxygen_state"/>>Oxygen</chemical> - <chemical state="liquid" <co xml:id="figure_nametokengroup_water_state"/>>Water</chemical> - - <chemical state="superfluous" <co - xml:id="figure_nametokengroup_helium_state"/>>Helium</chemical> - <!-- Ooops! --> -</top></programlisting> + <glossdef> + <para>Avoiding SQL injection by sanitizing user + input</para> + </glossdef> + </glossentry> - <calloutlist> - <callout arearefs="figure_nametokengroup_att_state"> - <para>The attribute <tag class="attribute">state</tag>'s - value may have values from the set {solid, liquid, - gas}.</para> - </callout> + <glossentry> + <glossterm><xref + linkend="exerciseSqlInjectPrepare"/></glossterm> - <callout arearefs="figure_nametokengroup_att_state_required"> - <para><tag class="attribute">state</tag> is - mandatory.</para> - </callout> + <glossdef> + <para>Avoiding SQL injection by using + <classname>java.sql.PreparedStatement</classname> + objects.</para> + </glossdef> + </glossentry> + </glosslist> - <callout arearefs="figure_nametokengroup_oxygen_state"> - <para>A legal value.</para> - </callout> + <para>A better solution should combine both techniques. + Non-vulnerability a basic requirement. Checking an E-Mail + for minimal conformance is an added value.</para> - <callout arearefs="figure_nametokengroup_water_state"> - <para>Another legal value.</para> - </callout> + <para>In order to address authentication the relation Person + has to be extended appropriately. The GUI needs two + additional fields for login name and password as well. The + following video demonstrates the intended behaviour:</para> - <callout arearefs="figure_nametokengroup_helium_state"> - <para>The token value <tag - class="attvalue">superfluous</tag> does not belong to the - set of allowed values. The parser flags this error - as:</para> + <figure xml:id="videoConnectAuth"> + <title>Intended usage behaviour for insertion of data + records.</title> - <para><code>Attribute "state" with value "superfluous" - must have a value from the list "solid liquid gas - ".</code></para> - </callout> - </calloutlist> - </figure> + <mediaobject> + <videoobject> + <videodata fileref="Ref/Video/connectauth.mp4"/> + </videoobject> + </mediaobject> + </figure> - <para>The rule defining an <link - linkend="w3RecXml_NT-Enumeration">Enumeration</link> has to be - supplemented by a validity constraint: The set of legal token - values must not contain duplicates. This would violate the - attributes property allowing values to be chosen from a - <emphasis>set</emphasis>.</para> + <para>Don't forget to use password hashes like those from + <xref linkend="exerciseHashTraining"/>. Due to their length + you may want to consider the data type + <code>TEXT</code>.</para> + </question> - <qandaset role="exercise"> - <title>Restriction of allowed languages</title> + <answer> + <para>In comparison to earlier versions it does make sense + to add some internal container structures. First we note, + that each GUI input field requires:</para> - <qandadiv> - <qandaentry xml:id="example_book.dtd_v4"> - <question> - <para xml:lang="">We extend our book.dtd version from - <xref linkend="example_book.dtd_v3"/>. The attribute - <tag class="attribute">lang</tag> is simple free text. - We want to restrict this to allow only values from the - set {en,fr,de,it,es}.</para> - </question> + <itemizedlist> + <listitem> + <para>A label like <quote>Enter password</quote>.</para> + </listitem> - <answer> - <para>We restrict our attribute definition from type - <code>CDATA</code> to a name token group:</para> + <listitem> + <para>A corresponding field object to hold user entered + input.</para> + </listitem> - <programlisting><!ATTLIST book lang (en|fr|de|it|es) #IMPLIED ></programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> + <listitem> + <para>A validator checking for correctness of entered + data.</para> + </listitem> - <para>The notation type branch production rule's usage is quite - similar:</para> + <listitem> + <para>A label or text field for warning messages in case + of invalid user input.</para> + </listitem> + </itemizedlist> -<<<<<<< HEAD <para>First we start by grouping label <coref linkend="uiuLabel"/>, input field's verifier <coref linkend="uiuVerifier"/> and the error message label <coref @@ -3716,30 +3921,11 @@ public class UserInputUnit { <para>The actual GUI text field is being defined <coref linkend="verfierGuiField"/> in class <classname>sda.jdbc.intro.auth.InputVerifierNotify</classname>:</para> -======= - <figure xml:id="attributeNotation"> - <title>A notation attribute</title> - <programlisting><!DOCTYPE doc [ + <programlisting language="java">package sda.jdbc.intro.auth; +... +public abstract class InputVerifierNotify extends InputVerifier { -<!NOTATION <emphasis role="bold">cpp</emphasis> SYSTEM "The ANSI C++ programming language"> -<!NOTATION <emphasis role="bold">perl</emphasis> SYSTEM "The PERL script programming language"> -<!NOTATION <emphasis role="bold">sql</emphasis> SYSTEM "SQL 92 database query language"> ->>>>>>> d6323b6ef90b1907e7f809b7765f28762f5ea52d - -<!ELEMENT doc (code)*> -<!ELEMENT code (#PCDATA)> -<!ATTLIST code - language NOTATION (<emphasis role="bold">cpp</emphasis>|<emphasis - role="bold">perl</emphasis>|<emphasis role="bold">sql</emphasis>) #REQUIRED > -]> -<doc> - <code language="<emphasis role="bold">cpp</emphasis>">delete[] namelist;</code> - <code language="<emphasis role="bold">sql</emphasis>">SELECT * FROM User;</code> -</doc></programlisting> - </figure> - -<<<<<<< HEAD protected final String errorMessage; public final JLabel validationLabel; public final JTextField field; <co xml:id="verfierGuiField"/> @@ -3748,60 +3934,33 @@ public class UserInputUnit { <para>We need two field verifier classes being derived from <classname>sda.jdbc.intro.auth.InputVerifierNotify</classname>:</para> -======= - <para>The only difference in comparison to a Name Token Group is - the keyword <code>NOTATION</code>. There are however additional - validity constraints imposed by the XML specification.</para> - - <para>In the given example the content of <tag - class="starttag">para</tag> nodes was declared as - <code>#PCDATA</code>. Actually all types of element content - except <code>EMPTY</code> may appear.</para> ->>>>>>> d6323b6ef90b1907e7f809b7765f28762f5ea52d - <itemizedlist> - <listitem> - <para>Values of type <code>NOTATION</code> - <emphasis>must</emphasis> match one of the notation names - included in the declaration. In the given example this would - be either <tag class="attvalue">cpp</tag>, <tag - class="attvalue">perl</tag> or <tag - class="attvalue">sql</tag>. All notation names in the - declaration <emphasis>must</emphasis> be declared.</para> - </listitem> + <glosslist> + <glossentry> + <glossterm><classname>sda.jdbc.intro.auth.RegexpVerifier</classname></glossterm> - <listitem> - <para>An element type <emphasis>must not</emphasis> have - more than one <code>NOTATION</code> attribute specified. - Actually a <code>NOTATION</code> attribute value gives us a - <quote>promise</quote> about the expected content of the - element node in which it appears. So if the content of a - <tag class="starttag">para</tag> node is SQL code it cannot - in addition be declared to be of language category type - <emphasis>declarative</emphasis>.</para> - </listitem> + <glossdef> + <para>This one is well known from earlier versions and + is used to validate text input fields by regular + expressions.</para> + </glossdef> + </glossentry> - <listitem> - <para>For compatibility to SGML an attribute of type - <code>NOTATION</code> <emphasis>must not</emphasis> be - declared on an element declared <link - linkend="section_empty">EMPTY</link>.</para> - </listitem> - </itemizedlist> - </section> + <glossentry> + <glossterm><classname>sda.jdbc.intro.auth.InputVerifierNotify</classname></glossterm> - <section xml:id="section_id_idref"> - <title><code>ID</code> and <code>IDREF / IDREFS</code></title> + <glossdef> + <para>This verifier class is responsible for comparing + our two password fields to have identical + values.</para> + </glossdef> + </glossentry> + </glosslist> - <para>The pair of attribute types <code>ID</code> and - <code>IDREF</code> defines internal references within a given - XML document instance. Before considering XML we recall the way - document internal references are implemented in HTML. A - reference originates from a <emphasis>source</emphasis> and - leads to a <emphasis>target</emphasis>, in HTML the latter is - frequently called an <emphasis>anchor</emphasis>:</para> + <para>All these components get assembled in + <classname>sda.jdbc.intro.auth.InsertPerson</classname>. We + remark some important points:</para> -<<<<<<< HEAD <programlisting>package sda.jdbc.intro.auth; ... public class InsertPerson extends JFrame { @@ -3877,80 +4036,60 @@ public class InsertPerson extends JFrame { return true; } }</programlisting> -======= - <figure xml:id="figure_reference_html"> - <title>An internal reference within a HTML document</title> - - <programlisting><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> -<html> - <head><title>Reference example</title></head> - <body> - <h1>Reference example</h1> - <p><a name="foo" <co xml:id="figure_reference_html_anchor"/>></a>This is the target.</p> - - <p>There may be lots of text in between ...</p> - <p>There may be lots of text in between ...</p> - - <h1>This is a different section</h1> - <p>Click <a href="#foo" <co xml:id="figure_reference_html_link1"/>>here</a> to see the target.</p> - - <h1>This is a third section</h1> - <p>Again <a href="#foo" <co xml:id="figure_reference_html_link2"/>>clicking</a> yields the same target.</p> - </body> -</html></programlisting> - </figure> - - <calloutlist> - <callout arearefs="figure_reference_html_anchor"> - <para>Each <tag class="starttag">a name="foo"</tag> tag with - the given value must appear only once. Thus it is an error - if a second tag <tag class="starttag">a name="foo"</tag> - appears within the same HTML file since the value <tag - class="attvalue">foo</tag> would not be unique.</para> - </callout> - <callout arearefs="figure_reference_html_link1"> - <para>The <quote>#</quote> is a shorthand for a document - local reference. A full HTML reference looks like - <code>http://someserver.org/docs/intro.html#foo</code> - defining a reference to the position indicated by <tag - class="starttag"><a name="foo"></tag> within the - document with path <code>/docs/intro.html</code> on the - server <code>someserver.org</code> accessed by the <link - xlink:href="http://www.w3.org/Protocols">HTTP</link> - protocol . Thus <quote><code>#foo</code></quote> points to - the local target defined by <tag class="starttag">a - name="foo"</tag> in the document itself.</para> - </callout> ->>>>>>> d6323b6ef90b1907e7f809b7765f28762f5ea52d - - <callout arearefs="figure_reference_html_link2"> - <para>A second link to the same destination.</para> - </callout> - </calloutlist> + <calloutlist> + <callout arearefs="listingInsertUserAuth-1-co" + xml:id="listingInsertUserAuth-1"> + <para>All GUI related stuff for entering a user's + name</para> + </callout> - <para>In a database context we would call <tag - class="starttag"><a name="foo"></tag> a <emphasis>primary - key value</emphasis>. The element node <tag class="starttag">a - href="#foo"</tag> would be considered a <emphasis>foreign - key</emphasis> reference which may appear multiple times - pointing to the same target.</para> + <callout arearefs="listingInsertUserAuth-2-co" + xml:id="listingInsertUserAuth-2"> + <para>Password fields need special treatment: + <code>getText()</code> is superseded by + <code>getPassword()</code>. In order to avoid casts from + <classname>javax.swing.JTextField</classname> to + <classname>javax.swing.JPasswordField</classname> we + simply keep an extra reference.</para> + </callout> - <para>In HTML a node may at the same time be itself a reference - target and define a reference to another target:</para> + <callout arearefs="listingInsertUserAuth-3-co" + xml:id="listingInsertUserAuth-3"> + <para>In order to check both password fields for + identical values we need a different validator + <classname>sda.jdbc.intro.auth.EqualValueVerifier</classname> + expecting both password fields in its + constructor.</para> + </callout> - <programlisting><a name="thisTarget" href="linkToOtherTarget">click on me!</a></programlisting> + <callout arearefs="listingInsertUserAuth-4-co" + xml:id="listingInsertUserAuth-4"> + <para>All 5 user input elements get grouped by an array. + This allows for iterations like in <coref + linkend="listingInsertUserAuth-7-co"/> or <coref + linkend="listingInsertUserAuth-8-co"/>.</para> + </callout> - <para>The XML standard adopts a different way to implement - document internal references. We give an example:</para> + <callout arearefs="listingInsertUserAuth-5-co" + xml:id="listingInsertUserAuth-5"> + <para>Adding all GUI elements to the base pane in a + loop.</para> + </callout> - <figure xml:id="figure_intern_reference_xml"> - <title>Internal references in XML</title> + <callout arearefs="listingInsertUserAuth-6-co" + xml:id="listingInsertUserAuth-6"> + <para>Providing user entered values to the persistence + provider.</para> + </callout> - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE catalog [ + <callout arearefs="listingInsertUserAuth-7-co" + xml:id="listingInsertUserAuth-7"> + <para>Whenever a dataset has been successfully sent to + the database we have to clean our GUI to possibly enter + another record.</para> + </callout> -<<<<<<< HEAD <callout arearefs="listingInsertUserAuth-8-co" xml:id="listingInsertUserAuth-8"> <para>Thanks to our grouping aggregation of individual @@ -4015,262 +4154,325 @@ PersistenceHandler.password=<emphasis role="bold">XYZ</emphasis> <co </section> </section> </chapter> -======= -<!ELEMENT catalog (product*) <co - xml:id="figure_intern_reference_xml_catalog"/> > -<!ELEMENT product (title, para*) <co - xml:id="figure_intern_reference_xml_product"/>> -<!ELEMENT title (#PCDATA)> -<!ELEMENT para (#PCDATA|link)* <co - xml:id="figure_intern_reference_xml_para"/> > -<!ELEMENT link (#PCDATA)> ->>>>>>> d6323b6ef90b1907e7f809b7765f28762f5ea52d -<!ATTLIST product id ID <co - xml:id="figure_intern_reference_xml_att_product_id"/> #IMPLIED> -<!ATTLIST link ref IDREF <co - xml:id="figure_intern_reference_xml_att_link_ref"/> #REQUIRED> -]> -<catalog> - <product id="homeTrainer" <co - xml:id="figure_intern_reference_xml_define_id_hometrainer"/> > - <title>Home trainer</title> - <para>Like to torture yourself in front of your TV?</para> - </product> - <product <co xml:id="figure_intern_reference_xml_product_no_id"/>> - <title>Mountain bike</title> - <para>If you hate rain look <link ref="homeTrainer" <co - xml:id="figure_intern_reference_xml_define_ref1_hometrainer"/> >here</link>.</para> - </product> -</catalog></programlisting> - </figure> + <chapter xml:id="chapUnitTesting"> + <title>Unit testing with <productname + xlink:href="http://testng.org">TestNG</productname></title> - <calloutlist> - <callout arearefs="figure_intern_reference_xml_catalog"> - <para>Start of the DTD. A catalog consists of - products.</para> - </callout> + <para>This chapter presents a very short introduction to the basic usage + of unit testing. We start with a simple stack implementation:</para> - <callout arearefs="figure_intern_reference_xml_product"> - <para>A product has a title and optional paragraphs to - describe it in detail.</para> - </callout> + <programlisting language="java">package sda.unittesting; - <callout arearefs="figure_intern_reference_xml_para"> - <para>A paragraph allows mixed content of text and - references to other parts of the document.</para> - </callout> +public class MyStack { + int [] data = new int[5]; + int numElements = 0; + + public void push(final int n) { + data[numElements] = n; + numElements++; + } + public int pop() { + numElements--; + return data[numElements]; + } + public int top() { + return data[numElements - 1]; + } + public boolean empty() { + return 0 == numElements; + } +}</programlisting> - <callout arearefs="figure_intern_reference_xml_att_product_id"> - <para>A <tag class="starttag">product</tag> node may have an - attribute <tag class="attribute">id</tag> with an unique - value within the document instance.</para> - </callout> - - <callout arearefs="figure_intern_reference_xml_att_link_ref"> - <para>A <tag class="starttag">link</tag> - <emphasis>must</emphasis> have an attribute <tag - class="attribute">ref</tag> with a value referring to an - element with a corresponding attribute value of type - <code>ID</code>.</para> - </callout> + <para>Readers being familiar with stacks will immediately notice a + deficiency in the above code: This stack is actually bounded. It only + allows us to store a maximum number of five integer values.</para> - <callout arearefs="figure_intern_reference_xml_define_id_hometrainer"> - <para>A product with unique <code>id</code> value - <code>homeTrainer</code>.</para> - </callout> + <para>The following implementation allows us to functionally test our + <classname>sda.unittesting.MyStack</classname> implementation with + respect to the usual stack behaviour:</para> - <callout arearefs="figure_intern_reference_xml_product_no_id"> - <para>A product without <code>id</code> value. Thus it may - not be referenced.</para> - </callout> + <programlisting language="java" linenumbering="numbered">package sda.unittesting; - <callout arearefs="figure_intern_reference_xml_define_ref1_hometrainer"> - <para>A reference to <emphasis>the</emphasis> element node - with a defined attribute of type <code>ID</code> and value - <code>homeTrainer</code>.</para> - </callout> - </calloutlist> +public class MyStackFuncTest { - <para>From this example we will now present the syntax and - validity constraints supplied by the XML specification:</para> + private static void assertTrue(boolean status) { + if (!status) { + throw new RuntimeException("Assert failed"); + } + } + public static void main(String[] args) { + final MyStack stack = new MyStack(); + // Test 1: A new MyStack instance should not contain any elements. + assertTrue(stack.empty()); - <glosslist> - <glossentry> - <glossterm><code>ID</code></glossterm> + // Test 2: Adding and removal + stack.push(4); + assertTrue (!stack.empty()); + assertTrue (4 == stack.top()); + assertTrue (4 == stack.pop()); + assertTrue (stack.empty()); - <glossdef> - <para><itemizedlist> - <listitem> - <para>Values of type <code>ID</code> - <emphasis>must</emphasis> match the <link - linkend="w3RecXml_NT-Name">Name</link> production. A - name <emphasis>must not</emphasis> appear more than - once in an XML document as a value of this type; - i.e., <code>ID</code> values - <emphasis>must</emphasis> uniquely identify the - elements which bear them. In a database context this - would be considered a <emphasis>primary key - constraint</emphasis>.</para> - </listitem> + // Test 3: Trying to add more than five values + stack.push(1);stack.push(2);stack.push(3);stack.push(4); + stack.push(5); + stack.push(6); + assertTrue(6 == stack.pop()); + } +}</programlisting> - <listitem> - <para>An element type <emphasis>must not</emphasis> - have more than one <code>ID</code> attribute - specified.</para> - </listitem> + <para>Execution yields a runtime exception which is due to the attempted + insert operation <code>stack.push(6)</code>:</para> - <listitem> - <para>An <code>ID</code> attribute - <emphasis>must</emphasis> have a declared default of - <code>#IMPLIED</code> or - <code>#REQUIRED</code>.</para> - </listitem> - </itemizedlist></para> - </glossdef> - </glossentry> + <programlisting>Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 5 + at sda.unittesting.MyStack.push(MyStack.java:8) + at sda.unittesting.MyStackFuncTest.main(MyStackFuncTest.java:20)</programlisting> - <glossentry> - <glossterm><code>IDREF</code></glossterm> + <para>The execution result is easy to understand since our + <classname>sda.unittesting.MyStack </classname> implementation only + allows to store 5 values.</para> - <glossdef> - <para>Values of type <code>IDREF</code> MUST match the - <link linkend="w3RecXml_NT-Name">Name</link> production. - Each Name <emphasis>must</emphasis> match the value of an - <code>ID</code> attribute on some element in the XML - document; i.e. <code>IDREF</code> values - <emphasis>must</emphasis> match the value of some - <code>ID</code> attribute. In a database context this - would be considered a <emphasis>foreign key - constraint</emphasis>.</para> - </glossdef> - </glossentry> + <para>Our testing application is fine so far. It does however lack some + features:</para> - <glossentry> - <glossterm><code>IDREFS</code></glossterm> + <itemizedlist> + <listitem> + <para>automatic initialization before starting tests and + finalization at the end.</para> + </listitem> - <glossdef> - <para>Values of type <code>IDREFS</code> are sets of - <code>IDREF</code> values separated by spaces:</para> + <listitem> + <para>Our test is monolithic: We used comments to document different + tests. This knowledge is implicit and thus invisible to testing + frameworks. Test results (failure/success) cannot be assigned to + test 1, test 2 for example.</para> + </listitem> - <programlisting><!DOCTYPE gamelist [ -<!ELEMENT gamelist (game+, gameCategory+)> -<!ELEMENT game (#PCDATA)> -<!ATTLIST game id ID #REQUIRED> + <listitem> + <para>Aggregation and visualization of test results</para> + </listitem> -<!ELEMENT gameCategory (#PCDATA)> -<!ATTLIST gameCategory games IDREFS #REQUIRED> -]> -<gamelist> - <game id='chess'>Chess</game> - <game id='poker'>Poker</game> - <game id='bj'>Black Jack</game> - - <gameCategory games="poker bj">Card games</gameCategory> -</gamelist></programlisting> + <listitem> + <para>Dependencies between individual tests</para> + </listitem> - <para>The restriction to the term <emphasis - role="bold">set</emphasis> disallowing duplicates is - important. The following snippet containing two identical - references would be flagged as an error:</para> + <listitem> + <para>Ability to enable and disable tests according to a project's + maturity level. In our example test 3 might be disabled till an + unbounded implementation gets completed.</para> + </listitem> + </itemizedlist> - <programlisting>... -<gameCategory games="poker bj poker">Card games</gameCategory> -...</programlisting> - </glossdef> - </glossentry> - </glosslist> + <para>Testing frameworks like <productname + xlink:href="http://junit.org">Junit</productname> or <productname + xlink:href="http://testng.org">TestNG</productname> provide means for + efficient and flexible test organization. Using <productname + xlink:href="http://testng.org">TestNG</productname> our current test + application including only test 1 and test 2 reads:</para> - <qandaset role="exercise"> - <title>Legal attribute values</title> + <programlisting language="java">package sda.unittesting; - <qandadiv> - <qandaentry xml:id="example_legal_attribute_values"> - <question> - <para>Complete the following matrix. Enter a - <quote>+</quote> if the attribute value satisfies the - constraint being imposed by the attribute type and a - <quote>-</quote> otherwise.</para> +import org.testng.annotations.Test; - <informaltable xml:id="table_legal_attribute_matrix"> - <?dbhtml table-width="40%" ?> +public class MyStackTestSimple { - <?dbfo table-width="40%" ?> + final MyStack stack = new MyStack(); + + @Test + public void empty() { + assert(stack.empty()); + } + @Test + public void pushPopEmpty() { + assert (stack.empty()); + stack.push(4); + assert (!stack.empty()); + assert (4 == stack.top()); + assert (4 == stack.pop()); + assert (stack.empty()); + } +}</programlisting> - <tgroup cols="4"> - <colspec colwidth="2*"/> + <para>We notice the absence of a <function>main()</function> method. Our + testing framework uses the above code for test definitions. In contrast + to our homebrew solution the individual tests are now defined in a + machine readable fashion. This allows for sophisticated statistics. + Executing inside <productname + xlink:href="http://testng.org">TestNG</productname> produces the + following results:</para> - <colspec colwidth="2*"/> + <programlisting>PASSED: empty +PASSED: pushPopEmpty - <colspec colwidth="2*"/> +=============================================== + Default test + Tests run: 2, Failures: 0, Skips: 0 +=============================================== - <colspec colwidth="2*"/> - <tbody> - <row> - <entry/> +=============================================== +Default suite +Total tests run: 2, Failures: 0, Skips: 0 +===============================================</programlisting> - <entry><code>CDATA</code></entry> + <para>Both tests run successfully. So why did we omit test 3 which is + bound to fail? We now add it to the test suite:</para> - <entry><code>NMTOKEN</code></entry> + <programlisting language="java">package sda.unittesting; +... +public class MyStackTestSimple1 { +... + @Test + public void empty() { + assert(stack.empty()); +... + + @Test + public void push6() { + stack.push(1); + stack.push(2); + stack.push(3); + stack.push(4); + stack.push(5); + stack.push(6); + assert (6 == stack.pop()); + } ...</programlisting> - <entry><code>ID</code></entry> - </row> + <para>As expected test 3 fails. But the result shows test 2 failing as + well:</para> - <row> - <entry><code>_foo</code></entry> + <programlisting>PASSED: empty +FAILED: push6 +java.lang.ArrayIndexOutOfBoundsException: 5 + at sda.unittesting.MyStack.push(MyStack.java:8) + at sda.unittesting.MyStackTestSimple1.push6(MyStackTestSimple1.java:30) + at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) + ... - <entry/> +FAILED: pushPopEmpty +java.lang.AssertionError + at sda.unittesting.MyStackTestSimple1.pushPopEmpty(MyStackTestSimple1.java:15) + at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) + ... - <entry/> +=============================================== + Default test + Tests run: 3, Failures: 2, Skips: 0 +===============================================</programlisting> - <entry/> - </row> + <para>This unexpected result is due to the execution order of the three + individual tests. Within our class + <classname>sda.unittesting.MyStackTestSimple1</classname> the three + tests appear in the sequence test 1, test 2 and test 3. This however is + just the order of source code. The testing framework will not infer any + order and thus execute our three tests in <emphasis + role="bold">arbitrary</emphasis> order. The execution log shows the + actual order:</para> - <row> - <entry><code>too small</code></entry> + <orderedlist> + <listitem> + <para>Test <quote><code>empty</code></quote></para> + </listitem> - <entry/> + <listitem> + <para>Test <quote><code>push6</code></quote></para> + </listitem> - <entry/> + <listitem> + <para>Test <quote><code>pushPopEmpty</code></quote></para> + </listitem> + </orderedlist> - <entry/> - </row> + <para>So the second test will raise an exception and leave the stack + filled with the maximum possible five elements. Thus it is not empty and + the <quote><code>pushPopEmpty</code></quote> test fails as well.</para> - <row> - <entry><code>2three4</code></entry> + <para>If we want to avoid this type of errors we may:</para> - <entry/> + <itemizedlist> + <listitem> + <para>Declare tests within separate (test class) definitions</para> + </listitem> - <entry/> + <listitem> + <para>Define dependencies like test X can only be executed after + test Y.</para> + </listitem> + </itemizedlist> - <entry/> - </row> + <para>The <productname + xlink:href="http://testng.org">TestNG</productname> framework offers a + feature which allows the definition of test groups and dependencies + between them. We use this feature to refine our test definition:</para> - <row> - <entry><code>-man</code></entry> + <programlisting language="java">package sda.unittesting; +... +public class MyStackTest { + ... + @Test (<emphasis role="bold">groups = "basic"</emphasis>) + public void empty() { + assert(stack.empty()); + } + @Test (<emphasis role="bold">groups = "basic"</emphasis>) + public void pushPopEmpty() { + ... + } + + @Test (<emphasis role="bold">dependsOnGroups = "basic"</emphasis>) + public void push6() { + ... + }</programlisting> - <entry/> + <para>The first two tests will now belong to the same test group + <quote>basic</quote>. The <emphasis role="bold"><code>dependsOnGroups = + "basic"</code></emphasis> declaration will guarantee that our + <code>push6</code> test will be launched as the last one. So we get the + expected result:</para> - <entry/> + <programlisting>PASSED: empty +PASSED: pushPopEmpty +FAILED: push6 +java.lang.ArrayIndexOutOfBoundsException: 5 + at sda.unittesting.MyStack.push(MyStack.java:8) + at sda.unittesting.MyStackTest.push6(MyStackTest.java:30) + at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) +... - <entry/> - </row> - <row> - <entry><code>two3four</code></entry> +=============================================== + Default test + Tests run: 3, Failures: 1, Skips: 0 +===============================================</programlisting> - <entry/> + <para>In fact the order between the first two tests might be critical as + well. The <quote><code>pushPopEmpty</code></quote> test leaves our stack + in an empty state. If this is not the case reversing the execution order + of <quote><code>pushPopEmpty</code></quote> and + <quote><code>empty</code></quote> would cause an error as well.</para> - <entry/> + <para>Programming <abbrev + xlink:href="http://en.wikipedia.org/wiki/Integrated_development_environment">IDE</abbrev>s + like eclipse provide elements for test result visualization. Our last + test gets summarized as:</para> - <entry/> - </row> + <screenshot> + <info> + <title><productname + xlink:href="http://testng.org">TestNG</productname> result + presentation in eclipse</title> + </info> - <row> - <entry><code>Uhh-oops</code></entry> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Screen/eclipseTestngResult.screen.png"/> + </imageobject> + </mediaobject> + </screenshot> + + <para>We can drill down from a result of type failure to its occurrence + within the corresponding code.</para> + </chapter> -<<<<<<< HEAD <chapter xml:id="xmlIntro"> <title>Introduction to XML</title> @@ -13937,11420 +14139,1317 @@ public class DbAccess { </section> </section> </chapter> -======= - <entry/> - <entry/> + <chapter xml:id="fo"> + <title>Generating printed output</title> - <entry/> - </row> + <titleabbrev>Print</titleabbrev> - <row> - <entry><code>a+b</code></entry> + <section xml:id="foIntro"> + <title>Online and print versions</title> - <entry/> + <titleabbrev>online / print</titleabbrev> - <entry/> + <para>We already learned how to transform XML documents into HTML by + means of a <abbrev + xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style sheet + processor. In principle we may create printed output by using a HTML + Browser's print function. However the result will not meet reasonable + typographical standards. A list of commonly required features for + printed output includes:</para> - <entry/> - </row> + <variablelist> + <varlistentry> + <term>Line breaks</term> - <row> - <entry><code>&</code></entry> + <listitem> + <para>Text paragraphs have to be divided into lines. To achieve + best results the processor must implement the hyphenation rules + of the language in question in order to automatically hyphenate + long words. This is especially important for text columns of + limited width as appearing in newspapers.</para> + </listitem> + </varlistentry> - <entry/> + <varlistentry> + <term>Page breaks</term> - <entry/> + <listitem> + <para>Since printed pages are limited in height the content has + to be broken into pages. This may be difficult to + achieve:</para> - <entry/> - </row> - </tbody> - </tgroup> - </informaltable> - </question> + <itemizedlist> + <listitem> + <para>Large images being indivisible may have to be deferred + to the following page leaving large amounts of empty + space.</para> + </listitem> - <answer> - <para>We may use the following code to ask a - parser:</para> + <listitem> + <para>Long tables may have to be subdivided into smaller + blocks. Thus it may be required to define sets of additional + footers like <quote>to be continued on the next page</quote> + and additional table headers containing column descriptions + on subsequent pages.</para> + </listitem> + </itemizedlist> + </listitem> + </varlistentry> - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE doc [ -<!ELEMENT doc (testentry)*> -<!ELEMENT testentry EMPTY> -<!ATTLIST testentry - cd CDATA #REQUIRED - nm NMTOKEN #REQUIRED - id ID #REQUIRED - > -]> -<doc> - <testentry cd="_foo" nm="_foo" id="_foo"/> - <testentry cd="too small" nm="too small" id="too small"/> - <testentry cd="2three4" nm="2three4" id="2three4"/> - <testentry cd="-man" nm="-man" id="-man"/> - <testentry cd="two3four" nm="two3four" id="two3four"/> - <testentry cd="Uhh-oops" nm="Uhh-oops" id="Uhh-oops"/> - <testentry cd="a+b" nm="a+b" id="a+b"/> -</doc></programlisting> + <varlistentry> + <term>Page references</term> - <para>This yields:</para> + <listitem> + <para>Document internal references via <link + xlink:href="http://www.w3.org/TR/xml#id">ID</link> / <link + xlink:href="http://www.w3.org/TR/xml#idref">IDREF</link> pairs + may be represented as page references like <quote>see page + 32</quote>.</para> + </listitem> + </varlistentry> - <table xml:id="exerciseAtttypeLegalValue"> - <title>Legal attribute values</title> + <varlistentry> + <term>Left and right pages</term> - <?dbhtml table-width="40%" ?> + <listitem> + <para>Books usually have a different layout for + <quote>left</quote> and <quote>right</quote> pages. Page numbers + usually appear on the left side of a <quote>left</quote> page + and vice versa.</para> - <?dbfo table-width="40%" ?> + <para>Very often the head of each page contains additional + information e.g. a chapter's name on each <quote>left</quote> + page head and the actual section's name on each + <quote>right</quote> page's head.</para> - <tgroup cols="4"> - <colspec colwidth="2*"/> - - <colspec colwidth="2*"/> + <para>In addition chapters usually start on a + <quote>right</quote> page. Sometimes a chapter's starting page + has special layout features e.g. a missing description in the + page's head which will only be given on subsequent pages.</para> + </listitem> + </varlistentry> - <colspec colwidth="2*"/> + <varlistentry> + <term>Footnotes</term> - <colspec colwidth="2*"/> + <listitem> + <para>Footnotes have to be numbered on a per page basis and have + to appear on the current page.</para> + </listitem> + </varlistentry> + </variablelist> + </section> - <tbody> - <row> - <entry/> + <section xml:id="foStart"> + <title>A simple <abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> + document</title> - <entry><code>CDATA</code></entry> + <titleabbrev>Simple <abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev></titleabbrev> - <entry><code>NMTOKEN</code></entry> + <para>A renderer for printed output from XML content also needs + instructions how to format the different elements. A common way to + define these formatting properties is by using <emphasis>Formatting + Objects</emphasis> (<abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>) + standard. <abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> + documents may be compared to HTML. A HTML document has to be rendered + by a piece of software called a browser in order to be viewed as an + image. Likewise <abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> + documents have to be rendered by a piece of software called a + formatting objects processor which typically yields PostScript or PDF + output. As a starting point we take a simple example:</para> - <entry><code>ID</code></entry> - </row> + <figure xml:id="foHelloWorld"> + <title>The most simple <abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> + document</title> - <row> - <entry><code>_foo</code></entry> + <programlisting><?xml version="1.0" encoding="utf-8"?> +<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> - <entry>+</entry> + <fo:layout-master-set> + <!-- Define a simple page layout --> + <fo:simple-page-master master-name="simplePageLayout" + page-width="60mm" page-height="100mm"> + <fo:region-body/> + </fo:simple-page-master> + </fo:layout-master-set> + <!-- Print a set of pages using the previously defined layout --> + <fo:page-sequence master-reference="simplePageLayout"> + <fo:flow flow-name="xsl-region-body"> + <emphasis role="bold"><fo:block>Hello, World ...</fo:block></emphasis> + </fo:flow> + </fo:page-sequence> +</fo:root></programlisting> + </figure> - <entry>+</entry> + <para>PDF generation is initiated by executing a <abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> + processor. At the MI department the script <code>fo2pdf</code> invokes + <orgname>RenderX</orgname>'s <productname + xlink:href="http://www.renderx.com">xep</productname> + processor:</para> - <entry>+</entry> - </row> + <programlisting>fo2pdf -fo hello.fo -pdf hello.pdf</programlisting> - <row> - <entry><code>too small</code></entry> + <para>This creates a PDF file which may be printed or previewed by + e.g. <productname + xlink:href="http://www.adobe.com">Adobe</productname>'s acrobat reader + or evince under Linux. For a list of command line options see + <productname xlink:href="http://www.renderx.com/reference.html">xep's + documentation</productname>.</para> + </section> - <entry>+</entry> + <section xml:id="layoutParam"> + <title>Page layout</title> - <entry>-</entry> + <para>The result from of our <quote>Hello, World ...</quote> code is + not very impressive. In order to develop more elaborated examples we + have to understand the underlying layout model being defined in a + <link + xlink:href="http://www.w3.org/TR/xsl/#fo_simple-page-master">fo:simple-page-master</link> + element. First of all <abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> + allows to subdivide a physical page into different regions:</para> - <entry>-</entry> - </row> + <figure xml:id="foRegionList"> + <title>Regions being defined in a page.</title> - <row> - <entry><code>2three4</code></entry> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/regions.fig"/> + </imageobject> + </mediaobject> + </figure> - <entry>+</entry> + <para>The most important area in this model is denoted by <link + xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link>. + Other regions like <link + xlink:href="http://www.w3.org/TR/xsl/#fo_region-before">fo:region-before</link> + are typically used as containers for meta information such as chapter + headings and page numbering. We take a closer look to the <link + xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link> + area and supply an example of parameterization:</para> - <entry>+</entry> + <figure xml:id="foParamRegBody"> + <title>A complete <abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> + parameterizing of a physical page and the <link + xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link>.</title> - <entry>-</entry> - </row> + <programlisting><?xml version="1.0" encoding="utf-8"?> +<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format" + font-size="6pt"> - <row> - <entry><code>-man</code></entry> + <fo:layout-master-set> <co xml:id="programlisting_fobodyreg_masterset"/> + <fo:simple-page-master master-name="<emphasis role="bold">simplePageLayout</emphasis>" <co + xml:id="programlisting_fobodyreg_simplepagelayout"/> + page-width = "50mm" page-height = "80mm" + margin-top = "5mm" margin-bottom = "20mm" + margin-left = "5mm" margin-right = "10mm"> - <entry>+</entry> + <fo:region-body <co xml:id="programlisting_fobodyreg_regionbody"/> + margin-top = "10mm" margin-bottom = "5mm" + margin-left = "10mm" margin-right = "5mm"/> + </fo:simple-page-master> + </fo:layout-master-set> - <entry>+</entry> + <fo:page-sequence master-reference="<emphasis role="bold">simplePageLayout</emphasis>"> <co + xml:id="programlisting_fobodyreg_pagesequence"/> + <fo:flow flow-name="xsl-region-body"> <co + xml:id="programlisting_fobodyreg_flow"/> + <fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <co + xml:id="programlisting_fobodyreg_block"/> + <fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <coref + linkend="programlisting_fobodyreg_block"/> + <fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <coref + linkend="programlisting_fobodyreg_block"/> + <fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <coref + linkend="programlisting_fobodyreg_block"/> + </fo:flow> + </fo:page-sequence> +</fo:root></programlisting> + </figure> - <entry>-</entry> - </row> + <calloutlist> + <callout arearefs="programlisting_fobodyreg_masterset"> + <para>As the name suggests multiple layout definitions can appear + here. In this example only one layout is defined.</para> + </callout> - <row> - <entry><code>two3four</code></entry> + <callout arearefs="programlisting_fobodyreg_simplepagelayout"> + <para>Each layout definition carries a key attribute master-name + being unique with respect to all defined layouts appearing in + <emphasis>the</emphasis> <tag + class="starttag">fo:layout-master-set</tag>. We may thus call it a + <emphasis>primary key</emphasis> attribute. The current layout + definition's key has the value <code>simplePageLayout</code>. The + length specifications appearing here are visualized in <xref + linkend="paramRegBodyVisul"/> and correspond to the white + rectangle.</para> + </callout> - <entry>+</entry> + <callout arearefs="programlisting_fobodyreg_regionbody"> + <para>Each layout definition <emphasis>must</emphasis> have a + region body being the region in which the documents main text flow + will appear. A layout definition <emphasis>may</emphasis> also + define top, bottom and side regions as we will see <link + linkend="paramHeadFoot">later</link>. The body region is shown + with pink background in <xref + linkend="paramRegBodyVisul"/>.</para> + </callout> - <entry>+</entry> + <callout arearefs="programlisting_fobodyreg_pagesequence"> + <para>A <abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> + document may have multiple page sequences for example one per each + chapter of a book. It <emphasis>must</emphasis> reference an + <emphasis>existing</emphasis> layout definition via its + <code>master-reference</code> attribute. So we may regard this + attribute as a foreign key targeting the set of all defined layout + definitions.</para> + </callout> - <entry>+</entry> - </row> + <callout arearefs="programlisting_fobodyreg_flow"> + <para>A flow allows us to define in which region output shall + appear. In the current example only one layout containing one + region of type body definition being able to receive text output + exists.</para> + </callout> - <row> - <entry><code>Uhh-oops</code></entry> + <callout arearefs="programlisting_fobodyreg_block"> + <para>A <tag class="starttag">fo:block</tag> element may be + compared to a paragraph element <tag class="starttag">p</tag> in + HTML. The attribute <link + xlink:href="http://www.w3.org/TR/xsl/#space-after">space-after</link>="2mm" + adds a space of two mm after each <link + xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link> + container.</para> + </callout> + </calloutlist> - <entry>+</entry> + <para>The result looks like:</para> - <entry>+</entry> + <figure xml:id="paramRegBodyVisul"> + <title>Parameterizing page- and region view port. All length + dimensions are in mm.</title> - <entry>+</entry> - </row> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/overlay.fig"/> + </imageobject> + </mediaobject> + </figure> + </section> - <row> - <entry><code>a+b</code></entry> + <section xml:id="headFoot"> + <title>Headers and footers</title> - <entry>+</entry> + <titleabbrev>Header/footer</titleabbrev> - <entry>-</entry> + <para>Referring to <xref linkend="foRegionList"/> we now want to add + fixed headers and footers frequently being used for page numbers. In a + textbook each page might have the actual chapter's name in its header. + This name should not change as long as the text below <link + xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link> + still belongs to the same chapter. In <abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> + this is achieved by:</para> - <entry>-</entry> - </row> + <itemizedlist> + <listitem> + <para>Encapsulating each chapter's content in a <link + xlink:href="http://www.w3.org/TR/xsl/#fo_page-sequence">fo:page-sequence</link> + of its own.</para> + </listitem> - <row> - <entry><code>&</code></entry> + <listitem> + <para>Defining the desired header text below <link + xlink:href="http://www.w3.org/TR/xsl/#fo_static-content">fo:static-content</link> + in the area defined by <link + xlink:href="http://www.w3.org/TR/xsl/#fo_region-before">fo:region-before</link>.</para> + </listitem> + </itemizedlist> - <entry>-</entry> + <para>The notion <link + xlink:href="http://www.w3.org/TR/xsl/#fo_static-content">fo:static-content</link> + refers to the fact that the content is constant (static) within the + given page sequence. The new version reads:</para> - <entry>-</entry> + <figure xml:id="paramHeadFoot"> + <title>Parameterizing header and footer.</title> - <entry>-</entry> - </row> - </tbody> - </tgroup> - </table> - </answer> - </qandaentry> - </qandadiv> - </qandaset> + <programlisting><?xml version="1.0" encoding="utf-8"?> +<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format" + font-size="6pt"> + + <fo:layout-master-set> + <fo:simple-page-master master-name="simplePageLayout" + page-width = "50mm" page-height = "80mm" + margin-top = "5mm" margin-bottom = "20mm" + margin-left = "5mm" margin-right = "10mm"> + + <fo:region-body margin-top = "10mm" margin-bottom = "5mm" <co + xml:id="programlisting_head_foot_bodydef"/> + margin-left = "10mm" margin-right = "5mm"/> + + <fo:region-before extent="5mm"/> <co + xml:id="programlisting_head_foot_beforedef"/> + <fo:region-after extent="5mm"/> <co + xml:id="programlisting_head_foot_afterdef"/> + + </fo:simple-page-master> + </fo:layout-master-set> + + <fo:page-sequence master-reference="simplePageLayout"> - <qandaset role="exercise"> - <title>book.dtd and internal references</title> + <fo:static-content flow-name="xsl-region-before"> <co + xml:id="programlisting_head_foot_beforeflow"/> + <fo:block + font-weight="bold" + font-size="8pt">Headertext</fo:block> + </fo:static-content> + + <fo:static-content flow-name="xsl-region-after"> <co + xml:id="programlisting_head_foot_afterflow"/> + <fo:block> + <fo:page-number/> + </fo:block> + </fo:static-content> + + <fo:flow flow-name="xsl-region-body"> + <fo:block space-after="8mm">Dumb text .. dumb text.</fo:block> + <fo:block space-after="8mm">Dumb text .. dumb text.</fo:block> + <fo:block space-after="8mm">More text .. more text.</fo:block> + <fo:block space-after="8mm">More text .. more text.</fo:block> + <fo:block space-after="8mm">More text .. more text.</fo:block> + </fo:flow> + </fo:page-sequence> +</fo:root></programlisting> + </figure> - <qandadiv> - <qandaentry xml:id="example_book.dtd_v5"> - <question> - <para>We want to extent our DTD from <xref - linkend="example_book.dtd_v4"/> to allow document - internal references by:</para> + <calloutlist> + <callout arearefs="programlisting_head_foot_bodydef"> + <para>Defining the body region.</para> + </callout> - <itemizedlist> - <listitem> - <para>Allowing each <tag - class="starttag">chapter</tag>, <tag - class="starttag">para</tag> and <tag - class="starttag">itemizedlist</tag> to become - reference targets.</para> - </listitem> + <callout arearefs="programlisting_head_foot_beforedef programlisting_head_foot_afterdef"> + <para>Defining two regions at the top and bottom of each page. The + <code>extent</code> attribute denotes the height of these regions. + <emphasis>Caveat</emphasis>: The attribute <code>extent</code>'s + value gets subtracted from the <code>margin-top</code> or + <code>margin-bottom</code> value being defined in the + corresponding <tag class="starttag">fo:region-body</tag> element. + So if we consider for example the <tag>fo:region-before</tag> we + have to obey:</para> - <listitem> - <para>Extending the element <tag - class="element">para</tag>'s mixed content model by - a new element <tag class="element">link</tag> with - an attribute <tag class="attribute">linkend</tag> - being a reference to a target.</para> - </listitem> - </itemizedlist> - </question> + <para>extent <= margin-top</para> - <answer> - <para>We extend our DTD:</para> + <para>Otherwise we may not even see any output.</para> + </callout> - <programlisting><!ELEMENT book (title, chapter+)> -<!ATTLIST book lang (en|fr|de|it|es) #IMPLIED > -<!ELEMENT chapter (title, (para|itemizedlist)+)> -<!ATTLIST chapter - id <co xml:id="progamlisting_book_v5_chapter_id"/> ID #IMPLIED > -<!ELEMENT title (#PCDATA)> -<!ELEMENT para (#PCDATA|emphasis|link <co - xml:id="progamlisting_book_v5_mixed_link"/>)*> -<!ATTLIST para - id <co xml:id="progamlisting_book_v5_para_id"/> ID #IMPLIED > -<!ELEMENT emphasis (#PCDATA)> -<!ELEMENT link (#PCDATA) <co xml:id="progamlisting_book_v5_link"/>> -<!ATTLIST link - linkend <co xml:id="progamlisting_book_v5_link_linkend"/> IDREF #REQUIRED > + <callout arearefs="programlisting_head_foot_beforeflow"> + <para>A <code>fo:static-content</code> denotes text portions which + are decoupled from the <quote>usual</quote> text flow. For example + as a book's chapter advances over multiple pages we expect the + constant chapter's title to appear on top of each page. In the + current example the static string <code>Headertext</code> will + appear on each page's top for the whole <tag + class="starttag">fo:page-sequence</tag> in which it is defined. + Notice the <code>flow-name="xsl-region-after"</code> reference to + the region being defined in <coref + linkend="programlisting_head_foot_beforedef"/>.</para> + </callout> -<!ELEMENT itemizedlist (listitem+)> -<!ATTLIST itemizedlist - id <co xml:id="progamlisting_book_v5_itemizedList_id"/> ID #IMPLIED > -<!ELEMENT listitem ((para|itemizedlist)+)></programlisting> + <callout arearefs="programlisting_head_foot_afterflow"> + <para>We do the same here for the page's footer. Instead of static + text we output <tag>fo_page-number</tag> yielding the current + page's number.</para> - <calloutlist> - <callout arch="" - arearefs="progamlisting_book_v5_chapter_id progamlisting_book_v5_para_id progamlisting_book_v5_itemizedList_id"> - <para>Defining an attribute <tag - class="attribute">id</tag> of type <code>ID</code> - for the elements <tag class="element">chapter</tag>, - <tag class="element">para</tag> and <tag - class="element">itemizedList</tag>. This enables an - author to define internal reference targets.</para> - </callout> + <para>This time <code>flow-name="xsl-region-after"</code> + references the region definition in <coref + linkend="programlisting_head_foot_afterdef"/>. Actually the + attribute <code>flow-name</code> is restricted to the following + five values corresponding to all possible region definitions + within a layout:</para> - <callout arearefs="progamlisting_book_v5_mixed_link"> - <para>A link is part of the element <tag - class="element">para</tag>'s mixed content model. - Thus an author may define internal references along - with ordinary text.</para> - </callout> + <informaltable> + <?dbhtml table-width="50%" ?> - <callout arearefs="progamlisting_book_v5_link"> - <para>Like in HTML a link may contain text. If - converted to HTML the formatting expectation is a - hypertext link.</para> - </callout> + <?dbfo table-width="50%" ?> - <callout arearefs="progamlisting_book_v5_link_linkend"> - <para>The attribute <tag - class="attribute">linkend</tag> holds the reference - to an internal target being either a <tag - class="element">chapter</tag>, a <tag - class="element">para</tag> or an <tag - class="element">itemizedList</tag>.</para> - </callout> - </calloutlist> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - </section> + <tgroup cols="2"> + <colspec align="left" colwidth="1*"/> - <section xml:id="section_attribute_default"> - <title>Attribute default values</title> + <colspec align="left" colwidth="1*"/> - <para>We have implicitly introduced attribute default values - already. The formal production rule reads:</para> + <tbody> + <row> + <entry><tag class="starttag">fo:region-body</tag></entry> - <productionset> - <title>Attribute Defaults</title> + <entry>xsl-region-body</entry> + </row> - <production xml:id="w3RecXml_NT-DefaultDecl"> - <lhs>DefaultDecl</lhs> + <row> + <entry><tag + class="starttag">fo:region-before</tag></entry> - <rhs>'#REQUIRED' | '#IMPLIED' | (('#FIXED' <nonterminal - def="#w3RecXml_NT-S">S</nonterminal>)? <nonterminal - def="#w3RecXml_NT-AttValue">AttValue</nonterminal>)</rhs> - </production> - </productionset> + <entry>xsl-region-before</entry> + </row> - <para>We have already introduced <code>#REQUIRED</code> and - <code>#IMPLIED</code> describing attribute values that - <emphasis>must</emphasis> be specified and attribute values that - <emphasis>may</emphasis> be specified. The Attribute type - declaration <code>#FIXED</code> is typically used during DTD - development and rarely for production systems. In a nutshell it - enables a DTD author to define an attribute with a fixed value - that cannot be overwritten by an author in a document - instance.</para> + <row> + <entry><tag class="starttag">fo:region-after</tag></entry> - <figure xml:id="attTypeFixed"> - <title>The attribute type <code>#FIXED</code></title> + <entry>xsl-region-after</entry> + </row> - <programlisting xml:id="figure_fixed"><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE configuration [ -<!ELEMENT configuration (property*)> -<!ELEMENT property EMPTY> -<!ATTLIST property - version CDATA #FIXED "3.4" <co xml:id="programmlisting_fixed_attfixed"/> - key NMTOKEN #REQUIRED - value CDATA #IMPLIED > -]> -<configuration> - <property key="user" value="admin"/> <co - xml:id="programmlisting_fixed_unset"/> - <property key="password" value="verySecret" version="3.4" <co - xml:id="programmlisting_fixed_correctlyset"/> /> + <row> + <entry><tag class="starttag">fo:region-start</tag></entry> - <!-- Ooops! --> - <property key="ldapHost" value="141.62.1.5" version="3.7" <co - xml:id="programmlisting_fixed_illdefined"/>/> -</configuration></programlisting> - </figure> + <entry>xsl-region-start</entry> + </row> - <calloutlist> - <callout arearefs="programmlisting_fixed_attfixed"> - <para>For each <tag class="element">property</tag> node the - attribute <tag class="attribute">version</tag> with value <tag - class="attvalue">3.4</tag> is automatically defined.</para> - </callout> + <row> + <entry><tag class="starttag">fo:region-end</tag></entry> - <callout arearefs="programmlisting_fixed_unset"> - <para>The attribute <tag class="attribute">version</tag> is - not explicitly set. Any software acting on the document will - see the value <tag class="attvalue">3.4</tag> though.</para> - </callout> + <entry>xsl-region-end</entry> + </row> + </tbody> + </tgroup> + </informaltable> + </callout> + </calloutlist> - <callout arearefs="programmlisting_fixed_correctlyset"> - <para>The attribute <tag class="attribute">version</tag> is - explicitly set to the value <tag class="attvalue">3.4</tag> - being defined in the DTD.</para> - </callout> + <para>This results in two pages with page numbers 1 and 2:</para> - <callout arearefs="programmlisting_fixed_illdefined"> - <para>The attribute <tag class="attribute">version</tag> is - explicitly set to the value <tag class="attvalue">3.7</tag> - differing from the value <tag class="attvalue">3.4</tag> being - defined in the DTD. A validating parser will complain:</para> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/headfoot.fig"/> + </imageobject> + </mediaobject> - <programlisting><errortext>[Xerces] Attribute "version" with value "3.7" must have a value of "3.4".</errortext></programlisting> - </callout> - </calloutlist> + <para>The free chapter from <xref linkend="bibHarold04"/> book + contains additional information on extended <link + xlink:href="http://www.cafeconleche.org/books/bible2/chapters/ch18.html#d1e2250">layout + definitions</link>. The <orgname + xlink:href="http://w3.org">W3C</orgname> as the holder of the FO + standard defines the elements <link + xlink:href="http://www.w3.org/TR/xsl/#fo_layout-master-set">fo:layout-master-set</link>, + <link + xlink:href="http://www.w3.org/TR/xsl/#fo_simple-page-master">fo:simple-page-master</link> + and <link + xlink:href="http://www.w3.org/TR/xsl/#fo_page-sequence">fo:page-sequence</link></para> + </section> - <para>Next we discuss attributes with default value - definitions:</para> + <section xml:id="foContainer"> + <title>Important Objects</title> - <figure xml:id="attDefDefault"> - <title xml:id="figure_attribute_default">Attribute definitions - with default values</title> + <section xml:id="fo_block"> + <title><code>fo:block</code></title> - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE doc [ -<!ELEMENT doc (para*)> -<!ELEMENT para (#PCDATA)> -<!ATTLIST para - language CDATA "english" <co - xml:id="programlisting_attribute_default_language"/>> -]> -<doc> - <para language="french" <co - xml:id="programlisting_attribute_default_french"/>>Une maison</para> - <para <co xml:id="programlisting_attribute_default_implicit"/>>A house</para> - <para language="english" <co - xml:id="programlisting_attribute_default_defaultoverride"/>>Another house</para> -</doc></programlisting> - </figure> + <para>The FO standard borrows a lot from the CSS standard. Most + formatting objects may have <link + xlink:href="http://www.w3.org/TR/xsl/#section-N19349-Description-of-Property-Groups">CSS + like properties</link> with similar semantics, some properties have + been added. We take a <link + xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link> + container as an example:</para> - <calloutlist> - <callout arearefs="programlisting_attribute_default_language"> - <para>Declaration of an attribute <tag - class="attribute">language</tag> with default value <tag - class="attvalue">english</tag>.</para> - </callout> + <figure xml:id="blockInline"> + <title>A <link + xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link> + with a <link + xlink:href="http://www.w3.org/TR/xsl/#fo_inline">fo:inline</link> + descendant.</title> - <callout arearefs="programlisting_attribute_default_french"> - <para>The attribute value may be overridden as long as the - content conforms to the <code>CDATA</code> attribute - type.</para> - </callout> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/blockprop.fo.pdf"/> + </imageobject> + </mediaobject> - <callout arearefs="programlisting_attribute_default_implicit"> - <para>A <tag class="starttag">para</tag> node with implicit - value <tag class="attribute">language="english"</tag>.</para> - </callout> + <programlisting>... +<fo:block font-weight='bold' + border-bottom-style='dashed' + border-style='solid' + border='1mm'>A lot of attributes and <fo:inline background-color='black' + color='white'>inverted</fo:inline> text.</fo:block> ...</programlisting> + </figure> - <callout arearefs="programlisting_attribute_default_defaultoverride"> - <para>Explicitly setting the DTD default value.</para> - </callout> - </calloutlist> + <para>The <link + xlink:href="http://www.w3.org/TR/xsl/#fo_inline">fo:inline</link> + descendant serves as a means to change the <quote>current</quote> + property set. In HTML/CSS this may be achieved by using the + <code>SPAN</code> tag:</para> - <para>So the difference in declaring an attribute value either - <code>#FIXED</code> or with an ordinary default is the fact, that - the latter may be overridden with a value differing from the - default being supplied in the DTD.</para> - </section> - </section> + <programlisting><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html> + <head> + <title>Blocks/spans and CSS</title> + </head> + <body> + <h1>Blocks/spans and CSS</h1> + <p style="font-weight: bold; border: 1mm; + border-style: solid; border-bottom-style: dashed;" + >A lot of attributes and + <span style="color: white;background-color: black;" + >inverted</span> text.</p> + </body> +</html></programlisting> - <section xml:id="catalogs"> - <title>Catalogs for <abbrev - xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>'s</title> + <para>Though being encapsulated in an attribute <code>class</code> + we find a one-to-one correspondence between FO and CSS in this case. + The HTML rendering works as expected.<mediaobject> + <imageobject> + <imagedata fileref="Ref/Screen/mozparaspancss.screen.png"/> + </imageobject> + </mediaobject>:</para> + </section> - <para>Till now our method to reference a DTD from a document - instance is via a SYSTEM reference:</para> + <section xml:id="fo_list"> + <title>Lists</title> - <programlisting><!DOCTYPE book SYSTEM "ftp://someserver.com/book.dtd"> ...</programlisting> + <para>The easiest type of lists are unlabeled (itemized) lists as + being expressed by the <code>UL</code>/<code>LI</code> tags in HTML. + FO allows a much more detailed parametrization regarding indents and + distances between labels and item content. Relevant elements are + <link + xlink:href="http://www.w3.org/TR/xsl/#fo_list-block">fo:list-block</link>, + <link + xlink:href="http://www.w3.org/TR/xsl/#fo_list-item">fo:list-item</link> + and <link + xlink:href="http://www.w3.org/TR/xsl/#fo_list-item-body">fo:list-item-body</link>. + The drawback is a more complex setup for <quote>default</quote> + lists:</para> - <para>As mentioned before the DTD may be accessed from the file - system or referenced by different protocols like http. As an example - we consider the XML version of the hypertext markup language - HTML:</para> + <figure xml:id="listItemize"> + <title>An itemized list and result.</title> - <figure xml:id="figure_xhtmlbase"> - <title>A simple XHTML document</title> + <programlisting>... +<fo:list-block + provisional-distance-between-starts="2mm"> + <fo:list-item> + <fo:list-item-label end-indent="label-end()"> + <fo:block>&#8226;</fo:block> + </fo:list-item-label> + <fo:list-item-body start-indent="body-start()"> + <fo:block>Flowers</fo:block> + </fo:list-item-body> + </fo:list-item> + + <fo:list-item> + <fo:list-item-label end-indent="label-end()"> + <fo:block>&#8226;</fo:block> + </fo:list-item-label> + <fo:list-item-body start-indent="body-start()"> + <fo:block>Animals</fo:block> + </fo:list-item-body> + </fo:list-item> +</fo:list-block> ...</programlisting> - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE html SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> -<html xmlns="http://www.w3.org/1999/xhtml"> - <head><title>A first start</title></head> - <body> - <h1>A first start</h1> - <p>This is a very simple document</p> - </body> -</html></programlisting> + <mediaobject> + <imageobject> + <imagedata align="left" fileref="Ref/Fig/itemize.fo.pdf"/> + </imageobject> + </mediaobject> </figure> - <para>In this example the DTD can be accessed via http. This seems - to be perfect: A parser reads the document and retrieves referenced - resources. But what happens if the HTTP server - <code>www.w3.org</code> is inaccessible? Or if someone wants to work - offline or in a company's intra net with restricted access policies? - In all these cases it is desirable to have a local copy of the DTD - to become independent from a remote server. The most simple solution - is a copy the complete DTD to the host's local file system:</para> + <para>The result looks somewhat primitive in relation to the amount + of source code it necessitates. The power of these constructs shows + up when trying to format nested lists of possibly different types + like enumerations or definition lists under the requirement of + typographical excellence. More complex examples are presented in + <link + xlink:href="http://www.cafeconleche.org/books/bible2/chapters/ch18.html#d1e4979">Xmlbible + book</link> of <xref linkend="bibHarold04"/>.</para> + </section> - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE html SYSTEM "C:\mystuff\xhtml1-strict.dtd"> -<html xmlns="http://www.w3.org/1999/xhtml"> -...</programlisting> + <section xml:id="leaderRule"> + <title>Leaders and rules</title> - <para>This seems to solve the problem of resources being - unavailable. But what about interoperability? If we want to exchange - documents with other people we cannot expect our partners to supply - the DTD at the same location in the file system. For this reason XML - supports the concept of <emphasis>public identifiers</emphasis>. We - extend the current example:</para> - - <figure xml:id="figure_xhtml_public"> - <title>A XHTML document insversion 2 oftance with public and - system identifier</title> - - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" - "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> -<html xmlns="http://www.w3.org/1999/xhtml"> - <head><title>A first start</title></head> - <body> - <h1>A first start</h1> - <p>This is a very simple document</p> - </body> -</html></programlisting> - </figure> + <titleabbrev>Leaders/rules</titleabbrev> - <para>The String <quote>-//W3C//DTD XHTML 1.0 Strict//EN</quote> - should uniquely identify the given DTD. Thus a different XHTML DTD - version or even a different XML DTD <emphasis>must have</emphasis> a - different public identifier. Note that in the above example a - <code>SYSTEM</code> identifier - <code>http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd</code> must - still be present although the keyword <code>SYSTEM</code> is - absent.</para> + <para>Sometimes adjustable horizontal space between two neighbouring + objects has to be filled e.g. in a book's table of contents. The + <link + xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link> + serves this purpose:</para> - <para>Now a parser may use a <code>PUBLIC</code> identifier to find - the DTD even if the resource being referenced by the - <code>SYSTEM</code> identifier's value is unavailable. This is - achieved by so called DTD catalogs. A catalog maps - <code>PUBLIC</code> identifier values to physical resources. It may - be conceived as a map:</para> + <figure xml:id="leaderToc"> + <title>Two simulated entries in a table of contents.</title> - <figure xml:id="publicSystemDict"> - <title>A catalog joining public identifiers with physical - resources.</title> + <programlisting>... +<fo:block text-align-last='justify'>Valid + XML<fo:leader leader-pattern="dots"/> +page 7</fo:block> - <programlisting>OVERRIDE YES <co - xml:id="figure_emacs_catalog_preferpublic"/> -- prefer public identifiers to system identifiers -- -... - -- XHTML 1.0 -- -PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" <co - xml:id="figure_emacs_catalog_pubid"/> xhtml1-frameset.dtd <co - xml:id="figure_emacs_catalog_resource"/> -PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" xhtml1-strict.dtd -PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" xhtml1-transitional.dtd -... - -- Docbook 3.1 -- -PUBLIC "-//OASIS//DTD DocBook V3.1//EN" docbook.dtd -...</programlisting> +<fo:block text-align-last='justify'>XSL +<fo:leader leader-pattern='dots'/> +page 42</fo:block> ...</programlisting> - <caption> - <para>In the given code snippet all resources are relative to - the base directory /usr/ share/ xemacs/ xemacs-packages/ etc/ - psgml-dtds.</para> - </caption> + <mediaobject> + <imageobject> + <imagedata align="left" fileref="Ref/Fig/leader.fo.pdf"/> + </imageobject> + </mediaobject> </figure> - <calloutlist> - <callout arearefs="figure_emacs_catalog_preferpublic"> - <para>As being stated in the subsequent comment public - identifiers will have precedence over system identifiers.</para> - </callout> + <para>The attributes' value <link + xlink:href="http://www.w3.org/TR/xsl/#text-align-last">text-align-last</link> + = <code>'justify'</code> forces the <link + xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link> to + extend to the available width of the current <link + xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link> + area. The <link + xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link> + inserts the necessary amount of content of the specified type + defined in in <link + xlink:href="http://www.w3.org/TR/xsl/#leader-pattern">leader-pattern</link> + to fill up the gap between its neighbouring components. This + principle can be extended to multiple objects:</para> - <callout arearefs="figure_emacs_catalog_pubid"> - <para>A public identifier with value <code>-//W3C//DTD XHTML 1.0 - Frameset//EN</code> ...</para> - </callout> + <figure xml:id="leaderMulti"> + <title>Four entries separated by equal amounts of dotted + space.</title> - <callout arearefs="figure_emacs_catalog_resource"> - <para>... and the corresponding value - <filename>${BASEDIR}/xhtml1-frameset.dtd</filename>.</para> - </callout> - </calloutlist> + <programlisting><fo:block text-align-last='justify'>A<fo:leader +leader-pattern="dots"/>B<fo:leader +leader-pattern="dots"/>C<fo:leader leader-pattern="dots"/>D</fo:block></programlisting> - <para>The format of a catalog file is by no means specified. Some - applications prefer XML formats to store these mappings. We note - that in presence of a <code>PUBLIC</code> identifier an XML - application is free to choose either of the two offered DTD files if - both are accessible.</para> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/leadermulti.fo.pdf"/> + </imageobject> + </mediaobject> + </figure> - <qandaset role="exercise"> - <title>Relation between public and system identifiers</title> + <para>A <link + xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link> + may also be used to draw horizontal lines to separate objects. In + this case there are no neighbouring components within the + <quote>current</quote> line in which the <link + xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link> + appears. This is frequently used to draw a border between + <code>xsl-region-body</code> and <code>xsl-region-before</code> + and/or <code>xsl-region-after</code>:</para> - <qandadiv> - <qandaentry xml:id="example_public_system"> - <question> - <para>We recall <xref linkend="figure_xhtml_public"/>. The - public identifier uniquely identifies the DTD. Thus the - system identifier still being present seems to be - superfluous. How does a parser react if we omit it? Read the - XML specification and find the corresponding - definition.</para> - </question> + <figure xml:id="leaderSeparate"> + <title>A horizontal line separator between header and body of a + page.</title> - <answer> - <para>Omitting the <code>SYSTEM</code> identifier yields a - parsing error:</para> + <programlisting>... +<fo:page-sequence master-reference="simplePageLayout"> + <fo:static-content flow-name="xsl-region-before"> + <fo:block text-align-last='justify'>FO<fo:leader/>page 5</fo:block> + <fo:block text-align-last='justify'> + <fo:leader leader-pattern="rule" leader-length="100%"/> + </fo:block> + </fo:static-content> + <fo:flow flow-name="xsl-region-body"> + <fo:block>Some body text ...</fo:block> + </fo:flow> +</fo:page-sequence>...</programlisting> - <programlisting><errortext>The system identifier must begin with either a single or -double quote character.</errortext></programlisting> + <mediaobject> + <imageobject> + <imagedata align="left" fileref="Ref/Fig/separate.fo.pdf"/> + </imageobject> + </mediaobject> + </figure> - <para>This message is a bit confusing. Actually the - <code>SYSTEM</code> identifier <emphasis>must</emphasis> - still be present and a better parser should actually - complain about its absence instead of only remarking the - missing begin quotes. The production rule indeed states that - even for <code>PUBLIC</code> identifiers a system literal is - mandatory:</para> + <para>Note the empty leader <code><</code> <link + xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link> + <code>/></code> between the <quote> <code>FO</code> </quote> and + the <quote>page 5</quote> text node inserting horizontal whitespace + to get the page number centered to the header's right edge. This is + in accordance with the <link + xlink:href="http://www.w3.org/TR/xsl/#leader-pattern">leader-pattern</link> + attributes default value <code>space</code>.</para> + </section> - <productionset> - <title>External Entity Declaration</title> + <section xml:id="pageNumbering"> + <title>Page numbers</title> - <production xml:id="w3RecXml_NT-ExternalID"> - <lhs>ExternalID</lhs> + <para>We already saw an example of page numbering via <link + xlink:href="http://www.w3.org/TR/xsl/#fo_page-number">fo:page-number</link> + in <xref linkend="paramHeadFoot"/>. Sometimes a different style for + page numbering is desired. The default page numbering style may be + changed by means of the <link + xlink:href="http://www.w3.org/TR/xsl/#fo_page-sequence">fo:page-sequence</link> + element's attribute <link + xlink:href="http://www.w3.org/TR/xsl/#format">format</link>. For a + closer explanation the <link + xlink:href="http://www.w3.org/TR/2007/REC-xslt20-20070123/#convert">W3X + XSLT standards documentation</link> may be consulted:</para> - <rhs>'SYSTEM' <nonterminal - def="#w3RecXml_NT-S">S</nonterminal> <nonterminal - def="#w3RecXml_NT-SystemLiteral">SystemLiteral</nonterminal> - <sbr/> | 'PUBLIC' <nonterminal - def="#w3RecXml_NT-S">S</nonterminal> <nonterminal - def="#w3RecXml_NT-PubidLiteral">PubidLiteral</nonterminal> - <nonterminal def="#w3RecXml_NT-S">S</nonterminal> - <nonterminal - def="#w3RecXml_NT-SystemLiteral">SystemLiteral</nonterminal></rhs> - </production> + <figure xml:id="pageNumberingRoman"> + <title>Roman style page numbers.</title> - <production xml:id="w3RecXml_NT-NDataDecl"> - <lhs>NDataDecl</lhs> + <programlisting>... +<fo:page-sequence format="i" + master-reference="simplePageLayout"> + <fo:static-content + flow-name="xsl-region-after"> + <fo:block text-align-last='justify'> + <fo:leader leader-pattern="rule" + leader-length="100%"/> + </fo:block> + <fo:block font-weight="bold"> + <fo:page-number/> + </fo:block> + </fo:static-content> - <rhs><nonterminal def="#w3RecXml_NT-S">S</nonterminal> - 'NDATA' <nonterminal - def="#w3RecXml_NT-S">S</nonterminal> <nonterminal - def="#w3RecXml_NT-Name">Name</nonterminal></rhs> - </production> - </productionset> - </answer> - </qandaentry> + <fo:flow flow-name="xsl-region-body"> + <fo:block>Some text...</fo:block> + <fo:block>More text, more text, + more text.</fo:block> + <fo:block>More text, more text, + more text.</fo:block> + <fo:block>Enough text.</fo:block> + </fo:flow> +</fo:page-sequence> ...</programlisting> - <qandaentry xml:id="example_public_dtdlookup"> - <question> - <label>DTD lookup by PUBLIC identifier</label> + <mediaobject> + <imageobject> + <imagedata align="left" fileref="Ref/Fig/pageStack.fig"/> + </imageobject> + </mediaobject> + </figure> + </section> - <para>Modify the document of the preceding exercise - by:</para> + <section xml:id="foMarker"> + <title>Marker</title> - <itemizedlist> - <listitem> - <para>Change the <code>PUBLIC</code> identifier from - <code>-//W3C//DTD XHTML 1.0 Strict//EN</code> to - <code>-//W3C//DTD XHTML 1.0 - Transitional//EN</code>.</para> - </listitem> + <figure xml:id="dictionary"> + <title>A dictionary with running page headers.</title> - <listitem> - <para>Change the <code>SYSTEM</code> identifier to a - resource name which cannot be retrieved.</para> - </listitem> - </itemizedlist> + <programlisting>... +<fo:page-sequence + master-reference="simplePageLayout"> + <fo:static-content flow-name="xsl-region-before"> + <fo:block font-weight="bold"> + <fo:retrieve-marker retrieve-class-name="alpha" + retrieve-position="first-starting-within-page" + />-<fo:retrieve-marker + retrieve-position="last-starting-within-page" + retrieve-class-name="alpha"/> + </fo:block> + <fo:block text-align-last='justify'> + <fo:leader leader-pattern="rule" leader-length="100%"/></fo:block> + </fo:static-content> - <para>Use the Oxygen plug in to check whether this document - instance is still valid. Which DTD is used for validation? - Hint: Check the - <option>Window->Preferences->oxyGen->XML->XML - Catalog</option> menu.</para> - </question> + <fo:flow flow-name="xsl-region-body"> + <fo:block> + <fo:marker marker-class-name="alpha">A + </fo:marker>Ant</fo:block> + <fo:block> + <fo:marker marker-class-name="alpha">B + </fo:marker>Bug</fo:block> + <fo:block> + <fo:marker marker-class-name="alpha">L + </fo:marker>Lion</fo:block> + <fo:block> + <fo:marker marker-class-name="alpha">N + </fo:marker>Nose</fo:block> + <fo:block> + <fo:marker marker-class-name="alpha">P + </fo:marker>Peg</fo:block> + </fo:flow> +</fo:page-sequence> ...</programlisting> - <answer> - <para>We modify the <code>SYSTEM</code> identifier by - omitting the <filename>.dtd</filename> suffix. Thus the DTD - cannot be retrieved by this <link - xlink:href="http://www.w3.org/Addressing">URL</link> any - longer. But we observe that the document remains valid. We - conclude that the parser found a DTD via the - <code>PUBLIC</code> identifier.</para> + <mediaobject> + <imageobject> + <imagedata align="left" fileref="Ref/Fig/dictionaryStack.fig"/> + </imageobject> + </mediaobject> + </figure> + </section> - <para>This assumption is indeed true: In the indicated - options menu we find that a master catalog file - <filename>/usr/share/.../frameworks/catalog.xml</filename> - is used for looking up <code>PUBLIC</code> - identifiers:</para> + <section xml:id="foIntRef"> + <title>Internal references</title> - <programlisting><?xml version="1.0"?> -<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" - "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> -<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> -... - <nextCatalog catalog="xhtml/dtd/xhtmlcatalog.xml" /> - <nextCatalog catalog="xhtml11/dtd/xhtmlcatalog.xml" /> - <nextCatalog catalog="xhtml11/schema/xhtmlcatalog.xml" /> -... -</catalog></programlisting> + <titleabbrev>References</titleabbrev> - <para>And in <filename>xhtml/dtd/xhtmlcatalog.xml</filename> - we find:</para> + <para>Regarding printed documents we may define two categories of + document internal references:</para> - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" - "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> -<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> -... - <public publicId="<emphasis role="bold">-//W3C//DTD XHTML 1.0 Transitional//EN</emphasis>" uri="<emphasis - role="bold">xhtml1-transitional.dtd</emphasis>"/> - <public publicId="<emphasis role="bold">-//W3C//DTD XHTML 1.0 Transitional//EN</emphasis>" uri="<emphasis - role="bold">xhtml1-strict.dtd</emphasis>"/> - <public publicId="-//W3C//DTD XHTML 1.0 Frameset//EN" uri="xhtml1-frameset.dtd"/> -... -</catalog></programlisting> + <variablelist> + <varlistentry> + <term><emphasis>Page number references</emphasis></term> - <para>We learn from this example that a W3C standard - describing a catalog file's structure exists.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - </section> + <listitem> + <para>This is the <quote>classical</quote> type of a reference + e.g. in books. An author refers the reader to a distant + location by writing <quote>... see further explanation in + section 4.5 on page 234</quote>. A book's table of contents + assigning page numbers to topics is another example. This way + the implementation of a reference relies solely on the + features a printed document offers.</para> + </listitem> + </varlistentry> - <section xml:id="xhtml"> - <title>The XHTML DTD</title> + <varlistentry> + <term><emphasis>Hypertext references</emphasis></term> - <para>The XHTML standard is completely defined in terms of a family of - <abbrev - xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>s. - One member of this family is denoted as <emphasis>strict</emphasis> - referring to the largest distinction with regards to - <quote>traditional</quote> HTML. We start with a <quote>Hello, - World</quote> example:</para> + <listitem> + <para>This way of implementing references utilizes features of + (online) viewers for printable documents. For example PDF + viewers like <productname + xlink:href="http://www.adobe.com">Adobe's Acrobat + reader</productname> or the evince application are able to + follow hypertext links in a fashion known from HTML browsers. + This browser feature is based on hypertext capabilities + defined in the Adobe's PDF de-facto standard.</para> + </listitem> + </varlistentry> + </variablelist> - <table border="1" xml:id="htmlHelloRender"> - <caption>A XHTML Hello, World example and its rendering</caption> + <para>Of course the second type of references is limited to people + who use an online viewer application instead of reading a document + from physical paper.</para> - <tr> - <td><programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" - "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> -<html xmlns="http://www.w3.org/1999/xhtml"> - <head> - <title>Hello Example</title> - </head> - <body> - <h1>Hello, World ...</h1> - </body> -</html></programlisting></td> + <para>We now show the implementation of <abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> + based page references. As already being discussed for <link + xlink:href="http://www.w3.org/TR/xml#id">ID</link> / <link + xlink:href="http://www.w3.org/TR/xml#idref">IDREF</link> pairs we + need a link destination (anchor) and a link source. The <abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> + standard uses the same anchor implementation as in XML for <link + xlink:href="http://www.w3.org/TR/xml#id">ID</link> typed attributes: + <abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> + objects <emphasis>may</emphasis> have an attribute <link + xlink:href="http://www.w3.org/TR/xsl/#id">id</link> with a document + wide unique value. The <abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> + element <link + xlink:href="http://www.w3.org/TR/xsl/#fo_page-number-citation">fo:page-number-citation</link> + is used to actually create a page reference via its attribute <link + xlink:href="http://www.w3.org/TR/xsl/#ref-id">ref-id</link>:</para> - <td><mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/hello.screen.png"/> - </imageobject> - </mediaobject></td> - </tr> - </table> - </section> - </chapter> + <figure xml:id="refJavaXml"> + <title>Two blocks mutual page referencing each other.</title> - <chapter xml:id="chapter_entities"> - <title>Entities</title> + <programlisting>... + <fo:flow flow-name='xsl-region-body'> + <fo:block id='xml'>Java section see page + <fo:page-number-citation ref-id='java'/>. + </fo:block> - <para>Entities target the <emphasis>physical</emphasis> structure of - <abbrev - xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>s - and document instances. Both <abbrev - xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>s - and XML document instances may be <emphasis>physically</emphasis> - composed of smaller pieces:</para> + <fo:block id='java'>XML section see page + <fo:page-number-citation ref-id='xml'/>. + </fo:block> + </fo:flow> ...</programlisting> - <itemizedlist> - <listitem> - <para><abbrev - xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>s - often reuse standard components. For example many <abbrev - xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>s - adopted the HTML table model. Entities offer an elegant way to - include such building blocks into other <abbrev - xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>s.</para> - </listitem> + <mediaobject> + <imageobject> + <imagedata align="left" fileref="Ref/Fig/pagerefStack.fig"/> + </imageobject> + </mediaobject> + </figure> - <listitem> - <para>A book may <emphasis>logically</emphasis> consist of 10 - chapters. We may use entities to represent a book by a single master - document plus 10 separate XML documents representing each - chapter.</para> - </listitem> - </itemizedlist> + <para>NB: Be careful defining <link + xlink:href="http://www.w3.org/TR/xsl/#id">id</link> attributes for + objects being descendants of <link + xlink:href="http://www.w3.org/TR/xsl/#fo_static-content">fo:static-content</link> + nodes. Such objects typically appear on multiple pages and are + therefore no unique anchors. A reference carrying such an id value + thus actually refers to 1 <= n values on n different pages. + Typically a user agent will choose the first object of this set when + clicking the link. So in effect the parent <link + xlink:href="http://www.w3.org/TR/xsl/#fo_page-sequence">fo:page-sequence</link> + is chosen as the effective link target.</para> - <para>In correspondence with these two examples we first note that two - different types of entities exist:</para> + <para>The element <link + xlink:href="http://www.w3.org/TR/xsl/#fo_basic-link">fo:basic-link</link> + creates PDF hypertext links. We extend the previous example:</para> - <glosslist> - <glossentry> - <glossterm>Parameter entities</glossterm> + <figure xml:id="refJavaXmlHyper"> + <title>Two blocks with mutual page- and hypertext + references.</title> - <glossdef> - <para>May only be used within <abbrev - xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>s - but not in document instances.</para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm>General entities</glossterm> - - <glossdef> - <para>May be used both in <abbrev - xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>s - and in document instances.</para> - </glossdef> - </glossentry> - </glosslist> - - <para>Both types of entities exist in two flavors - <quote>Internal</quote> and <quote>external</quote> depending on whether - they are defined within a document itself or in an external document - being referenced.</para> - - <section xml:id="section_parameterentity"> - <title xml:id="section_parameterentities">Parameter entities</title> - - <para>We consider the following DTD:</para> - - <figure xml:id="figure_nonmodular_doc"> - <title>A DTD <filename>doc.dtd</filename> describing document - instances consisting of paragraphs and figures</title> - - <programlisting><!ELEMENT doc (para|figure)* <co - xml:id="programlisting_figure1_doc"/>> -<!ELEMENT para (#PCDATA) > - -<!ELEMENT figure (caption, image) <co - xml:id="programlisting_figure1_figure"/>> -<!ELEMENT caption (#PCDATA) <co xml:id="programlisting_figure1_caption"/>> -<!ELEMENT image EMPTY > -<!ATTLIST image - src CDATA #REQUIRED <co - xml:id="programlisting_figure1_image_src"/>></programlisting> - </figure> - - <calloutlist> - <callout arearefs="programlisting_figure1_doc"> - <para>A document consists of an arbitrary sequence of paragraphs - and figures.</para> - </callout> - - <callout arearefs="programlisting_figure1_figure"> - <para>A figure has a caption describing the image's content and an - <tag class="starttag">image</tag> node. The formatting expectation - may be defined as an image with a caption being placed - below.</para> - </callout> - - <callout arearefs="programlisting_figure1_caption"> - <para>A textual description of the corresponding image.</para> - </callout> - - <callout arearefs="programlisting_figure1_image_src"> - <para>The attribute <tag class="attribute">src</tag> contains an - URI to image data.</para> - </callout> - </calloutlist> - - <para>An <filename>example.xml</filename> document instance looks - like:</para> - - <programlisting><!DOCTYPE doc SYSTEM "doc.dtd"> -<doc> - <para>A paragraph</para> - <figure> - <caption>A nice image</caption> - <image src="image.png"/> - </figure> -</doc></programlisting> - - <para>In a <quote>real</quote> DTD a <tag class="element">figure</tag> - element will have more complexity. An author of a different DTD - describing a fashion catalog may want to reuse the <tag - class="element">figure</tag> element as a component. This may be - achieved by moving all <tag class="element">figure</tag> related - definitions into a separate file - <filename>figure.mod</filename>:</para> - - <figure xml:id="figureEntityDef"> - <title>The <tag class="element">figure</tag> element implemented in - an independent DTD module <filename>figure.mod</filename></title> - - <programlisting><!ELEMENT figure (caption, image) > -<!ELEMENT caption (#PCDATA) > -<!ELEMENT image EMPTY > -<!ATTLIST image src CDATA #REQUIRED ></programlisting> - </figure> - - <para>Now we may include this module in a master DTD:</para> - - <figure xml:id="figure_doc_master"> - <title>The master DTD which includes the <code>figure.mod</code> - module</title> - - <programlisting><!ENTITY % <co xml:id="figure_doc_master_pentity"/>figure.mod <co - xml:id="figure_doc_master_identifier"/>SYSTEM <co - xml:id="figure_doc_master_keyword_system"/>"figure.mod" <co - xml:id="figure_doc_master_entity_filename"/>> - -%figure.mod; <co xml:id="figure_doc_master_include"/> - -<!ELEMENT doc (para|figure)* > -<!ELEMENT para (#PCDATA) ></programlisting> - - <calloutlist> - <callout arearefs="figure_doc_master_pentity"> - <para>The percent sign <quote>%</quote> defines the following - identifier to be a <emphasis>parameter</emphasis> entity. - Without this character it would define a <link - linkend="section_generalentities">general</link> entity.</para> - </callout> - - <callout arearefs="figure_doc_master_identifier"> - <para>The entity to be defined will be represented by the local - identifier <code>figure.mod</code>.<filename/></para> - </callout> - - <callout arearefs="figure_doc_master_keyword_system"> - <para>The <code>SYSTEM</code> keyword states that the following - content is a reference to an <emphasis>external</emphasis> - object.</para> - </callout> - - <callout arearefs="figure_doc_master_entity_filename"> - <para><filename>figure.mod</filename> is just the filename of a - DTD module containing all definitions of the <tag - class="element">figure</tag> element.</para> - </callout> - - <callout arearefs="figure_doc_master_include"> - <para>The variable <code>figure.mod</code> represents parameter - entity definitions. We have to <emphasis>include</emphasis> them - to the current DTD in order to make them part of it. In C/C++ - the term <code>%figure.mod;</code> would read <code>#include - "figure.mod"</code>.</para> - </callout> - </calloutlist> - </figure> - - <para>This file functions as a complete replacement for the non - modular DTD presented at the <link - linkend="figure_nonmodular_doc">beginning</link>. This way - <filename>figure.mod</filename> acts as a <quote>building - block</quote> that may be reused in other <abbrev - xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>'s - as well. We note that using an entity in a XML DTD is a two step - process:</para> - - <itemizedlist> - <listitem> - <para>Declaration of an entity.</para> - </listitem> - - <listitem> - <para><quote>Use</quote> of a declared entity.</para> - </listitem> - </itemizedlist> - - <para>Many programming languages combine these two steps into one. - Examples are:</para> - - <glosslist> - <glossentry> - <glossterm>C/C++:</glossterm> - - <glossdef> - <para><code>#include "stdio.h"</code></para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm><trademark - xlink:href="http://www.oracle.com/us/technologies/java">Java</trademark>:</glossterm> - - <glossdef> - <para><code>import de.hdm-stuttgart.xml;</code></para> - </glossdef> - </glossentry> - </glosslist> - - <para>On the other hand there are similarities concerning the way - entities are handled. If we take C/C++ as an example we observe the - following situation: A compiler reads a <quote>master</quote> file and - includes (possibly recursively) sets of other files. This part of the - compilation process is carried out by a separate software called a - preprocessor which may be invoked independently. As an example we take - a <quote>master</quote> file <filename>main.c</filename> written in - the programming language C:</para> - - <programlisting language="c">/* no #include <stdio.h> for simplicity */ -#include "maximum.h" - -void main(char **args){ - printf("The maximum of %d and %d is %d", 3, 5, <emphasis role="bold">max(3,5)</emphasis>); -}</programlisting> - - <para>The referenced file <filename>maximum.h</filename> being - included contains a single line defining the macro - <code>max(...)</code> appearing in the <code>printf</code> - statement:</para> - - <programlisting language="c">#define <emphasis role="bold">max(a, b)</emphasis> ( (a)>(b) ? (a) : (b) )</programlisting> - - <para>Despite some warning messages we may compile and execute - <code>main.c</code>:</para> + <programlisting><fo:flow flow-name='xsl-region-body'> + <fo:block id='xml'>Java section see <fo:basic-link color="blue" + internal-destination="java">page<fo:page-number-citation + ref-id='java'/>.</fo:basic-link></fo:block> - <programlisting><computeroutput>[goik@mupter ~]$ cc -o main main.c -... warnings omitted ... -[goik@mupter ~]$ ./main -The maximum of 3 and 5 is 5</computeroutput></programlisting> +<fo:block id='java'>XML section see + <fo:basic-link color="blue" + internal-destination="xml">page <fo:page-number-citation + ref-id='xml'/>.</fo:basic-link></fo:block > +</fo:flow></programlisting> - <para>Now we may also execute the C preprocessor separately:</para> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/pagerefhyperStack.fig"/> + </imageobject> + </mediaobject> + </figure> + </section> - <programlisting>[goik@mupter ~]$ cpp -P main.c -void main(char **args){ - printf("The maximum of %d and %d is %d", 3, 5, <emphasis role="bold">( (3)>(5) ? (3) : (5) )</emphasis>); -}</programlisting> + <section xml:id="pdfBookmarks"> + <title>PDF bookmarks</title> - <para>We observe that the preprocessor has resolved the dependency - from <filename>main.c</filename> to <filename>maximum.h</filename> by - in line replacing the macro call <code>max(3,5)</code> into <code>( - (3)>(5) ? (3) : (5) )</code>. This output is then read by the - <quote>real</quote> compiler to create an executable binary file - <code>main</code>.</para> + <titleabbrev>Bookmarks</titleabbrev> - <figure xml:id="cppCompilerTwoStep"> - <title>Two processing steps building an executable from a C - file</title> + <para>The PDF specification allows to define so called bookmarks + offering an explorer like navigation:</para> <mediaobject> <imageobject> - <imagedata fileref="Ref/Fig/cpp.fig"/> + <imagedata fileref="Ref/Screen/pdfbookmarks.screen.png"/> </imageobject> </mediaobject> - </figure> - <para>A XML parser validating a document will do the same both - regarding the document instance itself and any entities which have to - be resolved. The first step before any real parsing is executed by the - <emphasis>entity resolver</emphasis> which can be compared to a C - Preprocessor. We reconsider our figure DTD example:</para> - - <figure xml:id="entityResolv"> - <title>The entity resolving process. The dashed arrows show - <code>SYSTEM</code> references to external entities.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/entityresolve.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>The actual XML validating parser will examine the output - <filename>resolve.xml from the entity resolver</filename>.</para> - - <para>As we noted in the introduction to this chapter entities may - also be of type internal. This means they are defined within a - document itself rather than residing in an external object. We - consider the following example:</para> - - <programlisting><!ENTITY % <emphasis role="bold">url</emphasis> "CDATA" <co - xml:id="programlisting_internparam_urlent"/>> - -<!ELEMENT doc (para|figure)* > -<!ELEMENT para (#PCDATA) > - -<!ELEMENT figure (caption, image) > -<!ELEMENT caption (#PCDATA) > -<!ELEMENT image EMPTY > -<!ATTLIST image src %<emphasis role="bold">url</emphasis>;<co - xml:id="programlisting_internparam_urluse"/> #REQUIRED ></programlisting> - - <calloutlist> - <callout arearefs="programlisting_internparam_urlent"> - <para>An internal parameter entity <tag - class="paramentity">url</tag> is defined. Since the - <code>SYSTEM</code> keyword is absent the definition is taken - <quote>as is</quote>.</para> - </callout> - - <callout arearefs="programlisting_internparam_urluse"> - <para>The internal entity <tag class="paramentity">url</tag> is - used. The entity resolver will replace this term by the string - <code>CDATA</code>.</para> - </callout> - </calloutlist> - - <para>From a practical point of view we might argue that the given - code does not make sense. Actually the entity <tag - class="paramentity">url</tag> does a kind of <quote>copy/paste</quote> - action. There seems to be no benefit since the parser still sees the - attribute type <code>CDATA</code> and will thus still accept invalid - <link xlink:href="http://www.w3.org/Addressing">URLs</link> like - <code>http://c:\mydir\</code>.</para> - - <para>The actual gain is readability: In a DTD attributes of - <emphasis>desired</emphasis> type <link - xlink:href="http://www.w3.org/Addressing">URL</link> appear - frequently. In the scope of DTDs there is no appropriate data type - describing the <link - xlink:href="http://www.ietf.org/rfc/rfc1738.txt">formal rules</link> a - <link xlink:href="http://www.w3.org/Addressing">URL</link> has to - obey. But at least the reader will notice the - <emphasis>intention</emphasis> that the attribute <tag - class="attribute">src</tag> of the element <tag - class="element">image</tag> shall contain a <link - xlink:href="http://www.w3.org/Addressing">URL</link>.</para> - - <para>In the next example we want to extend out book.dtd by allowing - simplified HTML tables:</para> - - <table border="1" xml:id="example_table_col_rowspan"> - <caption>A table caption</caption> - - <?target dbhtml table-width="50%"?> - - <?target dbfo table-width="50%"?> - - <tr> - <td rowspan="2">A cell spanning two rows</td> - - <td>a single cell</td> - </tr> - - <tr> - <td>another single cell</td> - </tr> - - <tr> - <td colspan="2">A cell spanning two columns</td> - </tr> - </table> - - <qandaset role="exercise"> - <title>book.dtd and tables</title> - - <qandadiv> - <qandaentry xml:id="example_docbook_v5"> - <question> - <para>The <link linkend="example_table_col_rowspan">example - table</link> presented before may be defined by the following - code snippet:</para> - - <programlisting>... -<table border="1" <co xml:id="programlisting_table_col_rowspan_attborder"/> > - <caption>A table caption</caption> - <tr> - <td rowspan="2" <co - xml:id="programlisting_table_col_rowspan_attrowspan"/>>A cell spanning two rows</td> - <td>a single cell</td> - </tr> - <tr> - <td>another single cell</td> - </tr> - <tr> - <td colspan="2" <co - xml:id="programlisting_table_col_rowspan_attcolspan"/>>A cell spanning two columns</td> - </tr> -</table> -...</programlisting> - - <calloutlist> - <callout arearefs="programlisting_table_col_rowspan_attborder"> - <para>We want a table with borders. In a HTML rendered - version the number indicates the line with in pixel. In - this example we expect a line width of one pixel.</para> - </callout> - - <callout arearefs="programlisting_table_col_rowspan_attrowspan"> - <para>The cell will span two rows.</para> - </callout> - - <callout arearefs="programlisting_table_col_rowspan_attcolspan"> - <para>The cell will span two columns.</para> - </callout> - </calloutlist> - - <para>Define a DTD table module <filename>table.mod</filename> - and include it into the <filename>book.dtd</filename> via an - external parameter entity.</para> - </question> - - <answer> - <para>The table model definitions in - <filename>table.mod</filename> read:</para> - - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!ELEMENT table (caption, tr+)> -<!ATTLIST table - border NMTOKEN #IMPLIED > -<!ELEMENT caption (#PCDATA) > -<!ELEMENT tr (td+) > -<!ELEMENT td (#PCDATA) > -<!ATTLIST td - colspan NMTOKEN #IMPLIED - rowspan NMTOKEN #IMPLIED ></programlisting> - - <para>This may be included into our - <filename>book.dtd</filename> via:</para> - - <programlisting><!ENTITY % table.mod SYSTEM "table.mod" > -%table.mod; - -<!ELEMENT book (title, chapter+)> -...</programlisting> - - <para>The complete source code is available <link - xlink:href="Ref/src/Dtd/book/v5/book.dtd">here</link> . A - document instance reads:</para> - - <programlisting><!DOCTYPE book SYSTEM "book.dtd"> -<book lang="en"> - <title>Introduction to Java</title> - <chapter id="introJava"> - <title>Introduction</title> - <para id="notUsed">Documentation on <link linkend="introJava">types</link></para> - <table border="1"> - <caption>A table caption</caption> - <tr> - <td rowspan="2">A cell spanning two columns</td> - <td>a single cell</td> - </tr> - <tr> - <td>another single cell</td> - </tr> - <tr> - <td colspan="2">A cell spanning two rows</td> - </tr> - </table> - </chapter> -</book></programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="section_generalentities"> - <title>General entities</title> - - <para>Parameter entities are limited to appear only within the scope - of <abbrev - xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>s. - They must not appear in document instances. This motivates the - introduction of general entities. We start with an example of a - copyright notice:</para> - - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<para>All rights, including copyright are owned or -controlled for these purposes by the company.</para> - -<para>For further information, see Section Two of the Member Agreement.</para></programlisting> - - <para>We notice that this code is not even well formed XML: It has got - two <tag class="element">para</tag> nodes at top level.</para> - - <para>We assume that the company in question produces a great number - of documents. These two paragraphs shall be kept at a centralized - location to be included into all publications. For this purpose the - document shall be accessible from - <filename>ftp://internal.com/copyright.xml</filename> in the company's - intra net. Starting with our previously introduced - <code>doc.dtd</code> we may embed and use this copyright - document:</para> - - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE doc SYSTEM "doc.dtd" [ <co - xml:id="programlisting_copyright_internal"/> - <!ENTITY copyrightnotice <co xml:id="programlisting_copyright_entitydef"/> SYSTEM "ftp://internal.com/copyright.xml"> -]<co xml:id="programlisting_copyright_endsubset"/>> -<doc> - <para>A paragraph</para> - <figure> - <caption>A nice image</caption> - <image src="image.png"/> - </figure> - &copyrightnotice; <co xml:id="programlisting_copyright_entityuse"/> -</doc></programlisting> - - <calloutlist> - <callout arearefs="programlisting_copyright_internal"> - <para>The left bracket <quote>[</quote> marks the begin of the - document's <emphasis>internal DTD subset</emphasis>.</para> - </callout> - - <callout arearefs="programlisting_copyright_entitydef"> - <para>An external general entity <tag - class="genentity">copyrightnotice</tag> is declared. The <link - xlink:href="http://www.w3.org/Addressing">URL</link> following the - <code>SYSTEM</code> keyword defines a reference to the external - definitions.</para> - </callout> - - <callout arearefs="programlisting_copyright_endsubset"> - <para>Internal subset definitions end here.</para> - </callout> - - <callout arearefs="programlisting_copyright_entityuse"> - <para>The entity <tag class="genentity">copyrightnotice</tag> is - used. The entity resolver will expand it to the actual content of - <filename>ftp://internal.com/copyright.xml</filename>.</para> - </callout> - </calloutlist> - - <para>The careful reader will have already guessed that from a XML - processing application's viewpoint this is equivalent to:</para> - - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE doc SYSTEM "doc.dtd"> -<doc> - <para>A paragraph</para> - <figure> - <caption>A nice image</caption> - <image src="image.png"/> - </figure> - <para>All rights, including copyright are owned or -controlled for these purposes by the company.</para> - - <para>For further information, see Section Two of the Member Agreement.</para> -</doc></programlisting> - - <para>We now have to clarify the term <quote>internal subset</quote> - in the context of DTDs and start with:</para> - - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE doc SYSTEM "doc.dtd" [ - <!ENTITY copyrightnotice SYSTEM "ftp://internal.com/copyright.xml"> -]>...</programlisting> - - <para>The XML standard allows markup declarations to appear both in - <filename>doc.dtd</filename> itself and within the range being - delimited by the braces <code>[...]</code>. Markup declarations - appearing in <filename>doc.dtd</filename> belong to the so called - <emphasis>external subset</emphasis> reflecting the fact that they - reside outside the <quote>current</quote> document instance. Any - markup declarations appearing within <code>[ ... ]</code> are - considered to belong to the document instance's <emphasis>internal - subset</emphasis>. We are now able to review some of our introductory - XML examples: Our <tag class="element">memo</tag> document instance - from <xref linkend="dtd_and_document"/> has no external subset at all. - The markup declarations are completely defined in the internal subset - of the document instance. As being stated earlier this only makes - sense for development or demonstration purposes.</para> - - <para>The internal subset may under some circumstances even be used to - extend content model or attribute definitions of the underlying DTD - and thus leading to non portable document instances. This is possible - if the DTD provides <quote>hooks</quote> intended to be used as entry - points for extensions.</para> - - <para>In the above example we might have defined the entity <tag - class="genentity">copyrightnotice</tag> in the external subset i.e. - within <filename>doc.dtd</filename>. We conclude this section by - showing a meaningful use case for an internal general entity:</para> - - <qandaset role="exercise"> - <title>Avoiding title duplication</title> - - <qandadiv> - <qandaentry xml:id="example_xhtml_duplicate_title"> - <question> - <para>We recall the sample Xhtml document given in <xref - linkend="figure_xhtmlbase"/>. The <tag - class="starttag">title</tag> and the <tag - class="starttag">h1</tag> node both contain the same content - <quote>A first start</quote>. Use an entity to define this - content to be used at the two different positions.</para> - </question> - - <answer> - <para>We define an entity being used at the two locations in - question:</para> - - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE html SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"[ -<!ENTITY mytitle "A first start" <co - xml:id="programlisting_xhtml_duplicate_title_entity"/>> -]> -<html xmlns="http://www.w3.org/1999/xhtml"> - <head><title>&mytitle;<co - xml:id="programlisting_xhtml_duplicate_title_entity_first"/></title></head> - <body> - <h1>&mytitle;<co - xml:id="programlisting_xhtml_duplicate_title_entity_second"/></h1> - <p>This is a very simple document</p> - </body> -</html></programlisting> - - <calloutlist> - <callout arearefs="programlisting_xhtml_duplicate_title_entity"> - <para>Definition of an internal general entity <tag - class="genentity">mytitle</tag>.</para> - </callout> - - <callout arearefs="programlisting_xhtml_duplicate_title_entity_first"> - <para>First usage.</para> - </callout> - - <callout arearefs="programlisting_xhtml_duplicate_title_entity_second"> - <para>Second usage</para> - </callout> - </calloutlist> - </answer> - </qandaentry> - - <qandaentry xml:id="example_chapter_entities"> - <question> - <label>Dividing a book.dtd document instance into - chapters.</label> - - <para>General entities may be used to physically split - documents into smaller parts. Create a <tag - class="starttag">book</tag> document instance - <filename>master.xml</filename> with two chapters. Define an - <code>IDREF</code> reference from the second to the first - chapter. Now create two XML files - <filename>chap1.xml</filename> and - <filename>chap2.xml</filename> and move the content of the two - chapters from <filename>master.xml</filename> into these - files. Then include them into the master document as external - general entities. What happens with the reference from the - second to the first chapter?</para> - </question> - - <answer> - <para>Our master document reads:</para> - - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE book SYSTEM "book.dtd"[ - <!ENTITY chap1 SYSTEM "chap1.xml"> - <!ENTITY chap2 SYSTEM "chap2.xml"> -]> -<book> - <title>Master document example</title> - &chap1; - &chap2; -</book></programlisting> - - <para>The first general entity <filename>chap1.xml</filename> - contains:</para> - - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<chapter id="firstChapter"> - <title>This is the first chapter</title> - <para>We add some text here.</para> -</chapter></programlisting> - - <para>Notice that the <tag class="starttag">chapter</tag> node - contains an attribute <tag class="attribute">id</tag> with - value <tag class="attvalue">firstChapter</tag>. The second - file <filename>chap2.xml</filename> reads:</para> - - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<chapter> - <title>This is the second chapter</title> - <para>This is a <link linkend="firstChapter">reference</link>.</para> -</chapter></programlisting> - - <para>The paragraph contains an <code>IDREF</code> based - reference to the first chapter being defined as a general - entity. The master document is a valid XML file with respect - to our <filename>book.dtd</filename> grammar. We expect this - result since entities are only a means to - <emphasis>physically</emphasis> divide a XML file into smaller - <quote>chunks</quote> without changing the logical structure - at all.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="section_notation"> - <title>Notations and unparsed entities</title> - - <para>An unparsed entity is conceptually part of an XML document but - will be ignored by the parser. A common example for unparsed entities - are images. The most simple way is to reference XML document external - images by attributes:</para> - - <programlisting><graphic image="printer.gif"/></programlisting> - - <para>Many editors simply use this method which apparently suffers - from some deficiencies:</para> - </section> - </chapter> - - <chapter xml:id="xsl"> - <title>The Extensible Stylesheet Language XSL</title> - - <para>XSL is a <link xlink:href="http://www.w3.org/Style/XSL">W3C - standard</link> which defines a language to transform XML documents into - the following output formats:</para> - - <itemizedlist> - <listitem> - <para>Ordinary text e.g in <link - xlink:href="http://unicode.org">Unicode</link> encoding.</para> - </listitem> - - <listitem> - <para>XML.</para> - </listitem> - - <listitem> - <para>HTML</para> - </listitem> - - <listitem> - <para>XHTML</para> - </listitem> - </itemizedlist> - - <para>Transforming a source XML document into a target XML document may - be required if:</para> - - <itemizedlist> - <listitem> - <para>The target document expresses similar semantics but uses a - different XML dialect i.e. different tag names.</para> - </listitem> - - <listitem> - <para>The target document is only a view on the source document. We - may for example extract the chapter names from a <tag - class="starttag">book</tag> document to create a table of - contents.</para> - </listitem> - </itemizedlist> - - <section xml:id="xsl_helloworld"> - <title>A <quote>Hello, world</quote> <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> example</title> - - <para>We start from an extended version of our - <filename>memo.dtd</filename>:</para> - - <programlisting><!ELEMENT memo (from, to+, subject, content)> -<!ATTLIST memo date CDATA #REQUIRED - priority (low|medium|high) #IMPLIED> -<!ELEMENT from (#PCDATA)> -<!ATTLIST from id ID #IMPLIED > -<!ELEMENT to (#PCDATA)> -<!ATTLIST to id ID #IMPLIED > - -<!ELEMENT subject (#PCDATA)> -<!ELEMENT content (para)+> -<!ELEMENT para (#PCDATA|link)*> -<!ELEMENT link (#PCDATA) > -<!ATTLIST link linkend IDREF #REQUIRED ></programlisting> - - <para>This DTD allows a memo's document content to be structured into - paragraphs. A paragraph may contain links either to the sender or to - one of the memo's recipients.</para> - - <figure xml:id="figure_memoref_instance"> - <title>A memo document instance with an internal reference.</title> - - <programlisting><?xml version="1.0" ?> -<!DOCTYPE memo SYSTEM "memo.dtd"> -<memo date="9.9.2099" priority="high"> - <from id="goik">Martin Goik</from> - <to>Adam Hacker</to> - <to id="eve">Eve Intruder</to> - <subject>Firewall problems</subject> - <content> - <para>Thanks for your excellent work.</para> - <para>Our firewall is definitely broken! This bug has been reported by - the <link linkend="goik">sender</link>.</para> - </content> -</memo></programlisting> - </figure> - - <para>We want to extract the sender's name from an arbitrary <tag - class="element">memo</tag> document instance. Using <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> this task can be - accomplished by a script <filename>memo2sender.xsl</filename>:</para> - - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" - version="2.0"> - - <xsl:output method="text"/> - - <xsl:template match="/memo"> - <xsl:value-of select="from"/> - </xsl:template> - -</xsl:stylesheet></programlisting> - - <para>Before closer examining this code we first show its effect. We - need a piece of software called a <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor. It - reads both a <tag>memo</tag> document instance and a style sheet and - produces the following output:</para> - - <programlisting><computeroutput>[goik@mupter Memoref]$ xml2xml message.xml memo2sender.xsl -Martin Goik</computeroutput></programlisting> - - <para>The result is the sender's name <computeroutput>Martin - Goik</computeroutput>. We may sketch the transformation - principle:</para> - - <figure xml:id="figure_xsl_principle"> - <title>An <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor - transforming a XML document into a result using a stylesheet</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/xslconvert.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>The executable <filename>xml2xml</filename> defined at the MI - department is actually a script wrapping the <productname - xlink:href="http://saxon.sourceforge.net">Saxon XSLT - processor</productname>. We may also use the Eclipse/Oxygen plug in - <!-- goik - and <uri - xlink:href="src/viewlet/xslt_config/xslt_config_viewlet_swf.html"> - and define - a transformation scenario</uri> thus --> replacing the shell command by - a GUI. Next we closer examine the <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> example - code:</para> - - <programlisting><xsl:stylesheet <co - xml:id="programlisting_helloxsl_stylesheet"/> xmlns:xsl <co - xml:id="programlisting_helloxsl_namespace_abbv"/> ="http://www.w3.org/1999/XSL/Transform" - version="2.0" <co xml:id="programlisting_helloxsl_xsl_version"/> > - - <xsl:output method="text" <co - xml:id="programlisting_helloxsl_method_text"/>/> - - <xsl:template <co xml:id="programlisting_helloxsl_template"/> match <co - xml:id="programlisting_helloxsl_match"/> ="/memo"> - <xsl:value-of <co xml:id="programlisting_helloxsl_value-of"/> select <co - xml:base="" xml:id="programlisting_helloxsl_valueof_select_att"/> ="from" /> - </xsl:template> - -</xsl:stylesheet></programlisting> - - <calloutlist> - <callout arearefs="programlisting_helloxsl_stylesheet"> - <para>The element stylesheet belongs the the namespace - <code>http://www.w3.org/1999/XSL/Transform</code>. This namespace - is <emphasis>represented</emphasis> by the literal - <literal>xsl</literal>. As an alternative we might also use <tag - class="starttag">stylesheet - xmlns="http://www.w3.org/1999/XSL/Transform"</tag> instead of <tag - class="starttag">xsl:stylesheet ...</tag>. The value of the - namespace itself gets defined next.</para> - </callout> - - <callout arearefs="programlisting_helloxsl_namespace_abbv"> - <para>The keyword <code>xmlns</code> is reserved by the <link - xlink:href="http://www.w3.org/TR/REC-xml-names/">Namespaces in - XML</link> specification. In <quote>pure</quote> XML the whole - term <code>xmlns:xsl</code> would simply define an attribute. In - presence of a namespace aware XML parser however the literal - <literal>xsl</literal> represents the attribute value <tag - class="attvalue">http://www.w3.org/1999/XSL/Transform</tag>. This - value <emphasis>must not</emphasis> be changed! Otherwise a XSL - converter will fail since it cannot distinguish processing - instructions from other XML elements. An element <tag - class="starttag">stylesheet</tag> belonging to a different - namespace <code>http//someserver.org/SomeNamespace</code> may have - to be generated.</para> - </callout> - - <callout arearefs="programlisting_helloxsl_xsl_version"> - <para>The <link xlink:href="http://www.w3.org/TR/xslt20">XSL - standard</link> is still evolving. The version number identifies - the conformance level for the subsequent code.</para> - </callout> - - <callout arearefs="programlisting_helloxsl_method_text"> - <para>The <tag class="attribute">method</tag> attribute in the - <link - xlink:href="http://www.w3.org/TR/xslt20/#element-output"><xsl:output></link> - element specifies the type of output to be generated. Depending on - this type we may also define indentation depths and/or encoding. - Allowed <tag class="attvalue">method</tag> values are:</para> - - <glosslist> - <glossentry> - <glossterm>text</glossterm> - - <glossdef> - <para>Ordinary text.</para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm>html</glossterm> - - <glossdef> - <para><link - xlink:href="http://www.w3.org/TR/html4">HTML</link> - markup.</para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm>xhtml</glossterm> - - <glossdef> - <para><link - xlink:href="http://www.w3.org/TR/xhtml1">Xhtml</link> markup - differing from the former by e.g. the closing - <quote>/></quote> in <tag><img - src="..."/></tag>.</para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm>xml</glossterm> - - <glossdef> - <para>XML code. This is most commonly used to create views - on or different dialects of a XML document instance.</para> - </glossdef> - </glossentry> - </glosslist> - </callout> - - <callout arearefs="programlisting_helloxsl_template"> - <para>A <tag class="starttag">xsl:template</tag> defines the - output that will be created for document nodes being defined by a - selector.</para> - </callout> - - <callout arearefs="programlisting_helloxsl_match"> - <para>The attribute <tag class="attribute">match</tag> tells us - for which nodes of a document instance the given <tag - class="starttag">xsl:template</tag> is appropriate. In the given - example the value <code>/memo</code> tells us that the template is - only responsible for <tag class="element">memo</tag> nodes - appearing at top level i.e. being the root element of the document - instance.</para> - </callout> - - <callout arch="" - arearefs="programlisting_helloxsl_value-of programlisting_helloxsl_valueof_select_att"> - <para>A <tag class="element">value-of</tag> element writes content - to the <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> process' - output. In this example the <code>#PCDATA</code> content from the - element <tag class="element">from</tag> will be written to the - output.</para> - </callout> - </calloutlist> - </section> - - <section xml:id="xpath"> - <title><link xlink:href="http://www.w3.org/TR/xpath">XPath</link> and - node sets</title> - - <para>We are now interested in a list of all recipients being defined - in a <tag class="element">memo</tag> element. We introduce the element - <tag class="element">xsl:for-each</tag> which iterates over a result - set of nodes:</para> - - <figure xml:id="programlisting_tolist_xpath"> - <title>Iterating over the list of recipient nodes.</title> - - <programlisting><?xml version="1.0" encoding="UTF-8"?> - -<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" - version="2.0"> - - <xsl:output method="text"/> - - <xsl:template match="/" <co xml:id="programlisting_tolist_match_root"/>> - <xsl:for-each select="memo/to" <co - xml:id="programlisting_tolist_xpath_memo_to"/> > - <xsl:value-of select="." <co xml:id="programlisting_tolist_value_of"/> /> - <xsl:text>,</xsl:text> <co - xml:id="programlisting_tolist_xsl_text"/> - </xsl:for-each> - </xsl:template> - -</xsl:stylesheet></programlisting> - </figure> - - <calloutlist> - <callout arearefs="programlisting_tolist_match_root"> - <para>This template matches the XML document instance, - <emphasis>not</emphasis> the visible <tag - class="element"><memo></tag> node.</para> - </callout> - - <callout arearefs="programlisting_tolist_xpath_memo_to"> - <para>The <link - xlink:href="http://www.w3.org/TR/xpath">XPath</link> expression - <tag class="attvalue">memo/to</tag> gets evaluated starting from - the invisible top level document node being the context node. For - the given document instance this will define a result set - containing both <tag class="element"><to></tag> recipient - nodes, see <xref linkend="figure_memo_xpath_memo_to"/>.</para> - </callout> - - <callout arearefs="programlisting_tolist_value_of"> - <para>The dot <quote>.</quote> represents the <code>#PCDATA</code> - content of the current <tag class="element">to</tag> - element.</para> - </callout> - - <callout arearefs="programlisting_tolist_xsl_text"> - <para>A comma is appended. This is not quite correct since it - should be absent for the last element.</para> - </callout> - </calloutlist> - - <figure xml:id="figure_recipientlist_trailing_comma"> - <title>A list of recipients.</title> - - <para>The <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> presented - before yields:</para> - - <programlisting><computeroutput>Adam Hacker,Eve Intruder</computeroutput><emphasis - role="bold">,</emphasis></programlisting> - </figure> - - <para>Right now we do not bother about the trailing <quote>,</quote> - after the last recipient. The surrounding - <code><xsl:text></code>,<code></xsl:text></code> elements - <emphasis>may</emphasis> be omitted. We encourage the reader to leave - them in place since they increase readability when a template's body - gets more complex. The element <tag class="starttag">xsl:text</tag> is - used to append static text to the output. This way we append a - separator after each recipient. We now discuss the role of the two - attributes <tag class="attribute">match="/"</tag> and <tag - class="attribute">select=memo/to</tag>. Both are examples of so called - <link xlink:href="http://www.w3.org/TR/xpath">XPath</link> - expressions. They allow to define <emphasis>node sets</emphasis> being - subsets from the set of all nodes from a given document - instance.</para> - - <para>Conceptually <link - xlink:href="http://www.w3.org/TR/xpath">XPath</link> expressions may - be compared to the <acronym - xlink:href="http://en.wikipedia.org/wiki/Sql">SQL</acronym> language - the latter allowing the retrieval of data<emphasis>sets</emphasis> - from a relational database. We illustrate the current example by a - figure:</para> - - <figure xml:id="figure_memo_xpath_memo_to"> - <title>Selecting node sets from <tag class="element">memo</tag> - document instances</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/memoxpath.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>This figure needs some explanation. We observe an additional - node <quote>above</quote> <tag class="starttag">memo</tag> being - represented as <quote>filled</quote>. This node represents the - document instance as a whole and has got <tag>memo</tag> as its only - child. We will rediscover this additional root node when we discuss - the <abbrev - xlink:href="http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407">DOM</abbrev> - application programming interface.</para> - - <para>As already mentioned the expression <code>memo/to</code> - evaluates to a <emphasis>set</emphasis> of nodes. In our example this - set consists of two nodes of type <tag class="starttag">to</tag> each - of them representing a recipient of the memo. We observe a subtle - difference between the two <abbrev - xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> - expressions:</para> - - <glosslist> - <glossentry> - <glossterm><code>match="/"</code></glossterm> - - <glossdef> - <para>The expression starts and actually consists of the string - <quote>/</quote>. Thus it can be called an - <emphasis>absolute</emphasis> <abbrev - xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> - expression. Like a file specification - <filename>C:\dos\myprog.exe</filename> it starts on top level - and needs no further context information to get - evaluated.</para> - - <para>A <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style - sheet <emphasis>must</emphasis> have an <link - xlink:href="http://www.w3.org/TR/xslt20/#initiating">initial - context node</link> to start the transformation. This is - achieved by providing exactly one <tag - class="starttag">xsl:template</tag> with an absolute <abbrev - xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> value for - its <tag class="attribute">match</tag> attribute like <tag - class="attvalue">/memo</tag>.<emphasis/></para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm><code>select="memo/to"</code></glossterm> - - <glossdef> - <para>This expression can be compared to a - <emphasis>relative</emphasis> file path specification like e.g. - <filename>../images/hdm.gif</filename>. We need to add the base - (context) directory in order for a relative file specification - to become meaningful. If the base directory is - <filename>/home/goik/xml</filename> than this - <emphasis>relative</emphasis> file specification will address - the file <filename>/home/goik/images/hdm.gif</filename>.</para> - - <para>Likewise we have to define a <emphasis>context</emphasis> - node if we want to evaluate a relative <abbrev - xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> - expression. In our example this is the root node. The XSL - specification introduces the term <link - xlink:href="http://www.w3.org/TR/xslt20/#context">evaluation - context</link> for this purpose.</para> - </glossdef> - </glossentry> - </glosslist> - - <para>In order to explain relative <abbrev - xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> expressions we - consider <code>content/para</code> starting from the (unique!) <tag - class="element">memo</tag> node:</para> - - <figure xml:id="memoXpathPara"> - <title>The node set represented by <code>content/para</code> - starting at the context node <tag - class="starttag">memo</tag>.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/memorelativexpath.fig"/> - </imageobject> - </mediaobject> - - <caption> - <para>The dashed lines represent the relative <abbrev - xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> expressions - starting from the context node to each of the nodes in the result - set.</para> - </caption> - </figure> - </section> - - <section xml:id="xsl_important_elements"> - <title>Some important <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> elements</title> - - <section xml:id="xsl_if"> - <title><tag class="starttag">xsl:if</tag></title> - - <para>Sometimes we need conditional processing rules. We might want - create a list of sender and recipients with a defined value for the - attribute <tag class="attribute">id</tag>. In the <link - linkend="figure_memoref_instance">given example</link> this is only - valid for the (unique) sender and the recipient <code><to - id="eve">Eve Intruder</to></code>. We assume this set of - persons shall be inserted into a relational database table - <code>Customer</code> consisting of two <code>NOT NULL</code> - columns <code>id</code> an <code>name</code>. Thus both attributes - <emphasis>must</emphasis> be specified and we must exclude <tag - class="starttag">from</tag> or <tag class="starttag">to</tag> nodes - with undefined <tag class="attribute">id</tag> attributes:</para> - - <figure xml:id="programlisting_memo_export_sql"> - <title>Exporting SQL statements.</title> - - <programlisting>... -<xsl:variable name="newline" <co xml:id="programlisting_xsl_if_definevar"/>> <!-- A newline \n --> - <xsl:text> -</xsl:text> -</xsl:variable> - -<xsl:template match="/memo"> - <xsl:for-each select="from|to" <co xml:id="programlisting_xsl_if_foreach"/>> - <xsl:if <emphasis role="bold">test="@id"</emphasis> <co - xml:id="programlisting_xsl_if_test"/>> - <xsl:text>INSERT INTO Customer (id, name) VALUES ('</xsl:text> - <xsl:value-of select="@id" <co - xml:id="programlisting_xsl_if_select_idattrib"/>/> - <xsl:text>', '</xsl:text> - <xsl:value-of select="." <co - xml:id="programlisting_xsl_if_selectcontent"/>/> - <xsl:text>')</xsl:text> - <xsl:value-of select="$newline" <co - xml:id="programlisting_xsl_if_usevar"/>/> - </xsl:if> - </xsl:for-each> -</xsl:template></programlisting> - - <caption> - <para>We want to export data from XML documents to a database - server. For this purpose INSERT statements are being crafted - from a XML document containing relevant data.</para> - </caption> - </figure> - - <calloutlist> - <callout arearefs="programlisting_xsl_if_definevar"> - <para>Define a file local variable <code>newline</code>. Dealing - with text output frequently requires the insertion of newlines. - Due to the syntax of the <tag class="element">xsl:text</tag> - elements this tends to clutter the code.</para> - </callout> - - <callout arearefs="programlisting_xsl_if_foreach"> - <para>Iterate over the set of the sender node and all recipient - nodes.</para> - </callout> - - <callout arearefs="programlisting_xsl_if_test"> - <para>The attribute value of <tag class="attribute">test</tag> - will be <link - xlink:href="http://www.w3.org/TR/xslt20/#xsl-if">evaluated</link> - as a boolean. In this example it evaluates to <code>true</code> - iff the attribute <tag class="attribute">id</tag> is defined for - the context node. Since we are inside the <tag - class="element">xsl:for-each</tag> block all context nodes are - either of type <tag class="starttag">from</tag> or <tag - class="starttag">to</tag> and thus <emphasis>may</emphasis> have - an <tag class="attribute">id</tag> attribute.</para> - </callout> - - <callout arearefs="programlisting_xsl_if_select_idattrib"> - <para>The <tag class="attribute">id</tag> attributes value is - copied to the output. The <quote>@</quote> character in - <code>select="@id"</code> tells the <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor - to read the value of an <emphasis>attribute</emphasis> with name - <tag class="attribute">id</tag> rather then the content of a - nested sub<emphasis>element</emphasis> like in <code><to - id="foo"><id>I am - nested!</id></to></code>.</para> - </callout> - - <callout arearefs="programlisting_xsl_if_selectcontent"> - <para>As stated earlier the dot <quote>.</quote> denotes the - current context element. In this example simply the - <code>#PCDATA</code> content is copied to the output.</para> - </callout> - - <callout arearefs="programlisting_xsl_if_usevar"> - <para>The <quote>$</quote> sign in front of <code>newline</code> - tells the <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor - to access the variable <varname>newline</varname> previously - defined in <coref linkend="programlisting_xsl_if_definevar"/> - rather then interpreting it as the name of a sub element or an - attribute.</para> - </callout> - </calloutlist> - - <para>As expected the recipient entry <quote>Adam Hacker</quote> - does not appear due to the fact that no <tag - class="attribute">id</tag> attribute is defined in its <tag - class="starttag">to</tag> element:</para> - - <programlisting><computeroutput>INSERT INTO Customer (id, name) VALUES ('goik', 'Martin Goik') -INSERT INTO Customer (id, name) VALUES ('eve', 'Eve intruder')</computeroutput></programlisting> - - <qandaset role="exercise"> - <title>The XPath functions position() and last()</title> - - <qandadiv> - <qandaentry xml:id="example_position_last"> - <question> - <para>We return to our recipient list in <xref - linkend="figure_recipientlist_trailing_comma"/>. We are - interested in a list of recipients avoiding the trailing - comma:</para> - - <programlisting><computeroutput>Adam Hacker,Eve Intruder</computeroutput></programlisting> - - <para>We may use a <tag class="element">xsl:if</tag> to - insert a comma for all but the very last recipient node. - This can be achieved by using the <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> - functions <link - xlink:href="http://www.w3.org/TR/xpath#function-position">position()</link> - and <link - xlink:href="http://www.w3.org/TR/xpath#function-last">last()</link>. - Hint: The arithmetic operator <quote><</quote> may be - used in <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> to - compare two integer numbers. However it must be escaped as - <code>&lt;</code> in order to be XML compatible.</para> - </question> - - <answer> - <para>We have to exclude the comma for the last node of the - recipient list. If we have e.g. 10 recipients the function - <code>position()</code> will return values integer values - starting at 1 and ending with 10. So for the last node the - comparison <code>10 < 10</code> will evaluate to - false:</para> - - <programlisting><xsl:for-each select="memo/to"> - <xsl:value-of select="."/> - <xsl:if test="position() &lt; last()"> - <xsl:text>,</xsl:text> - </xsl:if> -</xsl:for-each></programlisting> - </answer> - </qandaentry> - - <qandaentry xml:id="example_avoid_xsl_if"> - <question> - <label>Avoiding xsl:if</label> - - <para>In <xref linkend="programlisting_memo_export_sql"/> we - used the <abbrev - xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> value - <quote>from|to</quote> to select the desired sender and - recipient nodes. Inside the <tag - class="element">xsl:for-each</tag> block we permitted only - those nodes which have an <tag class="attribute">id</tag> - attribute. These two steps may be combined into a single - <abbrev - xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> - expression obsoleting the <tag - class="element">xsl:if</tag>.</para> - </question> - - <answer> - <para>We simply need a modified <abbrev - xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> in - the <tag class="element">for-each</tag>:</para> - - <programlisting><xsl:for-each select="<emphasis - role="bold">from[@id]|to[@id]</emphasis>"> - <xsl:text>INSERT INTO Customer (id, name) VALUES ('</xsl:text> - <xsl:value-of select="@id"/> - <xsl:text>', '</xsl:text> - <xsl:value-of select="."/> - <xsl:text>')</xsl:text> - <xsl:value-of select="$newline"/> -</xsl:for-each></programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="xsl_apply_templates"> - <title><tag class="starttag">xsl:apply-templates</tag></title> - - <para>We already used <tag class="element">xsl:for-each</tag> to - iterate over a list of element nodes. <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> offers a - different possibility for this purpose. The idea is to define the - formatting rules at a centralized location. So the solution to <xref - linkend="example_position_last"/> in an equivalent way:</para> - - <programlisting><xsl:template match="/"> - <xsl:apply-templates select="memo/to" <co - xml:id="programlisting_apply_templates_apply"/>/> -</xsl:template> - -<xsl:template match="to" <co xml:id="programlisting_apply_templates_match"/>> - <xsl:value-of select="."/> - <xsl:if test="<emphasis role="bold">position()</emphasis> &lt; <emphasis - role="bold">last()</emphasis>"> - <xsl:text>,</xsl:text> - </xsl:if> -</xsl:template></programlisting> - - <calloutlist> - <callout arearefs="programlisting_apply_templates_apply"> - <para>Definition of the recipient node list. Each element of - this list shall be processed further.</para> - </callout> - - <callout arearefs="programlisting_apply_templates_match"> - <para>This template <emphasis>may</emphasis> be used by a XSL - processor to format nodes of type <tag - class="starttag">to</tag>. Since the processor is asked to do - exactly this in <xref - linkend="programlisting_apply_templates_apply"/> the current - template will <emphasis>really</emphasis> be used in this - example.</para> - </callout> - </calloutlist> - - <para>The procedure outlined above may have the following - advantages:</para> - - <itemizedlist> - <listitem> - <para>Some elements being central for a DTD may appear at - different places. For example a <tag - class="starttag">title</tag> element is likely to appear as a - child of chapters, sections, tables figures and so on. It may be - sufficient to define a single template with a - <code>match="title"</code> attribute which contains all rules - being required.</para> - </listitem> - - <listitem> - <para>Sometimes the body of a <tag - class="starttag">xsl:for-each</tag> ... <tag - class="endtag">xsl:for-each</tag> spans multiple screens thus - limiting code readability. Factoring out the body into a - template may avoid this obstacle.</para> - </listitem> - </itemizedlist> - - <para>This method is well known from programming languages: If the - code inside a loop is needed multiple times or reaches a painful - line count <emphasis>good</emphasis> programmers tend to define a - separate method. For example:</para> - - <programlisting language="java">for (int i = 0; i < 10; i++){ - if (a[i] < b[i]){ - max[i] = b; - } else { - max[i] = a; - } - ... -}</programlisting> - - <para>Inside the loop's body the relative maximum value of two - variables gets computed. This may be needed at several locations and - thus it is convenient to centralize this code into a method:</para> - - <programlisting language="java">// cf. <xsl:template match="..."> -static int maximum(int a, int b){ - if (a < b){ - return b; - } else { - return a; - } -} -... -// cf. <xsl:apply-templates select="..."/> -for (int i = 0; i < 10; i++){ - max[i] = maximum(a[i], b[i]); -}</programlisting> - - <para>So far calling a static method in <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - may be compared to a <tag - class="starttag">xsl:apply-templates</tag>. There is however one big - difference. In <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> the - <quote>method</quote> being called may not exist at all. A <tag - class="starttag">xsl:apply-templates</tag> instructs a processor to - format a set of nodes. It does not contain information about any - rules being defined to do this job:</para> - - <programlisting><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" - version="2.0"> - - <xsl:output method="text"/> - - <xsl:template match="/memo"> - <xsl:apply-templates <emphasis role="bold">select="content"</emphasis>/> - </xsl:template> - -</xsl:stylesheet></programlisting> - - <para>Since no suitable template supplying rules for <tag - class="starttag">content</tag> nodes exists a <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor uses - a default formatting rule instead:</para> - - <programlisting><computeroutput>Thanks for your excellent work.Our firewall is definitely -broken! This bug has been reported by the sender.</computeroutput></programlisting> - - <para>We observe that the <code>#PCDATA</code> content strings of - the element itself and all (recursive) sub elements get glued - together into one string. In most cases this is definitely not - intended. Omitting a necessary template is usually a programming - error. It is thus good programming practice during style sheet - development to define a special template catching forgotten - rules:</para> - - <programlisting><xsl:template match="/memo"> - <xsl:apply-templates select="content"/> -</xsl:template> - -<xsl:template match="*"> - <xsl:message> - <xsl:text>Error: No template defined matching element '</xsl:text> - <xsl:value-of select="name(.)"/> - <xsl:text>'</xsl:text> - </xsl:message> -</xsl:template></programlisting> - - <para>The <quote>*</quote> matches any element if there is no <link - xlink:href="http://www.w3.org/TR/xslt20/#conflict">better - matching</link> rule defined. Since we did not supply any template - for <tag class="starttag">content</tag> nodes at all this default - template will match nodes of type <tag - class="starttag">content</tag>. The function <code>name()</code> is - predefined in <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> and returns - the element type name of a node. During the formatting process we - will now see the following warning message:</para> - - <programlisting><computeroutput>Error: No template defined matching element 'content'</computeroutput></programlisting> - - <para>We note that for document nodes <tag - class="starttag">xyz</tag><code>foo</code><tag - class="endtag">xyz</tag> containing only <code>#PCDATA</code> a - simple <tag class="emptytag">xsl:apply-templates select="xyz"</tag> - is sufficient: A <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor uses - its default rule and copies the node's content <code>foo</code> to - its output.</para> - - <qandaset role="exercise"> - <title>Extending the export to a RDBMS</title> - - <qandadiv> - <qandaentry xml:id="example_rdbms_person"> - <question> - <para>We assume that our RDBMS table <code>Customer</code> - from <xref linkend="programlisting_memo_export_sql"/> shall - be replaced by a table <code>Person</code>. We expect the - senders of memo documents to be employees of a given - company. Conversely the recipients of memos are expected to - be customers. Our <code>Person</code> table shall have a - <quote>tag</quote> like column named <code>type</code> - having exactly two allowed values <code>customer</code> or - <code>employee</code> being controlled by a - <code>CHECK</code> constraint, see <xref - linkend="table_person"/>. Create a style sheet generating - the necessary SQL statements from a memo document instance. - Hint: Define two different templates for <tag - class="starttag">from</tag> and <tag - class="starttag">to</tag> nodes.</para> - </question> - - <answer> - <para>We define two templates differing only in the static - string value for a person's type. The relevant <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> - portion reads:<programlisting><xsl:template match="/memo"> - <xsl:apply-templates select="from|to"/> -</xsl:template> - -<xsl:template match="from"> - <xsl:text>INSERT INTO Person (name, type) VALUES('</xsl:text> - <xsl:value-of select="."/> - <xsl:text>', <emphasis role="bold">'employee'</emphasis>)</xsl:text> - <xsl:value-of select="$newline"/> -</xsl:template> - - <xsl:template match="to"> - <xsl:text>INSERT INTO Person (name, type) VALUES('</xsl:text> - <xsl:value-of select="."/> - <xsl:text>', <emphasis role="bold">'customer'</emphasis>)</xsl:text> - <xsl:value-of select="$newline"/> -</xsl:template></programlisting></para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <table xml:id="table_person"> - <title>The Person table</title> - - <?dbhtml table-width="30%" ?> - - <?dbfo table-width="40%" ?> - - <tgroup cols="2"> - <colspec colwidth="3*"/> - - <colspec colwidth="2*"/> - - <thead> - <row> - <entry>name</entry> - - <entry>type</entry> - </row> - </thead> - - <tbody> - <row> - <entry>Martin Goik</entry> - - <entry>employee</entry> - </row> - - <row> - <entry>Adam Hacker</entry> - - <entry>customer</entry> - </row> - - <row> - <entry>Eve intruder</entry> - - <entry>customer</entry> - </row> - </tbody> - </tgroup> - </table> - </section> - - <section xml:id="xsl_choose"> - <title><tag class="starttag">xsl:choose</tag></title> - - <para>We already described the <tag class="starttag">xsl:if</tag> - which can be compared to an <code>if(..){...}</code> statement in - many programming languages. The <tag - class="starttag">xsl:choose</tag> element can be compared to - multiple <code>else</code> conditions including an optional final - <code>else</code> block being reached if all boolean tests - fail:</para> - - <programlisting language="java">if (condition a){ -...//block 1 -} else if (condition b){ -... //block b -} ... -... -else { - ... //code being reached whan all conditions evaluate to false -}</programlisting> - - <para>We want to generate a list of memo recipient names with roman - type numeration up to 10. Higher numbers shall be displayed in - ordinary decimal notation:</para> - - <programlisting><computeroutput>I:Adam Hacker -II:Eve intruder -III: ... -IV: ... -...</computeroutput></programlisting> - - <para>Though <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> offers <link - xlink:href="http://www.w3.org/TR/xslt20/#convert">a better - way</link> we may generate these number literals by:</para> - - <programlisting><xsl:template match="/memo"> - <xsl:apply-templates select="to"/> -</xsl:template> - -<xsl:template match="to"> - <xsl:choose> - <xsl:when test="1 = position()">I</xsl:when> - <xsl:when test="2 = position()">II</xsl:when> - <xsl:when test="3 = position()">III</xsl:when> - <xsl:when test="4 = position()">IV</xsl:when> - <xsl:when test="5 = position()">V</xsl:when> - <xsl:when test="6 = position()">VI</xsl:when> - <xsl:when test="7 = position()">VII</xsl:when> - <xsl:when test="8 = position()">VIII</xsl:when> - <xsl:when test="9 = position()">IX</xsl:when> - <xsl:when test="10 = position()">X</xsl:when> - <xsl:otherwise> - <xsl:value-of select="position()"/> - </xsl:otherwise> - </xsl:choose> - - <xsl:text>:</xsl:text> - <xsl:value-of select="."/> - <xsl:value-of select="$newline"/> -</xsl:template></programlisting> - - <para>Note that this conversion is incomplete: If the number in - question is larger than 10 it will be formatted in ordinary decimal - style according to the <tag class="starttag">xsl:otherwise</tag> - clause.</para> - </section> - - <section xml:id="section_html_book"> - <title>A complete HTML formatting example</title> - - <para>We now present a series of exercises showing how to format - <tag class="starttag">book</tag> document instances to XHTML. This - is done in a step by step manner each time showing correspondent - code snippets for our <filename>memo.dtd</filename>.</para> - - <section xml:id="section_memo_to_list"> - <title>Listing the recipients of a memo</title> - - <para>In order to generate a XHTML <link - xlink:href="http://www.w3.org/TR/html401/struct/lists.html#h-10.2">list</link> - of all <tag class="starttag">memo</tag> recipients of a memo we - have to use <tag class="starttag">xsl:output method="xhtml"</tag> - and embed the required HTML tags in our <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style - sheet:</para> - - <programlisting><xsl:output method="xhtml" indent="yes"/> - -<xsl:template match="/memo"> - <html> - <head> - <title>Recipient list</title> - </head> - <body> - <ul> - <xsl:apply-templates select="to"/> - </ul> - </body> - </html> -</xsl:template> - -<xsl:template match="to"> - <li> - <xsl:value-of select="."/> - </li> -</xsl:template></programlisting> - - <para>Processing this style sheet for a <tag - class="starttag">memo</tag> document instance yields:</para> - - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<html> - <head> - <title>Recipient list</title> - </head> - <body> - <ul> - <li>Adam Hacker</li> - <li>Eve intruder</li> - </ul> - </body> -</html></programlisting> - - <para>The generated Xhtml code does not contain a reference to a - DTD. We may supply this reference by modifying our <tag - class="emptytag">xsl:output</tag> directive:</para> - - <programlisting><xsl:output method="xhtml" indent="yes" - <emphasis role="bold">doctype-public</emphasis>="-//W3C//DTD XHTML 1.0 Strict//EN" - <emphasis role="bold">doctype-system</emphasis>="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"/></programlisting> - - <para>This adds a corresponding header which allows to validate - the generated HTML:</para> - - <programlisting><!DOCTYPE html - PUBLIC "<emphasis role="bold">-//W3C//DTD XHTML 1.0 Strict//EN</emphasis>" - "<emphasis role="bold">http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd</emphasis>"> -<html><head> ...</programlisting> - - <para>This may be improved further by instructing the XSL - formatter to use <uri - xlink:href="http://www.w3.org/1999/xhtml">http://www.w3.org/1999/xhtml</uri> - as default namespace:</para> - - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<xsl:stylesheet <emphasis role="bold">xmlns="http://www.w3.org/1999/xhtml"</emphasis> - xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> - -<xsl:output method="xhtml" indent="yes" - doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN" - doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"/> - - <xsl:template match="/"> - <html><head> ... - </xsl:template> -... -</xsl:stylesheet></programlisting> - - <para>This yields the following output::</para> - - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE html - PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" - "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> - -<html <emphasis role="bold">xmlns="http://www.w3.org/1999/xhtml"</emphasis>> - <head> ... -</html></programlisting> - - <para>The top level element <tag class="element">html</tag> is now - declared to belong to the namespace - <code>xmlns="http://www.w3.org/1999/xhtml</code>. This will be - inherited by all inner Xhtml elements.</para> - - <qandaset role="exercise"> - <title>Transforming book instances to Xhtml</title> - - <qandadiv> - <qandaentry xml:id="example_xsl_book_1_dtd"> - <question> - <para>Create a <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> - style sheet to transform instances of the first version of - <link endterm="example_bookDtd" - linkend="example_bookDtd">book.dtd</link> (<xref - linkend="example_bookDtd"/>) into <uri - xlink:href="http://www.w3.org/TR/xhtml1/#a_dtd_XHTML-1.0-Strict">Xhtml - 1.0 strict</uri>.</para> - - <para>You should first construct a Xhtml document - <emphasis>manually</emphasis> before coding the XSL. After - you have a <quote>working</quote> Xhtml example document - create a <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> - style sheet which transforms arbitrary - <filename>book.dtd</filename> document instances into a - corresponding Xhtml file.</para> - </question> - - <answer> - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> - - <xsl:output indent="yes" method="xhtml"/> - - <xsl:template match="/book"> - <html> - <head> - <title><xsl:value-of select="title"/></title> - </head> - <body> - <h1><xsl:value-of select="title"/></h1> - <xsl:apply-templates select="chapter"/> - </body> - </html> - </xsl:template> - - <xsl:template match="chapter"> - <h2><xsl:value-of select="title"/></h2> - <xsl:apply-templates select="para"/> - </xsl:template> - - <xsl:template match="para"> - <p><xsl:value-of select="."/></p> - </xsl:template> - -</xsl:stylesheet></programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="section_xsl_attribute"> - <title><tag class="starttag">xsl:attribute</tag></title> - - <para>Sometimes we want to set attribute values in a generated XML - document. For example we might want to set the background color - <quote>red</quote> if a memo has a priority value of <tag - class="attvalue">high</tag>:</para> - - <programlisting><h1 style="background:red">Firewall problems</h1></programlisting> - - <para>Regarding our memo example this may be achieved by:</para> - - <programlisting><xsl:template match="/memo"> - <html> - ... - <body> - <xsl:variable name="<emphasis role="bold">messageColor</emphasis>" <co - xml:id="programlisting_priority_lolor_vardef"/>> - <xsl:choose> - <xsl:when test="@priority = 'low'">green</xsl:when> - <xsl:when test="@priority = 'medium'">yellow</xsl:when> - <xsl:when test="@priority = 'high'">red</xsl:when> - </xsl:choose> - </xsl:variable> - <h1 style="background:{<emphasis role="bold">$messageColor</emphasis>};" <co - xml:id="programlisting_priority_lolor_usevar"/>> - <xsl:value-of select="subject"/> - </h1> - </body> - </html> -</xsl:template></programlisting> - - <calloutlist> - <callout arearefs="programlisting_priority_lolor_vardef"> - <para>Definition of a color name depending on the attribute - <tag class="attvalue">priority</tag>'s value. The set off - possible attribute values (low,medium,high) is mapped to the - color names (green, yellow,red).</para> - </callout> - - <callout arearefs="programlisting_priority_lolor_usevar"> - <para>The color variable is used to compose the attribute <tag - class="attribute">style</tag>'s value. The curly - <code>{...}</code> braces are part of the <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> - standard's syntax. They are required here to instruct the - <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> - processor to substitute the local variable - <code>messageColor</code>'s value instead of simply copying - the literal string <quote><code>$messageColor</code></quote> - itself to the output document e.g. generating <tag - class="starttag">h1 style = - "background:$messageColor;"</tag>.</para> - </callout> - </calloutlist> - - <para>Instead of constructing an extra variable <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> offers a - slightly more compact way for the same purpose. The <tag - class="starttag">xsl:attribute</tag> element allows us to define - the name of an attribute to be added together with an attribute - value specification:</para> - - <programlisting><xsl:template match="/memo"> - <html> - ... - <h1> - <xsl:attribute name="<emphasis role="bold">style</emphasis>"> - <xsl:text>background:</xsl:text> - <xsl:choose> - <xsl:when test="@priority = 'low'">green</xsl:when> - <xsl:when test="@priority = 'medium'">yellow</xsl:when> - <xsl:when test="@priority = 'high'">red</xsl:when> - </xsl:choose> - </xsl:attribute> - <xsl:value-of select="subject"/> - </h1> - </body> - </html> -</xsl:template></programlisting> - - <qandaset role="exercise"> - <title>Adding a table of contents (toc)</title> - - <qandadiv> - <qandaentry xml:id="example_book_toc"> - <question> - <para>For larger document instances it is convenient to - add a table of contents to the generated Xhtml document. - <!-- We - demonstrate the desired result as an <uri - xlink:href="src/viewlet/bookhtmltoc/bookhtmltoc_viewlet_swf.html">animation</uri>.--></para> - - <para>For this exercise you need a unique string value for - each <tag class="starttag">chapter</tag> node. If a <tag - class="starttag">chapter</tag>'s <tag - class="attribute">id</tag> attribute had been declared as - <code>#REQUIRED</code> its value would do this job - perfectly. Unfortunately you cannot rely on its existence - since it is declared to be <code>#IMPLIED</code> and may - thus be absent.</para> - - <para>XSL offers a standard function for this purpose - namely <link - xlink:href="http://www.w3.org/TR/xslt20/#generate-id">generate-id(...)</link>. - In a nutshell this function takes a XML node as an - argument (or being called without arguments it uses the - context node) and creates a string value being unique with - respect to <emphasis>all</emphasis> other nodes in the - document. For a given node the function may be called - repeatedly and is guaranteed to always return the same - value during the <emphasis>same</emphasis> transformation - run. So it suffices to add something like <tag - class="starttag">a href="#{generate-id(...)}"</tag> or use - it in conjunction with <tag - class="starttag">xsl:attribute</tag>.</para> - </question> - - <answer> - <para>We use the <code>generate-id()</code> function to - create a unique identity string for each chapter node. - Since we also want to define links to the table of - contents we need another unique string value. It is - tempting to simply use a static value like - <quote>__toc__</quote> for this purpose. However we can - not be sure that this value coincides with one of the - <code>generate-id()</code> function return values.</para> - - <para>A cleaner solution uses the <tag - class="starttag">book</tag> node's generated identity - string for this purpose. As stated before this value is - definitively unique:</para> - - <programlisting><xsl:template match="/book"> -... - <body> - <h1><xsl:value-of select="title"/></h1> - <h2 id="{generate-id(.)}" <co xml:base="" - xml:id="programlisting_book_toc_def_toc"/>>Table of contents</h2> - <ul> - <xsl:for-each select="chapter"> - <li> - <a href="#{generate-id(.)}" <co xml:base="" - xml:id="programlisting_book_toc_ref_chap"/>><xsl:value-of select="title"></xsl:value-of></a> - </li> - </xsl:for-each> - </ul> - <xsl:apply-templates select="chapter"/> - </body> - </html> -</xsl:template> - -<xsl:template match="chapter"> - <h2 id="{generate-id(.)}" <co xml:base="" - xml:id="programlisting_book_toc_def_chap"/>> - <a href="#{generate-id(/book)}" <co xml:base="" - xml:id="programlisting_book_toc_ref_toc"/>> - <xsl:value-of select="title"/> - </a> - </h2> - <xsl:apply-templates select="para"/> -</xsl:template> -...</programlisting> - - <calloutlist> - <callout arearefs="programlisting_book_toc_def_toc"> - <para>The current context node is <tag - class="starttag">book</tag>. We use it as argument to - <code>generate-id()</code> to create a unique identity - string.</para> - </callout> - - <callout arearefs="programlisting_book_toc_ref_chap"> - <para>The <tag class="starttag">xsl:for-each</tag> - iterates over all <tag class="starttag">chapter</tag> - nodes. We reference the corresponding target nodes - being created in <xref - linkend="programlisting_book_toc_def_chap"/>.</para> - </callout> - - <callout arearefs="programlisting_book_toc_def_chap"> - <para>Each <tag class="starttag">chapter</tag>'s - heading is supplied with a unique identity string - being referenced from <xref - linkend="programlisting_book_toc_ref_chap"/>.</para> - </callout> - - <callout arearefs="programlisting_book_toc_ref_toc"> - <para>Clicking on a chapter's title shall take us back - to the table of contents (toc). So we create a - hypertext link referencing our toc heading's identity - string being defined in <xref - linkend="programlisting_book_toc_def_toc"/>.</para> - </callout> - </calloutlist> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="section_xsl_mixed"> - <title>XSL and mixed content</title> - - <para>We come back to our memo example from <xref - linkend="figure_memo_content_mixed"/> and ask ourselves how to - format mixed content. In the example the following part of a - document instance was given:</para> - - <programlisting><content>The <emphasis role="bold"><url href="http://w3.org/XML">XML</url></emphasis> language - is <emphasis role="bold"><emphasis>easy</emphasis></emphasis> to learn. However you need - some <emphasis role="bold"><emphasis>time</emphasis></emphasis>.</content></programlisting> - - <para>Embedded element nodes have been set to bold style in order - to distinguish them from <code>#PCDATA</code> text nodes. We may - also use <xref linkend="figure_memo_content_mixed"/> to help - understanding the formatting process of mixed content. First we - mention a possible way our Xhtml output might look like:</para> - - <programlisting><p>The <emphasis role="bold"><a href="http://w3.org/XML">XML</a>language is<em>easy</em></emphasis> to learn. However you -need some <emphasis role="bold"><em>time</em></emphasis>.</p></programlisting> - - <para>We start with a first version of an <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> - template:</para> - - <programlisting> <xsl:template match="content"> - <p> - <xsl:value-of select="."/> - </p> - </xsl:template></programlisting> - - <para>As mentioned earlier all <code>#PCDATA</code> text nodes of - the whole subtree are glued together leading to:</para> - - <programlisting><p>The XML language is easy to learn. However you need some time.</p></programlisting> - - <para>Our next attempt is to define templates to format the - elements <tag class="starttag">url</tag> and <tag - class="starttag">emphasis</tag>:</para> - - <programlisting>... -<xsl:template match="content"> - <p> - <xsl:apply-templates select="emphasis|url"/> - </p> -</xsl:template> - -<xsl:template match="url"> - <a href="{@href}"><xsl:value-of select="."/></a> -</xsl:template> - -<xsl:template match="emphasis"> - <em><xsl:value-of select="."/></em> -</xsl:template> -...</programlisting> - - <para>As expected the sub elements are formatted correctly. - Unfortunately the <code>#PCDATA</code> text nodes between the - element nodes are lost:</para> - - <programlisting><p> - <a href="http://w3.org/XML">XML</a> - <em>easy</em> - <em>time</em> -</p></programlisting> - - <para>To correct this transformation script we have to tell the - formatting processor to include bare text nodes into the output. - The <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> - standard defines a function <link - xlink:href="http://www.w3.org/TR/xpath#path-abbrev">text()</link> - for this purpose. It returns the boolean value <code>true</code> - for an argument node of type text:</para> - - <programlisting>... -<xsl:template match="content"> - <p> - <xsl:apply-templates select="<emphasis role="bold">text()</emphasis>|emphasis|url"/> - </p> -</xsl:template> -...</programlisting> - - <para>The yields the desired output. The text node result elements - are shown in bold style</para> - - <programlisting><p><emphasis role="bold">The</emphasis> <a href="http://w3.org/XML">XML</a><emphasis - role="bold"> language is </emphasis><em>easy</em><emphasis - role="bold"> to learn. However -you need some </emphasis><em>time</em><emphasis role="bold">.</emphasis></p></programlisting> - - <para>Some remarks:</para> - - <orderedlist> - <listitem> - <para>The <abbrev - xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> - expression <code>select="text()|emphasis|url"</code> - corresponds nicely to the content model definition in the - DTD:</para> - - <programlisting><!ELEMENT content (#PCDATA|emphasis|url)*></programlisting> - </listitem> - - <listitem> - <para>In most mixed content models <emphasis>all</emphasis> - sub elements of e.g. <tag class="starttag" - role="">content</tag> have to be formatted. During development - some of the elements defined in a DTD are likely to be omitted - by accidence. For this reason the <quote>typical</quote> - <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> - expression acting on mixed content models is defined to match - <emphasis>any</emphasis> sub element nodes:</para> - - <programlisting>select="text()|<emphasis role="bold">*</emphasis>"</programlisting> - </listitem> - - <listitem> - <para>Regarding <code>select="text()|emphasis|url"</code> we - have defined two templates for element nodes <tag - class="starttag">emphasis</tag> and <tag - class="starttag">url</tag>. What happens to those text nodes - being matched by <code>text()</code>? These are subject to a - default rule: The content of bare text nodes is written to the - output. We may however redefine this default rule by adding a - template:</para> - - <programlisting><xsl:template match="text()"> - <emphasis role="bold"><span style="color:red"> - <xsl:value-of select="."/> - </span></emphasis> -</xsl:template></programlisting> - - <para>This yields:</para> - - <programlisting><p> - <emphasis role="bold"><span style="color:red">The </span></emphasis> - <a href="http://w3.org/XML">XML</a> - <emphasis role="bold"><span style="color:red"> language is </span></emphasis> - <em>easy</em> - <emphasis role="bold"><span style="color:red"> to learn. However you need some </span></emphasis> - <em>time</em> - <emphasis role="bold"><span style="color:red">.</span></emphasis> -</p></programlisting> - - <para>In most cases it is not desired to replace all text - nodes throughout the whole document. In the current example we - might only format text nodes being - <emphasis>immediate</emphasis> children of <tag - class="starttag">content</tag>. This may be achieved by - restricting the <abbrev - xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> - expression to <tag class="starttag">xsl:template - match="content/text()"</tag>.</para> - </listitem> - </orderedlist> - </section> - - <section xml:id="section_xsl_functionid"> - <title>The function <code>id()</code></title> - - <para>In <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> we sometimes - want to lookup nodes by an attribute value of type <link - linkend="section_id_idref">ID</link>. We consider our product - catalog from <xref linkend="figure_intern_reference_xml"/>. The - following <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> may be used - to create Xhtml documents from <tag class="starttag">catalog</tag> - instances:</para> - - <programlisting xml:lang=""><xsl:template match="/catalog"> - <html> - <head><title>Product catalog</title></head> - <body> - <h1>List of Products</h1> - <xsl:apply-templates select="product"/> - </body> - </html> -</xsl:template> - -<xsl:template match="product"> - <h2 id="{@id}" <co xml:base="" - xml:id="programlisting_catalog2html_v1_defid"/>><xsl:value-of select="title"/></h2> - <xsl:apply-templates select="para"/> -</xsl:template> - -<xsl:template match="para"> - <p><xsl:apply-templates select="text()|*" <co - xml:id="programlisting_catalog2html_v1_mixed"/>/></p> -</xsl:template> - -<xsl:template match="link"> - <a href="#{@ref}" <co xml:id="programlisting_catalog2html_v1_refid"/>><xsl:value-of select="."/></a> -</xsl:template></programlisting> - - <calloutlist> - <callout arearefs="programlisting_catalog2html_v1_defid"> - <para>The <code>ID</code> attribute <tag - class="starttag">product id="foo"</tag> is unique within the - document instance. We may thus use it as an unique string - value in the generated Xhtml, too.</para> - </callout> - - <callout arearefs="programlisting_catalog2html_v1_mixed"> - <para>Mixed content consisting of text and <tag - class="starttag">link</tag> nodes.</para> - </callout> - - <callout arearefs="programlisting_catalog2html_v1_refid"> - <para>We define a file local Xhtml reference to a - product.</para> - </callout> - </calloutlist> - - <para>The <tag class="starttag">para</tag> element from the - example document instance containing a <tag class="starttag">link - ref="homeTrainer"</tag> reference will be formatted as:</para> - - <programlisting><p>If you hate rain look <a href="#homeTrainer">here</a>.</p></programlisting> - - <para>Now suppose we want to add the product's title - <emphasis>Home trainer</emphasis> here to give the reader an idea - about the product without clicking the hypertext link:</para> - - <programlisting><p>If you hate rain look <a href="#homeTrainer">here</a> <emphasis - role="bold">(Home trainer)</emphasis>.</p></programlisting> - - <para>This title text node is part of the <tag - class="starttag">product</tag>node being referenced from the - current <tag class="starttag">para</tag>:</para> - - <figure xml:id="linkIdrefProduct"> - <title>A graphical representation of our <tag - class="starttag">catalog</tag>.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/xsl_id.fig"/> - </imageobject> - </mediaobject> - - <caption> - <para>The dashed line shows the <code>IDREF</code> based - reference from the <tag class="starttag">link</tag> to the - <tag class="starttag">product</tag> node.</para> - </caption> - </figure> - - <para>In <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> we may - follow <code>ID</code> reference by means of the built in function - <link - xlink:href="http://www.w3.org/TR/xpath#function-id">id(...)</link>:</para> - - <programlisting><xsl:template match="link"> - <a href="#{@ref}"><xsl:value-of select="."/></a> - <xsl:text> (</xsl:text> - <xsl:value-of select="<emphasis role="bold">id(@ref)</emphasis>/title" <co - xml:id="programlisting_xsl_id_follow"/>/> - <xsl:text>)</xsl:text> -</xsl:template></programlisting> - - <para>Evaluating <code>id(@ref)</code> at <xref - linkend="programlisting_xsl_id_follow"/> returns the first <tag - class="starttag">product</tag> <emphasis>node</emphasis>. We - simply take its <tag class="starttag">title</tag> value and embed - it into a pair of braces. This way the desired text portion - <emphasis role="bold">(Home trainer)</emphasis> gets added after - the hypertext link.</para> - - <qandaset role="exercise"> - <title>Extending the memo style sheet by mixed content and - itemized lists</title> - - <qandadiv> - <qandaentry xml:id="example_book_xsl_mixed"> - <question> - <para>In <xref linkend="example_book.dtd_v5"/> we - constructed a DTD allowing itemized lists an mixed content - for <tag class="starttag">book</tag> instances. This DTD - also allowed to define <tag - class="starttag">emphasis</tag>, <tag - class="starttag">table</tag> and <tag - class="starttag">link</tag> elements being part of a mixed - content definition. Extend the current book2html.xsl to - account for these extensions.</para> - - <para - xlink:href="http://www.w3.org/TR/xslt20/#element-copy-of">As - we already saw in our memo example itemized lists in Xhtml - are represented by the element <tag - class="starttag">ul</tag> containing <tag - class="starttag">li</tag> elements. Since <tag - class="starttag">p</tag> elements are also allowed to - appear as children our itemized lists can be easily mapped - to Xhtml tags. A<tag class="starttag">link</tag> node may - be transformed into <tag class="starttag">a - href="..."</tag> Xhtml node.</para> - - <para>The table model is a simplified version of the Xhtml - table model. Read the <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> - documentation of the element <tag - class="emptytag">xsl:copy-of</tag> at <link - xlink:href="http://www.w3.org/TR/xslt20/#element-copy-of">copy-of</link> - for processing tables.</para> - </question> - - <answer> - <para>The full source code of the solution is available at - <link - xlink:href="Ref/src/Dtd/book/v5/book2html.1.xsl">(Online - HTML version) ... book2html.1.xsl</link>. We discuss some - important aspects. The following table provides mapping - rules from <filename>book.dtd</filename> to Xhtml:</para> - - <table xml:id="table_book2xhtml_element_mappings"> - <title>Mapping elements from - <filename>book.dtd</filename> to Xhtml</title> - - <?dbhtml table-width="50%" ?> - - <?dbfo table-width="50%" ?> - - <tgroup cols="2"> - <colspec colwidth="3*"/> - - <colspec colwidth="2*"/> - - <thead> - <row> - <entry>book.dtd</entry> - - <entry>Xhtml</entry> - </row> - </thead> - - <tbody> - <row> - <entry><tag class="starttag">book</tag>/<tag - class="starttag">title</tag></entry> - - <entry><tag class="starttag">h1</tag></entry> - </row> - - <row> - <entry><tag class="starttag">chapter</tag>/<tag - class="starttag">title</tag></entry> - - <entry><tag class="starttag">h2</tag></entry> - </row> - - <row> - <entry><tag class="starttag">para</tag> (mixed - content)</entry> - - <entry><tag class="starttag">p</tag></entry> - </row> - - <row> - <entry><tag class="starttag">link - href="foo"</tag></entry> - - <entry><tag class="starttag">a - href="foo"</tag></entry> - </row> - - <row> - <entry><tag class="starttag">emphasis</tag></entry> - - <entry><tag class="starttag">em</tag></entry> - </row> - - <row> - <entry><tag - class="starttag">itemizedlist</tag></entry> - - <entry><tag class="starttag">ul</tag></entry> - </row> - - <row> - <entry><tag class="starttag">listitem</tag></entry> - - <entry><tag class="starttag">li</tag></entry> - </row> - - <row> - <entry><tag class="starttag">table</tag>, <tag - class="starttag">caption</tag>,<tag - class="starttag">tr</tag>, <tag - class="starttag">td</tag> along with all - attributes</entry> - - <entry>Identity copy</entry> - </row> - </tbody> - </tgroup> - </table> - - <para>Since our table model is a subset of the HTML table - model we may simply copy corresponding nodes to the - output:</para> - - <programlisting><xsl:template match="table"> - <xsl:copy-of select="."/> -</xsl:template></programlisting> - - <para>Next we need rules for itemized lists and - paragraphs. Our model already implements lists in a way - that closely resembles XHTML lists. Since the structure - are compatible we only have to provide a mapping:</para> - - <programlisting><xsl:template match="para"> - <p id="{generate-id(.)}"><xsl:apply-templates select="text()|*" /></p> -</xsl:template> - -<xsl:template match="itemizedlist"> - <ul><xsl:apply-templates select="listitem"/></ul> -</xsl:template> - -<xsl:template match="listitem"> - <li><xsl:apply-templates select="*"/></li> -</xsl:template></programlisting> - - <para>Since <emphasis>all</emphasis> chapters are - reachable via hypertext links from the table of contents - we <emphasis>must</emphasis> supply a unique - <code>id</code> value <xref - linkend="programlisting_book2html_single_chapterid"/> for - <emphasis>all</emphasis> of them. Chapters and paragraphs - may be referenced by <tag class="starttag">link</tag> - elements and thus <emphasis>both</emphasis> need a unique - identity value. For simplicity we create both of them via - <code>generate-id()</code>. In a more sophisticated - solution the strategy would be slightly different:</para> - - <itemizedlist> - <listitem> - <para>If a <tag class="starttag">chapter</tag> node - does have an <code>id</code> attribute defined then - take its value.</para> - </listitem> - - <listitem> - <para>If a <tag class="starttag">chapter</tag> node - does <emphasis>not</emphasis> have an <code>id</code> - attribute defined then use - <code>generate-id()</code>.</para> - </listitem> - - <listitem> - <para><tag class="starttag">para</tag> nodes only get - values in XHTML if they do have an <code>id</code> - attribute defined. This is consistent since these - nodes are never referenced from the table of contents. - Thus an identity is only required if the <tag - class="starttag">para</tag> node is referenced by a - <tag class="starttag">link</tag>. If that is a case - the <tag class="starttag">para</tag> surely does have - a defined identity value.</para> - </listitem> - </itemizedlist> - - <para>We also have to provide a hypertext link <xref - linkend="programlisting_book2html_single_toclink"/> to the - table of contents:</para> - - <programlisting><xsl:template match="chapter"> - <h2 id="{<emphasis role="bold">generate-id(.)</emphasis>}" <co - xml:base="" - xml:id="programlisting_book2html_single_chapterid"/>> - <a href="#{<emphasis role="bold">generate-id(/book)</emphasis>}" <co - xml:base="" - xml:id="programlisting_book2html_single_toclink"/>><xsl:value-of select="title"/></a> - </h2> - <xsl:apply-templates select="para|itemizedlist|table"/> -</xsl:template></programlisting> - - <para>Implementing the <tag class="starttag">link</tag> - element is somewhat more complicated. We cannot use the - <code>@ref</code> attribute values itself as <tag - class="starttag">a href="..."</tag> attribute values since - the target's identity string is generated via - <code>generate-id()</code>. But we may follow the - reference via the <abbrev - xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> - <link linkend="section_xsl_functionid">id()</link> - function and then use the target's identity value:</para> - - <programlisting><xsl:template match="link"> - <a href="#{generate-id(id(@linkend))}"> - <xsl:value-of select="."/> - </a> -</xsl:template></programlisting> - - <para>The call to <code>id(@linkend)</code> returns either - a <tag class="starttag">chapter</tag> or a <tag - class="starttag">para</tag> node since according to the - DTD attributes of type <code>ID</code> are only defined - for these two elements. Using this node as input to - <code>generate-id()</code> returns the desired identity - value for the generated Xhtml.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="xslAxis"> - <title>XSL axis definitions</title> - - <para>XSL allows us to traverse a document instance's graph in - different directions. We start with a memo document - instance:</para> - - <programlisting><!DOCTYPE memo SYSTEM "memo.dtd"> -<memo date="9.9.2099"> - <from>Joe</from> - <to>Jack</to> - <to>Eve</to> - <to>Jude</to> - <to>Tolstoi</to> - <subject>Ignore me!</subject> - <content> - <para>Dumb text.</para> - </content> -</memo></programlisting> - - <para>This instance defines four nodes of type <tag - class="starttag">to</tag>. For each of these we want to create a - line of text showing also the preceding and the following - recipients:</para> - - <programlisting> <----Jack----> Eve Jude Tolstoi <co - xml:id="programlisting_axis_jack"/> -Jack <----Eve----> Jude Tolstoi <co xml:id="programlisting_axis_eve"/> -Jack Eve <----Jude----> Tolstoi <co xml:id="programlisting_axis_jude"/> -Jack Eve Jude <----Tolstoi----> <co - xml:id="programlisting_axis_tolstoi"/></programlisting> - - <calloutlist> - <callout arearefs="programlisting_axis_jack"> - <para>Jack has no predecessor and 3 successors</para> - </callout> - - <callout arearefs="programlisting_axis_eve"> - <para>Eve has 1 predecessor and 2 successors</para> - </callout> - - <callout arearefs="programlisting_axis_jude"> - <para>Jude has 2 predecessors and 1 successor</para> - </callout> - - <callout arearefs="programlisting_axis_tolstoi"> - <para><personname>Tolstoi</personname> has 3 predecessors and - no successor</para> - </callout> - </calloutlist> - - <para>XSL supports this type of transformation by supplying - <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> - axis definitions. We consider a memo document with 9 <tag - class="starttag">to</tag> nodes:</para> - - <figure xml:id="memo9recipients"> - <title>A memo with 9 recipients</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/memofour.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>We marked the 4-th recipient to represent the context node. - All three <tag class="starttag">to</tag> nodes to the - <quote>left</quote> belong to the <emphasis>set</emphasis> of - preceding siblings with respect to the context node. Likewise the - 5 neighbours to the right are called following siblings. Returning - to our <quote>four recipient</quote> example we may create the - desired output by:</para> - - <programlisting><xsl:template match="/"> - <xsl:apply-templates select="memo/to"/> -</xsl:template> - -<xsl:template match="to"> - - <xsl:for-each select="preceding-sibling::to" <co - xml:id="programlisting_memo_four_xsl_preceding"/>> - <xsl:value-of select="."/> - <xsl:text> </xsl:text> - </xsl:for-each> - - <xsl:text> &lt;----</xsl:text> - <xsl:value-of select="."/> <co - xml:id="programlisting_memo_four_xsl_context"/> - <xsl:text>----&gt; </xsl:text> - - <xsl:for-each select="following-sibling::to"> <co - xml:id="programlisting_memo_four_xsl_following"/> - <xsl:value-of select="."/> - <xsl:text> </xsl:text> - </xsl:for-each> - <xsl:value-of select="$newline"/> -</xsl:template></programlisting> - - <calloutlist> - <callout arearefs="programlisting_memo_four_xsl_preceding"> - <para>Iterate on the set of recipients <quote>left</quote> of - the context node.</para> - </callout> - - <callout arearefs="programlisting_memo_four_xsl_context"> - <para>Taking the context node's value embedded in - <code><---- ... ----></code>.</para> - </callout> - - <callout arearefs="programlisting_memo_four_xsl_following"> - <para>Iterate on the set of recipients <quote>right</quote> of - the context node.</para> - </callout> - </calloutlist> - - <para>More formally the set of preceding siblings is defined to be - the set of all nodes having the same parent as the context node - and appearing <quote>before</quote> the context node. The notion - <quote>before</quote> is meant in the sense of a <link - xlink:href="http://en.wikipedia.org/wiki/Depth-first_search">depth-first</link> - traversal of the document tree. <abbrev - xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> provides - different axis definitions, see <uri - xlink:href="http://www.w3.org/TR/xpath#axes">http://www.w3.org/TR/xpath#axes</uri> - for details. We provide an illustration here:</para> - - <figure xml:id="disjointAxeSets"> - <title>Disjoint <acronym - xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> axis - definitions.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/preceding.fig"/> - </imageobject> - </mediaobject> - - <caption> - <para>The sets defined by ancestor, descendant, following, - preceding and self are disjoint. Their union forms the set of - all document nodes.</para> - </caption> - </figure> - - <para>Some remarks:<itemizedlist> - <listitem> - <para>If the context node is already the topmost node i.e. - the root node then the sets defined by <code>ancestor</code> - and <code>parent</code> are empty.</para> - </listitem> - - <listitem> - <para>The <code>parent</code> set - <emphasis>always</emphasis> contains zero or one - node.</para> - </listitem> - </itemizedlist></para> - </section> - - <section xml:id="xslChunking"> - <title>Splitting documents into chunks</title> - - <para>Sometimes we want to generate multiple output documents from - a single XML source. It may for example be a bad idea to transform - a book of 200 printed pages into a <emphasis>single</emphasis> - online HTML page. Instead we may split each chapter into a - separate HTML file and create navigation links between - them.</para> - - <para>We consider a memo document instance. We want to generate - one text file for each memo recipient containing just the - recipient's name using the <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> element - <link - xlink:href="http://www.w3.org/TR/xslt20/#element-result-document"><xsl:result-document></link>:</para> - - <programlisting><xsl:template match="/memo"> - <xsl:apply-templates select="to"/> -</xsl:template> - -<xsl:template match="to"> - <emphasis role="bold"><xsl:result-document</emphasis> - <co xml:id="programlisting_xsl_result_document_main"/> - <emphasis role="bold">href="file_{position()}.txt"</emphasis> - <co xml:id="programlisting_xsl_result_document_href"/> - <emphasis role="bold">method="text"</emphasis> - <co xml:id="programlisting_xsl_result_document_method"/>> - <xsl:value-of select="."/> <co - xml:id="programlisting_xsl_result_document_content"/> - - <emphasis role="bold"></xsl:result-document></emphasis> -</xsl:template></programlisting> - - <calloutlist> - <callout arearefs="programlisting_xsl_result_document_main"> - <para>The output from all generating <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> - directives will be redirected from standard output to another - output channel.</para> - </callout> - - <callout arearefs="programlisting_xsl_result_document_href"> - <para>The output will be written to a file named - <filename>file_i.txt</filename> with the decimal number - <code>i</code> ranging from the value 1 to the number of - recipients.</para> - </callout> - - <callout arearefs="programlisting_xsl_result_document_method"> - <para>The <code>method</code> attribute may possibly override - a value being given in the <tag - class="starttag">xsl:output</tag> element. We may also - redefine <link - xlink:href="http://www.w3.org/TR/xslt20/#element-result-document">other - attributes</link> from <tag class="starttag">xsl:output</tag> - like <code>doctype-{public.system}</code>, and - <code>encoding</code>.</para> - </callout> - - <callout arearefs="programlisting_xsl_result_document_content"> - <para>All output being generated in this region gets - redirected to the channel specified in <xref - linkend="programlisting_xsl_result_document_href"/>.</para> - </callout> - </calloutlist> - - <qandaset role="exercise"> - <title>Splitting book into chapter files</title> - - <qandadiv> - <qandaentry xml:id="example_book_chunk"> - <question> - <para>Extend your solution of <xref - linkend="example_book_xsl_mixed"/> by writing each <tag - class="starttag">chapter</tag>'s content into a separate - Xhtml file. In addition create a file - <filename>index.html</filename> which contains references - to the corresponding <tag class="starttag">chapter</tag> - documents. Thus for a document instance with two chapters - the overall navigation structure is illustrated by <xref - linkend="figure_book_navigation"/>.</para> - - <para>Implementing the <tag class="starttag">link</tag> - tag may cause a problem: An internal link may reference a - <tag class="starttag">para</tag>. You need to identify the - <tag class="starttag">chapter</tag> node embedding this - para. This may be done by using a suitable <abbrev - xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> - axis direction.</para> - </question> - - <answer> - <para>The full source code of the solution is available at - <link - xlink:href="Ref/src/Dtd/book/v5/book2chunks.1.xsl">(Online - HTML version) ... book2chunks.1.xsl</link>. First we - generate the table of contents as the file - <filename>index.html</filename>:</para> - - <programlisting><xsl:template match="/"> - <xsl:result-document href="index.html"> - <xsl:apply-templates select="book"/> - </xsl:result-document> - - <xsl:for-each select="book/chapter"> - <xsl:result-document href="{generate-id(.)}.html"> - <xsl:apply-templates select="."/> - </xsl:result-document> - </xsl:for-each> -</xsl:template> - -<xsl:template match="book"> - <html> - <head><title><xsl:value-of select="title"/></title></head> - <body> - <h1><xsl:value-of select="title"/></h1> - <h2>Table of contents</h2> - <ul> - <xsl:for-each select="<emphasis role="bold">chapter</emphasis>"> - <li><a href="{<emphasis role="bold">generate-id(.)</emphasis>}.html"><xsl:value-of select="title"/></a></li> - </xsl:for-each> - </ul> - </body> - </html> -</xsl:template></programlisting> - - <para>The <tag class="starttag">link ref="..."</tag> may - reference a <tag class="starttag">chapter</tag> or a <tag - class="starttag">para</tag>. So we may need to <quote>step - up</quote> from a paragraph to the corresponding chapter - node:</para> - - <programlisting><xsl:template match="link"> - <xsl:variable name="reftargetNode" select="id(@linkend)"/> - <xsl:variable name="reftargetParentChapter" - select="$reftargetNode/ancestor-or-self::chapter"/> - - <a href="{generate-id($reftargetParentChapter)}.html#{ - generate-id($reftargetNode)}"> - <xsl:value-of select="."/> - </a> -</xsl:template></programlisting> - - <para>This is consistent since <emphasis>all</emphasis> - <tag class="starttag">p</tag> nodes in the generated Xhtml - receive a unique <code>id</code> value regardless whether - the originating <tag class="starttag">para</tag> node does - have one.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <figure xml:id="figure_book_navigation"> - <title>A <tag class="starttag">book</tag> document with two - chapters</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/booknavigate.fig"/> - </imageobject> - </mediaobject> - </figure> - </section> - </section> - </section> - </chapter> - - <chapter xml:id="xmlApis"> - <title><abbrev - xlink:href="http://en.wikipedia.org/wiki/Api">API</abbrev>s for XML - document processing</title> - - <section xml:id="sax"> - <title>The Simple API for XML</title> - - <section xml:id="saxPrinciple"> - <title>The principle of a <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> - application</title> - - <para>We are already familiar with transformations of XML document - instances to other formats. Sometimes the capabilities being offered - by a given transformation approach do not suffice for a given - problem. Obviously a general purpose programming language like - <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - offers superior means to perform advanced manipulations of XML - document trees.</para> - - <para>Before diving into technical details we present an example - exceeding the limits of our present transformation capabilities. We - want to format an XML catalog document with article descriptions to - HTML. The price information however shall resides in a XML document - external database namely a RDBMS:</para> - - <figure xml:id="saxRdbmsAccessPrinciple"> - <title>Generating HTML from a XML document and an RDBMS.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/saxxmlrdbms.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>Our catalog might look like:</para> - - <figure xml:id="simpleCatalog"> - <title>A <abbrev xlink:href="http://www.w3.org/XML">Xml</abbrev> - based catalog.</title> - - <programlisting><catalog> - <item orderNo="<emphasis role="bold">3218</emphasis>">Swinging headset</item> - <item orderNo="<emphasis role="bold">9921</emphasis>">200W Stereo Amplifier</item> -</catalog></programlisting> - </figure> - - <para>The RDBMS may hold some relation with a field - <code>orderNo</code> as primary key and a corresponding attribute - like <code>price</code>. In a real world application - <code>orderNo</code> should probably be an integer typed - <code>IDENTITY</code> attribute.</para> - - <figure xml:id="saxRdbmsSchema"> - <title>A Relation containing price information.</title> - - <programlisting>CREATE TABLE Product ( - orderNo CHAR(10) PRIMARY KEY - ,price Money -) - -INSERT INTO Product VALUES('<emphasis role="bold">3218</emphasis>', 42.57) -INSERT INTO Product VALUES('<emphasis role="bold">9921</emphasis>', 121.50)</programlisting> - - <caption> - <para>Prices are depending on article numbers.</para> - </caption> - </figure> - - <para>The intended HTML output with order numbers being highlighted - looks like:</para> - - <figure xml:id="saxPriceOut"> - <title>HTML generated output.</title> - - <programlisting><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> - <html> - <head><title>Available products</title></head> - <body> - <table border="1"> - <tbody> - <tr> - <th><emphasis role="bold">Order number</emphasis></th> - <th>Price</th> - <th>Product</th> - </tr> - <tr> - <td><emphasis role="bold">3218</emphasis></td> - <td>42,57</td> - <td>Swinging headset</td> - </tr> - <tr> - <td><emphasis role="bold">9921</emphasis></td> - <td>121,50</td> - <td>200W Stereo Amplifier</td> - </tr> - </tbody> - </table> - </body> - </html></programlisting> - - <caption> - <para>This result HTML document contains content both from our - XML document an from the database table - <code>Product</code>.</para> - </caption> - </figure> - - <para>The intended transformation is beyond the XSLT standard's - processing capabilities: XSLT does not enable us to RDBMS content. - However some XSLT processors provide extensions for this - task.</para> - - <para>It is tempting to write a <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - application which might use e.g. <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - for database access. But how do we actually read and parse a XML - file? Sticking to the <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - standard we might use a <link - xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/java/io/FileInputStream.html">FileInputStream</link> - instance to read from <code>catalog.xml</code> and write a XML - parser by ourself. Fortunately <orgname>SUN</orgname>'s <trademark - xlink:href="http://www.oracle.com/technetwork/java/javase">JDK</trademark> - already includes an API denoted <acronym - xlink:href="http://www.saxproject.org">SAX</acronym>, the - <emphasis>S</emphasis>imple <emphasis>A</emphasis>pi for - <emphasis>X</emphasis>ml. The<productname - xlink:href="http://www.oracle.com/technetwork/java/javase/jdk-7-readme-429198.html">JDK</productname> - also includes a corresponding parser implementation. In addition - there are third party <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> parser - implementations available like <productname - xlink:href="http://xerces.apache.org">Xerces</productname> from the - <orgname xlink:href="http://www.apache.org">Apache - Foundation</orgname>.</para> - - <para>The <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> API is event - based and will be illustrated by the relationship between customers - and a software vendor company:</para> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/updateinfo.fig"/> - </imageobject> - </mediaobject> - - <para>After purchasing software customers are asked to register - their software. This way the vendor receives the customer's address. - Each time a new release is being completed all registered customers - will receive a notification typically including a <quote>special - offer</quote> to upgrade their software. From an abstract point of - view the following two actions take place:</para> - - <variablelist> - <varlistentry> - <term>Registration</term> - - <listitem> - <para>The customer registers itself at the company's site - indicating it's interest in updated versions.</para> - </listitem> - </varlistentry> - - <varlistentry> - <term>Notification</term> - - <listitem> - <para>Upon completion of each new software release (considered - to be an <emphasis>event</emphasis>) a message is sent to all - registered customers.</para> - </listitem> - </varlistentry> - </variablelist> - - <para>The same principle applies to GUI applications in software - development. A key press <emphasis>event</emphasis> for example will - be forwarded by an application's <emphasis>event handler</emphasis> - to a callback function (sometimes called a - <emphasis>handler</emphasis> method) being implemented by an - application developer. The <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> API works the - same way: A parser reads a XML document generating events which - <emphasis>may</emphasis> be handled by an application. During - document parsing the XML tree structure gets - <quote>flattened</quote> to a sequence of events:</para> - - <figure xml:id="saxFlattenEvent"> - <title>Parsing a XML document creates a corresponding sequence of - events.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/saxmodel.pdf"/> - </imageobject> - </mediaobject> - </figure> - - <para>An application may register components to the parser:</para> - - <figure xml:id="figureSax"> - <title><acronym - xlink:href="http://www.saxproject.org">SAX</acronym> - Principle</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/saxapparch.pdf"/> - </imageobject> - </mediaobject> - - <caption> - <para>A <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> application - consists of a <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> parser and - an implementation of event handlers being specific to the - application. The application is developed by implementing the - two handlers.</para> - </caption> - </figure> - - <para>An Error Handler is required since the XML stream may contain - errors. In order to implement a <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> application we - have to:</para> - - <orderedlist> - <listitem> - <para>Instantiate required objects:</para> - - <itemizedlist> - <listitem> - <para>Parser</para> - </listitem> - - <listitem> - <para>Event Handler</para> - </listitem> - - <listitem> - <para>Error Handler</para> - </listitem> - </itemizedlist> - </listitem> - - <listitem> - <para>Register handler instances</para> - - <itemizedlist> - <listitem> - <para>register Event Handler to Parser</para> - </listitem> - - <listitem> - <para>register Error Handler to Parser</para> - </listitem> - </itemizedlist> - </listitem> - - <listitem> - <para>Start the parsing process by calling the parser's - appropriate method.</para> - </listitem> - </orderedlist> - </section> - - <section xml:id="saxIntroExample"> - <title>First steps</title> - - <para>Our first <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> toy application - <classname>sax.stat.v1.ElementCount</classname> shall simply count - the number of elements it finds in an arbitrary XML document. In - addition the <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> events shall be - written to standard output generating output sketched in <xref - linkend="saxFlattenEvent"/>. The application's central - implementation reads:</para> - - <figure xml:id="saxElementCount"> - <title>Counting XML elements.</title> - - <programlisting language="java">package sax.stat.v1; -... - -public class ElementCount { - - public void parse(final String uri) { - try { - final SAXParserFactory saxPf = SAXParserFactory.newInstance(); - final SAXParser saxParser = saxPf.newSAXParser(); - saxParser.parse(uri, eventHandler); - } catch (ParserConfigurationException e){ - e.printStackTrace(System.err); - } catch (org.xml.sax.SAXException e) { - e.printStackTrace(System.err); - } catch (IOException e){ - e.printStackTrace(System.err); - } - } - - public int getElementCount() { - return eventHandler.getElementCount(); - } - private final MyEventHandler eventHandler = new MyEventHandler(); -}</programlisting> - - <caption> - <para>This application works for arbitrary well-formed XML - documents.</para> - </caption> - </figure> - - <para>We now explain this application in detail. The first part - deals with the instantiation of a parser:</para> - - <programlisting language="java">try { - final SAXParserFactory saxPf = <emphasis role="bold">SAXParserFactory</emphasis>.newInstance(); - final SAXParser saxParser = saxPf.newSAXParser(); - saxParser.parse(uri, eventHandler); -} catch (ParserConfigurationException e){ - e.printStackTrace(System.err); -} ...</programlisting> - - <para>In order to keep an application independent from a specific - parser implementation the <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> uses the so - called <link - xlink:href="http://www.dofactory.com/Patterns/PatternAbstract.aspx">Abstract - Factory Pattern</link> instead of simply calling a constructor from - a vendor specific parser class.</para> - - <para>In order to be useful the parser has to be instructed to do - something meaningful when a XML document gets parsed. For this - purpose our application supplies an event handler instance:</para> - - <programlisting language="java">public void parse(final String uri) { - try { - final SAXParserFactory saxPf = SAXParserFactory.newInstance(); - final SAXParser saxParser = saxPf.newSAXParser(); - saxParser.parse(uri, <emphasis role="bold">eventHandler</emphasis>); - } catch (org.xml.sax.SAXException e) { - ... - private final MyEventHandler <emphasis role="bold">eventHandler = new MyEventHandler()</emphasis>; -}</programlisting> - - <para>What does the event handler actually do? It offers methods to - the parser being callable during the parsing process:</para> - - <programlisting language="java">package sax.stat.v1; -... -public class MyEventHandler extends <classname>org.xml.sax.helpers.DefaultHandler</classname> { - - public void <emphasis role="bold"><emphasis role="bold">startDocument()</emphasis></emphasis><co - xml:id="programlisting_eventhandler_startDocument"/> { - System.out.println("Opening Document"); - } - public void <emphasis role="bold">endDocument()</emphasis><co - xml:id="programlisting_eventhandler_endDocument"/> { - System.out.println("Closing Document"); - } - public void <emphasis role="bold">startElement(String namespaceUri, String localName, String rawName, - Attributes attrs)</emphasis> <co - xml:id="programlisting_eventhandler_startElement"/>{ - System.out.println("Opening \"" + rawName + "\""); - elementCount++; - } - public void <emphasis role="bold">endElement(String namespaceUri, String localName, - String rawName)</emphasis><co - xml:id="programlisting_eventhandler_endElement"/>{ - System.out.println("Closing \"" + rawName + "\""); - } - public void <emphasis role="bold">characters(char[] ch, int start, int length)</emphasis><co - xml:id="programlisting_eventhandler_characters"/>{ - System.out.println("Content \"" + new String(ch, start, length) + '"'); - } - public int getElementCount() <co - xml:id="programlisting_eventhandler_getElementCount"/>{ - return elementCount; - } - private int elementCount = 0; -}</programlisting> - - <calloutlist> - <callout arearefs="programlisting_eventhandler_startDocument"> - <para>This method gets called exactly once namely when opening - the XML document as a whole.</para> - </callout> - - <callout arearefs="programlisting_eventhandler_endDocument"> - <para>After successfully parsing the whole document instance - this method will finally be called.</para> - </callout> - - <callout arearefs="programlisting_eventhandler_startElement"> - <para>This method gets called each time a new element is parsed. - In the given catalog.xml example it will be called three times: - First when the <tag class="starttag">catalog</tag> appears and - then two times upon each <item ... >. The supplied - parameters depend whether or not name space processing is - enabled.</para> - </callout> - - <callout arearefs="programlisting_eventhandler_endElement"> - <para>Called each time an element like <tag - class="starttag">item ...</tag> gets closed by its counterpart - <tag class="endtag">item</tag>.</para> - </callout> - - <callout arearefs="programlisting_eventhandler_characters"> - <para>This method is responsible for the treatment of textual - content i.e. handling <code>#PCDATA</code> element content. We - will explain its uncommon signature a little bit later.</para> - </callout> - - <callout arearefs="programlisting_eventhandler_getElementCount"> - <para><function>getElementCount()</function> is a getter method - to read only access the private field - <varname>elementCount</varname> which gets incremented in <coref - linkend="programlisting_eventhandler_startElement"/> each time - an XML element opens.</para> - </callout> - </calloutlist> - - <para>The call <code>saxParser.parse(uri, eventHandler)</code> - actually initiates the parsing process and tells the parser - to:</para> - - <itemizedlist> - <listitem> - <para>Open the XML document being referenced by the URI - argument.</para> - </listitem> - - <listitem> - <para>Forward XML events to the event handler instance supplied - by the second argument.</para> - </listitem> - </itemizedlist> - - <para>A driver class containing a <code>main(...)</code> method may - start the whole process and print out the desired number of elements - upon completion of a parsing run:</para> - - <programlisting language="java">package sax.stat.v1; - -public class ElementCountDriver { - public static void main(String argv[]) { - ElementCount xmlStats = new ElementCount(); - xmlStats.parse("<emphasis role="bold">Input/Sax/catalog.xml</emphasis>"); - System.out.println("Document contains " + xmlStats.<emphasis role="bold">getElementCount()</emphasis> + " elements"); - } -}</programlisting> - - <para>Processing the catalog example instance yields:</para> - - <programlisting>Opening Document -<emphasis role="bold">Opening "catalog"</emphasis> <co - xml:id="programlisting_catalog_output"/> -Content " - " -<emphasis role="bold">Opening "item"</emphasis> <co - xml:id="programlisting_catalog_item1"/> -Content "Swinging headset" -Closing "item" -Content " - " -<emphasis role="bold">Opening "item"</emphasis> <co - xml:id="programlisting_catalog_item2"/> -Content "200W Stereo Amplifier" -Closing "item" -Content " -" -Closing "catalog" -Closing Document -<emphasis role="bold">Document contains 3 elements</emphasis> <co - xml:id="programlisting_catalog_elementcount"/></programlisting> - - <calloutlist> - <callout arearefs="programlisting_catalog_output"> - <para>Start parsing element <tag - class="starttag">catalog</tag>.</para> - </callout> - - <callout arch="" arearefs="programlisting_catalog_item1"> - <para>Start parsing element <tag class="starttag">item - orderNo="3218"</tag>Swinging headset<tag class="endtag" - role="">item</tag>.</para> - </callout> - - <callout arch="" arearefs="programlisting_catalog_item2"> - <para>Start parsing element <tag class="starttag">item - orderNo="9921"</tag>200W Stereo Amplifier<tag class="endtag" - role="">item</tag>.</para> - </callout> - - <callout arearefs="programlisting_catalog_elementcount"> - <para>After the parsing process has completed the application - outputs the number of elements being counted so far.</para> - </callout> - </calloutlist> - - <para>The output contains some lines of <quote>empty</quote> - content. This content is due to whitespace being located between - elements. For example a newline appears between the the <tag - class="starttag">catalog</tag> and the first <tag - class="starttag">item</tag> element. The parser encapsulates this - whitespace in a call to the <link - xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/ContentHandler.html#characters(char[],%20int,%20int)">characters</link> - method. In an application this call will typically be ignored. XML - document instances in a professional context will typically not - contain any newline characters at all. Instead the whole document is - represented as a single line. This inhibits human readability which - is not required if the processing applications work well. In this - case empty content as above will not appear.</para> - - <para>The <code>characters(char[] ch, int start, int length)</code> - method's signature looks somewhat strange regarding <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - conventions. One might expect <code>characters(String s)</code>. But - this way the <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> API allows - efficient parser implementations: A parser may initially allocate a - reasonable large <code>char</code> array of say 128 bytes sufficient - to hold 64 (<link xlink:href="http://unicode.org">Unicode</link>) - characters. If this buffer gets exhausted the parser might allocate - a second buffer of double size thus implementing an <quote>amortized - doubling</quote> algorithm:</para> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/saxcharacter.pdf"/> - </imageobject> - </mediaobject> - - <para>In this example the first element content fits in the first - buffer. The second content <code>200W Stereo Amplifier</code> and - the third content <code>Earphone</code> both fit in the second - buffer. Subsequent content may require further buffer allocations. - Such a strategy minimizes the number of time consuming <code>new - </code> <link - xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html">String</link> - <code>(...)</code> constructor calls being necessary for the more - convenient API variant <code>characters(String s)</code>.</para> - </section> - - <section xml:id="saxRegistry"> - <title>Event- and error handler registration</title> - - <para>Our first <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> application - suffers from the following deficiencies:</para> - - <itemizedlist> - <listitem> - <para>The error handling is very sparse. It completely relies on - exceptions being thrown by classes like <link - xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/SAXException.html">SAXException</link> - which frequently do not supply meaningful error - information.</para> - </listitem> - - <listitem> - <para>The application is not aware of namespaces. Thus reading - e.g. <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> document - instances will not allow to distinguish between elements from - different namespaces like HTML.</para> - </listitem> - - <listitem> - <para>The parser will not validate a document instance against a - DTD being present.</para> - </listitem> - </itemizedlist> - - <para>We now incrementally add these features to the <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> parsing - process. <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> offers an - interface <link - xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/XMLReader.html">XmlReader</link> - to conveniently <emphasis>register</emphasis> event- and error - handler instances instead of passing them as a separate argument to - the <link - xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/SAXParser.html#parse(java.lang.String,%20org.xml.sax.helpers.DefaultHandler)">parse</link> - method. We first code an error handler class by implementing the - interface <classname>org.xml.sax.ErrorHandler</classname> being part - of the <acronym xlink:href="http://www.saxproject.org">SAX</acronym> - API:</para> - - <programlisting language="java">package sax.stat.v2; -... -public class MyErrorHandler implements ErrorHandler { - - <emphasis role="bold">public void warning(SAXParseException e)</emphasis> { - System.err.println("[Warning]" + getLocationString(e)); - } - <emphasis role="bold">public void error(SAXParseException e)</emphasis> { - System.err.println("[Error]" + getLocationString(e)); - } - <emphasis role="bold">public void fatalError(SAXParseException e)</emphasis> throws SAXException{ - System.err.println("[Fatal Error]" + getLocationString(e)); - } - private String getLocationString(SAXParseException e) { - return " line " + e.getLineNumber() + - ", column " + e.getColumnNumber()+ ":" + e.getMessage(); - } -}</programlisting> - - <para>These three methods represent the - <classname>org.xml.sax.ErrorHandler</classname> interface. The - method <function>getLocationString</function> is used to supply - precise parsing error locations by means of line- and column numbers - within a document instance. If errors or warnings are encountered - the parser will call one of the appropriate public methods:</para> - - <figure xml:id="saxMissItem"> - <title>A non well formed document.</title> - - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<catalog> - <item orderNo="3218">Swinging headset</item> - <item orderNo="9921">200W Stereo Amplifier -</catalog></programlisting> - - <caption> - <para>This document is not well formed since due to a missing a - closing <tag class="endtag">item</tag> tag is missing.</para> - </caption> - </figure> - - <para>Our error handler method gets called yielding an informative - message:</para> - - <programlisting>[Fatal Error] line 5, column -1:Expected "</item>" to terminate -element starting on line 4.</programlisting> - - <para>This error output is achieved by - <emphasis>registering</emphasis> an instance of - <classname>sax.stat.v2.MyErrorHandler</classname> to the parser - prior to starting the parsing process. In the following code snippet - we also register a content handler instance to the parser and thus - separate the parser's configuration from its invocation:</para> - - <programlisting language="java">package sax.stat.v2; -... -public class ElementCount { - public ElementCount() - throws SAXException, ParserConfigurationException{ - final SAXParserFactory saxPf = SAXParserFactory.newInstance(); - final SAXParser saxParser = saxPf.newSAXParser(); - xmlReader = saxParser.getXMLReader(); - xmlReader.setContentHandler(eventHandler); <co - xml:id="programlisting_assemble_parser_setcontenthandler"/> - xmlReader.setErrorHandler(errorHandler); <co - xml:id="programlisting_assemble_parser_seterrorhandler"/> - } - public void parse(final String uri) - throws IOException, SAXException{ - xmlReader.parse(uri); <co - xml:id="programlisting_assemble_parser_invokeparse"/> - } - public int getElementCount() { - return eventHandler.getElementCount(); <co - xml:id="programlisting_assemble_parser_getelementcount"/> - } - private final XMLReader xmlReader; - private final MyEventHandler eventHandler = new MyEventHandler(); <co - xml:id="programlisting_assemble_parser_createeventhandler"/> - private final MyErrorHandler errorHandler = new MyErrorHandler(); <co - xml:id="programlisting_assemble_parser_createerrorhandler"/> -}</programlisting> - - <calloutlist> - <callout arearefs="programlisting_assemble_parser_setcontenthandler programlisting_assemble_parser_seterrorhandler"> - <para>Referring to <xref linkend="figureSax" os=""/> these two - calls attach the event- and error handler objects to the parser - thus implementing the two arrows from the parser to the - application's implementation.</para> - </callout> - - <callout arearefs="programlisting_assemble_parser_invokeparse"> - <para>The parser is invoked. Note that in this example we only - pass a document's URI but no reference to a handler - object.</para> - </callout> - - <callout arearefs="programlisting_assemble_parser_getelementcount"> - <para>The method <function>getElementCount()</function> is - needed to allow a calling object to access the private - <varname>eventHandler</varname> object's - <function>getElementCount()</function> method.</para> - </callout> - - <callout arearefs="programlisting_assemble_parser_createeventhandler programlisting_assemble_parser_createerrorhandler"> - <para>An event handling and an error handling object are created - to handle events during the parsing process.</para> - </callout> - </calloutlist> - - <para>The careful reader might notice a subtle difference between - the content- and the error handler implementation: The class - <classname>sax.stat.v2.MyErrorHandler</classname> implements the - interface <classname>org.xml.sax.ErrorHandler</classname>. But - <classname>sax.stat.v2.MyEventHandler</classname> is derived from - <classname>org.xml.sax.helpers.DefaultHandler</classname> which - itself implements the - <classname>org.xml.sax.ContentHandler</classname> interface. - Actually one might as well start from the latter interface requiring - to implement all of it's 11 methods. In most circumstances this only - complicates the application's code since it is unnecessary to react - to events belonging for example to processing instructions. For this - reason it is good coding practice to use the empty default - implementations in - <classname>org.xml.sax.helpers.DefaultHandler</classname> and to - redefine only those methods corresponding to events actually being - handled by the application in question.</para> - - <qandaset role="exercise"> - <title>Reading XML attributes</title> - - <qandadiv> - <qandaentry xml:id="exercise_saxAttrib"> - <question> - <label>Reading an element's set of attributes.</label> - - <para>The example document instance does include <tag - class="attribute">orderNo</tag> attribute values for each - <tag class="starttag">item</tag> element. The parser does - not yet show these attribute keys and their corresponding - values. Read the documentation for <classname - xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/Attributes.html">org.xml.sax.Attributes</classname> - and extend the given code to use it.</para> - </question> - - <answer> - <para>For the given example it would suffice to read the - known <tag class="attribute">orderNo</tag> attributes value. - A generic solution may ask for the set of all defined - attributes and show their values:</para> - - <programlisting language="java">package sax; - -public class AttribEventHandler extends DefaultHandler { - - public void startElement(String namespaceUri, String localName, - String rawName, Attributes attrs) { - System.out.println("Opening Element " + rawName); - for (int i = 0; i < attrs.getLength(); i++){ - System.out.println(attrs.getQName(i) + "=" + attrs.getValue(i) + "\n"); - } - } -}</programlisting> - </answer> - </qandaentry> - - <qandaentry xml:id="saxRdbms"> - <question> - <label>SAX processing with RDBMS access.</label> - - <para>Implement the example given in <xref - linkend="saxRdbmsAccessPrinciple"/> to produce the output - sketched in <xref linkend="saxPriceOut"/>. You may start by - implementing <emphasis>and testing</emphasis> the following - methods of a RDBMS interfacing class using <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>:</para> - - <programlisting language="java">package sax.rdbms; - -public class RdbmsAccess { - - public void connect(final String host, final int port, - final String userName, final String password) { - // <emphasis role="bold">open connection to a database</emphasis> - } - public String readPrice(final String articleNumber) { - return "0"; // <emphasis role="bold">To be implemented as access to a ResultSet object</emphasis> - } - public void close() { - // <emphasis role="bold">close database connection</emphasis> - } -}</programlisting> - - <para>You may find it helpful to write a small testbed for - the RDBMS access functionality prior to integrate it into - your <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> - application producing HTML output.</para> - </question> - - <answer> - <para>We start by creating a suitable RDBMS Table:</para> - - <programlisting>CREATE SCHEMA AUTHORIZATION midb2 -CREATE TABLE Product( - orderNo CHAR(10) NOT NULL PRIMARY KEY - ,price DECIMAL (9,2) NOT NULL -)</programlisting> - - <para>Next we feed some toy data:</para> - - <programlisting>INSERT INTO Product VALUES('x-223', 330.20); -INSERT INTO Product VALUES('w-124', 110.40);</programlisting> - - <para>Now we implement our RDBMS access class:</para> - - <programlisting language="java">package dom.xsl; -... -public class DbAccess { - - public void connect(final String jdbcUrl, - final String userName, final String password) { - try { - conn = DriverManager.getConnection(jdbcUrl, userName, password); - priceQuery = conn.prepareStatement(sqlPriceQuery); - } catch (SQLException e) { - System.err.println("Unable to open connection to database:" + e);} - } - public String readPrice(final String articleNumber) { - String result; - try { - priceQuery.setString(1, articleNumber); - final ResultSet rs = priceQuery.executeQuery(); - if (rs.next()) { - result = rs.getString("price"); - } else { - result = "No price available for article '" + articleNumber + "'"; - } - } catch (SQLException e) { - result = "Error reading price for article '" + articleNumber + "':" + e; - } - return result; - } - public void close() { - try {conn.close();} catch (SQLException e) { - System.err.println("Error closing database connection:" + e); - } - } - static { - try { Class.forName("com.ibm.db2.jcc.DB2Driver"); - } catch (ClassNotFoundException e) { - System.err.println("Unable to register Driver:" + e);} - } - private static final String sqlPriceQuery = - "SELECT price FROM Product WHERE orderNo = ?"; - private PreparedStatement priceQuery = null; - private Connection conn = null; -}</programlisting> - - <para>This access layer may be tested independently from - handling catalog instances:</para> - - <programlisting language="java">package dom/xsl; - -public class DbAccessDriver { - - public static void main(String[] args) { - final DbAccess dbaccess = new DbAccess(); - dbaccess.connect("jdbc:db2://db2.mi.hdm-stuttgart.de:10000/hdm", - "midb2", "password"); - System.out.println(dbaccess.readPrice("x-223")); - System.out.println(dbaccess.readPrice("..aaargh!")); - dbaccess.close(); - } -}</programlisting> - - <para>If the above test succeeds we may embed the RDBMS - access layer into our The <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> - handler:</para> - - <programlisting language="java">package sax.rdbms; -... -public class HtmlEventHandler extends DefaultHandler{ - public void startDocument() { - dbaccess.connect("jdbc:db2://db2.mi.hdm-stuttgart.de:10000/hdm", - "midb2", "password"); - System.out.println("<html><head><title>Catalog</title></head>"); - } - public void endDocument() { - System.out.println("</html>"); - dbaccess.close(); - } - public void startElement(String namespaceUri, String localName, - String rawName, Attributes attrs){ - if (rawName.equals("catalog")){ - System.out.println("<body><H1>A catalog</H1>" - +"<table border='1'><tbody>"); - System.out.println("<tr><th>Order number</th>\n" - + "<th>Price</th>\n" - +" <th>Product</th></tr>"); - } else if (rawName.equals("item")){ - final String orderNo = attrs.getValue("orderNo"); - System.out.print("<tr><td>" + orderNo - + "</td>\n<td>" + dbaccess.readPrice(orderNo) - + "</td>\n<td>"); - } else { - System.err.println("Element '" + rawName + "' unknown"); - } - } - public void endElement(String namespaceUri, String localName, - String rawName) { - if (rawName.equals("catalog")){ - System.out.println("</tbody></table>"); - } else if (rawName.equals("item")){ - System.out.println("</td></tr>\n"); - } - } - public void characters(char[] ch, int start, int length) { - System.out.print(new String(ch, start, length)); - } - private DbAccess dbaccess = new DbAccess(); -}</programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="saxValidate"> - <title><acronym xlink:href="http://www.saxproject.org">SAX</acronym> - validation</title> - - <para>So far we only parsed well formed document instances. Our - current parser may operate on valid XML instances:</para> - - <figure xml:id="saxNotValid"> - <title>An invalid XML document.</title> - - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE catalog [ - <!ELEMENT catalog (item) > - <!ELEMENT item (#PCDATA) > -<!ATTLIST item orderNo NMTOKEN #REQUIRED > -]> -<catalog> - <item orderNo="3218">Swinging headset</item> - <item orderNo="9921">200W Stereo Amplifier</item> -</catalog></programlisting> - - <caption> - <para>In contrast to <xref linkend="saxMissItem"/> this document - is well formed. But it is not <emphasis - role="bold">valid</emphasis> with respect to the DTD grammar - since more than one <tag class="starttag">item</tag> elements - are present.</para> - </caption> - </figure> - - <para>This document instance is well-formed but not valid. The - parser will not report any error or warning. In order to enable - validation we need to configure our parser:</para> - - <programlisting language="java">xmlReader.setFeature("http://xml.org/sax/features/validation", true);</programlisting> - - <para>The string <code>http://xml.org/sax/features/validation</code> - serves as a key. Since this is an ordinary string value a parser may - or may not implement it. The <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> standard - defines two exception classes for dealing with feature related - errors:</para> - - <variablelist> - <varlistentry> - <term><link - xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/SAXNotRecognizedException.html">SAXNotRecognizedException</link></term> - - <listitem> - <para>The feature is not known to the parser.</para> - </listitem> - </varlistentry> - - <varlistentry> - <term><link - xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/SAXNotSupportedException.html">SAXNotSupportedException</link></term> - - <listitem> - <para>The feature is known to the parser but the parser does - not support it or it does not support a specific value being - set as a value.</para> - </listitem> - </varlistentry> - </variablelist> - </section> - - <section xml:id="saxNamespace"> - <title>Namespaces</title> - - <para>In order to make a <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> parser - application namespace aware we have to activate two <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> parsing - feature:</para> - - <programlisting language="java">xmlReader = saxParser.getXMLReader(); -xmlReader.setFeature("http://xml.org/sax/features/namespaces", true); -xmlReader.setFeature("http://xml.org/sax/features/namespace-prefixes", true);</programlisting> - - <para>This instructs the parser to pass the namespace's name for - each element. Namespace prefixes like <code>xsl</code> in <tag - class="starttag">xsl:for-each</tag> are also passed and may be used - by an application:</para> - - <programlisting language="java">package sax; -... -public class NamespaceEventHandler extends DefaultHandler { -... - public void startElement(String <emphasis role="bold">namespaceUri</emphasis>, String localName, - String rawName, Attributes attrs) { - System.out.println("Opening Element rawName='" + rawName + "'\n" - + "namespaceUri='" + <emphasis role="bold">namespaceUri</emphasis> + "'\n" - + "localName='" + localName - + "'\n--------------------------------------------"); -}</programlisting> - - <para>As an example we take a XSLT script:</para> - - <programlisting><?xml version="1.0" encoding="utf-8"?> -<xsl:stylesheet version="1.0" - xmlns:xsl='http://www.w3.org/1999/XSL/Transform' - xmlns:fo='http://www.w3.org/1999/XSL/Format'> - - <xsl:template match="/"> - <fo:block>A block</fo:block> - <HTML/> - </xsl:template> - -</xsl:stylesheet></programlisting> - - <para>This XSLT script being conceived as a XML document instance - contains elements belonging to two different namespaces namely - <code>http://www.w3.org/1999/XSL/Transform</code> and - <code>http://www.w3.org/1999/XSL/Format</code>. The script also - contains a <quote>raw</quote> <tag audience="" - class="emptytag">HTML</tag> element being introduced only for - demonstration purposes belonging to the default namespace. The - result reads:</para> - - <programlisting>Opening Element rawName='xsl:stylesheet' -namespaceUri='http://www.w3.org/1999/XSL/Transform' -localName='stylesheet' --------------------------------------------- -Opening Element rawName='xsl:template' -namespaceUri='http://www.w3.org/1999/XSL/Transform' -localName='template' --------------------------------------------- -Opening Element rawName='fo:block' -namespaceUri='http://www.w3.org/1999/XSL/Format' -localName='block' --------------------------------------------- -Opening Element rawName='HTML' -namespaceUri='' -localName='HTML'</programlisting> - - <para>Now the parser tells us to which namespace a given element - node belongs to. A XSLT engine for example uses this information to - build two classes of elements:</para> - - <itemizedlist> - <listitem> - <para>Elements belonging to the namespace - <code>http://www.w3.org/1999/XSL/Transform</code> like <tag - class="emptytag">xsl:value-of select="..."</tag> have to be - interpreted as instructions by the processor.</para> - </listitem> - - <listitem> - <para>Elements <emphasis role="bold">not</emphasis> belonging to - the namespace <code>http://www.w3.org/1999/XSL/Transform</code> - like <tag class="emptytag">html</tag> or <tag - class="starttag">fo:block</tag> are copied <quote>as is</quote> - to the output.</para> - </listitem> - </itemizedlist> - </section> - </section> - - <section xml:id="dom"> - <title>The Document Object Model (<acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym>)</title> - - <titleabbrev><acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym></titleabbrev> - - <section xml:id="domBase"> - <title>Language independent specification</title> - - <titleabbrev>Language independence</titleabbrev> - - <para>XML documents allow for automated content processing. We - already discussed the <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> API to access - XML documents by <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - applications. There are however situations where <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> is not - appropriate:</para> - - <itemizedlist> - <listitem> - <para>The <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> is event - based. XML node elements are passed to handler methods. - Sometimes we want to access neighbouring nodes from a context - node in our handler methods for example a <tag - class="starttag">title</tag> following a <tag - class="starttag">chapter</tag> node. <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> does not - offer any support for this. If we need references to - neighbouring nodes we have to create them ourselves during a - <acronym xlink:href="http://www.saxproject.org">SAX</acronym> - parsing run. This is tedious and leads to code being hard to - understand.</para> - </listitem> - - <listitem> - <para>Some applications may want to select node sets by <acronym - xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> - expressions which is completely impossible in a <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> - application.</para> - </listitem> - - <listitem> - <para>We may want to move subtrees within a document itself (for - example exchanging two <tag class="starttag">chapter</tag> - nodes) or even transferring them to a different document.</para> - </listitem> - </itemizedlist> - - <para>The greatest deficiency of the <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> is the fact - that an XML instance is not represented as a tree like structure but - as a succession of events. The <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> allows us to - represent XML document instances as tree like structures and thus - enables navigational operations between nodes.</para> - - <para>In order to achieve language <emphasis>and</emphasis> software - vendor independence the <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> approach uses two - stages:</para> - - <itemizedlist> - <listitem> - <para>The <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> is formulated - in an Interface Definition Language (<abbrev - xlink:href="http://en.wikipedia.org/wiki/Interface_description_language">IDL</abbrev>)</para> - </listitem> - - <listitem> - <para>In order to use the <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> API by a - concrete programming language a so called <emphasis>language - binding</emphasis> is required. In languages like <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - the language binding will still be a set of (<trademark - xlink:href="http://www.oracle.com/us/technologies/java">Java</trademark>) - interfaces. Thus for actually coding an application an - implementation of these interfaces is needed</para> - </listitem> - </itemizedlist> - - <para>So what exactly may an <abbrev - xlink:href="http://en.wikipedia.org/wiki/Interface_description_language">IDL</abbrev> - be? The programming language <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - already allows pure interface definitions without any - implementation. In C++ the same result can be achieved by so called - <emphasis>pure virtual classes</emphasis>. An <abbrev - xlink:href="http://en.wikipedia.org/wiki/Interface_description_language">IDL</abbrev> - offers extended features to describe such interfaces. For <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> the <productname - xlink:href="http://www.omg.org/gettingstarted/corbafaq.htm">CORBA - 2.2</productname> <abbrev - xlink:href="http://en.wikipedia.org/wiki/Interface_description_language">IDL</abbrev> - had been chosen to describe an XML document programming interface. - As a first example we take an excerpt from the <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym>'s <link - xlink:href="http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-1950641247">Node</link> - interface definition:</para> - - <programlisting>interface Node { - // NodeType - const unsigned short ELEMENT_NODE = 1; - const unsigned short ATTRIBUTE_NODE = 2; - const unsigned short TEXT_NODE = 3; - ... - - readonly attribute DOMString nodeName; - attribute DOMString nodeValue; - // raises(DOMException) on setting - // raises(DOMException) on retrieval - readonly attribute unsigned short nodeType; - readonly attribute Node parentNode; - ... - readonly attribute NodeList childNodes; - readonly attribute Node firstChild; - ... - Node insertBefore(in Node newChild, - in Node refChild) - raises(DOMException); - ...</programlisting> - - <para>If we want to implement the <abbrev - xlink:href="http://en.wikipedia.org/wiki/Interface_description_language">IDL</abbrev> - <classname>org.w3c.dom.Node</classname> specification in e.g. - <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - a language binding has to be defined. This means writing <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - code which closely resembles the <abbrev - xlink:href="http://en.wikipedia.org/wiki/Interface_description_language">IDL</abbrev> - specification. Obviously this task depends on and is restricted by - the constructs being offered by the target programming language. The - W3C <link - xlink:href="http://www.w3.org/TR/DOM-Level-3-Core/java-binding.html">defines</link> - the <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - <classname>org.w3c.dom.Node</classname> interface by:</para> - - <programlisting language="java">package org.w3c.dom; - -public interface Node { - public static final short ELEMENT_NODE = 1; // Node Types - public static final short ATTRIBUTE_NODE = 2; - public static final short TEXT_NODE = 3; - ... - public String getNodeName(); - public String getNodeValue() throws DOMException; - public void setNodeValue(String nodeValue) throws DOMException; - public short getNodeType(); - public Node getParentNode(); - public NodeList getChildNodes(); - public Node getFirstChild(); - ... - public Node insertBefore(Node newChild, - Node refChild) - throws DOMException; - ... - }</programlisting> - - <para>The <classname>org.w3c.dom.Node</classname> interface offers a - set of common operations for objects being part of a XML document. - But a XML document tree contains different types of nodes such - as:</para> - - <itemizedlist> - <listitem> - <para>Elements</para> - </listitem> - - <listitem> - <para>Attributes</para> - </listitem> - - <listitem> - <para>Entities</para> - </listitem> - </itemizedlist> - - <para>An XML API may address this issue by offering data types to - represent these different kinds of nodes. The <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - Binding defines an inheritance hierarchy of interfaces for this - purpose:</para> - - <figure xml:id="domJavaNodeInterfaces"> - <title>Inheritance interface hierarchy in the <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> <trademark - xlink:href="http://www.oracle.com/us/technologies/java">Java</trademark> - binding</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/nodeHierarchy.svg"/> - </imageobject> - </mediaobject> - </figure> - - <para>Two commonly used <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - implementations of these interfaces are:</para> - - <variablelist> - <varlistentry> - <term>Xerces</term> - - <listitem> - <para><orgname - xlink:href="http://xml.apache.org/xerces2-j">Apache Software - foundation</orgname></para> - </listitem> - </varlistentry> - - <varlistentry> - <term>Jaxp</term> - - <listitem> - <para><orgname xlink:href="http://java.sun.com/xml/jaxp">Sun - microsystems</orgname></para> - </listitem> - </varlistentry> - </variablelist> - - <para>Both implementations offer additional interfaces beyond the - <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym>'s - scope.</para> - - <para>Going back to the <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> itself the - specification is divided into <link - xlink:href="http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/introduction.html#DOMArchitecture-h2">modules</link>:</para> - - <figure xml:id="figureDomModules"> - <title><acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> - modules.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/dom-architecture.screen.png"/> - </imageobject> - </mediaobject> - </figure> - </section> - - <section xml:id="domCreate"> - <title>Creating a new document from scratch</title> - - <titleabbrev>New document</titleabbrev> - - <para>If we want to export non-XML content (e.g. from a RDBMS) into - XML we may achieve this by the following recipe:</para> - - <orderedlist> - <listitem> - <para>Create a document builder instance.</para> - </listitem> - - <listitem> - <para>Create an empty <link - xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/Document.html">Document</link> - instance.</para> - </listitem> - - <listitem> - <para>Fill in the desired Elements and Attributes.</para> - </listitem> - - <listitem> - <para>Create a serializer.</para> - </listitem> - - <listitem> - <para>Serialize the resulting tree to a stream.</para> - </listitem> - </orderedlist> - - <para>An introductory piece of code illustrates these steps:</para> - - <figure xml:id="simpleDomCreate"> - <title>Creation of a XML document instance from scratch.</title> - - <programlisting language="java">package dom; -... -public class CreateDoc { - public static void main(String[] args) throws Exception { - - // Create the root element - <emphasis role="bold">final Element titel = new Element("titel"); -</emphasis> - //Set a date - <emphasis role="bold">titel.setAttribute("date", "23.02.2000");</emphasis> - - // Append a text node as child - <emphasis role="bold">titel.addContent(new Text("Versuch 1"));</emphasis> - - - // Set formatting for the XML output - <emphasis role="bold">final Format outFormat = Format.getPrettyFormat();</emphasis> - - // Serialize to console - <emphasis role="bold">final XMLOutputter printer = new XMLOutputter(outFormat); - printer.output(titel, System.out);</emphasis> - } -}</programlisting> - </figure> - - <para>We get the following result:</para> - - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<titel date="23.02.2000">Versuch 1</titel></programlisting> - </section> - - <section xml:id="domCreateExercises"> - <title>Exercises</title> - - <qandaset role="exercise"> - <title>A sub structured <tag class="starttag">title</tag></title> - - <qandadiv> - <qandaentry xml:id="createDocModify"> - <question> - <label>Creation of an extended XML document instance</label> - - <para>In order to run the examples given during the lecture - the <filename - xlink:href="http://www.jdom.org/downloads">jdom2.jar</filename> - library must be added to the - <envar>CLASSPATH</envar>.</para> - - <para>The <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> creating - example given before may be used as a starting point. Extend - the <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> tree - created in <xref linkend="simpleDomCreate"/> to produce an - extended XML document:</para> - - <programlisting><title> - <long>The long version of this title</long> - <short>Short version</short> -</title></programlisting> - </question> - - <answer> - <programlisting language="java">package dom; -... -public class CreateExtended { - /** - * @param args - * @throws IOException - */ - public static void main(String[] args) throws IOException { - - final Element titel = new Element("titel"), - tLong = new Element("long"), - tShort = new Element("short"); - - <emphasis role="bold">// Append <long> and <short> to parent <title></emphasis> - titel.addContent(tLong).addContent(tShort); - - <emphasis role="bold">// Append text to <long> and <short></emphasis> - tLong.addContent(new Text("The long version of this title")); - tShort.addContent(new Text("Short version")); - - <emphasis role="bold">// Set formatting for the XML output</emphasis> - Format outFormat = Format.getPrettyFormat(); - - <emphasis role="bold">// Serialize to console</emphasis> - final XMLOutputter printer = new XMLOutputter(outFormat); - printer.output(titel, System.out); - } -}</programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="domParse"> - <title>Parsing existing XML documents</title> - - <titleabbrev>Parsing</titleabbrev> - - <para>We already used a <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> to parse an XML - document. Rather than handling <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> events - ourselves these events may be used to construct a <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> representation of - our document. This work is done by an instance of. We use our - catalog example from <xref linkend="simpleCatalog"/> as an - introductory example.</para> - - <para>We already noticed the need for an - <classname>org.xml.sax.ErrorHandler</classname> object during - <acronym xlink:href="http://www.saxproject.org">SAX</acronym> - processing. A <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> Parser requires a - similar type of Object in order to react to parsing errors in a - meaningful way. In principle a <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> parser implementor - is free to choose his implementation but most implementations are - based on top of a <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> parser. For - this reason it was natural to choose a <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> error handling - interface which is similar to a <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> - <classname>org.xml.sax.ErrorHandler</classname>. The following code - serves the needs described before:</para> - - <figure xml:id="domTreeTraversal"> - <title>Accessing a XML Tree purely by <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> methods.</title> - - <programlisting language="java">package dom; -... -public class ArticleOrder { - - -<emphasis role="bold"> // Though we are playing DOM here, a <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> parser still - // assembles our DOM tree.</emphasis> - private SAXBuilder builder = new SAXBuilder(); - - public ArticleOrder() { - <emphasis role="bold">// Though an ErrorHandler is not strictly required it allows - // for easierlocalization of XML document errors</emphasis> - builder.setErrorHandler(new MySaxErrorHandler(System.out));<co - linkends="domSetSaxErrorHandler-co" - xml:id="domSetSaxErrorHandler"/> - } - - /** Descending a catalog till its <item> elements. For each product - * its name and order number are being written to the output. - * @param filename - * @throws JDOMException - * @throws IOException - */ - public void process(final String filename) throws JDOMException, IOException { - - <emphasis role="bold">// Parsing our XML file</emphasis> - final Document docInput = builder.build(filename); - - <emphasis role="bold">// Accessing the document's root element</emphasis> - final Element docRoot = docInput.getRootElement(); - - <emphasis role="bold">// Accessing the <item> children of parent element <catalog></emphasis> - final List<Element> items = docRoot.getChildren(); // This method only selects Element nodes - for (final Element item : items) { - System.out.println("Article: " + item.getText() - + ", order number: " + item.getAttributeValue("orderNo")); - } - } -}</programlisting> - - <para>Note <coref linkend="domSetSaxErrorHandler" - xml:id="domSetSaxErrorHandler-co"/>: This is out standard <acronym - xlink:href="http://www.saxproject.org">SAX</acronym> error handler - implementing the <classname>org.xml.sax.ErrorHandler</classname> - interface.</para> - </figure> - - <para>Executing this method needs a driver instance providing an - input XML filename:</para> - - <programlisting language="java">package dom; -... -public class ArticleOrderDriver { - public static void main(String[] argv) throws Exception { - final ArticleOrder ao = new ArticleOrder(); - ao.process("<emphasis role="bold">Input/article.xml</emphasis>"); - } -}</programlisting> - - <para>This yields:</para> - - <programlisting>Article: Swinging headset, order number: 3218 -Article: 200W Stereo Amplifier, order number: 9921</programlisting> - - <para>To illustrate the internal processes we take a look at the - sequence diagram:</para> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/sequenceDomParser.svg"/> - </imageobject> - </mediaobject> - - <qandaset role="exercise"> - <title>Creating HTML output</title> - - <qandadiv> - <qandaentry xml:id="exercise_domHtmlSimple"> - <question> - <label>Simple HTML output</label> - - <para>Instead exporting simple text output in <xref - linkend="domTreeTraversal"/> we may also create HTML pages - like:</para> - - <programlisting><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> -<html> - <head> - <title>Available articles</title> - </head> - <body> - <h1>Available articles</h1> - <table> - <tbody> - <tr> - <th align="left">Article Description</th><th>Order Number</th> - </tr> - <tr> - <td align="left"><emphasis role="bold">Swinging headset</emphasis></td><td><emphasis - role="bold">3218</emphasis></td> - </tr> - <tr> - <td align="left"><emphasis role="bold">200W Stereo Amplifier</emphasis></td><td><emphasis - role="bold">9921</emphasis></td> - </tr> - </tbody> - </table> - </body> -</html></programlisting> - - <para>Instead of simply writing - <code>...println(<html>\n\t<head>...)</code> - statements you are expected to code a more sophisticated - solution. We may combine<xref linkend="createDocModify"/> - and <xref linkend="createDocModify"/>. The idea is reading - the XML catalog instance as a <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> as before. - Then construct a <emphasis>second</emphasis> <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> tree for - the desired HTML output and fill in the article information - from the first <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> tree - accordingly.</para> - </question> - - <answer> - <para>We introduce a class - <classname>solve.dom.HtmlTree</classname>:</para> - - <programlisting language="java">package solve.dom; -... -package solve.dom; - -import java.io.IOException; -import java.io.PrintStream; - -import org.jdom2.DocType; -import org.jdom2.Document; -import org.jdom2.Element; -import org.jdom2.Text; -import org.jdom2.output.Format; -import org.jdom2.output.XMLOutputter; - -/** - * Holding a HTML DOM to produce output. - * @author goik - */ -public class HtmlTree { - - private Document htmlOutput; - private Element tableBody; - - public HtmlTree(final String titleText, - final String[] tableHeaderFields) { <co - linkends="programlisting_catalog2html_htmlskel_co" - xml:id="programlisting_catalog2html_htmlskel"/> - - DocType doctype = new DocType("html", - "-//W3C//DTD XHTML 1.0 Strict//EN", - "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"); - - final Element htmlRoot = new Element("html"); <co - linkends="programlisting_catalog2html_tablehead_co" - xml:id="programlisting_catalog2html_tablehead"/> - htmlOutput = new Document(htmlRoot); - htmlOutput.setDocType(doctype); - - // We create a HTML skeleton including an "empty" table - final Element head = new Element("head"), - body = new Element("body"), - table = new Element("table"); - - htmlRoot.addContent(head).addContent(body); - - head.addContent(new Element("title").addContent(new Text(titleText))); - - body.addContent(new Element("h1").addContent(new Text(titleText))); - - body.addContent(table); - - - tableBody = new Element("tbody"); - table.addContent(tableBody); - - final Element tr = tableBody.addContent(new Element("tr")); - for (final String headerField: tableHeaderFields) { - tr.addContent(new Element("th").addContent(new Text(headerField))); - } - } - - public void appendItem(final String itemName, final String orderNo) {<co - linkends="programlisting_catalog2html_insertproduct_co" - xml:id="programlisting_catalog2html_insertproduct"/> - final Element tr = new Element("tr"); - tableBody.addContent(tr); - tr.addContent(new Element("td").addContent(new Text(itemName))); - tr.addContent(new Element("td").addContent(new Text(orderNo))); - } - public void serialize(PrintStream out){ - - // Set formatting for the XML output - final Format outFormat = Format.getPrettyFormat(); - - // Serialize to console - final XMLOutputter printer = new XMLOutputter(outFormat); - try { - printer.output(htmlOutput, System.out); - } catch (IOException e) { - e.printStackTrace(); - System.exit(1); - } - } - /** - * @return the table's <tbody> element - */ - public Element getTable() { - return tableBody; - } -} - - </programlisting> - - <calloutlist> - <callout arearefs="programlisting_catalog2html_htmlskel" - xml:id="programlisting_catalog2html_htmlskel_co"> - <para>A basic HTML skeleton is is being created:</para> - - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" - "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> -<html xmlns="http://www.w3.org/1999/xhtml"> - <head> - <title>Available articles</title> - </head> - <body> - <h1>Available articles</h1> - <table> - <emphasis role="bold"><tbody></emphasis> <!-- Data to be inserted here in next step --> - <emphasis role="bold"></tbody></emphasis> - </table> - </body> -</html></programlisting> - - <para>The table containing the product's data is empty - at this point and thus invalid.</para> - </callout> - - <callout arearefs="programlisting_catalog2html_tablehead" - xml:id="programlisting_catalog2html_tablehead_co"> - <para>The table's header is appended but the actual data - from our two products is still missing:</para> - - <programlisting>... <h1>Available articles</h1> - <table> - <tbody> - <tr> - <th>Article Description</th> - <th>Order Number</th> - <emphasis role="bold"></tr></emphasis><!-- Data to be appended after this row in next step --> - <emphasis role="bold"></tbody></emphasis> - </table> ...</programlisting> - </callout> - - <callout arearefs="programlisting_catalog2html_insertproduct" - xml:id="programlisting_catalog2html_insertproduct_co"> - <para>Calling - <methodname>appendIteprogramlisting_catalog2html_insertproduct_com(...)</methodname> - once per product completes the creation of our HTML DOM - tree:</para> - - <programlisting>... </tr> - <tr> - <td>Swinging headset</td> - <td>3218</td> - </tr> - <tr> - <td>200W Stereo Amplifier</td> - <td>9921</td> - </tr> - </tbody> ...</programlisting> - </callout> - </calloutlist> - - <para>The class - <classname>solve.dom.Article2Html</classname> reads the - catalog data:</para> - - <programlisting language="java">package solve.dom; -... -public class Article2Html { - - private final SAXBuilder builder = new SAXBuilder(); - private final HtmlTree htmlResult; - - public Article2Html() { - - builder.setErrorHandler(new MySaxErrorHandler(System.out)); - - htmlResult = new HtmlTree("Available articles", new String[] { <co - linkends="programlisting_catalog2html_glue_createhtmldom_co" - xml:id="programlisting_catalog2html_glue_createhtmldom"/> - "Article Description", "Order Number" }); - } - - /** Read an Xml catalog instance and insert product names among with their - * order numbers into the HTML DOM. Then serialize HTML tree to a stream. - * - * @param - * filename of the Xml source. - * @param out - * The output stream for HTML serialization. - * @throws IOException - * @throws JDOMException - */ - public void process(final String filename, final PrintStream out) throws JDOMException, IOException{ - final List<Element> items = - builder.build(filename).getRootElement().getChildren(); - - for (final Element item : items) { <co - linkends="programlisting_catalog2html_glue_prodloop_co" - xml:id="programlisting_catalog2html_glue_prodloop"/> - htmlResult.appendItem(item.getText(), item.getAttributeValue("orderNo")); <co - linkends="programlisting_catalog2html_glue_insertprod_co" - xml:id="programlisting_catalog2html_glue_insertprod"/> - } - htmlResult.serialize(out); <co - linkends="programlisting_catalog2html_glue_serialize_co" - xml:id="programlisting_catalog2html_glue_serialize"/> - } -}</programlisting> - - <calloutlist> - <callout arearefs="programlisting_catalog2html_glue_createhtmldom" - xml:id="programlisting_catalog2html_glue_createhtmldom_co"> - <para>Create an instance holding a HTML <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> with a - table header containing the strings <emphasis>Article - Description</emphasis> and <emphasis>Order - Number</emphasis>.</para> - </callout> - - <callout arearefs="programlisting_catalog2html_glue_prodloop" - xml:id="programlisting_catalog2html_glue_prodloop_co"> - <para>Iterate over all product nodes.</para> - </callout> - - <callout arearefs="programlisting_catalog2html_glue_insertprod" - xml:id="programlisting_catalog2html_glue_insertprod_co"> - <para>Insert the product's name an order number into the - HTML <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym>.</para> - </callout> - - <callout arearefs="programlisting_catalog2html_glue_serialize" - xml:id="programlisting_catalog2html_glue_serialize_co"> - <para>Serialize the completed HTML <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> tree to - the output stream.</para> - </callout> - </calloutlist> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="domJavaScript"> - <title>Using <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> with - HTML/Javascript</title> - - <para>Due to script language support in a variety of browsers we may - also use the <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> to implement client - side event handling. As an example we <link - xlink:href="Ref/src/tablesort.html">demonstrate</link> how a HTML - table can be made sortable by clicking on a header's column. The - example code along with the code description can be found at <uri - xlink:href="http://www.kryogenix.org/code/browser/sorttable">http://www.kryogenix.org/code/browser/sorttable</uri>.</para> - - <para>Quite remarkably there are only few ingredients required to - enrich an ordinary static HTML table with this functionality:</para> - - <itemizedlist> - <listitem> - <para>An external Javascript library has to be included via - <code><script type="text/javascript" - src="sorttable.js"></code></para> - </listitem> - - <listitem> - <para>Each sortable HTML table needs:</para> - - <itemizedlist> - <listitem> - <para>A unique <code>id</code> attribute</para> - </listitem> - - <listitem> - <para>A <code>class="sortable"</code> attribute</para> - </listitem> - </itemizedlist> - </listitem> - </itemizedlist> - </section> - - <section xml:id="domXpath"> - <title>Using <acronym - xlink:href="http://www.w3.org/TR/xpath">XPath</acronym></title> - - <para><xref linkend="domTreeTraversal"/> demonstrated the - possibility to traverse trees solely by using <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> Method calls. - Though this approach is possible it will in general not lead to - stable applications. Real world examples are often based on large - XML documents with complex hierarchical structures. Thus using this - rather primitive approach deeply nested method calls are necessary - to access desired sets of nodes. In addition changing a DTD will - require rewriting large code portions..</para> - - <para>As we already know from <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> - transformations <code>Xpath</code> allows to address node sets - inside a XML tree. The role of <acronym - xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> can be - compared to SQL queries when working with relational databases. - <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> may - also be used within <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - code. As a first example we show an image filename extracting - application operating on XHTML documents. The following example - contains three <tag class="starttag">img</tag> elements:</para> - - <figure xml:id="htmlGallery"> - <title>A HTML document containing <code>IMG</code> tags.</title> - - <programlisting><?xml version="1.0"?> -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" - "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> -<html> - <head> - <title>Picture gallery</title> - </head> - <body> - <h1>Picture gallery</h1> - <p>Images may appear inline:<emphasis role="bold"><img src="inline.gif" alt="none"/></emphasis></p> - <table> - <tbody> - <tr> - <td>Number one:</td> - <td><emphasis role="bold"><img src="one.gif" alt="none"/></emphasis></td> - </tr> - <tr> - <td>Number two:</td> - <td><emphasis role="bold"><img src="http://www.hdm-stuttgart.de/favicon.ico" alt="none"/></emphasis></td> - </tr> - </tbody> - </table> - </body> -</html> -</programlisting> - </figure> - - <para>A given HTML document may contain <tag - class="starttag">img</tag> elements at - <emphasis>arbitrary</emphasis> positions. It is sometimes desirable - to check for existence and accessibility of such external objects - being necessary for the page's correct rendering. A simple XSL - script will do first part the job namely extracting the <tag - class="starttag">img</tag> elements:</para> - - <figure xml:id="gallery2imagelist"> - <title>A <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> script for - image name extraction.</title> - - <programlisting><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" - xmlns:html="http://www.w3.org/1999/xhtml"> - <xsl:output method="text"/> - - <xsl:template match="/"> - <xsl:for-each select="//html:img"> - <xsl:value-of select="@src"/> - <xsl:text> </xsl:text> - </xsl:for-each> - </xsl:template> - -</xsl:stylesheet></programlisting> - </figure> - - <para>Note the necessity for <code>html</code> namespace inclusion - into the <acronym - xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> expression - in <code><xsl:for-each select="//html:img"></code>. A simple - <code>select="//img"></code> results in an empty node set. - Executing the <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> script yields - a list of image filenames being contained in the HTML page i.e. - <code>inline.gif one.gif two.gif</code>.</para> - - <para>Now we want to write a <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - application which allows to check whether these referenced image - files do exist and have sufficient permissions to be accessed. A - simple approach may pipe the <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> output to our - application which then executes the readability checks. Instead we - want to incorporate the <acronym - xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> based search - into the application. Ignoring Namespaces and trying to resemble the - <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> - actions as closely as possible our application will have to search - for <link - xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/Element.html">Element</link> - Nodes by the <acronym - xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> expression - <code>//html:img</code>:</para> - - <figure xml:id="domFindImages"> - <title>Extracting <tag class="emptytag">img</tag> elements. image - references from a HTML document.</title> - - <programlisting language="java">package dom.xpath; -... -public class DomXpath { - private final SAXBuilder builder = new SAXBuilder(); - - public DomXpath() { - builder.setErrorHandler(new MySaxErrorHandler(System.err)); - } - public void process(final String xhtmlFilename) throws JDOMException, IOException { - - final Document htmlInput = builder.build(xhtmlFilename);<co - linkends="programlisting_java_searchimg_parse_co" - xml:id="programlisting_java_searchimg_parse"/> - final XPathExpression<Object> xpath = XPathFactory.instance().compile( "//img" ); <co - linkends="programlisting_java_searchimg_pf_co" - xml:id="programlisting_java_searchimg_pf"/> <co - linkends="programlisting_java_searchimg_newxpath_co" - xml:id="programlisting_java_searchimg_newxpath"/> - final List<Object> images = xpath.evaluate(htmlInput);<co - linkends="programlisting_java_searchimg_execquery_co" - xml:id="programlisting_java_searchimg_execquery"/> - - for (Object o: images) { <co - linkends="programlisting_java_searchimg_loop_co" - xml:id="programlisting_java_searchimg_loop"/> - final Element image = (Element ) o;<co - linkends="programlisting_java_searchimg_cast_co" - xml:id="programlisting_java_searchimg_cast"/> - System.out.print(image.getAttribute("src") + " "); - } - } -}</programlisting> - - <caption> - <para>This application searches for <tag - class="emptytag">img</tag> elements and shows their - <code>src</code> attribute value.</para> - </caption> - </figure> - - <calloutlist> - <callout arearefs="programlisting_java_searchimg_parse" - xml:id="programlisting_java_searchimg_parse_co"> - <para>Parse a XHTML document instance into a DOM tree.</para> - </callout> - - <callout arearefs="programlisting_java_searchimg_pf" - xml:id="programlisting_java_searchimg_pf_co"> - <para>Create a <acronym - xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> - factory.</para> - </callout> - - <callout arearefs="programlisting_java_searchimg_newxpath" - xml:id="programlisting_java_searchimg_newxpath_co"> - <para>Create a <acronym - xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> query - instance. This may be used to search for a set of nodes starting - from a context node.</para> - </callout> - - <callout arearefs="programlisting_java_searchimg_execquery" - xml:id="programlisting_java_searchimg_execquery_co"> - <para>Using the document's root node as the context node we - search for <tag class="starttag">img</tag> elements appearing at - arbitrary positions in our document.</para> - </callout> - - <callout arearefs="programlisting_java_searchimg_loop" - xml:id="programlisting_java_searchimg_loop_co"> - <para>We iterate over the retrieved list of images.</para> - </callout> - - <callout arearefs="programlisting_java_searchimg_cast" - xml:id="programlisting_java_searchimg_cast_co"> - <para>Casting to the correct type.</para> - </callout> - </calloutlist> - - <para>The result is a list of image filename references:</para> - - <programlisting>inline.gif one.gif http://www.hdm-stuttgart.de/favicon.ico </programlisting> - - <qandaset role="exercise"> - <title>Legal casting?</title> - - <qandadiv> - <qandaentry> - <question> - <para>Why is the cast in <coref - linkend="programlisting_java_searchimg_cast"/> guaranteed to - never cause a - <classname>java.lang.ClassCastException</classname>?</para> - </question> - - <answer> - <para>The <acronym - xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> - <code>//img</code> expression is guaranteed to return only - <tag class="starttag">img</tag> elements. Thus within our - <trademark - xlink:href="http://www.oracle.com/us/technologies/java">Java</trademark> - context we are sure to find only - <classname>org.jdom2.Element</classname> instances.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <qandaset role="exercise"> - <title>Verification of referenced images readability</title> - - <qandadiv> - <qandaentry xml:id="exercise_htmlImageVerify"> - <question> - <para>We want to extend the example given in <xref - linkend="domFindImages"/> by testing the existence and - checking for readability of referenced images. The following - HTML document contains <quote>dead</quote> image - references:</para> - - <programlisting xml:id="domCheckImageAccessibility"><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" - "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> -<html xmlns="http://www.w3.org/1999/xhtml"> ... - <body> - <h1>External Pictures</h1> - <p>A local image reference:<img src="inline.gif" alt="none"/></p> - <table> - <tbody> - <tr> - <td>An existing picture:</td> - <td><img - src="http://www.hdm-stuttgart.de/bilder_navigation/laptop.gif" - alt="none"/></td> - </tr> - <tr> - <td>A non-existing picture:</td> - <td><img src="<emphasis role="bold">http://www.hdm-stuttgart.de/rotfl.gif</emphasis>" alt="none"/></td> - </tr> - </tbody> - </table> - </body> -</html></programlisting> - - <para>Write an application which checks for readability of - <abbrev - xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev> - image references to <emphasis>external</emphasis> Servers - starting with <code>http://</code> or <code>ftp://</code> - ignoring other protocol types. Internal image references - referring to the <quote>current</quote> server typically - look like <code><img src="/images/test.gif"</code>. So in - order to distinguish these two types of references we may - use the XSL built in function <link - xlink:href="http://www.cafeconleche.org/books/bible2/chapters/ch17.html">starts-with()</link> - testing for the <code>http</code> or <code>ftp</code> - protocol definition part of an <abbrev - xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev>. - A possible output for the example being given is:</para> - - <programlisting>Received 'sun.awt.image.URLImageSource' from - http://www.hdm-stuttgart.de/bilder_navigation/laptop.gif -Unable to open 'http://www.hdm-stuttgart.de/rotfl.gif'</programlisting> - - <para>The following code snippet shows a helpful class - method to check for both correctness of <abbrev - xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev>'s - and accessibility of referenced objects:</para> - - <programlisting language="java">package dom.xpath; -... -public class CheckUrl { - public static void checkReadability(final String urlRef) { - try { - final URL url = new URL(urlRef); - try { - final Object imgCandidate = url.getContent(); - if (null == imgCandidate) { - System.err.println("Unable to open '" + urlRef + "'"); - } else { - System.out.println("Received '" - + imgCandidate.getClass().getName() + "' from " - + urlRef); - } - } catch (IOException e) { - System.err.println("Unable to open '" + urlRef + "'"); - } - } catch (MalformedURLException e) { - System.err.println("Adress '" + urlRef + "' is malformed"); - } - } -}</programlisting> - </question> - - <answer> - <para>We are interested in the set of images within a given - HTML document containing an <link - xlink:href="http://www.w3.org/Addressing">URL</link> - reference starting either with <code>http://</code> or - <code>ftp://</code>. This is achieved by the following - <acronym - xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> - expression:</para> - - <programlisting>//html:img[starts-with(@src, 'http://') or starts-with(@src, 'ftp://')]</programlisting> - - <para>The application only needs to pass the corresponding - <abbrev - xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev>'s - to the method <link - xlink:href="domCheckUrlObjectExistence">CheckUrl.checkReadability()</link>. - The rest of the code is identical to the <link - linkend="domFindImages">introductory example</link>:</para> - - <informalfigure xml:id="solutionFintExtImgRef"> - <programlisting language="java">package dom.xpath; -... -public class CheckExtImage { - private final SAXBuilder builder = new SAXBuilder(); - - public CheckExtImage() { - builder.setErrorHandler(new MySaxErrorHandler(System.err)); - } - public void process(final String xhtmlFilename) throws JDOMException, IOException { - - final Document htmlInput = builder.build(xhtmlFilename); - final XPathExpression<Object> xpath = XPathFactory.instance().compile( - "<emphasis role="bold">//img[starts-with(@src, 'http://') or starts-with(@src, 'ftp://')]</emphasis>"); - final List<Object> images = xpath.evaluate(htmlInput); - - for (Object o: images) { - final Element image = (Element ) o; - <emphasis role="bold">CheckUrl.checkReadability(image.getAttributeValue("src"));</emphasis> - } - } -}</programlisting> - </informalfigure> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="domXsl"> - <title><acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> and - <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev></title> - - <para><trademark - xlink:href="http://www.oracle.com/us/technologies/java">Java</trademark> - based <abbrev xlink:href="http://www.w3.org/XML">Xml</abbrev> - applications may use XSL style sheets for processing. A <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> tree may for - example be transformed into another tree. The package <link - xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/transform/package-frame.html">javax.xml.transform</link> - provides interfaces and classes for this purpose. We consider the - following product catalog example:</para> - - <figure xml:id="climbingCatalog"> - <title>A simplified <abbrev - xlink:href="http://www.w3.org/XML">Xml</abbrev> product - catalog</title> - - <programlisting><?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE <emphasis role="bold">catalog</emphasis> SYSTEM "<emphasis - role="bold">catalog.dtd</emphasis>"> -<catalog> - <title>Climbing gear</title> - <introduction> - <para>We offer a great variety of basic stuff for mountaineering - such as ropes, harnesses and runners.</para> - <para>Our shop is proud on its large number of sleeping bags - available.</para> - </introduction> - <product id="x-223"> - <title>Multi freezing bag Nightmare camper</title> - <description> - <para>You will feel comfortable till minus 20 degrees - At - least if you are a penguin or a polar bear.</para> - </description> - </product> - <product id="r-334"> - <title>Rope 40m</title> - <description> - <para>Excellent for indoor climbing.</para> - </description> - </product> -</catalog></programlisting> - - <para>A corresponding DTD is straightforward:</para> - - <programlisting><!ELEMENT catalog (title, introduction, product+) > -<!ELEMENT introduction (para+) > -<!ELEMENT title (#PCDATA) > -<!ELEMENT product (title, description) > -<!ATTLIST product - id ID #REQUIRED - price NMTOKEN #IMPLIED> -<!ELEMENT description (para+) > -<!ELEMENT para (#PCDATA) ></programlisting> - </figure> - - <para>A <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style sheet - may be used to transform this document into the HTML Format:</para> - - <figure xml:id="catalog2html"> - <title>A <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style sheet - for catalog transformation to HTML.</title> - - <programlisting><?xml version="1.0" encoding="utf-8"?> -<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" - version="2.0" xmlns="http://www.w3.org/1999/xhtml"> - - <xsl:template match="/catalog"> - <html> - <head><title><xsl:value-of select="title"/></title></head> - <body style="background-color:#FFFFFF"> - <h1><xsl:value-of select="title"/></h1> - <xsl:apply-templates select="product"/> - </body> - </html> - </xsl:template> - - <xsl:template match="product"> - <h3><xsl:value-of select="title"/></h3> - <xsl:for-each select="description/para"> - <p><xsl:value-of select="."/></p> - </xsl:for-each> - <xsl:if test="price"> - <p> - <xsl:text>Price:</xsl:text> - <xsl:value-of select="price/@value"/> - </p> - </xsl:if> - </xsl:template> -</xsl:stylesheet></programlisting> - </figure> - - <para>As a preparation for <xref linkend="exercise_catalogRdbms"/> - we now demonstrate the usage of <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> within a - <trademark - xlink:href="http://www.oracle.com/us/technologies/java">Java</trademark> - application. This is done by a <link - xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/transform/Transformer.html">Transformer</link> - instance:</para> - - <figure xml:id="xml2xml"> - <title>Transforming an XML document instance to HTML by a XSL - style sheet.</title> - - <programlisting language="java">package dom.xsl; -... -public class Xml2Html { - private final SAXBuilder builder = new SAXBuilder(); - - final XSLTransformer transformer; - - public Xml2Html(final String xslFilename) throws XSLTransformException { - builder.setErrorHandler(new MySaxErrorHandler(System.err)); - transformer = new XSLTransformer(xslFilename); - } - public void transform(final String xmlInFilename, - final String resultFilename) throws JDOMException, IOException { - - final Document inDoc = builder.build(xmlInFilename); - Document result = transformer.transform(inDoc); - - // Set formatting for the XML output - final Format outFormat = Format.getPrettyFormat(); - - // Serialize to console - final XMLOutputter printer = new XMLOutputter(outFormat); - printer.output(result.getDocument(), System.out); - - } -}</programlisting> - </figure> - - <para>A corresponding driver file is needed to invoke a - transformation:</para> - - <figure xml:id="xml2xmlDriver"> - <title>A driver class for the xml2xml transformer.</title> - - <programlisting language="java">package dom.xsl; -... -public class Xml2HtmlDriver { -... - public static void main(String[] args) { - final String - inFilename = "Input/Dom/climbing.xml", - xslFilename = "Input/Dom/catalog2html.xsl", - htmlOutputFilename = "Input/Dom/climbing.html"; - try { - final Xml2Html converter = new Xml2Html(xslFilename); - converter.transform(inFilename, htmlOutputFilename); - } catch (Exception e) { - System.err.println("The conversion of '" + inFilename - + "' by stylesheet '" + xslFilename - + "' to output HTML file '" + htmlOutputFilename - + "' failed with the following error:" + e); - e.printStackTrace(); - } - } -}</programlisting> - </figure> - - <qandaset role="exercise"> - <title>HTML from XML and relational data</title> - - <qandadiv> - <qandaentry xml:id="exercise_catalogRdbms"> - <question> - <label>Catalogs and RDBMS</label> - - <para>We want to extend the transformation being described - before in <xref linkend="xml2xml"/> by reading price - information from a RDBMS. Consider the following schema and - <code>INSERT</code>s:</para> - - <programlisting>CREATE TABLE Product( - orderNo CHAR(10) - ,price NUMERIC(10,2) -); - -INSERT INTO Product VALUES('x-223', 330.20); -INSERT INTO Product VALUES('w-124', 110.40);</programlisting> - - <para>Adding prices may be implemented the following - way:</para> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/xml2html.fig"/> - </imageobject> - </mediaobject> - - <para>You may implement this by following these - steps:</para> - - <orderedlist> - <listitem> - <para>You may reuse class - <classname>sax.rdbms.RdbmsAccess</classname> from <xref - linkend="saxRdbms"/>.</para> - </listitem> - - <listitem> - <para>Use the previous class to modify <xref - linkend="xml2xml"/> by introducing a new method - <code>addPrices(final Document catalog)</code> which - adds prices to the <acronym - xlink:href="http://www.w3.org/DOM">DOM</acronym> tree - accordingly. The insertion points may be reached by an - <acronym - xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> - expression.</para> - </listitem> - </orderedlist> - </question> - - <answer> - <para>The additional functionality on top of <xref - linkend="xml2xml"/> is represented by a method - <methodname>addPrices</methodname>. This method modifies the - <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> - input tree prior to applying the XSL. Prices are being - inserting based on data received from an RDBMS via - <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>:</para> - - <programlisting language="java">package dom.xsl; -... -public class XmlRdbms2Html { - private final SAXBuilder builder = new SAXBuilder(); - - DbAccess db = new DbAccess(); - - final XSLTransformer transformer; - Document catalog; - - final org.jdom2.xpath.XPathExpression<Object> selectProducts = - XPathFactory.instance().compile("/catalog/product"); - - /** - * @param xslFilename the stylesheet being used for subsequent - * transformations by {@link #transform(String, String)}. - * - * @throws XSLTransformException - */ - public XmlRdbms2Html(final String xslFilename) throws XSLTransformException { - builder.setErrorHandler(new MySaxErrorHandler(System.err)); - transformer = new XSLTransformer(xslFilename); - } - - /** - * The actual workhorse carrying out the transformation - * and adding prices from the database table. - * - * @param xmlInFilename input file to be transformed - * @param resultFilename the result file holding the generated HTML document - * @throws JDOMException The transformation may fail for various reasons. - * @throws IOException - */ - public void transform(final String xmlInFilename, - final String resultFilename) throws JDOMException, IOException { - - catalog = builder.build(xmlInFilename); - - addPrices(); - - final Document htmlResult = transformer.transform(catalog); - - // Set formatting for the XML output - final Format outFormat = Format.getPrettyFormat(); - - // Serialize to console - final XMLOutputter printer = new XMLOutputter(outFormat); - printer.output(htmlResult, System.out); - - } - private void addPrices() { - final List<Object> products = selectProducts.evaluate(catalog.getRootElement()); - - db.connect("jdbc:mysql://localhost:3306/hdm", "hdmuser", "XYZ"); - for (Object p: products) { - final Element product = (Element ) p; - final String productId = product.getAttributeValue("id"); - product.setAttribute("price", db.readPrice(productId)); - } - db.close(); - } -}</programlisting> - - <para>The method <code>addPrices(...)</code> utilizes our - RDBMS access class:</para> - - <programlisting language="java">package dom.xsl; -... -public class DbAccess { - public void connect(final String jdbcUrl, - final String userName, final String password) { - try { - conn = DriverManager.getConnection(jdbcUrl, userName, password); - priceQuery = conn.prepareStatement(sqlPriceQuery); - } catch (SQLException e) { - System.err.println("Unable to open connection to database:" + e);} - } - public String readPrice(final String articleNumber) { - String result; - try { - priceQuery.setString(1, articleNumber); - final ResultSet rs = priceQuery.executeQuery(); - if (rs.next()) { - result = rs.getString("price"); - } else { - result = "No price available for article '" + articleNumber + "'"; - } - } catch (SQLException e) { - result = "Error reading price for article '" + articleNumber + "':" + e; - } - return result; - } - ... -}</programlisting> - - <para>Of course the connection details should be moved to a - configuration file.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - </section> - </chapter> - - <chapter xml:id="fo"> - <title>Generating printed output</title> - - <titleabbrev>Print</titleabbrev> - - <section xml:id="foIntro"> - <title>Online and print versions</title> - - <titleabbrev>online / print</titleabbrev> - - <para>We already learned how to transform XML documents into HTML by - means of a <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style sheet - processor. In principle we may create printed output by using a HTML - Browser's print function. However the result will not meet reasonable - typographical standards. A list of commonly required features for - printed output includes:</para> - - <variablelist> - <varlistentry> - <term>Line breaks</term> - - <listitem> - <para>Text paragraphs have to be divided into lines. To achieve - best results the processor must implement the hyphenation rules - of the language in question in order to automatically hyphenate - long words. This is especially important for text columns of - limited width as appearing in newspapers.</para> - </listitem> - </varlistentry> - - <varlistentry> - <term>Page breaks</term> - - <listitem> - <para>Since printed pages are limited in height the content has - to be broken into pages. This may be difficult to - achieve:</para> - - <itemizedlist> - <listitem> - <para>Large images being indivisible may have to be deferred - to the following page leaving large amounts of empty - space.</para> - </listitem> - - <listitem> - <para>Long tables may have to be subdivided into smaller - blocks. Thus it may be required to define sets of additional - footers like <quote>to be continued on the next page</quote> - and additional table headers containing column descriptions - on subsequent pages.</para> - </listitem> - </itemizedlist> - </listitem> - </varlistentry> - - <varlistentry> - <term>Page references</term> - - <listitem> - <para>Document internal references via <link - xlink:href="http://www.w3.org/TR/xml#id">ID</link> / <link - xlink:href="http://www.w3.org/TR/xml#idref">IDREF</link> pairs - may be represented as page references like <quote>see page - 32</quote>.</para> - </listitem> - </varlistentry> - - <varlistentry> - <term>Left and right pages</term> - - <listitem> - <para>Books usually have a different layout for - <quote>left</quote> and <quote>right</quote> pages. Page numbers - usually appear on the left side of a <quote>left</quote> page - and vice versa.</para> - - <para>Very often the head of each page contains additional - information e.g. a chapter's name on each <quote>left</quote> - page head and the actual section's name on each - <quote>right</quote> page's head.</para> - - <para>In addition chapters usually start on a - <quote>right</quote> page. Sometimes a chapter's starting page - has special layout features e.g. a missing description in the - page's head which will only be given on subsequent pages.</para> - </listitem> - </varlistentry> - - <varlistentry> - <term>Footnotes</term> - - <listitem> - <para>Footnotes have to be numbered on a per page basis and have - to appear on the current page.</para> - </listitem> - </varlistentry> - </variablelist> - </section> - - <section xml:id="foStart"> - <title>A simple <abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> - document</title> - - <titleabbrev>Simple <abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev></titleabbrev> - - <para>A renderer for printed output from XML content also needs - instructions how to format the different elements. A common way to - define these formatting properties is by using <emphasis>Formatting - Objects</emphasis> (<abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>) - standard. <abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> - documents may be compared to HTML. A HTML document has to be rendered - by a piece of software called a browser in order to be viewed as an - image. Likewise <abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> - documents have to be rendered by a piece of software called a - formatting objects processor which typically yields PostScript or PDF - output. As a starting point we take a simple example:</para> - - <figure xml:id="foHelloWorld"> - <title>The most simple <abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> - document</title> - - <programlisting><?xml version="1.0" encoding="utf-8"?> -<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> - - <fo:layout-master-set> - <!-- Define a simple page layout --> - <fo:simple-page-master master-name="simplePageLayout" - page-width="60mm" page-height="100mm"> - <fo:region-body/> - </fo:simple-page-master> - </fo:layout-master-set> - <!-- Print a set of pages using the previously defined layout --> - <fo:page-sequence master-reference="simplePageLayout"> - <fo:flow flow-name="xsl-region-body"> - <emphasis role="bold"><fo:block>Hello, World ...</fo:block></emphasis> - </fo:flow> - </fo:page-sequence> -</fo:root></programlisting> - </figure> - - <para>PDF generation is initiated by executing a <abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> - processor. At the MI department the script <code>fo2pdf</code> invokes - <orgname>RenderX</orgname>'s <productname - xlink:href="http://www.renderx.com">xep</productname> - processor:</para> - - <programlisting>fo2pdf -fo hello.fo -pdf hello.pdf</programlisting> - - <para>This creates a PDF file which may be printed or previewed by - e.g. <productname - xlink:href="http://www.adobe.com">Adobe</productname>'s acrobat reader - or evince under Linux. For a list of command line options see - <productname xlink:href="http://www.renderx.com/reference.html">xep's - documentation</productname>.</para> - </section> - - <section xml:id="layoutParam"> - <title>Page layout</title> - - <para>The result from of our <quote>Hello, World ...</quote> code is - not very impressive. In order to develop more elaborated examples we - have to understand the underlying layout model being defined in a - <link - xlink:href="http://www.w3.org/TR/xsl/#fo_simple-page-master">fo:simple-page-master</link> - element. First of all <abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> - allows to subdivide a physical page into different regions:</para> - - <figure xml:id="foRegionList"> - <title>Regions being defined in a page.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/regions.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>The most important area in this model is denoted by <link - xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link>. - Other regions like <link - xlink:href="http://www.w3.org/TR/xsl/#fo_region-before">fo:region-before</link> - are typically used as containers for meta information such as chapter - headings and page numbering. We take a closer look to the <link - xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link> - area and supply an example of parameterization:</para> - - <figure xml:id="foParamRegBody"> - <title>A complete <abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> - parameterizing of a physical page and the <link - xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link>.</title> - - <programlisting><?xml version="1.0" encoding="utf-8"?> -<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format" - font-size="6pt"> - - <fo:layout-master-set> <co xml:id="programlisting_fobodyreg_masterset"/> - <fo:simple-page-master master-name="<emphasis role="bold">simplePageLayout</emphasis>" <co - xml:id="programlisting_fobodyreg_simplepagelayout"/> - page-width = "50mm" page-height = "80mm" - margin-top = "5mm" margin-bottom = "20mm" - margin-left = "5mm" margin-right = "10mm"> - - <fo:region-body <co xml:id="programlisting_fobodyreg_regionbody"/> - margin-top = "10mm" margin-bottom = "5mm" - margin-left = "10mm" margin-right = "5mm"/> - </fo:simple-page-master> - </fo:layout-master-set> - - <fo:page-sequence master-reference="<emphasis role="bold">simplePageLayout</emphasis>"> <co - xml:id="programlisting_fobodyreg_pagesequence"/> - <fo:flow flow-name="xsl-region-body"> <co - xml:id="programlisting_fobodyreg_flow"/> - <fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <co - xml:id="programlisting_fobodyreg_block"/> - <fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <coref - linkend="programlisting_fobodyreg_block"/> - <fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <coref - linkend="programlisting_fobodyreg_block"/> - <fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <coref - linkend="programlisting_fobodyreg_block"/> - </fo:flow> - </fo:page-sequence> -</fo:root></programlisting> - </figure> - - <calloutlist> - <callout arearefs="programlisting_fobodyreg_masterset"> - <para>As the name suggests multiple layout definitions can appear - here. In this example only one layout is defined.</para> - </callout> - - <callout arearefs="programlisting_fobodyreg_simplepagelayout"> - <para>Each layout definition carries a key attribute master-name - being unique with respect to all defined layouts appearing in - <emphasis>the</emphasis> <tag - class="starttag">fo:layout-master-set</tag>. We may thus call it a - <emphasis>primary key</emphasis> attribute. The current layout - definition's key has the value <code>simplePageLayout</code>. The - length specifications appearing here are visualized in <xref - linkend="paramRegBodyVisul"/> and correspond to the white - rectangle.</para> - </callout> - - <callout arearefs="programlisting_fobodyreg_regionbody"> - <para>Each layout definition <emphasis>must</emphasis> have a - region body being the region in which the documents main text flow - will appear. A layout definition <emphasis>may</emphasis> also - define top, bottom and side regions as we will see <link - linkend="paramHeadFoot">later</link>. The body region is shown - with pink background in <xref - linkend="paramRegBodyVisul"/>.</para> - </callout> - - <callout arearefs="programlisting_fobodyreg_pagesequence"> - <para>A <abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> - document may have multiple page sequences for example one per each - chapter of a book. It <emphasis>must</emphasis> reference an - <emphasis>existing</emphasis> layout definition via its - <code>master-reference</code> attribute. So we may regard this - attribute as a foreign key targeting the set of all defined layout - definitions.</para> - </callout> - - <callout arearefs="programlisting_fobodyreg_flow"> - <para>A flow allows us to define in which region output shall - appear. In the current example only one layout containing one - region of type body definition being able to receive text output - exists.</para> - </callout> - - <callout arearefs="programlisting_fobodyreg_block"> - <para>A <tag class="starttag">fo:block</tag> element may be - compared to a paragraph element <tag class="starttag">p</tag> in - HTML. The attribute <link - xlink:href="http://www.w3.org/TR/xsl/#space-after">space-after</link>="2mm" - adds a space of two mm after each <link - xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link> - container.</para> - </callout> - </calloutlist> - - <para>The result looks like:</para> - - <figure xml:id="paramRegBodyVisul"> - <title>Parameterizing page- and region view port. All length - dimensions are in mm.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/overlay.fig"/> - </imageobject> - </mediaobject> - </figure> - </section> - - <section xml:id="headFoot"> - <title>Headers and footers</title> - - <titleabbrev>Header/footer</titleabbrev> - - <para>Referring to <xref linkend="foRegionList"/> we now want to add - fixed headers and footers frequently being used for page numbers. In a - textbook each page might have the actual chapter's name in its header. - This name should not change as long as the text below <link - xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link> - still belongs to the same chapter. In <abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> - this is achieved by:</para> - - <itemizedlist> - <listitem> - <para>Encapsulating each chapter's content in a <link - xlink:href="http://www.w3.org/TR/xsl/#fo_page-sequence">fo:page-sequence</link> - of its own.</para> - </listitem> - - <listitem> - <para>Defining the desired header text below <link - xlink:href="http://www.w3.org/TR/xsl/#fo_static-content">fo:static-content</link> - in the area defined by <link - xlink:href="http://www.w3.org/TR/xsl/#fo_region-before">fo:region-before</link>.</para> - </listitem> - </itemizedlist> - - <para>The notion <link - xlink:href="http://www.w3.org/TR/xsl/#fo_static-content">fo:static-content</link> - refers to the fact that the content is constant (static) within the - given page sequence. The new version reads:</para> - - <figure xml:id="paramHeadFoot"> - <title>Parameterizing header and footer.</title> - - <programlisting><?xml version="1.0" encoding="utf-8"?> -<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format" - font-size="6pt"> - - <fo:layout-master-set> - <fo:simple-page-master master-name="simplePageLayout" - page-width = "50mm" page-height = "80mm" - margin-top = "5mm" margin-bottom = "20mm" - margin-left = "5mm" margin-right = "10mm"> - - <fo:region-body margin-top = "10mm" margin-bottom = "5mm" <co - xml:id="programlisting_head_foot_bodydef"/> - margin-left = "10mm" margin-right = "5mm"/> - - <fo:region-before extent="5mm"/> <co - xml:id="programlisting_head_foot_beforedef"/> - <fo:region-after extent="5mm"/> <co - xml:id="programlisting_head_foot_afterdef"/> - - </fo:simple-page-master> - </fo:layout-master-set> - - <fo:page-sequence master-reference="simplePageLayout"> - - <fo:static-content flow-name="xsl-region-before"> <co - xml:id="programlisting_head_foot_beforeflow"/> - <fo:block - font-weight="bold" - font-size="8pt">Headertext</fo:block> - </fo:static-content> - - <fo:static-content flow-name="xsl-region-after"> <co - xml:id="programlisting_head_foot_afterflow"/> - <fo:block> - <fo:page-number/> - </fo:block> - </fo:static-content> - - <fo:flow flow-name="xsl-region-body"> - <fo:block space-after="8mm">Dumb text .. dumb text.</fo:block> - <fo:block space-after="8mm">Dumb text .. dumb text.</fo:block> - <fo:block space-after="8mm">More text .. more text.</fo:block> - <fo:block space-after="8mm">More text .. more text.</fo:block> - <fo:block space-after="8mm">More text .. more text.</fo:block> - </fo:flow> - </fo:page-sequence> -</fo:root></programlisting> - </figure> - - <calloutlist> - <callout arearefs="programlisting_head_foot_bodydef"> - <para>Defining the body region.</para> - </callout> - - <callout arearefs="programlisting_head_foot_beforedef programlisting_head_foot_afterdef"> - <para>Defining two regions at the top and bottom of each page. The - <code>extent</code> attribute denotes the height of these regions. - <emphasis>Caveat</emphasis>: The attribute <code>extent</code>'s - value gets subtracted from the <code>margin-top</code> or - <code>margin-bottom</code> value being defined in the - corresponding <tag class="starttag">fo:region-body</tag> element. - So if we consider for example the <tag>fo:region-before</tag> we - have to obey:</para> - - <para>extent <= margin-top</para> - - <para>Otherwise we may not even see any output.</para> - </callout> - - <callout arearefs="programlisting_head_foot_beforeflow"> - <para>A <code>fo:static-content</code> denotes text portions which - are decoupled from the <quote>usual</quote> text flow. For example - as a book's chapter advances over multiple pages we expect the - constant chapter's title to appear on top of each page. In the - current example the static string <code>Headertext</code> will - appear on each page's top for the whole <tag - class="starttag">fo:page-sequence</tag> in which it is defined. - Notice the <code>flow-name="xsl-region-after"</code> reference to - the region being defined in <coref - linkend="programlisting_head_foot_beforedef"/>.</para> - </callout> - - <callout arearefs="programlisting_head_foot_afterflow"> - <para>We do the same here for the page's footer. Instead of static - text we output <tag>fo_page-number</tag> yielding the current - page's number.</para> - - <para>This time <code>flow-name="xsl-region-after"</code> - references the region definition in <coref - linkend="programlisting_head_foot_afterdef"/>. Actually the - attribute <code>flow-name</code> is restricted to the following - five values corresponding to all possible region definitions - within a layout:</para> - - <informaltable> - <?dbhtml table-width="50%" ?> - - <?dbfo table-width="50%" ?> - - <tgroup cols="2"> - <colspec align="left" colwidth="1*"/> - - <colspec align="left" colwidth="1*"/> - - <tbody> - <row> - <entry><tag class="starttag">fo:region-body</tag></entry> - - <entry>xsl-region-body</entry> - </row> - - <row> - <entry><tag - class="starttag">fo:region-before</tag></entry> - - <entry>xsl-region-before</entry> - </row> - - <row> - <entry><tag class="starttag">fo:region-after</tag></entry> - - <entry>xsl-region-after</entry> - </row> - - <row> - <entry><tag class="starttag">fo:region-start</tag></entry> - - <entry>xsl-region-start</entry> - </row> - - <row> - <entry><tag class="starttag">fo:region-end</tag></entry> - - <entry>xsl-region-end</entry> - </row> - </tbody> - </tgroup> - </informaltable> - </callout> - </calloutlist> - - <para>This results in two pages with page numbers 1 and 2:</para> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/headfoot.fig"/> - </imageobject> - </mediaobject> - - <para>The free chapter from <xref linkend="bibHarold04"/> book - contains additional information on extended <link - xlink:href="http://www.cafeconleche.org/books/bible2/chapters/ch18.html#d1e2250">layout - definitions</link>. The <orgname - xlink:href="http://w3.org">W3C</orgname> as the holder of the FO - standard defines the elements <link - xlink:href="http://www.w3.org/TR/xsl/#fo_layout-master-set">fo:layout-master-set</link>, - <link - xlink:href="http://www.w3.org/TR/xsl/#fo_simple-page-master">fo:simple-page-master</link> - and <link - xlink:href="http://www.w3.org/TR/xsl/#fo_page-sequence">fo:page-sequence</link></para> - </section> - - <section xml:id="foContainer"> - <title>Important Objects</title> - - <section xml:id="fo_block"> - <title><code>fo:block</code></title> - - <para>The FO standard borrows a lot from the CSS standard. Most - formatting objects may have <link - xlink:href="http://www.w3.org/TR/xsl/#section-N19349-Description-of-Property-Groups">CSS - like properties</link> with similar semantics, some properties have - been added. We take a <link - xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link> - container as an example:</para> - - <figure xml:id="blockInline"> - <title>A <link - xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link> - with a <link - xlink:href="http://www.w3.org/TR/xsl/#fo_inline">fo:inline</link> - descendant.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/blockprop.fo.pdf"/> - </imageobject> - </mediaobject> - - <programlisting>... -<fo:block font-weight='bold' - border-bottom-style='dashed' - border-style='solid' - border='1mm'>A lot of attributes and <fo:inline background-color='black' - color='white'>inverted</fo:inline> text.</fo:block> ...</programlisting> - </figure> - - <para>The <link - xlink:href="http://www.w3.org/TR/xsl/#fo_inline">fo:inline</link> - descendant serves as a means to change the <quote>current</quote> - property set. In HTML/CSS this may be achieved by using the - <code>SPAN</code> tag:</para> - - <programlisting><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> -<html> - <head> - <title>Blocks/spans and CSS</title> - </head> - <body> - <h1>Blocks/spans and CSS</h1> - <p style="font-weight: bold; border: 1mm; - border-style: solid; border-bottom-style: dashed;" - >A lot of attributes and - <span style="color: white;background-color: black;" - >inverted</span> text.</p> - </body> -</html></programlisting> - - <para>Though being encapsulated in an attribute <code>class</code> - we find a one-to-one correspondence between FO and CSS in this case. - The HTML rendering works as expected.<mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/mozparaspancss.screen.png"/> - </imageobject> - </mediaobject>:</para> - </section> - - <section xml:id="fo_list"> - <title>Lists</title> - - <para>The easiest type of lists are unlabeled (itemized) lists as - being expressed by the <code>UL</code>/<code>LI</code> tags in HTML. - FO allows a much more detailed parametrization regarding indents and - distances between labels and item content. Relevant elements are - <link - xlink:href="http://www.w3.org/TR/xsl/#fo_list-block">fo:list-block</link>, - <link - xlink:href="http://www.w3.org/TR/xsl/#fo_list-item">fo:list-item</link> - and <link - xlink:href="http://www.w3.org/TR/xsl/#fo_list-item-body">fo:list-item-body</link>. - The drawback is a more complex setup for <quote>default</quote> - lists:</para> - - <figure xml:id="listItemize"> - <title>An itemized list and result.</title> - - <programlisting>... -<fo:list-block - provisional-distance-between-starts="2mm"> - <fo:list-item> - <fo:list-item-label end-indent="label-end()"> - <fo:block>&#8226;</fo:block> - </fo:list-item-label> - <fo:list-item-body start-indent="body-start()"> - <fo:block>Flowers</fo:block> - </fo:list-item-body> - </fo:list-item> - - <fo:list-item> - <fo:list-item-label end-indent="label-end()"> - <fo:block>&#8226;</fo:block> - </fo:list-item-label> - <fo:list-item-body start-indent="body-start()"> - <fo:block>Animals</fo:block> - </fo:list-item-body> - </fo:list-item> -</fo:list-block> ...</programlisting> - - <mediaobject> - <imageobject> - <imagedata align="left" fileref="Ref/Fig/itemize.fo.pdf"/> - </imageobject> - </mediaobject> - </figure> - - <para>The result looks somewhat primitive in relation to the amount - of source code it necessitates. The power of these constructs shows - up when trying to format nested lists of possibly different types - like enumerations or definition lists under the requirement of - typographical excellence. More complex examples are presented in - <link - xlink:href="http://www.cafeconleche.org/books/bible2/chapters/ch18.html#d1e4979">Xmlbible - book</link> of <xref linkend="bibHarold04"/>.</para> - </section> - - <section xml:id="leaderRule"> - <title>Leaders and rules</title> - - <titleabbrev>Leaders/rules</titleabbrev> - - <para>Sometimes adjustable horizontal space between two neighbouring - objects has to be filled e.g. in a book's table of contents. The - <link - xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link> - serves this purpose:</para> - - <figure xml:id="leaderToc"> - <title>Two simulated entries in a table of contents.</title> - - <programlisting>... -<fo:block text-align-last='justify'>Valid - XML<fo:leader leader-pattern="dots"/> -page 7</fo:block> - -<fo:block text-align-last='justify'>XSL -<fo:leader leader-pattern='dots'/> -page 42</fo:block> ...</programlisting> - - <mediaobject> - <imageobject> - <imagedata align="left" fileref="Ref/Fig/leader.fo.pdf"/> - </imageobject> - </mediaobject> - </figure> - - <para>The attributes' value <link - xlink:href="http://www.w3.org/TR/xsl/#text-align-last">text-align-last</link> - = <code>'justify'</code> forces the <link - xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link> to - extend to the available width of the current <link - xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link> - area. The <link - xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link> - inserts the necessary amount of content of the specified type - defined in in <link - xlink:href="http://www.w3.org/TR/xsl/#leader-pattern">leader-pattern</link> - to fill up the gap between its neighbouring components. This - principle can be extended to multiple objects:</para> - - <figure xml:id="leaderMulti"> - <title>Four entries separated by equal amounts of dotted - space.</title> - - <programlisting><fo:block text-align-last='justify'>A<fo:leader -leader-pattern="dots"/>B<fo:leader -leader-pattern="dots"/>C<fo:leader leader-pattern="dots"/>D</fo:block></programlisting> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/leadermulti.fo.pdf"/> - </imageobject> - </mediaobject> - </figure> - - <para>A <link - xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link> - may also be used to draw horizontal lines to separate objects. In - this case there are no neighbouring components within the - <quote>current</quote> line in which the <link - xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link> - appears. This is frequently used to draw a border between - <code>xsl-region-body</code> and <code>xsl-region-before</code> - and/or <code>xsl-region-after</code>:</para> - - <figure xml:id="leaderSeparate"> - <title>A horizontal line separator between header and body of a - page.</title> - - <programlisting>... -<fo:page-sequence master-reference="simplePageLayout"> - <fo:static-content flow-name="xsl-region-before"> - <fo:block text-align-last='justify'>FO<fo:leader/>page 5</fo:block> - <fo:block text-align-last='justify'> - <fo:leader leader-pattern="rule" leader-length="100%"/> - </fo:block> - </fo:static-content> - <fo:flow flow-name="xsl-region-body"> - <fo:block>Some body text ...</fo:block> - </fo:flow> -</fo:page-sequence>...</programlisting> - - <mediaobject> - <imageobject> - <imagedata align="left" fileref="Ref/Fig/separate.fo.pdf"/> - </imageobject> - </mediaobject> - </figure> - - <para>Note the empty leader <code><</code> <link - xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link> - <code>/></code> between the <quote> <code>FO</code> </quote> and - the <quote>page 5</quote> text node inserting horizontal whitespace - to get the page number centered to the header's right edge. This is - in accordance with the <link - xlink:href="http://www.w3.org/TR/xsl/#leader-pattern">leader-pattern</link> - attributes default value <code>space</code>.</para> - </section> - - <section xml:id="pageNumbering"> - <title>Page numbers</title> - - <para>We already saw an example of page numbering via <link - xlink:href="http://www.w3.org/TR/xsl/#fo_page-number">fo:page-number</link> - in <xref linkend="paramHeadFoot"/>. Sometimes a different style for - page numbering is desired. The default page numbering style may be - changed by means of the <link - xlink:href="http://www.w3.org/TR/xsl/#fo_page-sequence">fo:page-sequence</link> - element's attribute <link - xlink:href="http://www.w3.org/TR/xsl/#format">format</link>. For a - closer explanation the <link - xlink:href="http://www.w3.org/TR/2007/REC-xslt20-20070123/#convert">W3X - XSLT standards documentation</link> may be consulted:</para> - - <figure xml:id="pageNumberingRoman"> - <title>Roman style page numbers.</title> - - <programlisting>... -<fo:page-sequence format="i" - master-reference="simplePageLayout"> - <fo:static-content - flow-name="xsl-region-after"> - <fo:block text-align-last='justify'> - <fo:leader leader-pattern="rule" - leader-length="100%"/> - </fo:block> - <fo:block font-weight="bold"> - <fo:page-number/> - </fo:block> - </fo:static-content> - - <fo:flow flow-name="xsl-region-body"> - <fo:block>Some text...</fo:block> - <fo:block>More text, more text, - more text.</fo:block> - <fo:block>More text, more text, - more text.</fo:block> - <fo:block>Enough text.</fo:block> - </fo:flow> -</fo:page-sequence> ...</programlisting> - - <mediaobject> - <imageobject> - <imagedata align="left" fileref="Ref/Fig/pageStack.fig"/> - </imageobject> - </mediaobject> - </figure> - </section> - - <section xml:id="foMarker"> - <title>Marker</title> - - <figure xml:id="dictionary"> - <title>A dictionary with running page headers.</title> - - <programlisting>... -<fo:page-sequence - master-reference="simplePageLayout"> - <fo:static-content flow-name="xsl-region-before"> - <fo:block font-weight="bold"> - <fo:retrieve-marker retrieve-class-name="alpha" - retrieve-position="first-starting-within-page" - />-<fo:retrieve-marker - retrieve-position="last-starting-within-page" - retrieve-class-name="alpha"/> - </fo:block> - <fo:block text-align-last='justify'> - <fo:leader leader-pattern="rule" leader-length="100%"/></fo:block> - </fo:static-content> - - <fo:flow flow-name="xsl-region-body"> - <fo:block> - <fo:marker marker-class-name="alpha">A - </fo:marker>Ant</fo:block> - <fo:block> - <fo:marker marker-class-name="alpha">B - </fo:marker>Bug</fo:block> - <fo:block> - <fo:marker marker-class-name="alpha">L - </fo:marker>Lion</fo:block> - <fo:block> - <fo:marker marker-class-name="alpha">N - </fo:marker>Nose</fo:block> - <fo:block> - <fo:marker marker-class-name="alpha">P - </fo:marker>Peg</fo:block> - </fo:flow> -</fo:page-sequence> ...</programlisting> - - <mediaobject> - <imageobject> - <imagedata align="left" fileref="Ref/Fig/dictionaryStack.fig"/> - </imageobject> - </mediaobject> - </figure> - </section> - - <section xml:id="foIntRef"> - <title>Internal references</title> - - <titleabbrev>References</titleabbrev> - - <para>Regarding printed documents we may define two categories of - document internal references:</para> - - <variablelist> - <varlistentry> - <term><emphasis>Page number references</emphasis></term> - - <listitem> - <para>This is the <quote>classical</quote> type of a reference - e.g. in books. An author refers the reader to a distant - location by writing <quote>... see further explanation in - section 4.5 on page 234</quote>. A book's table of contents - assigning page numbers to topics is another example. This way - the implementation of a reference relies solely on the - features a printed document offers.</para> - </listitem> - </varlistentry> - - <varlistentry> - <term><emphasis>Hypertext references</emphasis></term> - - <listitem> - <para>This way of implementing references utilizes features of - (online) viewers for printable documents. For example PDF - viewers like <productname - xlink:href="http://www.adobe.com">Adobe's Acrobat - reader</productname> or the evince application are able to - follow hypertext links in a fashion known from HTML browsers. - This browser feature is based on hypertext capabilities - defined in the Adobe's PDF de-facto standard.</para> - </listitem> - </varlistentry> - </variablelist> - - <para>Of course the second type of references is limited to people - who use an online viewer application instead of reading a document - from physical paper.</para> - - <para>We now show the implementation of <abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> - based page references. As already being discussed for <link - xlink:href="http://www.w3.org/TR/xml#id">ID</link> / <link - xlink:href="http://www.w3.org/TR/xml#idref">IDREF</link> pairs we - need a link destination (anchor) and a link source. The <abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> - standard uses the same anchor implementation as in XML for <link - xlink:href="http://www.w3.org/TR/xml#id">ID</link> typed attributes: - <abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> - objects <emphasis>may</emphasis> have an attribute <link - xlink:href="http://www.w3.org/TR/xsl/#id">id</link> with a document - wide unique value. The <abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> - element <link - xlink:href="http://www.w3.org/TR/xsl/#fo_page-number-citation">fo:page-number-citation</link> - is used to actually create a page reference via its attribute <link - xlink:href="http://www.w3.org/TR/xsl/#ref-id">ref-id</link>:</para> - - <figure xml:id="refJavaXml"> - <title>Two blocks mutual page referencing each other.</title> - - <programlisting>... - <fo:flow flow-name='xsl-region-body'> - <fo:block id='xml'>Java section see page - <fo:page-number-citation ref-id='java'/>. - </fo:block> - - <fo:block id='java'>XML section see page - <fo:page-number-citation ref-id='xml'/>. - </fo:block> - </fo:flow> ...</programlisting> - - <mediaobject> - <imageobject> - <imagedata align="left" fileref="Ref/Fig/pagerefStack.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>NB: Be careful defining <link - xlink:href="http://www.w3.org/TR/xsl/#id">id</link> attributes for - objects being descendants of <link - xlink:href="http://www.w3.org/TR/xsl/#fo_static-content">fo:static-content</link> - nodes. Such objects typically appear on multiple pages and are - therefore no unique anchors. A reference carrying such an id value - thus actually refers to 1 <= n values on n different pages. - Typically a user agent will choose the first object of this set when - clicking the link. So in effect the parent <link - xlink:href="http://www.w3.org/TR/xsl/#fo_page-sequence">fo:page-sequence</link> - is chosen as the effective link target.</para> - - <para>The element <link - xlink:href="http://www.w3.org/TR/xsl/#fo_basic-link">fo:basic-link</link> - creates PDF hypertext links. We extend the previous example:</para> - - <figure xml:id="refJavaXmlHyper"> - <title>Two blocks with mutual page- and hypertext - references.</title> - - <programlisting><fo:flow flow-name='xsl-region-body'> - <fo:block id='xml'>Java section see <fo:basic-link color="blue" - internal-destination="java">page<fo:page-number-citation - ref-id='java'/>.</fo:basic-link></fo:block> - -<fo:block id='java'>XML section see - <fo:basic-link color="blue" - internal-destination="xml">page <fo:page-number-citation - ref-id='xml'/>.</fo:basic-link></fo:block > -</fo:flow></programlisting> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/pagerefhyperStack.fig"/> - </imageobject> - </mediaobject> - </figure> - </section> - - <section xml:id="pdfBookmarks"> - <title>PDF bookmarks</title> - - <titleabbrev>Bookmarks</titleabbrev> - - <para>The PDF specification allows to define so called bookmarks - offering an explorer like navigation:</para> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/pdfbookmarks.screen.png"/> - </imageobject> - </mediaobject> - - <para>PDF bookmarks are <link - xlink:href="http://www.w3.org/TR/2006/REC-xsl11-20061205/#d0e14206">part - of the XSL-FO 1.1</link> Standard. Some <abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> - processors still continue to use proprietary solutions for bookmark - creation with respect to the older <abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> - 1.0 standard. For details of bookmark extensions by - <orgname>RenderX</orgname>'s processor see <link - xlink:href="http://www.renderx.com/tutorial.html#PDF_Bookmarks">xep's - documentation</link>.</para> - </section> - </section> - - <section xml:id="xml2fo"> - <title>Constructing <abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> - from XML documents</title> - - <titleabbrev><abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> - from XML</titleabbrev> - - <para>So far we have learnt some basic <abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> - elements. As with HTML we typically generate FO code from other - sources rather than crafting it by hand. The general picture - is:</para> - - <figure xml:id="htmlFoProduction"> - <title>Different target formats from common source.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/crossmedia.fig"/> - </imageobject> - </mediaobject> - - <caption> - <para>We may generate both online and printed documentation from a - common source. This requires style sheets for the desired - destination formats in question.</para> - </caption> - </figure> - - <para>We discussed the <abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> - standard as an input format for printable output production by a - renderer. In this way a <abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> - document is similar to HTML being a format to be rendered by a web - browser for visual (screen oriented) output production. The - transformation from a XML source (e.g. a memo document) to <abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> - is still missing. As for HTML we may use <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> as a - transformation means. We generate the sender's surname from a memo - document instance:</para> - - <figure xml:id="memo2fosurname"> - <title>Generating a sender's surname for printing.</title> - - <programlisting><?xml version="1.0" encoding="utf-8"?> -<xsl:stylesheet version="1.0" - xmlns:fo="http://www.w3.org/1999/XSL/Format" - xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> - - <xsl:output method="xml" indent="yes"/> - - <xsl:template match="/"> - <fo:root> - <fo:layout-master-set> - <fo:simple-page-master master-name="simplePageLayout" - page-width="294mm" page-height="210mm" margin="5mm"> - <fo:region-body margin="15mm"/> - </fo:simple-page-master> - </fo:layout-master-set> - <fo:page-sequence master-reference="simplePageLayout"> - <fo:flow flow-name="xsl-region-body"> - <fo:block font-size="20pt"> - <xsl:text>Sender:</xsl:text> - <fo:inline font-weight='bold'> - <xsl:value-of select="memo/from/surname"/> - </fo:inline> - </fo:block> - </fo:flow> - </fo:page-sequence> - </fo:root> - </xsl:template> -</xsl:stylesheet></programlisting> - </figure> - - <para>A suitable XML document instance reads:</para> - - <figure xml:id="memoMessage"> - <title>A <code>memo</code> document instance.</title> - - <programlisting><?xml version="1.0" ?> -<!DOCTYPE memo SYSTEM "memo.dtd"> -<memo> - <from> - <name>Martin</name> - <surname>Goik</surname> - </from> - <to> - <name>Adam</name> - <surname>Hacker</surname> - </to> - <to> - <name>Eve</name> - <surname>Intruder</surname> - </to> - <date year="2005" month="1" day="6"/> - <subject>Firewall problems</subject> - <content> - <para>Thanks for your excellent work.</para> - <para>Our firewall is definitely broken!</para> - </content> -</memo></programlisting> - </figure> - - <para>Some remarks:</para> - - <orderedlist> - <listitem> - <para>The <link - xlink:href="http://www.w3.org/TR/2007/REC-xslt20-20070123/#element-stylesheet">xsl_stylesheet</link> - element contains a namespace definition for the target FO - document's namespace, namely:</para> - - <programlisting>xmlns:xsl="http://www.w3.org/1999/XSL/Transform"</programlisting> - - <para>This is required to use elements like <link - xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link> - belonging to the FO namespace.</para> - </listitem> - - <listitem> - <para>The option value <code>indent="yes"</code> in <link - xlink:href="http://www.w3.org/TR/2007/REC-xslt20-20070123/#element-output">xsl_output</link> - is usually set to "no" in a production environment to avoid - whitespace related problems.</para> - </listitem> - - <listitem> - <para>The generation of a print format like PDF is actually a two - step process. To generate message.pdf from message.xml by a - stylesheet memo2fo.xsl we need the following calls:</para> - - <variablelist> - <varlistentry> - <term><emphasis>XML document instance to FO</emphasis></term> - - <listitem> - <programlisting>xml2xml message.xml memo2fo.xsl -o message.fo</programlisting> - </listitem> - </varlistentry> - - <varlistentry> - <term><emphasis>FO to PDF</emphasis></term> - - <listitem> - <programlisting>fo2pdf -fo message.fo -pdf message.pdf</programlisting> - </listitem> - </varlistentry> - </variablelist> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/xml2fo2pdf.fig"/> - </imageobject> - </mediaobject> - - <para>When debugging of the intermediate <abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> - file is not required both steps may be combined into a single - call:</para> - - <programlisting>fo2pdf -xml message.xml -xsl memo2fo.xsl -pdf message.pdf</programlisting> - </listitem> - </orderedlist> - </section> - - <section xml:id="foCatalog"> - <title>Formatting a catalog.</title> - - <titleabbrev>A catalog</titleabbrev> - - <para>We now take the <link linkend="climbingCatalog">climbing catalog - example</link> with prices being added and incrementally create a - series of PDF versions improving from one version to another.</para> - - <qandaset role="exercise"> - <title>A first PDF version of the catalog</title> - - <qandadiv> - <qandaentry xml:id="idCatalogStart"> - <question> - <para>Write a <abbrev - xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> script - to generate a starting version <filename - xlink:href="Ref/src/Dom/climbenriched.start.pdf">climbenriched.start.pdf</filename>.</para> - </question> - - <answer> - <programlisting><?xml version="1.0" encoding="utf-8"?> -<xsl:stylesheet version="1.0" - xmlns:fo="http://www.w3.org/1999/XSL/Format" - xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> - - <xsl:output method="xml" indent="yes"/> - - <xsl:template match="/"> - <fo:root font-size="10pt"> - <fo:layout-master-set> - <fo:simple-page-master master-name="productPage" - page-width="80mm" page-height="110mm" margin="5mm"> - <fo:region-body margin="15mm"/> - <fo:region-before extent="10mm"/> - </fo:simple-page-master> - </fo:layout-master-set> - <xsl:apply-templates select="catalog/product" /> - </fo:root> - </xsl:template> - - <xsl:template match="product"> - <fo:page-sequence master-reference="productPage"> - <fo:static-content flow-name="xsl-region-before"> - <fo:block font-weight="bold"> - <xsl:value-of select="title"/> - </fo:block> - </fo:static-content> - <fo:flow flow-name="xsl-region-body"> - <xsl:apply-templates select="description/para"/> - - <fo:block>Price:<xsl:value-of select="@price"/></fo:block> - <fo:block>Order no:<xsl:value-of select="@id"/></fo:block> - </fo:flow> - </fo:page-sequence> - </xsl:template> - - <xsl:template match="para"> - <fo:block space-after="10px"> - <xsl:value-of select="."/> - </fo:block> - </xsl:template> - -</xsl:stylesheet></programlisting> - </answer> - </qandaentry> - - <qandaentry xml:id="idCatalogProduct"> - <question> - <label>Header, page numbers and table formatting</label> - - <para>Extend <xref linkend="idCatalogStart"/> by adding page - numbers. The order number and prices shall be formatted as - tables. Add a ruler to each page's head. The result should - look like <filename - xlink:href="Ref/src/Dom/climbenriched.product.pdf">climbenriched.product.pdf</filename></para> - </question> - - <answer> - <para>Solution see <filename - xlink:href="Ref/src/Dom/catalog2fo.product.xsl">catalog2fo.product.xsl</filename>.</para> - </answer> - </qandaentry> - - <qandaentry xml:id="idCatalogToc"> - <question> - <label>A table of contents.</label> - - <para>Each product description's page number shall appear in a - table of contents together with the product's - <code>title</code> as in <filename - xlink:href="Ref/src/Dom/climbenriched.toc.pdf">climbenriched.toc.pdf</filename>.</para> - </question> - - <answer> - <para>Solution see <filename - xlink:href="Ref/src/Dom/catalog2fo.toc.xsl">catalog2fo.toc.xsl</filename>.</para> - </answer> - </qandaentry> - - <qandaentry xml:id="idCatalogToclink"> - <question> - <label>A table of contents with hypertext links.</label> - - <para>The table of contents' entries may offer hypertext - features to supporting browsers as in <filename - xlink:href="Ref/src/Dom/climbenriched.toclink.pdf">climbenriched.toclink.pdf</filename>. - In addition include the document's <tag - class="starttag">introduction</tag>.</para> - </question> - - <answer> - <para>Solution see <filename - xlink:href="Ref/src/Dom/catalog2fo.toclink.xsl">catalog2fo.toclink.xsl</filename>.</para> - </answer> - </qandaentry> - - <qandaentry xml:id="idCatalogFinal"> - <question> - <label>A final version.</label> - - <para>Add the following features:</para> - - <orderedlist> - <listitem> - <para>Number the table of contents starting with page i, - ii, iii, iv and so on. Start the product descriptions with - page 1. On each page's footer a text <quote>page xx of - yy</quote> shall be displayed. This requires the - definition of an anchor <code>id</code> on the <abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> - document's last page.</para> - </listitem> - - <listitem> - <para>Add PDF bookmarks by using <orgname>XEP</orgname>'s - <abbrev - xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> - extensions. This requires the namespace declaration - <code>xmlns:rx="http://www.renderx.com/XSL/Extensions"</code> - in the XSLT script's header.</para> - </listitem> - </orderedlist> - - <para>The result may look like <filename - xlink:href="Ref/src/Dom/climbenriched.final.pdf">climbenriched.final.pdf</filename>. - N.B.: It may take some effort to achieve this result. This - effort is left to the <emphasis>interested</emphasis> - participants.</para> - </question> - - <answer> - <para>Solution see <filename - xlink:href="Ref/src/Dom/catalog2fo.toclink.xsl">catalog2fo.toclink.xsl</filename>.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - </chapter> - - <chapter xml:id="introPersistence"> - <title>Accessing Relational Data</title> - - <section xml:id="persistence"> - <title>Persistence in Object Oriented languages</title> - - <para>Following <xref linkend="Bauer05"/> we may define persistence - by:</para> - - <blockquote> - <para>persistence allows an object to outlive the process that - created it. The state of the object may be stored to disk and an - object with the same state re-created at some point in the - future.</para> - </blockquote> - - <para>The notion of <quote>process</quote> refers to operating - systems. Let us start wit a simple example assuming a <trademark - xlink:href="http://www.oracle.com/us/technologies/java">Java</trademark> - class User:</para> - - <programlisting>public class User { - String cname; //The user's common name e.g. 'Joe Bix' - String uid; //The user's unique system ID (login name) e.g. 'bix' - -// getters, setters and other stuff - ... -}</programlisting> - - <para>A relational implementation might look like:</para> - - <programlisting>CREATE TABLE User( - CHAR(80) cname - ,CHAR(10) uid PRIMARY KEY -)</programlisting> - - <para>Now a <trademark - xlink:href="http://www.oracle.com/us/technologies/java">Java</trademark> - application may create instances of class <code>User</code> and save - these to a database:</para> - - <figure xml:id="processObjPersist"> - <title>Persistence across process boundaries</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/persistence.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>Both the <trademark - xlink:href="http://www.oracle.com/technetwork/java/javase">JRE</trademark> - instances and the RDBMS database server are processes (or sets of - processes) typically existing in different address spaces. The two - <trademark - xlink:href="http://www.oracle.com/technetwork/java/javase">JRE</trademark> - processes mentioned here may as well be started in disjoint address - spaces. In fact we might even run two entirely different applications - implemented in different programming languages like <abbrev - xlink:href="http://www.php.net">PHP</abbrev>.</para> - - <para>It is important to mention that the two arrows -  <quote>save</quote> and <quote>load</quote> thus typically denote a - communication across machine boundaries.</para> - </section> - - <section xml:id="jdbcIntro"> - <title>Introduction to <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark></title> - - <section xml:id="jdbcWrite"> - <title>Write access, principles</title> - - <para>Connecting an application to a database means to establish a - connection from a client to a database server:</para> - - <figure xml:id="jdbcClientServer"> - <title>Networking between clients and database servers</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/clientserv.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>So <trademark - xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> - is just one among a whole bunch of protocol implementations - connecting database servers and applications. Consequently - <trademark - xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> - is expected to appear in the lower layer of multi-tier applications. - We take a three-tier application as a starting point:</para> - - <figure xml:id="jdbcThreeTier"> - <title>The role of <trademark - xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> - in a three-tier application</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/jdbcThreeTier.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>We may add an additional layer. Web applications are typically - being build on top of an application server (<productname - xlink:href="http://www.ibm.com/software/de/websphere/">WebSphere</productname>, - <productname - xlink:href="http://glassfish.java.net">Glassfish</productname>, - <productname - xlink:href="http://www.jboss.org/jbossas">Jboss</productname>,...) - providing additional services:</para> - - <figure xml:id="jdbcFourTier"> - <title><trademark - xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> - connecting application server and database.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/jdbcFourTier.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>So what is actually required to connect to a database server? - A client requires the following parameter values to open a - connection:</para> - - <orderedlist> - <listitem xml:id="ItemJdbcProtocol"> - <para>The type of database server i.e. <productname - xlink:href="http://www.oracle.com/us/products/database">Oracle</productname>, - <productname - xlink:href="www.ibm.com/software/data/db2">DB2</productname>, - <productname - xlink:href="http://www-01.ibm.com/software/data/informix">Informix</productname>, - <productname - xlink:href="http://www.mysql.com">Mysql</productname> etc. This - information is needed because of vendor dependent <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - protocol implementations.</para> - </listitem> - - <listitem> - <para>The server's <link - xlink:href="http://en.wikipedia.org/wiki/Domain_Name_System">DNS</link> - name or IP number</para> - </listitem> - - <listitem> - <para>The database service's port number at the previously - defined host. The database server process listens for - connections to this port number.</para> - </listitem> - - <listitem xml:id="itemJdbcDatabaseName"> - <para>The database name within the given database server</para> - </listitem> - - <listitem> - <para>Optional: A database user's account name and - password.</para> - </listitem> - </orderedlist> - - <para>Items <xref linkend="ItemJdbcProtocol"/> - <xref - linkend="itemJdbcDatabaseName"/> will be encapsulated into a so - called <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - <link - xlink:href="http://en.wikipedia.org/wiki/Uniform_Resource_Locator">URL</link>. - We consider a typical example corresponding to the previous - parameter list:</para> - - <figure xml:id="jdbcUrlComponents"> - <title>Components of a <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - URL</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/jdbcurl.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>In fact this <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - URL example closely resembles other types of URL strings as being - defined in <uri - xlink:href="http://www.ietf.org/rfc/rfc2396.txt">http://www.ietf.org/rfc/rfc2396.txt</uri>. - Look for <code>opaque_part</code> to understand the second - <quote>:</quote> in the protocol definition part of a <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - URL. Common example for <abbrev - xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev>s - are:</para> - - <itemizedlist> - <listitem> - <para><code>http://www.hdm-stuttgart.de/aaa</code></para> - </listitem> - - <listitem> - <para><code>http://someserver.com:8080/someResource</code></para> - </listitem> - - <listitem> - <para><code>ftp://mirror.mi.hdm-stuttgart.de/Firmen</code></para> - </listitem> - </itemizedlist> - - <para>We notice the explicit mentioning of a port number 8080 in the - second example; The default <abbrev - xlink:href="http://www.w3.org/Protocols">http</abbrev> protocol port - number is 80. So if a web server accepts connections at port 80 we - do not have to specify this value. A web browser will automatically - use this default port.</para> - - <para>Actually the notion <quote><code>jdbc:mysql</code></quote> - denotes a sub protocol implementation namely<orgname> - Mysql</orgname>'s implementation of <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>. - Connecting to an IBM DB2 server would require jdbc:db2 for this - protocol part.</para> - - <para>In contrast to <abbrev - xlink:href="http://www.w3.org/Protocols">http</abbrev> no standard - ports are <quote>officially</quote> assigned for <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - protocol variants. Due to vendor specific implementations this does - not make any sense. Thus we <emphasis role="bold">always</emphasis> - have to specify the port number when opening <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - connections.</para> - - <para>Writing <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - based applications follows a simple scheme:</para> - - <figure xml:id="jdbcArchitecture"> - <title>Architecture of JDBC</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/jdbcarch.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>From a programmer's point of view the - <classname>java.sql.DriverManager</classname> is a bootstrapping - object: Other objects like Statement instances are created from this - central and unique object.</para> - - <para>The first instance being created by the - <classname>java.sql.DriverManager</classname> is an object of type - <classname>java.sql.Connection</classname>. In <xref - linkend="exerciseJdbcWhyInterface"/> we discuss the way vendor - specific implementation details are hidden by Interfaces. We can - distinguish between:</para> - - <orderedlist> - <listitem> - <para>Vendor neutral specific parts of a <trademark - xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> - environment. These are those components being shipped by Oracle - or other organizations providing <trademark - xlink:href="http://www.oracle.com/us/technologies/java">Java</trademark> - runtimes. The class - <classname>java.sql.DriverManager</classname> belongs to this - domain.</para> - </listitem> - - <listitem> - <para>Vendor specific parts. In <xref - linkend="jdbcArchitecture"/> this starts with the - <classname>java.sql.Connection</classname> object.</para> - </listitem> - </orderedlist> - - <para>The <classname>java.sql.Connection</classname> object thus - marks the boundary between a <trademark - xlink:href="http://www.oracle.com/technetwork/java/javase">JDK</trademark> - / <trademark - xlink:href="http://www.oracle.com/technetwork/java/javase">JRE</trademark> - and a <trademark - xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> - Driver implementation from e.g. Oracle or other institutions.</para> - - <para><xref linkend="jdbcArchitecture"/> does not show details about - the relations between <classname>java.sql.Connection</classname>, - <classname>java.sql.Statement</classname> and - <classname>java.sql.ResultSet</classname> objects. We start by - giving a rough description of the tasks and responsibilities these - three types have:</para> - - <glosslist> - <glossentry> - <glossterm><classname>java.sql.Connection</classname></glossterm> - - <glossdef> - <para>Holding a permanent connection to a database server. - Both client and server can contact each other. The database - server may for example terminate a transaction if problems - like deadlocks occur.</para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm><classname>java.sql.Statement</classname></glossterm> - - <glossdef> - <para>We have two distinct classes of actions:</para> - - <orderedlist> - <listitem> - <para>Instructions to modify data on the database server. - These include <code>INSERT</code>, <code>UPDATE</code> and - <code>DELETE</code> operations as far as - <abbrev>SQL-DML</abbrev> is concerned. <trademark - xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> - acts as a means of transport and merely returns integer - values back to the client like the number of rows being - affected by an UPDATE.</para> - </listitem> - - <listitem> - <para>Instructions reading data from the server. This is - done by sending SELECT statements. It is not sufficient to - just return integer values: Instead <trademark - xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> - needs to copy complete datasets back to the client to fill - containers being accessible by applications. This is being - discussed in <xref linkend="jdbcRead"/>.</para> - </listitem> - </orderedlist> - </glossdef> - </glossentry> - </glosslist> - - <para>We shed some light on the relationship between these important - <trademark - xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> - components and their respective creation:<figure - xml:id="jdbcObjectCreation"> - <title>Important <trademark - xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> - instances and relationships.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/jdbcObjectRelation.fig"/> - </imageobject> - </mediaobject> - </figure></para> - </section> - - <section xml:id="writeAccessCoding"> - <title>Write access, coding!</title> - - <para>So how does it actually work with respect to coding? You may - want to read <xref linkend="toolingConfigJdbc"/> before starting - your exercises. We first prepare a database table using Eclipse's - database tools:</para> - - <figure xml:id="figSchemaPerson"> - <title>A relation <code>Person</code> containing names and email - addresses</title> - - <programlisting><emphasis role="strong">CREATE</emphasis> <emphasis - role="strong">TABLE</emphasis> Person ( - name CHAR(20) - ,email CHAR(20) <emphasis>UNIQUE</emphasis>)</programlisting> - </figure> - - <para>Our actual (toy) <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - application will insert a single object ('Jim', 'jim@foo.org') into - the <code>Person</code> relation. This is simpler than reading data - since no client <classname>java.sql.ResultSet</classname> container - is needed:</para> - - <figure xml:id="figJdbcSimpleWrite"> - <title>A simple <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - application inserting data into a relational table.</title> - - <programlisting language="java">01 package sda.jdbc.intro.v1; -02 -03 import java.sql.Connection; -04 import java.sql.DriverManager; -05 import java.sql.SQLException; -06 import java.sql.Statement; -07 -08 public class SimpleInsert { -09 -10 public static void main(String[] args) throws SQLException { -11 // Step 1: Open a connection to the database server -12 final Connection conn = DriverManager.getConnection( -13 "jdbc:mysql://localhost:3306/hdm", "hdmuser", "XYZ"); -14 // Step 2: Create a Statement instance -15 final Statement stmt = conn.createStatement(); -16 // Step 3: Execute the desired INSERT -17 final int updateCount = stmt.executeUpdate( -18 "INSERT INTO Person VALUES('Jim', 'jim@foo.org')"); -19 // Step 4: Give feedback to the enduser -20 System.out.println("Successfully inserted " + updateCount + " dataset(s)"); -21 } -22 }</programlisting> - </figure> - - <para>Looks simple? Unfortunately it does not (yet) work:</para> - - <programlisting>Exception in thread "main" java.sql.SQLException: <emphasis - role="bold">No suitable driver found for jdbc:mysql://localhost:3306/hdm</emphasis> - at java.sql.DriverManager.getConnection(DriverManager.java:604) - at java.sql.DriverManager.getConnection(DriverManager.java:221) - at sda.jdbc.intro.SimpleInsert.main(SimpleInsert.java:12)</programlisting> - - <para>What's wrong here? In <xref linkend="figureConfigJdbcDriver"/> - we needed a <productname - xlink:href="http://www.mysql.com">Mysql</productname> <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - Driver implementation <filename>mysql-connector-java.jar</filename> - as a prerequisite to open connections to a database server. This - implementation is mandatory for our toy application as well. All we - have to do is adding <filename>mysql-connector-java.jar</filename> - to our <trademark - xlink:href="http://www.oracle.com/us/technologies/java">Java</trademark> - <varname>CLASSPATH</varname> at <emphasis - role="bold">runtime</emphasis>.</para> - - <para>Depending on our <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - environment this will be achieved by different means. Eclipse - requires the definition of a run configuration as being described in - <uri - xlink:href="http://help.eclipse.org/juno/index.jsp?topic=/org.eclipse.jdt.doc.user/tasks/tasks-java-local-configuration.htm">http://help.eclipse.org/juno/index.jsp?topic=/org.eclipse.jdt.doc.user/tasks/tasks-java-local-configuration.htm</uri>. - When configuring a run-time configuration for - <classname>sda.jdbc.intro.SimpleInsert</classname> we have to add - <filename>mysql-connector-java.jar</filename> to the - <varname>Classpath</varname> tab. The following screen shot shows a - working configuration:</para> - - <figure xml:id="figureConfigRunExtJar"> - <title>Creating an Eclipse run time configuration containing a - <productname xlink:href="http://www.mysql.com">Mysql</productname> - <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - Driver Jar marked red.</title> - - <screenshot> - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/runConfigJarAnnot.screen.png"/> - </imageobject> - </mediaobject> - </screenshot> - </figure> - - <para>This time execution works as expected:</para> - - <programlisting>Successfully inserted 1 dataset(s)</programlisting> - - <qandaset role="exercise"> - <qandadiv> - <qandaentry> - <question> - <para>A second invocation of - <classname>sda.jdbc.intro.v1.SimpleInsert</classname> yields - the following runtime error:</para> - - <programlisting>Exception in thread "main" com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: - <emphasis role="bold">Duplicate entry 'jim@foo.org' for key 'email'</emphasis> -... - at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1617) - at sda.jdbc.intro.SimpleInsert.main(SimpleInsert.java:17)</programlisting> - </question> - - <answer> - <para>This expected error is easy to understand: The - exception's message text <emphasis role="bold">Duplicate - entry 'Jim' for key 'PRIMARY'</emphasis> informs us about a - UNIQUE key constraint violation with respect to the - attribute <code>email</code> in our schema definition in - <xref linkend="figSchemaPerson"/>. We cannot add a second - entry with the same value <code>'jim@foo.org'</code>.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <para>It is worth to mention that the <productname - xlink:href="http://www.mysql.com">Mysql</productname> driver - implementation does not have to be available at compile time. JDBC - uses interfaces in favour of concrete class. Only at runtime we do - need concrete classes.</para> - - <para>On the other hand when working with eclipse we need a separate - runtime configuration for each runnable <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - application. This becomes tedious after some time. So you may want - to follow the author and just add - <filename>mysql-connector-java.jar</filename> to your compile time - <envar>CLASSPATH</envar>.</para> - - <para>We now discuss some important methods being defined in the - relevant <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - interfaces:</para> - - <glosslist> - <glossentry> - <glossterm><classname>java.sql.Connection</classname></glossterm> - - <glossdef> - <itemizedlist> - <listitem> - <para><link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#createStatement()">createStatement()</link></para> - </listitem> - - <listitem> - <para><link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#setAutoCommit(boolean)">setAutoCommit()</link>, - <link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#getAutoCommit()">getAutoCommit()</link></para> - </listitem> - - <listitem> - <para><link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#getWarnings()">getWarnings()</link></para> - </listitem> - - <listitem> - <para><link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#isClosed()">isClosed()</link>, - <link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#isValid(int)">isValid(int - timeout)</link></para> - </listitem> - - <listitem> - <para><link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#rollback()">rollback()</link>, - <link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#commit()">commit()</link> - and .</para> - </listitem> - - <listitem> - <para><link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#close()">close()</link></para> - </listitem> - </itemizedlist> - </glossdef> - </glossentry> - - <glossentry> - <glossterm><classname>java.sql.Statement</classname></glossterm> - - <glossdef> - <itemizedlist> - <listitem> - <para><link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#executeUpdate(java.lang.String)">executeUpdate(String - sql)</link></para> - </listitem> - - <listitem> - <para><link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#getConnection()">getConnection()</link></para> - </listitem> - - <listitem> - <para><link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#getResultSet()">getResultSet()</link></para> - </listitem> - - <listitem> - <para><link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#close()">close()</link> - and <link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#isClosed()">isClosed()</link></para> - </listitem> - </itemizedlist> - </glossdef> - </glossentry> - </glosslist> - - <qandaset role="exercise"> - <title><trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - and transactions</title> - - <qandadiv> - <qandaentry> - <question> - <para><link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#setAutoCommit(boolean)">How - does the method setAutoCommit()</link> relate to <link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#commit()">commit()</link> - and <link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#rollback()">rollback()</link>?</para> - </question> - - <answer> - <para>A connections default state is <code>autocommit == - true</code>. This means that individual SQL statements are - executed as separate transactions.</para> - - <para>If we want to group two or more statements into a - transaction we have to:</para> - - <orderedlist> - <listitem> - <para>Call - <code>connection.setAutoComit(false)</code></para> - </listitem> - - <listitem> - <para>From now on subsequent SQL statements will - implicitly become part of a transaction till either of - the three events happens:</para> - - <orderedlist numeration="loweralpha"> - <listitem> - <para><code>connection.commit()</code></para> - </listitem> - - <listitem> - <para><code>connection.rollback()</code></para> - </listitem> - - <listitem> - <para>The transaction gets aborted by the database - server. This may for example happen in case of a - deadlock conflict with a second transaction.</para> - </listitem> - </orderedlist> - - <para>Note that the first two events are initiated by - our client software. The third possible action is being - carried out by the database server.</para> - </listitem> - </orderedlist> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <qandaset role="exercise"> - <qandadiv> - <qandaentry> - <question> - <para>Why is it very important to call the close() method - for <classname>java.sql.Connection</classname> and / or - <classname>java.sql.Statement</classname> instances?</para> - </question> - - <answer> - <para>A <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - connection ties network resources (socket connections). - These may be used up if e.g. new connections get established - within a loop without being closed.</para> - - <para>The situation is comparable to memory leaks when using - programming languages lacking a garbage collector.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <qandaset role="exercise"> - <title>Aborted transactions</title> - - <qandadiv> - <qandaentry> - <question> - <para>In the previous exercise we mentioned the possibility - of a transaction abort issued by the database server. Which - responsibility arises for an application programmer? Hint: - How may an implementation become aware of such an abort - transaction event?</para> - </question> - - <answer> - <para>If a database server aborts a transaction a - <classname>java.sql.SQLException</classname> will be thrown. - An application must be aware of this possibility and thus - implement a sensible <code>catch(...)</code> clause - accordingly.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <qandaset role="exercise"> - <title>Interfaces and classes in <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark></title> - - <qandadiv> - <qandaentry xml:id="exerciseJdbcWhyInterface"> - <question> - <para>The <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - standard mostly defines interfaces as - <classname>java.sql.Connection</classname> and - <classname>java.sql.Statement</classname>. Why are these not - being defined as classes? Moreover why is - <classname>java.sql.DriverManager</classname> being defined - as a class rather than an interface?</para> - - <para>You may want to supply code examples to explain your - argumentation.</para> - </question> - - <answer> - <para>Figure <xref linkend="jdbcArchitecture"/> tells us - about the vendor independent architecture of <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>. - Oracle for example may implement a class - <code>com.oracle.jdbc.OracleConnection</code>:</para> - - <programlisting annotations="nojavadoc">package com.oracle.jdbc; - -import java.sql.Connection; -import java.sql.Statement; -import java.sql.SQLException; - -public class OracleConnection implements Connection { - -... - -Statement createStatement(int resultSetType, - int resultSetConcurrency) - throws SQLException) { - // Implementation omitted here due to - // limited personal hacking capabilities - ... -} -... -}</programlisting> - - <para>If a programmer only uses the <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - interfaces rather than a vendor's classes it is much easier - to make the resulting application work with different - databases from other vendors. This way a company's - implementation is not exposed to our own <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - code.</para> - - <para>Regarding the special role of - <classname>java.sql.DriverManager</classname> we notice the - need of a starting point: We have to create an initial - instance of some class. In theory (<emphasis role="bold">BUT - NOT IN PRACTICE!!!</emphasis>) the following (ugly code) - might be possible:</para> - - <programlisting>package my.personal.application; - -import java.sql.Connection; -import java.sql.Statement; -import java.sql.SQLException; - -public someClass { - - public void someMethod(){ - - Connection conn = <emphasis role="bold">new OracleConnection()</emphasis>; // bad idea! - ... - } - ... -}</programlisting> - - <para>The problem with this approach is the explicit - constructor call: Whenever we want to use another database - we have two possibilities:</para> - - <itemizedlist> - <listitem> - <para>Rewrite our code.</para> - </listitem> - - <listitem> - <para>Introduce some sort of switch statement to provide - a fixed number of databases beforehand:</para> - - <programlisting>public void someMethod(final String vendor){ - - final Connection conn; - - switch(vendor) { - case "ORACLE": - conn = new OracleConnection(); - break; - - case "DB2": - conn = new Db2Connection(); - break; - - default: - conn = null; - break; - } - ... -}</programlisting> - - <para>Adding a new database still requires code - rewriting.</para> - </listitem> - </itemizedlist> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <qandaset role="exercise"> - <title>Driver dispatch mechanism</title> - - <qandadiv> - <qandaentry> - <question> - <para>In exercise <xref linkend="exerciseJdbcWhyInterface"/> - we saw a hypothetic way to resolve the interface/class - resolution problem by using a switch clause. How is this - <code>switch</code> clause's logic actually realized in a - <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - based application? (<quote>behind the scenes</quote>)</para> - - <para>Hint: Read the documentation of - <classname>java.sql.DriverManager</classname>.</para> - </question> - - <answer> - <para>Prior to opening a Connection a <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - driver registers itself at the - <classname>java.sql.DriverManager</classname> singleton - instance. For this purpose the standard defined the method - <link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/DriverManager.html#registerDriver(java.sql.Driver)">registerDriver(Driver)</link>. - On success the <classname>java.sql.DriverManager</classname> - adds the driver to an internal dictionary:</para> - - <informaltable border="1"> - <col width="20%"/> - - <col width="30%"/> - - <tr> - <th>protocol</th> - - <th>driver instance</th> - </tr> - - <tr> - <td>jdbc:mysql</td> - - <td>mysqlDriver instance</td> - </tr> - - <tr> - <td>jdbc:oracle</td> - - <td>oracleDriver instance</td> - </tr> - - <tr> - <td>...</td> - - <td>...</td> - </tr> - </informaltable> - - <para>So whenever the method <link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/DriverManager.html#getConnection(java.lang.String,%20java.lang.String,%20java.lang.String)">getConnection()</link> - is being called the - <classname>java.sql.DriverManager</classname> will scan the - <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - URL and isolate the protocol part. If we start with - <code>jdbc:mysql://someserver.com:3306/someDatabase</code> - this is just <code>jdbc:mysql</code>. The value is then - being looked up in the above table of registered drivers to - choose an appropriate instance or null otherwise. This way - our hypothetic switch including the default value null is - actually implemented.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="propertiesFile"> - <title>Connection properties</title> - - <para>So far our application depicted in <xref - linkend="figJdbcSimpleWrite"/> suffers both from missing error - handling and hard-coded parameters.</para> - - <para>Professional applications must be configurable. Changing the - password currently requires source code modification and - recompilation. <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - offers a standard procedure to externalize parameters like - <varname>username</varname>, <varname>password</varname> an - <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - connection URL as being present in <xref - linkend="figJdbcSimpleWrite"/>: We may externalize these parameters - to external so called properties files:</para> - - <figure xml:id="propertyExternalization"> - <title>Externalize a single string <code>"User name"</code> to a - separate file <filename>message.properties</filename>.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/externalize.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>The current figure shows the externalization of just a single - property. The file <filename>message.properties</filename> contains - key-value pairs. The key <code>PropHello.uname</code> contains the - value <code>User name</code>. Multiple strings may be externalized - to the same properties file.</para> - - <para>Eclipse does have tool support for externalization. Simply hit - Source --> Externalize Strings from the context menu. This - activates a wizard to define property keys, renaming the generated - helper class' name and finally create the actual - <filename>message.properties</filename> file.</para> - - <qandaset role="exercise"> - <title>Moving <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - <abbrev - xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev> and - credentials to a property file</title> - - <qandadiv> - <qandaentry> - <question> - <para>Start executing the code given in <xref - linkend="figJdbcSimpleWrite"/>. Then extend this example by - externalizing all <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - related connection parameters to a - <filename>jdbc.properties</filename> file like:</para> - - <programlisting>SimpleInsert.jdbcUrl=jdbc:mysql://localhost:3306/hdm -SimpleInsert.password=XYZ -SimpleInsert.username=hdmuser</programlisting> - - <para>As being stated earlier the eclipse wizard assists you - by generating both the properties file and a helper class - reading that file at runtime.</para> - </question> - - <answer> - <para>The current exercise is mostly related to tooling. - From our <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - code the context menu allows us to choose the desired - wizard:</para> - - <informalfigure> - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/externalize.screen.png"/> - </imageobject> - </mediaobject> - </informalfigure> - - <para>We may now:</para> - - <itemizedlist> - <listitem> - <para>Select the strings to be externalized.</para> - </listitem> - - <listitem> - <para>Supply key names. In the subsequent screenshot - this task has already been started by manually replacing - the default <code>SimpleInsert.1</code> by - <code>Simpleinsert.jdbc</code>.</para> - </listitem> - - <listitem> - <para>Redefine other parameters like prefix, properties - file name etc. In the following screenshot only the - first of three keys has been manually renamed to the - sensible value - <varname>SimpleInsert.jdbc</varname>.</para> - </listitem> - </itemizedlist> - - <informalfigure> - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/externalize2.screen.png"/> - </imageobject> - </mediaobject> - </informalfigure> - - <para>The wizard also generates a class - <classname>sda.jdbc.intro.v1.DbProps</classname> to actually - access our properties:</para> - - <programlisting language="java">package sda.jdbc.intro.v1; -... -public class DbProps { - private static final String BUNDLE_NAME = "sda.jdbc.intro.v1.database"; - - private static final ResourceBundle RESOURCE_BUNDLE = ResourceBundle - .getBundle(BUNDLE_NAME); - - private DbProps() { - } - - public static String getString(String key) { - try { - return RESOURCE_BUNDLE.getString(key); - } catch (MissingResourceException e) { - return '!' + key + '!'; - } - } -}</programlisting> - - <para>Our <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - related code now contains three references to external - properties:</para> - - <programlisting language="java">package sda.jdbc.intro.v1; -... -public class SimpleInsert { - - - public static void main(String[] args) throws SQLException { - // Step 1: Open a connection to the database server - final Connection conn = DriverManager.getConnection ( - <emphasis role="bold">DbProps.getString("PersistenceHandler.jdbcUrl"), </emphasis> - <emphasis role="bold">DbProps.getString("PersistenceHandler.username")</emphasis>, - <emphasis role="bold">DbProps.getString("PersistenceHandler.password")</emphasis>); - // Step 2: Create a Statement instance - final Statement stmt = conn.createStatement(); - // Step 3: Execute the desired INSERT - final int updateCount = stmt.executeUpdate( - "INSERT INTO Person VALUES('Jim', 'jim@foo.org')"); - // Step 4: Give feedback to the enduser - System.out.println("Successfully inserted " + updateCount + " dataset(s)"); - } -}</programlisting> - - <para>The current base name - <classname>sda.jdbc.intro.v1.PersistenceHandler</classname> - is related to a later exercise.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="sectSimpleInsertGui"> - <title>A first GUI sketch</title> - - <para>So far all data records being transferred to the database - server are still hard-coded in our application. In practice a user - wants to enter data of persons to be submitted to the - database.</para> - - <para>We now guide you to develop a first version of a simple GUI - for this tasks. A more <link linkend="figureDataInsert2">elaborate - version</link> will be presented in a follow-up exercise. The - screenshot illustrates the intended application behaviour:</para> - - <figure xml:id="simpleInsertGui"> - <title>A simple GUI to insert data into a database server.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/simpleInsertGui.screen.png"/> - </imageobject> - </mediaobject> - - <caption> - <para>After clicking <quote>Insert</quote> a message is being - presented to the user. This message may as well indicate a - failure.</para> - </caption> - </figure> - - <para>Implementing Swing GUI applications requires knowledge as - being taught in e.g. <link - xlink:href="http://www.hdm-stuttgart.de/studenten/stundenplan/vorlesungsverzeichnis/vorlesung_detail?vorlid=5212221">113300 - Entwicklung von Web-Anwendungen</link>. If you do not (yet) feel - comfortable writing <productname - xlink:href="http://docs.oracle.com/javase/tutorial/uiswing/index.html">Swing</productname> - applications you may want to read <uri - xlink:href="http://www.javamex.com/tutorials/swing">http://www.javamex.com/tutorials/swing</uri> - and <emphasis role="bold">really</emphasis> understand the examples - being presented therein.</para> - - <qandaset role="exercise"> - <title>GUI for inserting Person data to a database server</title> - - <qandadiv> - <qandaentry> - <question> - <para>Write a GUI application as being outlined in <xref - linkend="simpleInsertGui"/>. You may proceed as - follows:</para> - - <orderedlist> - <listitem> - <para>Write a dummy GUI without any database - functionality. Only present the two labels an input - fields and the Insert button.</para> - </listitem> - - <listitem> - <para>Add an - <classname>java.awt.event.ActionListener</classname> - which generates a SQL INSERT Statement when clicking the - Insert button. Return this string to the user as being - shown in the message window of <xref - linkend="simpleInsertGui"/>.</para> - - <para>At this point you still do not need a database - connection. The message shown to the user is just a - fake, so the GUI <emphasis - role="bold">appears</emphasis> to be working.</para> - </listitem> - - <listitem> - <para>Establish a - <classname>java.sql.Connection</classname> and create a - <classname>java.sql.Statement</classname> instance when - launching your application. Use the latter in your - <classname>java.awt.event.ActionListener</classname> to - actually insert datasets into your database.</para> - </listitem> - </orderedlist> - </question> - - <answer> - <para>The complete implementation resides in - <classname>sda.jdbc.intro.v01.InsertPerson</classname>:</para> - - <programlisting language="java">package sda.jdbc.intro.v01; - -import ... - -public class InsertPerson extends JFrame { - - ... - - public InsertPerson () throws SQLException{ - super ("Add a person's data"); - - setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); - - final JPanel databaseFieldPanel = new JPanel(); - databaseFieldPanel.setLayout(new GridLayout(0,2)); - add(databaseFieldPanel, BorderLayout.CENTER); - - databaseFieldPanel.add(new JLabel("Name:")); - final JTextField nameField = new JTextField(15); - databaseFieldPanel.add(nameField); - - databaseFieldPanel.add(new JLabel("E-mail:")); - final JTextField emailField = new JTextField(15); - databaseFieldPanel.add(emailField); - - final JButton insertButton = new JButton("Insert"); - add(insertButton, BorderLayout.SOUTH); - - final Connection conn = DriverManager.getConnection( - "jdbc:mysql://localhost:3306/hdm", "hdmuser", "XYZ"); - final Statement stmt = conn.createStatement(); - - insertButton.addActionListener(new ActionListener() { - // Linking the GUI to the database server. We assume an open - // connection and a correctly initialized Statement instance - @Override - public void actionPerformed(ActionEvent event) { - final String sql = "INSERT INTO Person VALUES('" + nameField.getText()+ "', '" - + emailField.getText() + "')"; - // We have to catch this Exception because an ActionListener's signature - // prohibits the existence of a "throws" clause. - try { - final int updateCount = stmt.executeUpdate(sql); - JOptionPane.showMessageDialog(null, "Successfully executed \n'" + sql + "'\nand inserted " - + updateCount + " dataset"); - } catch (SQLException e) { - e.printStackTrace(); - } - } - }); - pack(); - } -}</programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="jdbcExceptions"> - <title>Handling possible exceptions</title> - - <para>Our current code lacks any kind of error handling: Exceptions - will not be caught at all and invariably lead to program - termination. This is of course inadequate regarding professional - software. In case of problems we have to:</para> - - <itemizedlist> - <listitem> - <para>Gracefully recover or shut down our application. We may - for example show a pop up window <quote>Terminating due to an - internal error</quote>.</para> - </listitem> - - <listitem> - <para>Enable the customer to supply the development team with - helpful information. The user may for example be asked to submit - a log file in case of errors.</para> - </listitem> - </itemizedlist> - - <para>In addition the solution - <classname>sda.jdbc.intro.v01.InsertPerson</classname> contains an - ugly mix of GUI components and database related code. We take a - first step to decouple these two distinct concerns:</para> - - <qandaset role="exercise" xml:id="exercicseGuiStateful"> - <title>Handling the database layer</title> - - <qandadiv> - <qandaentry> - <question> - <para>Implement a class <code>PersistenceHandler</code> to - be later used as a component of our next step GUI - application prototype. This class should have the following - methods:</para> - - <programlisting language="java">... -/** - * Handle database communication. There are two - * distinct internal states <q>disconnected</q> and <q>connected</q>, see - * {@link #isConnected()}. These two states may be toggled by invoking - * {@link #connect()} and {@link #disconnect()} respectively. - * - * The following snippet illustrates the intended usage: - * <pre> public static void main(String[] args) { - final PersistenceHandler ph = new PersistenceHandler(); - if (ph.connect()) { - if (!ph.add("Jim", "jim@foo.com")) { - System.err.println("Insert Error:" + ph.getErrorMessage()); - } - } else { - System.err.println("Connect error:" + ph.getErrorMessage()); - } - }</pre> - * - * @author goik - */ -public class PersistenceHandler { - ... - /** - * Instance in <q>disconnected</q> state. See {@link #isConnected()} - */ - public PersistenceHandler() {/* only present here to supply Javadoc comment */} - - /** - * Inserting a (name, email) record into the database server. In case of - * errors corresponding messages may subsequently be retrieved by calling - * {@link #getErrorMessage()}. - * - * <dt><b>Precondition:</b></dt> <dd>must be in - * <q>connected</q> state, see {@link #isConnected()}</dd> - * - * @param name - * A person's name - * @param email - * A person's email address - * - * @return true if the current data record has been successfully inserted - * into the database server. false in case of error(s). - */ - public boolean add(final String name, final String email){ - ... - } - - /** - * Retrieving error messages in case a call to {@link #add(String, String)}, - * {@link #connect()}, or {@link #disconnect()} yields an error. - * - * @return the error explanation corresponding to the latest failed - * operation, null if no error yet occurred. - */ - public String getErrorMessage() { - return ...; - } - - /** - * Open a connection to a database server. - * - * <dt><b>Precondition:</b><dd> - * <dd>must be in <q>disconnected</q> state, see {@link #isConnected()}</dd> - * - * <dt><b>Precondition:</b><dd> - * <dd>The following properties must be set: - * <pre>PersistenceHandler.jdbcUrl=jdbc:mysql://localhost:3306/hdm -PersistenceHandler.password=XYZ -PersistenceHandler.username=foo</pre> - * </dd> - * - * @return true if connecting was successful - */ - public boolean connect () { - ... - } - - /** - * Close a connection to a database server and clean up JDBC related resources - * - * Error messages in case of failure may subsequently be retrieved by - * calling {@link #getErrorMessage()}. - * - * <dt><b>Precondition:</b></dt> - * <dd>must be in <q>connected</q> state, see {@link #isConnected()}</dd> - * - * @return true if disconnecting was successful, false in case error(s) occur. - */ - public boolean disconnect() { - ... - } - - /** - * An instance can either be in <q>connected</q> or <q>disconnected</q> state. The - * state can be toggled by invoking {@link #connect()} or - * {@link #disconnect()} respectively. - * - * @return true if connected, false otherwise - */ - public boolean isConnected() { - return ...; - } -}</programlisting> - - <para>Notice the two internal states - <quote>disconnected</quote> and - <quote>connected</quote>:</para> - - <figure xml:id="figPersistenceHandlerStates"> - <title>Possible states and transitions for instances of - <code>PersistenceHandler</code>.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/persistHandlerStates.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>According to the above documentation a newly created - <code>PersistenceHandler</code> instance should be in - disconnected state. As being shown in the <trademark - xlink:href="http://docs.oracle.com/javase/1.5.0/docs/guide/javadoc">Javadoc</trademark> - class description you may test your implementation without - any GUI code. If you are already familiar with unit testing - this might be a good start as well.</para> - </question> - - <answer> - <para>We show a possible implementation of - <classname>sda.jdbc.intro.v1.PersistenceHandler</classname>:</para> - - <programlisting language="java">package sda.jdbc.intro.v1; -... - -public class PersistenceHandler { - - Connection conn = null; - Statement stmt = null; - - String errorMessage = null; - - /** - * New instances are in <q>disconnected</q> state. See {@link #isConnected()} - */ - public PersistenceHandler() {/* only present here to supply Javadoc comment */} - - /** - * Inserting a (name, email) record into the database server. In case of - * errors corresponding messages may subsequently be retrieved by calling - * {@link #getErrorMessage()}. - * - * <dt><b>Precondition:</b></dt> <dd>must be in - * <q>connected</q> state, see {@link #isConnected()}</dd> - * - * @param name - * A person's name - * @param email - * A person's email address - * - * @return true if the current data record has been successfully inserted - * into the database server. false in case of error(s). - */ - public boolean add(final String name, final String email){ - final String sql = "INSERT INTO Person VALUES('" + name + "', '" + - email + "')"; - try { - stmt.executeUpdate(sql); - return true; - } catch (SQLException e) { - errorMessage = "Unable to execute '" + sql + "': '" + e.getMessage() + "'"; - return false; - } - } - - /** - * Retrieving error messages in case a call to {@link #add(String, String)}, - * {@link #connect()}, or {@link #disconnect()} yields an error. - * - * @return the error explanation corresponding to the latest failed - * operation, null if no error yet occurred. - */ - public String getErrorMessage() { - return errorMessage; - } - - /** - * Open a connection to a database server. - * - * <dt><b>Precondition:</b><dd> - * <dd>must be in <q>disconnected</q> state, see {@link #isConnected()}</dd> - * - * <dt><b>Precondition:</b><dd> - * <dd>The following properties must be set: - * <pre>PersistenceHandler.jdbcUrl=jdbc:mysql://localhost:3306/hdm -PersistenceHandler.password=XYZ -PersistenceHandler.username=foo</pre> - * </dd> - * - * @return true if connecting was successful - */ - public boolean connect () { - try { - conn = DriverManager.getConnection( - DbProps.getString("PersistenceHandler.jdbcUrl"), - DbProps.getString("PersistenceHandler.username"), - DbProps.getString("PersistenceHandler.password")); - try { - stmt = conn.createStatement(); - return true; - } catch (SQLException e) { - errorMessage = "Connection opened but Statement creation failed:\"" + e.getMessage() + "\"."; - try { - conn.close(); - } catch (SQLException ee) { - errorMessage += "Closing connection failed:\"" + e.getMessage() + "\"."; - } - conn = null; - } - - } catch (SQLException e) { - errorMessage = "Unable to open connection:\"" + e.getMessage() + "\"."; - } - return false; - } - - /** - * Close a connection to a database server and clean up JDBC related resources - * - * Error messages in case of failure may subsequently be retrieved by - * calling {@link #getErrorMessage()}. - * - * <dt><b>Precondition:</b></dt> - * <dd>must be in <q>connected</q> state, see {@link #isConnected()}</dd> - * - * @return true if disconnecting was successful, false in case error(s) occur. - */ - public boolean disconnect() { - boolean resultStatus = true; - final StringBuffer messageCollector = new StringBuffer(); - try { - stmt.close(); - } catch (SQLException e) { - resultStatus = false; - messageCollector.append("Unable to close Statement:\"" + e.getMessage() + "\"."); - } - stmt = null; - try { - conn.close(); - } catch (SQLException e) { - resultStatus = false; - messageCollector.append("Unable to close connection:\"" + e.getMessage() + "\"."); - } - conn = null; - if (!resultStatus) { - errorMessage = messageCollector.toString(); - } - return resultStatus; - } - - /** - * An instance can either be in <q>connected</q> or <q>disconnected</q> state. The - * state can be toggled by invoking {@link #connect()} or - * {@link #disconnect()} respectively. - * - * @return true if connected, false otherwise - */ - public boolean isConnected() { - return null != conn; - } -}</programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <para>We may now complete the next enhancement step of our GUI - database client.</para> - - <qandaset role="exercise"> - <title>Connection on user action</title> - - <qandadiv> - <qandaentry xml:id="exerciseGuiWriteTakeTwo"> - <question> - <label>An application writing records to a database - server</label> - - <para>Our aim is to enhance the first GUI prototype being - described in <xref linkend="simpleInsertGui"/>. The - application shall start being disconnected from the database - server. Prior to entering data the user shall be guided to - open a connection. The following video illustrates the - desired user interface:</para> - - <figure xml:id="figureDataInsert2"> - <title>A GUI frontend for adding personal data to a - server.</title> - - <mediaobject> - <videoobject> - <videodata fileref="Ref/Video/dataInsert.mp4"/> - </videoobject> - </mediaobject> - </figure> - - <para>In case a user closes the main window while still - being connected a disconnect from the database server shall - be enforced. For this purpose we must handle the event when - the user clicks on the closing button within the window - decoration. An exit handler method is being required to - terminate a potentially open database connection.</para> - </question> - - <answer> - <para>Our implementation uses the class - <classname>sda.jdbc.intro.v1.PersistenceHandler</classname> - for handling all database communication. The GUI needs to - visualize the two different states - <quote>disconnected</quote> and <quote>connected</quote>. In - <quote>disconnected</quote> state the whole input pane for - entering datasets and clicking the <quote>Insert</quote> - button is locked. So the user is forced to actively open a - database connection.</para> - - <para>Notice also the - <classname>java.awt.event.WindowAdapter</classname> - implementation being executed when closing the application's - main window. The <methodname>windowClosing(WindowEvent - e)</methodname> method disconnects any existing database - connection thus freeing resources.</para> - - <programlisting language="java">package sda.jdbc.intro.v1; - -import ... - -public class InsertPerson extends JFrame { - - private static final long serialVersionUID = 6815975741605247675L; - - final PersistenceHandler persistenceHandler = new PersistenceHandler(); - - final JTextField nameField = new JTextField(15), - emailField = new JTextField(20); - - final JButton toggleConnectButton = new JButton(), - insertButton = new JButton("Insert"); - - final JPanel databaseFieldPanel = new JPanel(); - - private void setGuiConnectionState(final boolean state) { - if (state) { - toggleConnectButton.setText("Disconnect"); - } else { - toggleConnectButton.setText("Connect"); - } - for (final Component c: databaseFieldPanel.getComponents()){ - c.setEnabled(state); - } - } - - public static void main(String[] args) throws SQLException { - InsertPerson app = new InsertPerson(); - app.setVisible(true); - } - - public InsertPerson (){ - super ("Add a person's data"); - - setSize(500, 500); - - addWindowListener(new WindowAdapter() { - // In case a user closes our application window while still being connected - // we have to close the database connection. - @Override - public void windowClosing(WindowEvent e) { - super.windowClosing(e); - if (persistenceHandler.isConnected() && !persistenceHandler.disconnect()) { - System.exit(1); - } else { - System.exit(0); - } - }); - Box top = Box.createHorizontalBox(); - add(top, BorderLayout.NORTH); - top.add(toggleConnectButton); - - toggleConnectButton.addActionListener(new ActionListener() { - - @Override - public void actionPerformed(ActionEvent e) { - if (persistenceHandler.isConnected()) { - if (persistenceHandler.disconnect()){ - setGuiConnectionState(false); - } else { - JOptionPane.showMessageDialog(null, persistenceHandler.getErrorMessage()); - } - } else { - if (persistenceHandler.connect()){ - setGuiConnectionState(true); - } else { - JOptionPane.showMessageDialog(null, persistenceHandler.getErrorMessage()); - } - } - } - }); - - databaseFieldPanel.setLayout(new GridLayout(0,2)); - add(databaseFieldPanel); - - databaseFieldPanel.add(new JLabel("Name:")); - databaseFieldPanel.add(nameField); - - databaseFieldPanel.add(new JLabel("E-mail:")); - databaseFieldPanel.add(emailField); - - insertButton.addActionListener(new ActionListener() { - @Override - public void actionPerformed(ActionEvent e) { - if (persistenceHandler.add(nameField.getText(), emailField.getText())) { - nameField.setText(""); - emailField.setText(""); - JOptionPane.showMessageDialog(null, "Succesfully inserted dataset"); - } else { - JOptionPane.showMessageDialog(null, persistenceHandler.getErrorMessage()); - } - } - }); - databaseFieldPanel.add(Box.createGlue()); - databaseFieldPanel.add(insertButton); - setGuiConnectionState(false); - pack(); - } -}</programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="jdbcSecurity"> - <title><trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - and security</title> - - <section xml:id="jdbcSecurityNetwork"> - <title>Network sniffing</title> - - <para>Sniffing <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - network traffic is one possibility for intruders to compromise - database applications. This requires physical access to either - of:</para> - - <itemizedlist> - <listitem> - <para>Server host</para> - </listitem> - - <listitem> - <para>Client host</para> - </listitem> - - <listitem> - <para>intermediate hub, switch or router.</para> - </listitem> - </itemizedlist> - - <figure xml:id="figJdbcSniffing"> - <title>Sniffing a <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - connection by an intruder.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/jdbcSniffing.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>We demonstrate a possible attack by analyzing the network - traffic between our application shown in <xref - linkend="figJdbcSimpleWrite"/> and the <productname - xlink:href="http://www.mysql.com">Mysql</productname> database - server. Prior to starting the application we set up <productname - xlink:href="http://www.wireshark.org">Wireshark</productname> for - filtered capturing:</para> - - <itemizedlist> - <listitem> - <para>Connecting to the <varname>loopback</varname> (lo) - interface only. This is sufficient since our client connects - to <varname>localhost</varname>.</para> - </listitem> - - <listitem> - <para>Filtering packets if not of type <acronym - xlink:href="http://en.wikipedia.org/wiki/Transmission_Control_Protocol">TCP</acronym> - and having port number 3306</para> - </listitem> - </itemizedlist> - - <para>This yields the following capture being shortened for the - sake of brevity:</para> - - <programlisting>[... -5.5.24-0ubuntu0.12.04.1.%...X*e?I1ZQ...................e,F[yoA5$T[N.mysql_native_password. - A...........!.......................hdmuser <co xml:id="tcpCaptureUsername"/>......U.>S.%..~h...!.xhdm............j..../* - - ... INSERT INTO Person VALUES('Jim', 'jim@foo.org') <co - xml:id="tcpCaptureSqlInsert"/>6... - .&.#23000Duplicate entry 'jim@foo.org' for key 'email' <co - xml:id="tcpCaptureErrmsg"/></programlisting> - - <calloutlist> - <callout arearefs="tcpCaptureUsername"> - <para>The <varname>username</varname> initiating the - connection to the database server.</para> - </callout> - - <callout arearefs="tcpCaptureSqlInsert"> - <para>The <code>INSERT ...</code> statement.</para> - </callout> - - <callout arearefs="tcpCaptureErrmsg"> - <para>The resulting error message being sent back to the - client.</para> - </callout> - </calloutlist> - - <para>Something seems to be missing here: The user's password. Our - code in <xref linkend="figJdbcSimpleWrite"/> contains the password - <quote><varname>XYZ</varname></quote> in clear text. But even - using the search function of <productname - xlink:href="http://www.wireshark.org">Wireshark</productname> does - not show any such string within the above capture. The - <productname xlink:href="http://www.mysql.com">Mysql</productname> - documentation however <link - xlink:href="http://dev.mysql.com/doc/refman/5.0/en/security-against-attack.html">reveals</link> - that everything but the password is transmitted in clear text. So - all we might identify is a hash of <code>XYZ</code>.</para> - - <para>So regarding our (current) <productname - xlink:href="http://www.mysql.com">Mysql</productname> - implementation the impact of this attack type is somewhat limited - but still severe: All data being transmitted between client and - server may be disclosed. This typically comprises sensible data as - well. Possible solutions:</para> - - <itemizedlist> - <listitem> - <para>Create an encrypted tunnel between client and server - like e.g. <link - xlink:href="http://www.debianadmin.com/howto-use-ssh-local-and-remote-port-forwarding.html">ssh - port forwarding</link> or <link - xlink:href="http://de.wikipedia.org/wiki/Virtual_Private_Network">VPN</link>.</para> - </listitem> - - <listitem> - <para>Many database vendors <link - xlink:href="http://dev.mysql.com/doc/refman/5.1/de/connector-j-reference-using-ssl.html">supply - SSL</link> or similar <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - protocol encryption extensions. This requires additional - configuration procedures like setting up server side - certificates. Moreover similar to the http/https protocols - encryption generally slows down data traffic.</para> - </listitem> - </itemizedlist> - - <para>Of course this is only relevant if the transport layer is - considered to be insecure. If both server and client reside within - the same trusted infrastructure no action has to be taken. We also - note that this kind of problem is not limited to <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>. - In fact all protocols lacking encryption are subject to this type - of attack.</para> - </section> - - <section xml:id="sqlInjection"> - <title>SQL injection</title> - - <para>Before diving into technical details we shed some light on - the possible impact of this common attack type being described in - this chapter. Our example is the well known Heartland Payment - Systems data breach:</para> - - <figure xml:id="figHeartlandSecurityBreach"> - <title>Summary about possible SQL injection impact based on the - Heartland security breach</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/heartland.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>Why should we be concerned with SQL injection? In the - introduction of <xref linkend="bibClarke09"/> a compelling - argument is being given:</para> - - <blockquote> - <para>Many people say they know what SQL injection is, but all - they have heard about or experienced are trivial examples. SQL - injection is one of the most devastating vulnerabilities to - impact a business, as it can lead to exposure of all of the - sensitive information stored in an application's database, - including handy information such as usernames, passwords, names, - addresses, phone numbers, and credit card details.</para> - </blockquote> - - <para>In this lecture due to limited resources we only deal with - trivial examples mentioned above. One possible way SQL injection - attacks work is by inserting SQL code into fields being designed - for end user input:</para> - - <figure xml:id="figSqlInject"> - <title>SQL injection triggered by ordinary user input.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/sqlinject.fig"/> - </imageobject> - </mediaobject> - </figure> - - <qandaset role="exercise"> - <title>Attack from the dark side</title> - - <qandadiv> - <qandaentry xml:id="sqlInjectDropTable"> - <question> - <para>Use the application from <xref - linkend="exerciseGuiWriteTakeTwo"/> and <xref - linkend="figSqlInject"/> to launch a SQL injection attack. - We provide some hints:</para> - - <orderedlist> - <listitem> - <para>The <productname - xlink:href="http://www.mysql.com">Mysql</productname> - <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - driver implementation already provides precautions to - hamper SQL injection attacks. In its default - configuration a sequence of SQL commands separated by - semicolons (<quote>;</quote>) will not be executed but - flagged as a SQL syntax error. We take an - example:</para> - - <programlisting>INSERT INTO Person VALUES (...);DROP TABLE Person</programlisting> - - <para>In order to execute these so called multi user - queries we have to enable a <productname - xlink:href="http://www.mysql.com">Mysql</productname> - property. This may be achieved by extending our - <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - URL:</para> - - <programlisting>jdbc:mysql://localhost:3306/hdm?<emphasis - role="bold">allowMultiQueries=true</emphasis></programlisting> - - <para>In the <productname - xlink:href="http://www.mysql.com">Mysql</productname> - manual you may <link - xlink:href="http://dev.mysql.com/doc/refman/5.1/en/connector-j-reference-configuration-properties.html">find - </link>a remark regarding this parameter:</para> - - <remark>Notice that this has the potential for SQL - injection if using plain java.sql.Statements and your - code doesn't sanitize input correctly.</remark> - - <para>In other words: You have been warned!</para> - </listitem> - - <listitem> - <para>You may now use either of the two input fields - <quote>name</quote> or <quote>email</quote> to inject - arbitrary SQL code.</para> - </listitem> - </orderedlist> - </question> - - <answer> - <para>We construct a suitable string being injected to - drop our <code>Person</code> table:</para> - - <programlisting>Jim', 'jim@c.com');DROP TABLE Person;INSERT INTO Person VALUES('Joe</programlisting> - - <para>This being entered into the name field kills our - <code>Table</code> relation effectively. As the error - message shows two INSERT statements are separated by a - DROP TABLE statement. So after executing the first INSERT - our database server drops the whole table. At last the - second INSERT statement fails giving rise to an error - message no end user will ever understand:</para> - - <figure xml:id="figSqlInjectDropPerson"> - <title>Dropping the <code>Person</code> table by SQL - injection</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/sqlInject.screen.png"/> - </imageobject> - </mediaobject> - </figure> - - <para>According to the message text the table - <code>Person</code> gets dropped as expected. Thus the - subsequent (second) <code>INSERT</code> action is bound to - fail.</para> - - <para>In practice this result my be avoided. The database - user will (hopefully!) not have sufficient permissions to - drop the whole table. But malicious modifications by - INSERT, UPDATE or DELETE statements are still within - range.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="sanitizeUserInput"> - <title>Sanitizing user input</title> - - <para>There are at least two general ways to deal with the - disastrous result of <xref linkend="sqlInjectDropTable"/>:</para> - - <itemizedlist> - <listitem> - <para>Keep the database server from interpreting user input - completely. This is probably the best way and will be - discussed in <xref linkend="sectPreparedStatements"/>.</para> - </listitem> - - <listitem> - <para>Let the application check and process user input. - Dangerous user input may be modified prior to being embedded - in SQL statements or being rejected completely.</para> - </listitem> - </itemizedlist> - - <para>The first method is definitely superior in most cases. There - are however cases where the restrictions being implied are too - severe. We may for example choose dynamically which tables shall - be accessed. So an SQL statement's structure rather than just its - predicates is affected by user input. There are at least two - standard procedures dealing with this problem:</para> - - <glosslist> - <glossentry> - <glossterm>Input Filtering</glossterm> - - <glossdef> - <para>In the simplest case we check a user's input by - regular expressions. An example is an input field in a login - window representing a system user name. Legal input may - allows letters and digits only. Special characters, - whitespace etc. are typically prohibited. The input does - have a minimum length of one character. A maximum length may - be imposed as well. So we may choose the regular expression - <code>[A-Za-z0-9]+</code> to check valid user names.</para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm><foreignphrase>Whitelisting</foreignphrase></glossterm> - - <glossdef> - <para>In many cases Input fields only allow a restricted set - of values. Consider an input field for names of planets. An - application may keep a dictionary table to validate user - input:</para> - - <informaltable border="1"> - <col width="10%"/> - - <col width="5%"/> - - <tr> - <td>Mercury</td> - - <td>1</td> - </tr> - - <tr> - <td>Venus</td> - - <td>2</td> - </tr> - - <tr> - <td>Earth</td> - - <td>3</td> - </tr> - - <tr> - <td>...</td> - - <td>...</td> - </tr> - - <tr> - <td>Neptune</td> - - <td>9</td> - </tr> - - <tr> - <td><emphasis role="bold">Default:</emphasis></td> - - <td><emphasis role="bold">0</emphasis></td> - </tr> - </informaltable> - - <para>So if a user enters a valid planet name a - corresponding number representing this particular planet - will be sent to the database. If the user enters an invalid - string an error message may be raised.</para> - - <para>In a GUI in many situations this may be better - accomplished by presenting the list of planets to choose - from. In this case a user has no chance to enter invalid or - even malicious code.</para> - </glossdef> - </glossentry> - </glosslist> - - <para>So we have an <quote>interceptor</quote> sitting between - user input fields and SQL generating code:</para> - - <figure xml:id="figInputFiltering"> - <title>Validating user input prior to dynamically composing SQL - statements.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/filtering.fig"/> - </imageobject> - </mediaobject> - </figure> - - <qandaset role="exercise"> - <title>Using regular expressions in <trademark - xlink:href="http://www.oracle.com/us/technologies/java">Java</trademark></title> - - <qandadiv> - <qandaentry> - <question> - <para>This exercise is a preparation for <xref - linkend="exercisefilterUserInput"/>. The aim is to deal - with regular expressions and to use them in <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark>. - If you don't know yet about regular expressions / pattern - matching you may want to read either of:</para> - - <itemizedlist> - <listitem> - <para><link - xlink:href="http://www.aivosto.com/vbtips/regex.html">Regular - expressions - An introduction</link></para> - </listitem> - - <listitem> - <para><link - xlink:href="http://www.codeproject.com/Articles/939/An-Introduction-to-Regular-Expressions">An - Introduction to Regular Expressions</link></para> - </listitem> - - <listitem> - <para><link - xlink:href="http://www.regular-expressions.info/tutorial.html">Regular - Expression Tutorial</link></para> - </listitem> - </itemizedlist> - - <para>Complete the implementation of the following - skeleton:</para> - - <programlisting language="java">package ... - -import java.util.regex.Matcher; -import java.util.regex.Pattern; -... -public static void main(String[] args) { - final String [] wordList = new String [] {"Eric", "126653BBb", "_login","some text"}; - final String [] regexpList = new String[] {"[A-K].*", "[^0-9]+.*", "_[a-z]+", ""}; - - for (final String word: wordList) { - for (final String regexp: regexpList) { - testMatch(word, regexp); - } - } -} - -/** - * Matching a given word by a regular expression. A log message is being - * written to stdout. - * - * Hint: The implementation is based on the explanation being given in the - * introduction to {@link Pattern} - * - * @param word This string will be matched by the subsequent argument. - * @param regexp The regular expression tested to match the previous argument. - * @return true if regexp matches word, false otherwise. - */ -public static boolean testMatch(final String word, final String regexp) { -.../* to be implemented by <emphasis role="bold">**YOU**</emphasis> */ -}</programlisting> - - <para>As being noted in the <trademark - xlink:href="http://docs.oracle.com/javase/1.5.0/docs/guide/javadoc">Javadoc</trademark> - above you may want to read the documentation of class - <classname>java.util.regex.Pattern</classname>. The - intended output of the above application is:</para> - - <programlisting>The expression '[A-K].*' matches 'Eric' -The expression '[^0-9]+.*' ... -...</programlisting> - </question> - - <answer> - <para>A possible implementation is given at - <classname>sda.regexp.RegexpPrimer</classname>.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <qandaset role="exercise"> - <title>Input validation by regular expressions</title> - - <qandadiv> - <qandaentry xml:id="exercisefilterUserInput"> - <question> - <para>The application of <xref - linkend="sqlInjectDropTable"/> proved to be vulnerable to - SQL injection. Sanitize the two user input field's values - to prevent such behaviour.</para> - - <itemizedlist> - <listitem> - <para>Find appropriate regular expressions to check - both username and email. Some hints:</para> - - <glosslist> - <glossentry> - <glossterm>username</glossterm> - - <glossdef> - <para>Regarding SQL injection the <quote>;</quote> - character is among the most critical. You may want - to exclude certain special characters. This doesn't - harm since their presence in a user's name is likely - to be a typo rather then any sensitive input.</para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm>email</glossterm> - - <glossdef> - <para>There are tons of <quote>ultimate</quote> - regular expressions available to check email - addresses. Remember that rather avoiding - <quote>wrong</quote> email addresses the present - task is to avoid SQL injection. So find a reasonable - one which may be too permissive regarding RFC email - syntax rules but sufficient to secure your - application.</para> - - <para>A concise definition of an email's syntax is - being given in <link - xlink:href="http://tools.ietf.org/html/rfc5322#section-3.4.1">RFC5322</link>. - Its implementation is beyond scope of the current - lecture. Moreover it is questionable whether E-mail - clients and mail transfer agents implement strict - RFC compliance.</para> - </glossdef> - </glossentry> - </glosslist> - - <para>Both regular expressions must cover the whole - user input from the beginning to the end. This can be - achieved by using <code>^ ... $</code>.</para> - </listitem> - - <listitem> - <para>The <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - standard class - <classname>javax.swing.InputVerifier</classname> may - help you validating user input.</para> - </listitem> - - <listitem> - <para>The following screenshot may provide an idea for - GUI realization and user interaction in case of - errors. Of course the submit button's action should be - disabled in case of erroneous input. The user should - receive a helpful error message instead.</para> - - <figure xml:id="figInsertValidate"> - <title>Error message being presented to the - user.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/insertValidate.screen.png"/> - </imageobject> - </mediaobject> - - <caption> - <para>In the current example the trailing - <quote>;</quote> within the E-Mail field is - invalid.</para> - </caption> - </figure> - </listitem> - </itemizedlist> - </question> - - <answer> - <para>Extending - <classname>javax.swing.InputVerifier</classname> allows us - to build a generic class to filter user text input by - arbitrary regular expressions:</para> - - <programlisting language="java">package sda.jdbc.intro.v1.sanitize; - -import java.util.regex.Pattern; - -import javax.swing.InputVerifier; -import javax.swing.JComponent; -import javax.swing.JLabel; -import javax.swing.JTextField; - -/** - * We may check user input from text fields in a generic way. - * defined by regular expressions. In addition error messages are forwarded - * automatically to respective GUI elements. - * - * @author goik - */ -public class RegexpVerifier extends InputVerifier { - - final Pattern syntaxPattern; - final JLabel validationLabel; - private boolean inputValid = false; - private final String errMsg; - - /** - * For each input field to be validated we need the - * following parameters: - - * @param regex The regular expression defining allowed textual content. The string - * "^[^ ]*$" for example allows arbitrary text not containing any spaces. - * @param validationLabel This is where error messages in case of rexex violation go. - * @param errMsg This text will be presented to the user in case of a validation error. - */ - public RegexpVerifier (final String regex, final JLabel validationLabel, final String errMsg) { - this.validationLabel = validationLabel; - this.errMsg = errMsg; - syntaxPattern = Pattern.compile(regex); - } - - @Override - public boolean verify(JComponent input) { - if (input instanceof JTextField) { - final String userInput = ((JTextField) input).getText(); - if (syntaxPattern.matcher(userInput).find()) { - validationLabel.setText(""); - inputValid = true; - } else { - validationLabel.setText(errMsg); - inputValid = false; - } - } - return inputValid; - } - - /** - * @return the current validation state. - */ - public boolean inputIsValid () { - return inputValid; - } -}</programlisting> - - <para>Instances of - <classname>sda.jdbc.intro.v1.sanitize.RegexpVerifier</classname> - may now be used to validate our two input data fields. We - put emphasis on the changes with respect to - <classname>sda.jdbc.intro.v1.InsertPerson</classname>:</para> - - <programlisting language="java">package sda.jdbc.intro.v1.sanitize; - -... - -public class InsertPerson extends JFrame { - - final JTextField nameField = new JTextField(15); - <emphasis role="bold">final JLabel nameFieldValidationLabel = new JLabel(); - final RegexpVerifier nameFieldVerifier = new RegexpVerifier( - "^[^;'\"]+$", - nameFieldValidationLabel, - "No special characters");</emphasis> - - final JTextField emailField = new JTextField(20); - <emphasis role="bold">final JLabel emailFieldValidationLabel = new JLabel(); - final RegexpVerifier emailFieldVerifier = - new RegexpVerifier("^[\\w\\-\\.\\_]+@[\\w\\-\\.]*[a-zA-Z]{2,4}$", - emailFieldValidationLabel, - "email not valid");</emphasis> - -... - public static void main(String[] args) throws SQLException { - InsertPerson app = new InsertPerson(); - app.setVisible(true); - } - - public InsertPerson (){ - -... - databaseFieldPanel.setLayout(new GridLayout<emphasis role="bold">(0, 3)</emphasis>);//Third column for validation label - add(databaseFieldPanel); - - databaseFieldPanel.add(new JLabel("Name:")); - databaseFieldPanel.add(nameField); - <emphasis role="bold">nameFieldValidationLabel.setForeground(Color.RED); - databaseFieldPanel.add(nameFieldValidationLabel); - nameField.setInputVerifier(nameFieldVerifier);</emphasis> - - databaseFieldPanel.add(new JLabel("E-mail:")); - databaseFieldPanel.add(emailField); - <emphasis role="bold">databaseFieldPanel.add(emailFieldValidationLabel); - emailFieldValidationLabel.setForeground(Color.RED); - emailField.setInputVerifier(emailFieldVerifier);</emphasis> - - insertButton.addActionListener(new ActionListener() { - @Override - public void actionPerformed(ActionEvent e) { - <emphasis role="bold">if (!nameFieldVerifier.inputIsValid() || !emailFieldVerifier.inputIsValid()) { - JOptionPane.showMessageDialog(null, "Invalid input value(s)"); - }</emphasis> else { -...</programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - - <section xml:id="sectPreparedStatements"> - <title><classname>java.sql.PreparedStatement</classname> - objects</title> - - <para>Sanitizing user input is an essential means to secure an - application. The <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - standard however provides a mechanism being superior regarding the - purpose of protecting applications against SQL injection attacks. - We shed some light on our current mechanism sending SQL statements - to a database server:</para> - - <figure xml:id="sqlTransport"> - <title>SQL statements in <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - applications get parsed at the database server</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/sqlTransport.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>This architecture raises two questions:</para> - - <orderedlist> - <listitem> - <para>What happens in case identical SQL statements are - executed repeatedly? This may happen inside a loop when - thousands of records with identical structure are being sent - to a database.</para> - </listitem> - - <listitem> - <para>Is this architecture adequate with respect to security - concerns?</para> - </listitem> - </orderedlist> - - <para>The first question is related to performance: Parsing - statements being identical despite the properties being contained - within is a waste of resources. We consider the transfer of - records between different databases:</para> - - <programlisting>INSERT INTO Person VALUES ('Jim', 'jim@q.org') -INSERT INTO Person VALUES ('Eve', 'eve@y.org') -INSERT INTO Person VALUES ('Pete', 'p@rr.com') -...</programlisting> - - <para>In this case it does not make sense to repeatedly parse - identical SQL statements. Using single <code>INSERT</code> - statements with multiple data records may not be an option when - the number of records grows.</para> - - <para>The second question is related to our current security - topic: The database server's interpreter my be so - <quote>kind</quote> to interpret an attacker's malicious code as - well.</para> - - <para>Both topics are being addressed by - <classname>java.sql.PreparedStatement</classname> objects. - Basically these objects allow for separation of an SQL statements - structure from parameter values contained within. The scenario - given in <xref linkend="sqlTransport"/> may be implemented - as:</para> - - <figure xml:id="sqlTransportPrepare"> - <title>Using <classname>java.sql.PreparedStatement</classname> - objects.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/sqlTransportPrepare.fig"/> - </imageobject> - </mediaobject> - </figure> ->>>>>>> d6323b6ef90b1907e7f809b7765f28762f5ea52d - - <para>Prepared statements are an example for parameterized SQL - statements which exist in various programming languages. When - using <classname>java.sql.PreparedStatement</classname> instances - we actually have three distinct phases:</para> - - <orderedlist> - <listitem> - <para xml:id="exerciseGuiWritePrepared">Creating an instance - of <classname>java.sql.PreparedStatement</classname>. The SQL - statement possibly containing place holders gets - parsed.</para> - </listitem> - - <listitem> - <para>Setting all placeholder values. This does not involve - any further SQL syntax parsing.</para> - </listitem> - - <listitem> - <para>Execute the statement.</para> - </listitem> - </orderedlist> - - <para>Steps 2. and 3. may be repeated as often as desired without - any re-parsing of SQL statements thus saving resources on the - database server side.</para> - - <para>Our introductory toy application <xref - linkend="figJdbcSimpleWrite"/> may be rewritten using - <classname>java.sql.PreparedStatement</classname> objects:</para> - - <programlisting language="java">sda.jdbc.intro.v1; -... -public class SimpleInsert { - - public static void main(String[] args) throws SQLException { - - final Connection conn = DriverManager.getConnection (... - - // Step 2: Create a PreparedStatement instance - final PreparedStatement pStmt = conn.prepareStatement("INSERT INTO Person VALUES(<emphasis - role="bold">?, ?</emphasis>)");<co xml:id="listPrepCreate"/> - - // Step 3a: Fill in desired attribute values - pStmt.setString(1, "Jim");<co xml:id="listPrepSet1"/> - pStmt.setString(2, "jim@foo.org");<co xml:id="listPrepSet2"/> - - // Step 3b: Execute the desired INSERT - final int updateCount = pStmt.executeUpdate();<co xml:id="listPrepExec"/> - - // Step 4: Give feedback to the enduser - System.out.println("Successfully inserted " + updateCount + " dataset(s)"); - } -}</programlisting> - - <calloutlist> - <callout arearefs="listPrepCreate"> - <para>An instance of - <classname>java.sql.PreparedStatement</classname> is being - created. Notice the two question marks representing two place - holders for string values to be inserted in the next - step.</para> - </callout> - - <callout arearefs="listPrepSet1 listPrepSet2"> - <para>Fill in the two placeholder values being defined at - <coref linkend="listPrepCreate"/>.</para> - - <caution> - <para>Since half the world of programming folks will index a - list of n elements starting from 0 to n-1, <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - apparently counts from 1 to n. Working with <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - would have been too easy otherwise.</para> - </caution> - </callout> - - <callout arearefs="listPrepExec"> - <para>Execute the beast! Notice the empty parameter list. No - SQL is required since we already prepared it in <coref - linkend="listPrepCreate"/>.</para> - </callout> - </calloutlist> - - <para>The problem of SQL injection disappears completely when - using <classname>java.sql.PreparedStatement</classname> instances. - An attacker may safely enter offending strings like:</para> - - <programlisting>Jim', 'jim@c.com');DROP TABLE Person;INSERT INTO Person VALUES('Joe</programlisting> - - <para>The above string will be taken <quote>as is</quote> and thus - simply becomes part of the database server's content.</para> - - <qandaset role="exercise"> - <title>Prepared Statements to keep the barbarians at the - gate</title> - - <qandadiv> - <qandaentry xml:id="exerciseSqlInjectPrepare"> - <question> - <para>In <xref linkend="sqlInjectDropTable"/> we found our - implementation in <xref - linkend="exerciseGuiWriteTakeTwo"/> to be vulnerable with - respect to SQL injection. Rather than sanitizing user - input you shall use - <classname>java.sql.PreparedStatement</classname> objects - to secure the application.</para> - </question> - - <answer> - <para>Due to our separation of GUI and persistence - handling we only need to re-implement - <classname>sda.jdbc.intro.sqlinject.PersistenceHandler</classname>. - We have to replace - <classname>java.sql.Statement</classname> by - <classname>java.sql.PreparedStatement</classname> - instances. A possible implementation is - <classname>sda.jdbc.intro.v1.prepare.PersistenceHandler</classname>. - We may now safely enter offending strings like:</para> - - <programlisting>Jim', 'jim@c.com');DROP TABLE Person;INSERT INTO Person VALUES('Joe</programlisting> - - <para>This time the input value is taken <quote>as - is</quote> and yields the following error message:</para> - - <informalfigure> - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/sqlInjectPrepare.screen.png"/> - </imageobject> - </mediaobject> - </informalfigure> - - <para>The offending string exceeds the length of the - attribute <code>name</code> within the database table - <code>Person</code>. We may enlarge this value to allow - the <code>INSERT</code> operation:</para> - - <programlisting>CREATE TABLE Person ( - name char(<emphasis role="bold">80</emphasis>) <emphasis role="bold">-- a little bit longer --</emphasis> - ,email CHAR(20) UNIQUE -);</programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <para>We may have followed the track of test-driven development. - In that case we would have written tests before actually - implementing our application. In the current lecture we will do - this the other way round in the following exercise. The idea is to - assure software quality when fixing bugs or extending an - application.</para> - - <para>The subsequent exercise requires the <productname - xlink:href="http://testng.org/doc/eclipse.html#eclipse-installation">TestNG</productname> - plugin for Eclipse to be installed. This should already be the - case both in the MI exercise classrooms and in the Virtualbox - image provided at <uri - xlink:href="ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.vdi">ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.vdi</uri>. - If you use a private Eclipse installation you may want to follow - <xref linkend="testngInstall"/>.</para> - - <qandaset role="exercise"> - <title>Testing - <classname>sda.jdbc.intro.v1.PersistenceHandler</classname> - using <productname - xlink:href="http://testng.org">TestNG</productname></title> - - <qandadiv> - <qandaentry> - <question> - <para>Read <xref linkend="chapUnitTesting"/>. Then - test:</para> - - <itemizedlist> - <listitem> - <para>Proper behaviour when opening and closing - connections.</para> - </listitem> - - <listitem> - <para>Proper behavior when inserting data</para> - </listitem> - - <listitem> - <para>Expected behaviour when entering duplicate - values violating integrity constraints. Look for error - messages as well.</para> - </listitem> - </itemizedlist> - - <para>You may write code to initialize the database state - appropriately prior to start tests.</para> - </question> - - <answer> - <para><productname - xlink:href="http://testng.org">TestNG</productname> may be - directed by - <classname>sda.jdbc.intro.v1.prepare.PersistenceHandlerTest</classname>.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - </section> - </section> - - <section xml:id="jdbcRead"> - <title>Read Access</title> - - <para>So far we've sent records to a database server. Applications - however need both directions: Pushing data to a Server and receiving - data as well. The overall process looks like:</para> - - <figure xml:id="jdbcReadWrite"> - <title>Server / client object's life cycle</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/jdbcReadWrite.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>So far we've only covered the second (<code>UPDATE</code>) - part of this picture. Reading objects from a database server into a - client's (transient) address space requires a container object to - hold the data in question. Though <trademark - xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark> - offers standard container interfaces like - <classname>java.util.List</classname> the <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - standard has created separate specifications like - <classname>java.sql.ResultSet</classname>. Instances of - <classname>java.sql.ResultSet</classname> will hold transient copies - of (database) objects. The next figure outlines the basic - approach:</para> - - <figure xml:id="figJdbcRead"> - <title>Reading data from a database server.</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Fig/jdbcread.fig"/> - </imageobject> - </mediaobject> - </figure> - - <para>We take an example. Suppose our database contains a table of - our friends' nicknames and their respective birth dates:</para> - - <table border="1" xml:id="figRelationFriends"> - <caption>Names and birth dates of friends.</caption> - - <tr> - <td><programlisting>CREATE TABLE Friends ( - id INTEGER NOT NULL PRIMARY KEY - ,nickname char(10) - ,birthdate DATE -);</programlisting></td> - - <td><programlisting>INSERT INTO Friends VALUES - (1, 'Jim', '1991-10-10') - ,(2, 'Eve', '2003-05-24') - ,(3, 'Mick','2001-12-30') - ;</programlisting></td> - </tr> - </table> - - <para>Following the outline in <xref linkend="figJdbcRead"/> we may - access our data by:</para> - - <figure xml:id="listingJdbcRead"> - <title>Accessing relational data</title> - - <programlisting language="java">package sda.jdbc.intro; -... -public class SimpleRead { - - public static void main(String[] args) throws SQLException { - - // Step 1: Open a connection to the database server - final Connection conn = DriverManager.getConnection ( - DbProps.getString("PersistenceHandler.jdbcUrl"), - DbProps.getString("PersistenceHandler.username"), - DbProps.getString("PersistenceHandler.password")); - - // Step 2: Create a Statement instance - final Statement stmt = conn.createStatement(); - - <emphasis role="bold">// Step 3: Creating the client side JDBC container holding our data records</emphasis> - <emphasis role="bold">final ResultSet data = stmt.executeQuery("SELECT * FROM Friends");</emphasis> <co - linkends="listingJdbcRead-1" xml:id="listingJdbcRead-1-co"/> - - <emphasis role="bold">// Step 4: Dataset iteration - while (data.next()) {</emphasis> <co linkends="listingJdbcRead-2" - xml:id="listingJdbcRead-2-co"/> - <emphasis role="bold">System.out.println(data.getInt("id")</emphasis> <co - linkends="listingJdbcRead-3" xml:id="listingJdbcRead-3-co"/> - <emphasis role="bold">+ ", " + data.getString("nickname")</emphasis> <co - linkends="listingJdbcRead-3" xml:id="listingJdbcRead-4-co"/> - <emphasis role="bold">+ ", " + data.getString("birthdate"));</emphasis> <co - linkends="listingJdbcRead-3" xml:id="listingJdbcRead-5-co"/> - } - } -}</programlisting> - </figure> - - <para>The marked code segment above shows difference with respect to - our data insertion application - <classname>sda.jdbc.intro.SimpleInsert</classname>. Some remarks are - in order:</para> - - <calloutlist> - <callout arearefs="listingJdbcRead-1-co" - xml:id="listingJdbcRead-1"> - <para>As being mentioned in the introduction to this section the - <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - standard comes with its own container interface rather than - <classname>java.util.List</classname> or similar.</para> - </callout> - - <callout arearefs="listingJdbcRead-2-co" - xml:id="listingJdbcRead-2"> - <para>Calling <link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#next()">next()</link> - prior to actually accessing data on the client side is - mandatory! The <link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#next()">next()</link> - method places the internal iterator to the first element of our - dataset if not empty. Follow the link address and **read** the - documentation.</para> - </callout> - - <callout arearefs="listingJdbcRead-3-co listingJdbcRead-4-co listingJdbcRead-5-co" - xml:id="listingJdbcRead-3"> - <para>The access methods have to be chosen according to matching - types. An overview of database/<trademark - xlink:href="http://www.oracle.com/us/technologies/java">Java</trademark> - type mappings is being given in <uri - xlink:href="http://docs.oracle.com/javase/1.3/docs/guide/jdbc/getstart/mapping.html">http://docs.oracle.com/javase/1.3/docs/guide/jdbc/getstart/mapping.html</uri>.</para> - </callout> - </calloutlist> - - <qandaset role="exercise"> - <title>Getter methods and type conversion</title> - - <qandadiv> - <qandaentry> - <question> - <para>Apart from type mappings the <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - access methods like <link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getString(int)">getString()</link> - may also be used for type conversion. Modify <xref - linkend="listingJdbcRead"/> by:</para> - - <itemizedlist> - <listitem> - <para>Read the database attribute <code>id</code> by - <link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getString(java.lang.String)">getString(String)</link>.</para> - </listitem> - - <listitem> - <para>Read the database attribute nickname by <link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getInt(java.lang.String)">getInt(String)</link>.</para> - </listitem> - </itemizedlist> - - <para>What do you observe?</para> - </question> - - <answer> - <para>Modifying our iteration loop:</para> - - <programlisting>// Step 4: Dataset iteration -while (data.next()) { - System.out.println(data.<emphasis role="bold">getString</emphasis>("id") <co - linkends="jdbcReadWrongType-1" - xml:id="jdbcReadWrongType-1-co"/> - + ", " + data.<emphasis role="bold">getInt</emphasis>("nickname") <co - linkends="jdbcReadWrongType-2" - xml:id="jdbcReadWrongType-2-co"/> - + ", " + data.getString("birthdate")); -}</programlisting> - - <para>We observe:</para> - - <calloutlist> - <callout arearefs="jdbcReadWrongType-1-co" - xml:id="jdbcReadWrongType-1"> - <para>Calling <link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getString(int)">getString()</link> - for a database attribute of type INTEGER does not cause - any trouble: The value gets silently converted to a - string value.</para> - </callout> - - <callout arearefs="jdbcReadWrongType-2-co" - xml:id="jdbcReadWrongType-2"> - <para>Calling <link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getInt(java.lang.String)">getInt(String)</link> - for the database field of type CHAR yields an (expected) - Exception:</para> - </callout> - </calloutlist> - - <programlisting>Exception in thread "main" java.sql.SQLException: Invalid value for getInt() - 'Jim' - at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073) -...</programlisting> - - <para>We may however provide <quote>compatible</quote> data - records:</para> - - <programlisting>DELETE FROM Friends; -INSERT INTO Friends VALUES (1, <emphasis role="bold">'31'</emphasis>, '1991-10-10');</programlisting> - - <para>This time our application executes perfectly - well:</para> - - <programlisting>1, 31, 1991-10-10</programlisting> - - <para>Conclusion: The <trademark - xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> - driver performs a conversion from a string type to an - integer similar like the <link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/lang/Integer.html#parseInt(java.lang.String)">parseInt(String)</link> - method.</para> - - <para>The next series of exercises aims on a more powerful - implementation of our person data insertion application in - <xref linkend="exerciseInsertLoginCredentials"/>.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <qandaset role="exercise"> - <title>Handling NULL values.</title> - - <qandadiv> - <qandaentry> - <question> - <para>The attribute <code>birthday</code> in our database - table Friends allows <code>NULL</code> values:</para> - - <programlisting>INSERT INTO Friends VALUES - (1, 'Jim', '1991-10-10') - ,(2, <emphasis role="bold"> NULL</emphasis>, '2003-5-24') - ,(3, 'Mick', '2001-12-30');</programlisting> - - <para>Starting our current application yields:</para> - - <programlisting>1, Jim, 1991-10-10 -2, null, 2003-05-24 -3, Mick, 2001-12-30</programlisting> - - <para>This might be confuses with a person having the - nickname <quote>null</quote>. Instead we would like to - have:</para> - - <programlisting>1, Jim, 1991-10-10 -2, -Name unknown- , 2003-05-24 -3, Mick, 2001-12-30</programlisting> - - <para>Extend the current code of - <classname>sda.jdbc.intro.SimpleRead</classname> to produce - the above result in case of nickname <code>NULL</code> - values.</para> - - <para>Hint: Read the documentation of <link - xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#wasNull()">wasNull()</link>.</para> - </question> - - <answer> - <para>A possible implementation is being given in - <classname>sda.jdbc.intro.v1.SimpleRead</classname>.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <qandaset role="exercise"> - <title>A user authentication <quote>strategy</quote></title> - - <qandadiv> - <qandaentry xml:id="exerciseInsecureAuth"> - <question> - <para>Our current application for entering - <code>Person</code> records lacks authentication: A user - simply connects to the database using credentials being hard - coded in a properties file. A programmer suggests to - implement authentication based on the following extension of - the <code>Person</code> table:</para> - - <programlisting>CREATE TABLE Person ( - name char(80) NOT NULL - ,email CHAR(20) NOT NULL UNIQUE - ,login CHAR(10) UNIQUE -- login names must be unique -- - ,password CHAR(20) -);</programlisting> - - <para>On clicking <quote>Connect</quote> a user may enter - his login name and password, <quote>fred</quote> and - <quote>12345678</quote> in the following example:</para> - - <figure xml:id="figLogin"> - <title>Login credentials for database connection</title> - - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/login.screen.png"/> - </imageobject> - </mediaobject> - </figure> - - <para>Based on these input values the following SQL query is - being executed by a - <classname>java.sql.Statement</classname> object:</para> - - <programlisting>SELECT * FROM Person WHERE login='<emphasis - role="bold">fred</emphasis>' and password = '<emphasis - role="bold">12345678</emphasis>'</programlisting> - - <para>Since the login attribute is UNIQUE we are sure to - receive either 0 or 1 dataset. Our programmer proposes to - grant login if the query returns at least one - dataset.</para> - - <para>Discuss this implementation sketch with a colleague. - Do you think this is a sensible approach? <emphasis - role="bold">Write down</emphasis> your results.</para> - </question> - - <answer> - <para>The approach is essentially unusable due to severe - security implications. Since it is based on - <classname>java.sql.Statement</classname> rater than on - <classname>java.sql.PreparedStatement</classname> objects it - is vulnerable to SQL injection attacks. A user my enter the - following password value in the GUI:</para> - - <programlisting>sd' OR '1' = '1</programlisting> - - <para>Based on the login name <quote>fred</quote> the - following SQL string is being crafted:</para> - - <programlisting>SELECT * FROM Person WHERE login='fred' and password = 'sd' OR <emphasis - role="bold">'1' = '1'</emphasis>;</programlisting> - - <para>Since the WHERE clause's last component always - evaluates to true, all objects from the <code>Person</code> - relation are returned thus permitting login.</para> - - <para>The implementation approach suffers from a second - deficiency: The passwords are stored in clear text. If an - attacker gains access to the <code>Person</code> table he'll - immediately retrieve the passwords of all users. This - problem can be solved by storing hash values of passwords - rather than the clear text values themselves.</para> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <qandaset role="exercise" xml:id="passwordHashes"> - <title>Passwords and hash values</title> - - <qandadiv> - <qandaentry xml:id="exerciseHashTraining"> - <question> - <para>In exercise <xref linkend="exerciseInsecureAuth"/> we - discarded the idea of clear text passwords in favour of - password hashes. In order to avoid Rainbow cracking so - called salted hashes are superior. You should read <uri - xlink:href="https://www.heckrothindustries.co.uk/articles/an-introduction-to-password-hashes">https://www.heckrothindustries.co.uk/articles/an-introduction-to-password-hashes</uri> - for overview purposes. The article contains further - references on the bottom of the page.</para> - - <para>With respect to an implementation <uri - xlink:href="http://stackoverflow.com/questions/2860943/suggestions-for-library-to-hash-passwords-in-java#11038230">http://stackoverflow.com/questions/2860943/suggestions-for-library-to-hash-passwords-in-java</uri> - provides a simple example for:</para> - - <itemizedlist> - <listitem> - <para>Creating a salted hash from a given password - string.</para> - </listitem> - - <listitem> - <para>Verify if a hash string matches a given clear text - password.</para> - </listitem> - </itemizedlist> - - <para>The example uses an external library. On <productname - xlink:href="http://www.ubuntu.com">Ubuntu</productname> - Linux this may be installed by issuing - <command>aptitude</command> <option>install</option> - <option>libcommons-codec-java</option>. On successful - install the file - <filename>/usr/share/java/commons-codec-1.5.jar</filename> - may be appended to your <envar>CLASSPATH</envar>.</para> - - <para>You may as well use <uri - xlink:href="http://crackstation.net/hashing-security.htm#javasourcecode">http://crackstation.net/hashing-security.htm#javasourcecode</uri> - as a starting point. This example works standalone without - needing an external library. Note: Tis example produces - different (incompatible) hash values.</para> - - <para>Create a simple main() method to experiment with the - two class methods.</para> - </question> - - <answer> - <para>Starting from <uri - xlink:href="http://stackoverflow.com/questions/2860943/suggestions-for-library-to-hash-passwords-in-java#11038230">http://stackoverflow.com/questions/2860943/suggestions-for-library-to-hash-passwords-in-java</uri> - we create a slightly modified class - <classname>sda.jdbc.intro.auth.HashProvider</classname>:</para> - - <programlisting language="java">package sda.jdbc.intro.auth; - -... - -/** - * Credits to http://stackoverflow.com/users/90998/martin-konicek, answer #12 in - * http://stackoverflow.com/questions/2860943/suggestions-for-library-to-hash-passwords-in-java - * - * Subsequent code slightly modified from original. - * - */ -public class HashProvider { - // The higher the number of iterations the more - // expensive computing the hash is for us - // and also for a brute force attack. - private static final int iterations = 10*1024; - private static final int saltLen = 32; - private static final int desiredKeyLen = 256; - - /** Computes a salted PBKDF2 hash of given plaintext password - suitable for storing in a database. */ - public static String getSaltedHash(char [] password) { - byte[] salt; - try { - salt = SecureRandom.getInstance("SHA1PRNG").generateSeed(saltLen); - // store the salt with the password - return Base64.encodeBase64String(salt) + "$" + hash(password, salt); - } catch (NoSuchAlgorithmException e) { - e.printStackTrace(); - } - System.exit(1); - return null; - } - - /** Checks whether given plaintext password corresponds - to a stored salted hash of the password. */ - public static boolean check(char[] password, String stored){ - String[] saltAndPass = stored.split("\\$"); - if (saltAndPass.length != 2) - return false; - String hashOfInput = hash(password, Base64.decodeBase64(saltAndPass[0])); - return hashOfInput.equals(saltAndPass[1]); - } - - // using PBKDF2 from Sun, an alternative is https://github.com/wg/scrypt - // cf. http://www.unlimitednovelty.com/2012/03/dont-use-bcrypt.html - private static String hash(char [] password, byte[] salt) { - SecretKeyFactory f; - try { - f = SecretKeyFactory.getInstance("PBKDF2WithHmacSHA1"); - SecretKey key = f.generateSecret(new PBEKeySpec( - password, salt, iterations, desiredKeyLen) - ); - return Base64.encodeBase64String(key.getEncoded()); - } catch (NoSuchAlgorithmException e) { - e.printStackTrace(); - } catch (InvalidKeySpecException e) { - e.printStackTrace(); - } - System.exit(1); - return null; - } -}</programlisting> - - <para>Now we may test the two class methods - <methodname>getSaltedHash</methodname>(...) and - <methodname>check(...)</methodname> by a separate driver - class:</para> - - <programlisting language="java">package sda.jdbc.intro.auth; - - -public class TestHashProvider { - - public static void main(String [] args) throws Exception { - final char [] clearText = {'s', 'e', 'c'}; - final String hash = <emphasis role="bold">HashProvider.getSaltedHash(clearText)</emphasis>; - System.out.println("Hash:" + hash); - if (<emphasis role="bold">HashProvider.check(clearText, - "HwX2DkuYiwp7xogm3AGndza8DKRVvCMntxRvCrCGFPw=$6Ix11yHNB4uPZuF2IQYxVV/MYragJwTDE33OIFR9a24=")</emphasis>) { - System.out.println("hash matches"); - } else { - System.out.println("hash does not match"); - } - } -}</programlisting> - </answer> - </qandaentry> - </qandadiv> - </qandaset> - - <qandaset role="exercise"> - <title>Gui authentication: The real McCoy</title> - - <qandadiv> - <qandaentry xml:id="exerciseInsertLoginCredentials"> - <question> - <para>We now implement a refined version to enter - <code>Person</code> records based on the solutions of two - related exercises:</para> - - <glosslist> - <glossentry> - <glossterm><xref - linkend="exercisefilterUserInput"/></glossterm> - - <glossdef> - <para>Avoiding SQL injection by sanitizing user - input</para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm><xref - linkend="exerciseSqlInjectPrepare"/></glossterm> - - <glossdef> - <para>Avoiding SQL injection by using - <classname>java.sql.PreparedStatement</classname> - objects.</para> - </glossdef> - </glossentry> - </glosslist> - - <para>A better solution should combine both techniques. - Non-vulnerability a basic requirement. Checking an E-Mail - for minimal conformance is an added value.</para> - - <para>In order to address authentication the relation Person - has to be extended appropriately. The GUI needs two - additional fields for login name and password as well. The - following video demonstrates the intended behaviour:</para> - - <figure xml:id="videoConnectAuth"> - <title>Intended usage behaviour for insertion of data - records.</title> - - <mediaobject> - <videoobject> - <videodata fileref="Ref/Video/connectauth.mp4"/> - </videoobject> - </mediaobject> - </figure> - - <para>Don't forget to use password hashes like those from - <xref linkend="exerciseHashTraining"/>. Due to their length - you may want to consider the data type - <code>TEXT</code>.</para> - </question> - - <answer> - <para>In comparison to earlier versions it does make sense - to add some internal container structures. First we note, - that each GUI input field requires:</para> - - <itemizedlist> - <listitem> - <para>A label like <quote>Enter password</quote>.</para> - </listitem> - - <listitem> - <para>A corresponding field object to hold user entered - input.</para> - </listitem> - - <listitem> - <para>A validator checking for correctness of entered - data.</para> - </listitem> - - <listitem> - <para>A label or text field for warning messages in case - of invalid user input.</para> - </listitem> - </itemizedlist> - - <para>First we start by grouping label, input field's - verifier and the error message label in - <classname>sda.jdbc.intro.auth.UserInputUnit</classname>:</para> - - <programlisting>package sda.jdbc.intro.auth; -... - /** - * For each field in a GUI we typically need the following items: - * <ul> - * <li>A label e.g. "Enter Password:"</li> - * <li>An input component like {@link javax.swing.JTextField}</li> - * <li>A validation method or object like {@link InputVerifier}</li> - * <li>A container holding error messages to be presented at GUI level - * when an input field value is invalid e.g. "Password too short"</li> - * </ul> - * This wrapper allows grouping of the above items. - * - * @author goik - * - */ -public class UserInputUnit { - - final JLabel label; - final InputVerifierNotify verifier; - final JLabel errorMessage; - - public UserInputUnit(final String guiText, final InputVerifierNotify verifier) { - this.label = new JLabel(guiText); - this.verifier = verifier; - errorMessage = new JLabel(); - } - - /** - * Convenience method. The actual text field receiving - * user input is contained within the verifier. - * @return the GUI elements text value. - */ - public String getText() { - return verifier.field.getText(); - } -}</programlisting> - - <para>The actual GUI text field is part of an abstract class - <classname>sda.jdbc.intro.auth.InputVerifierNotify</classname>:</para> - - <programlisting language="java">package sda.jdbc.intro.auth; -... -public abstract class InputVerifierNotify extends InputVerifier { - - protected final String errorMessage; - public final JLabel validationLabel; - public final JTextField field; - - public InputVerifierNotify(final JTextField field, final String errorMessage) { - this.errorMessage = errorMessage; - this.field = field; - field.setInputVerifier(this); - validationLabel = new JLabel(); - validationLabel.setForeground(Color.RED); - } - public void clear() { - validationLabel.setText(""); - } -}</programlisting> - - <para>We have two field verifier classes being derived from - <classname>sda.jdbc.intro.auth.InputVerifierNotify</classname>:</para> - - <glosslist> - <glossentry> - <glossterm><classname>sda.jdbc.intro.auth.RegexpVerifier</classname></glossterm> - - <glossdef> - <para>This one is well known from earlier versions and - is used to validate text input fields by regular - expressions.</para> - </glossdef> - </glossentry> - - <glossentry> - <glossterm><classname>sda.jdbc.intro.auth.InputVerifierNotify</classname></glossterm> - - <glossdef> - <para>This verifier class is responsible for comparing - our two password fields to have identical - values.</para> - </glossdef> - </glossentry> - </glosslist> - - <para>All these components get assembled in - <classname>sda.jdbc.intro.auth.InsertPerson</classname>. We - remark some important points:</para> - - <programlisting>package sda.jdbc.intro.auth; -... -public class InsertPerson extends JFrame { - ... - - // GUI attributes for user input - final UserInputUnit name = <co linkends="listingInsertUserAuth-1" - xml:id="listingInsertUserAuth-1-co"/> - new UserInputUnit( - "Name", - new RegexpVerifier(new JTextField(15), "^[^;'\"]+$", "No special characters allowed")); - - - // We need a reference to the password field to avoid - // casting from JTextField later. - private final JPasswordField passwordField = new JPasswordField(10); <co - linkends="listingInsertUserAuth-2" - xml:id="listingInsertUserAuth-2-co"/> - private final UserInputUnit password = - new UserInputUnit( - "Password", - new RegexpVerifier(passwordField, "^.{6,20}$", "length from 6 to 20 characters")); -... - private final UserInputUnit passwordRepeat = - new UserInputUnit( - "repeat pass.", - new EqualValueVerifier <co linkends="listingInsertUserAuth-3" - xml:id="listingInsertUserAuth-3-co"/> (new JPasswordField(10), passwordField, "Passwords do not match")); - - private final UserInputUnit [] userInputUnits = <co - linkends="listingInsertUserAuth-4" - xml:id="listingInsertUserAuth-4-co"/> - {name, email, login, password, passwordRepeat}; - - -... - private void userLoginDialog() { -... - } - -... - public InsertPerson (){ -... - databaseFieldPanel.setLayout(new GridLayout(0, 3)); //Third column for validation label - add(databaseFieldPanel); - - for (UserInputUnit unit: userInputUnits) { <co - linkends="listingInsertUserAuth-5" - xml:id="listingInsertUserAuth-5-co"/> - databaseFieldPanel.add(unit.label); - databaseFieldPanel.add(unit.verifier.field); - databaseFieldPanel.add(unit.verifier.validationLabel); - } - - insertButton.addActionListener(new ActionListener() { - @Override - public void actionPerformed(ActionEvent e) { - if (inputValuesAllValid()) { - if (persistenceHandler.add( <co - linkends="listingInsertUserAuth-6" - xml:id="listingInsertUserAuth-6-co"/> - name.getText(), - email.getText(), - login.getText(), - passwordField.getPassword())) { - clearMask(); -... - } - private void clearMask() { <co linkends="listingInsertUserAuth-7" - xml:id="listingInsertUserAuth-7-co"/> - for (UserInputUnit unit: userInputUnits) { - unit.verifier.field.setText(""); - unit.verifier.clear(); - } - } - private boolean inputValuesAllValid() {<co - linkends="listingInsertUserAuth-8" - xml:id="listingInsertUserAuth-8-co"/> - for (UserInputUnit unit: userInputUnits) { - if (!unit.verifier.verify(unit.verifier.field)){ - return false; - } - } - return true; - } -}</programlisting> - - <calloutlist> - <callout arearefs="listingInsertUserAuth-1-co" - xml:id="listingInsertUserAuth-1"> - <para>All GUI related stuff for entering a user's - name</para> - </callout> - - <callout arearefs="listingInsertUserAuth-2-co" - xml:id="listingInsertUserAuth-2"> - <para>Password fields need special treatment: - <code>getText()</code> is superseded by - <code>getPassword()</code>. In order to avoid casts from - <classname>javax.swing.JTextField</classname> to - <classname>javax.swing.JPasswordField</classname> we - simply keep an extra reference.</para> - </callout> - - <callout arearefs="listingInsertUserAuth-3-co" - xml:id="listingInsertUserAuth-3"> - <para>In order to check both password fields for - identical values we need a different validator - <classname>sda.jdbc.intro.auth.EqualValueVerifier</classname> - expecting both password fields in its - constructor.</para> - </callout> - - <callout arearefs="listingInsertUserAuth-4-co" - xml:id="listingInsertUserAuth-4"> - <para>All 5 user input elements get grouped by an array. - This allows for iterations like in <coref - linkend="listingInsertUserAuth-7-co"/> or <coref - linkend="listingInsertUserAuth-8-co"/>.</para> - </callout> - - <callout arearefs="listingInsertUserAuth-5-co" - xml:id="listingInsertUserAuth-5"> - <para>Adding all GUI elements to the base pane in a - loop.</para> - </callout> - - <callout arearefs="listingInsertUserAuth-6-co" - xml:id="listingInsertUserAuth-6"> - <para>Providing user entered values to the persistence - provider.</para> - </callout> - - <callout arearefs="listingInsertUserAuth-7-co" - xml:id="listingInsertUserAuth-7"> - <para>Whenever a dataset has been successfully sent to - the database we have to clean our GUI to possibly enter - another record.</para> - </callout> - - <callout arearefs="listingInsertUserAuth-8-co" - xml:id="listingInsertUserAuth-8"> - <para>Thanks to our grouping aggregation of individual - validation states becomes easy.</para> - </callout> - </calloutlist> - </answer> - </qandaentry> - </qandadiv> - </qandaset> + <para>PDF bookmarks are <link + xlink:href="http://www.w3.org/TR/2006/REC-xsl11-20061205/#d0e14206">part + of the XSL-FO 1.1</link> Standard. Some <abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> + processors still continue to use proprietary solutions for bookmark + creation with respect to the older <abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> + 1.0 standard. For details of bookmark extensions by + <orgname>RenderX</orgname>'s processor see <link + xlink:href="http://www.renderx.com/tutorial.html#PDF_Bookmarks">xep's + documentation</link>.</para> </section> </section> - </chapter> - - <chapter xml:id="chapUnitTesting"> - <title>Unit testing with <productname - xlink:href="http://testng.org">TestNG</productname></title> - - <para>This chapter presents a very short introduction to the basic usage - of unit testing. We start with a simple stack implementation:</para> - - <programlisting language="java">package sda.unittesting; - -public class MyStack { - int [] data = new int[5]; - int numElements = 0; - - public void push(final int n) { - data[numElements] = n; - numElements++; - } - public int pop() { - numElements--; - return data[numElements]; - } - public int top() { - return data[numElements - 1]; - } - public boolean empty() { - return 0 == numElements; - } -}</programlisting> - - <para>Readers being familiar with stacks will immediately notice a - deficiency in the above code: This stack is actually bounded. It only - allows us to store a maximum number of five integer values.</para> - - <para>The following implementation allows us to functionally test our - <classname>sda.unittesting.MyStack</classname> implementation with - respect to the usual stack behaviour:</para> - <programlisting language="java" linenumbering="numbered">package sda.unittesting; + <section xml:id="xml2fo"> + <title>Constructing <abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> + from XML documents</title> -public class MyStackFuncTest { + <titleabbrev><abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> + from XML</titleabbrev> - private static void assertTrue(boolean status) { - if (!status) { - throw new RuntimeException("Assert failed"); - } - } - public static void main(String[] args) { - final MyStack stack = new MyStack(); - // Test 1: A new MyStack instance should not contain any elements. - assertTrue(stack.empty()); + <para>So far we have learnt some basic <abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> + elements. As with HTML we typically generate FO code from other + sources rather than crafting it by hand. The general picture + is:</para> - // Test 2: Adding and removal - stack.push(4); - assertTrue (!stack.empty()); - assertTrue (4 == stack.top()); - assertTrue (4 == stack.pop()); - assertTrue (stack.empty()); + <figure xml:id="htmlFoProduction"> + <title>Different target formats from common source.</title> - // Test 3: Trying to add more than five values - stack.push(1);stack.push(2);stack.push(3);stack.push(4); - stack.push(5); - stack.push(6); - assertTrue(6 == stack.pop()); - } -}</programlisting> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/crossmedia.fig"/> + </imageobject> + </mediaobject> - <para>Execution yields a runtime exception which is due to the attempted - insert operation <code>stack.push(6)</code>:</para> + <caption> + <para>We may generate both online and printed documentation from a + common source. This requires style sheets for the desired + destination formats in question.</para> + </caption> + </figure> - <programlisting>Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 5 - at sda.unittesting.MyStack.push(MyStack.java:8) - at sda.unittesting.MyStackFuncTest.main(MyStackFuncTest.java:20)</programlisting> + <para>We discussed the <abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> + standard as an input format for printable output production by a + renderer. In this way a <abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> + document is similar to HTML being a format to be rendered by a web + browser for visual (screen oriented) output production. The + transformation from a XML source (e.g. a memo document) to <abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> + is still missing. As for HTML we may use <abbrev + xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> as a + transformation means. We generate the sender's surname from a memo + document instance:</para> - <para>The execution result is easy to understand since our - <classname>sda.unittesting.MyStack </classname> implementation only - allows to store 5 values.</para> + <figure xml:id="memo2fosurname"> + <title>Generating a sender's surname for printing.</title> - <para>Our testing application is fine so far. It does however lack some - features:</para> + <programlisting><?xml version="1.0" encoding="utf-8"?> +<xsl:stylesheet version="1.0" + xmlns:fo="http://www.w3.org/1999/XSL/Format" + xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> - <itemizedlist> - <listitem> - <para>automatic initialization before starting tests and - finalization at the end.</para> - </listitem> + <xsl:output method="xml" indent="yes"/> - <listitem> - <para>Our test is monolithic: We used comments to document different - tests. This knowledge is implicit and thus invisible to testing - frameworks. Test results (failure/success) cannot be assigned to - test 1, test 2 for example.</para> - </listitem> + <xsl:template match="/"> + <fo:root> + <fo:layout-master-set> + <fo:simple-page-master master-name="simplePageLayout" + page-width="294mm" page-height="210mm" margin="5mm"> + <fo:region-body margin="15mm"/> + </fo:simple-page-master> + </fo:layout-master-set> + <fo:page-sequence master-reference="simplePageLayout"> + <fo:flow flow-name="xsl-region-body"> + <fo:block font-size="20pt"> + <xsl:text>Sender:</xsl:text> + <fo:inline font-weight='bold'> + <xsl:value-of select="memo/from/surname"/> + </fo:inline> + </fo:block> + </fo:flow> + </fo:page-sequence> + </fo:root> + </xsl:template> +</xsl:stylesheet></programlisting> + </figure> - <listitem> - <para>Aggregation and visualization of test results</para> - </listitem> + <para>A suitable XML document instance reads:</para> - <listitem> - <para>Dependencies between individual tests</para> - </listitem> + <figure xml:id="memoMessage"> + <title>A <code>memo</code> document instance.</title> - <listitem> - <para>Ability to enable and disable tests according to a project's - maturity level. In our example test 3 might be disabled till an - unbounded implementation gets completed.</para> - </listitem> - </itemizedlist> + <programlisting><?xml version="1.0" ?> +<!DOCTYPE memo SYSTEM "memo.dtd"> +<memo> + <from> + <name>Martin</name> + <surname>Goik</surname> + </from> + <to> + <name>Adam</name> + <surname>Hacker</surname> + </to> + <to> + <name>Eve</name> + <surname>Intruder</surname> + </to> + <date year="2005" month="1" day="6"/> + <subject>Firewall problems</subject> + <content> + <para>Thanks for your excellent work.</para> + <para>Our firewall is definitely broken!</para> + </content> +</memo></programlisting> + </figure> - <para>Testing frameworks like <productname - xlink:href="http://junit.org">Junit</productname> or <productname - xlink:href="http://testng.org">TestNG</productname> provide means for - efficient and flexible test organization. Using <productname - xlink:href="http://testng.org">TestNG</productname> our current test - application including only test 1 and test 2 reads:</para> + <para>Some remarks:</para> - <programlisting language="java">package sda.unittesting; + <orderedlist> + <listitem> + <para>The <link + xlink:href="http://www.w3.org/TR/2007/REC-xslt20-20070123/#element-stylesheet">xsl_stylesheet</link> + element contains a namespace definition for the target FO + document's namespace, namely:</para> -import org.testng.annotations.Test; + <programlisting>xmlns:xsl="http://www.w3.org/1999/XSL/Transform"</programlisting> -public class MyStackTestSimple { + <para>This is required to use elements like <link + xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link> + belonging to the FO namespace.</para> + </listitem> - final MyStack stack = new MyStack(); - - @Test - public void empty() { - assert(stack.empty()); - } - @Test - public void pushPopEmpty() { - assert (stack.empty()); - stack.push(4); - assert (!stack.empty()); - assert (4 == stack.top()); - assert (4 == stack.pop()); - assert (stack.empty()); - } -}</programlisting> + <listitem> + <para>The option value <code>indent="yes"</code> in <link + xlink:href="http://www.w3.org/TR/2007/REC-xslt20-20070123/#element-output">xsl_output</link> + is usually set to "no" in a production environment to avoid + whitespace related problems.</para> + </listitem> - <para>We notice the absence of a <function>main()</function> method. Our - testing framework uses the above code for test definitions. In contrast - to our homebrew solution the individual tests are now defined in a - machine readable fashion. This allows for sophisticated statistics. - Executing inside <productname - xlink:href="http://testng.org">TestNG</productname> produces the - following results:</para> + <listitem> + <para>The generation of a print format like PDF is actually a two + step process. To generate message.pdf from message.xml by a + stylesheet memo2fo.xsl we need the following calls:</para> - <programlisting>PASSED: empty -PASSED: pushPopEmpty + <variablelist> + <varlistentry> + <term><emphasis>XML document instance to FO</emphasis></term> -=============================================== - Default test - Tests run: 2, Failures: 0, Skips: 0 -=============================================== + <listitem> + <programlisting>xml2xml message.xml memo2fo.xsl -o message.fo</programlisting> + </listitem> + </varlistentry> + <varlistentry> + <term><emphasis>FO to PDF</emphasis></term> -=============================================== -Default suite -Total tests run: 2, Failures: 0, Skips: 0 -===============================================</programlisting> + <listitem> + <programlisting>fo2pdf -fo message.fo -pdf message.pdf</programlisting> + </listitem> + </varlistentry> + </variablelist> - <para>Both tests run successfully. So why did we omit test 3 which is - bound to fail? We now add it to the test suite:</para> + <mediaobject> + <imageobject> + <imagedata fileref="Ref/Fig/xml2fo2pdf.fig"/> + </imageobject> + </mediaobject> - <programlisting language="java">package sda.unittesting; -... -public class MyStackTestSimple1 { -... - @Test - public void empty() { - assert(stack.empty()); -... - - @Test - public void push6() { - stack.push(1); - stack.push(2); - stack.push(3); - stack.push(4); - stack.push(5); - stack.push(6); - assert (6 == stack.pop()); - } ...</programlisting> + <para>When debugging of the intermediate <abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> + file is not required both steps may be combined into a single + call:</para> - <para>As expected test 3 fails. But the result shows test 2 failing as - well:</para> + <programlisting>fo2pdf -xml message.xml -xsl memo2fo.xsl -pdf message.pdf</programlisting> + </listitem> + </orderedlist> + </section> - <programlisting>PASSED: empty -FAILED: push6 -java.lang.ArrayIndexOutOfBoundsException: 5 - at sda.unittesting.MyStack.push(MyStack.java:8) - at sda.unittesting.MyStackTestSimple1.push6(MyStackTestSimple1.java:30) - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) - ... + <section xml:id="foCatalog"> + <title>Formatting a catalog.</title> -FAILED: pushPopEmpty -java.lang.AssertionError - at sda.unittesting.MyStackTestSimple1.pushPopEmpty(MyStackTestSimple1.java:15) - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) - ... + <titleabbrev>A catalog</titleabbrev> -=============================================== - Default test - Tests run: 3, Failures: 2, Skips: 0 -===============================================</programlisting> + <para>We now take the <link linkend="climbingCatalog">climbing catalog + example</link> with prices being added and incrementally create a + series of PDF versions improving from one version to another.</para> - <para>This unexpected result is due to the execution order of the three - individual tests. Within our class - <classname>sda.unittesting.MyStackTestSimple1</classname> the three - tests appear in the sequence test 1, test 2 and test 3. This however is - just the order of source code. The testing framework will not infer any - order and thus execute our three tests in <emphasis - role="bold">arbitrary</emphasis> order. The execution log shows the - actual order:</para> + <qandaset role="exercise"> + <title>A first PDF version of the catalog</title> - <orderedlist> - <listitem> - <para>Test <quote><code>empty</code></quote></para> - </listitem> + <qandadiv> + <qandaentry xml:id="idCatalogStart"> + <question> + <para>Write a <abbrev + xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> script + to generate a starting version <filename + xlink:href="Ref/src/Dom/climbenriched.start.pdf">climbenriched.start.pdf</filename>.</para> + </question> - <listitem> - <para>Test <quote><code>push6</code></quote></para> - </listitem> + <answer> + <programlisting><?xml version="1.0" encoding="utf-8"?> +<xsl:stylesheet version="1.0" + xmlns:fo="http://www.w3.org/1999/XSL/Format" + xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> - <listitem> - <para>Test <quote><code>pushPopEmpty</code></quote></para> - </listitem> - </orderedlist> + <xsl:output method="xml" indent="yes"/> - <para>So the second test will raise an exception and leave the stack - filled with the maximum possible five elements. Thus it is not empty and - the <quote><code>pushPopEmpty</code></quote> test fails as well.</para> + <xsl:template match="/"> + <fo:root font-size="10pt"> + <fo:layout-master-set> + <fo:simple-page-master master-name="productPage" + page-width="80mm" page-height="110mm" margin="5mm"> + <fo:region-body margin="15mm"/> + <fo:region-before extent="10mm"/> + </fo:simple-page-master> + </fo:layout-master-set> + <xsl:apply-templates select="catalog/product" /> + </fo:root> + </xsl:template> - <para>If we want to avoid this type of errors we may:</para> + <xsl:template match="product"> + <fo:page-sequence master-reference="productPage"> + <fo:static-content flow-name="xsl-region-before"> + <fo:block font-weight="bold"> + <xsl:value-of select="title"/> + </fo:block> + </fo:static-content> + <fo:flow flow-name="xsl-region-body"> + <xsl:apply-templates select="description/para"/> + + <fo:block>Price:<xsl:value-of select="@price"/></fo:block> + <fo:block>Order no:<xsl:value-of select="@id"/></fo:block> + </fo:flow> + </fo:page-sequence> + </xsl:template> - <itemizedlist> - <listitem> - <para>Declare tests within separate (test class) definitions</para> - </listitem> + <xsl:template match="para"> + <fo:block space-after="10px"> + <xsl:value-of select="."/> + </fo:block> + </xsl:template> - <listitem> - <para>Define dependencies like test X can only be executed after - test Y.</para> - </listitem> - </itemizedlist> +</xsl:stylesheet></programlisting> + </answer> + </qandaentry> - <para>The <productname - xlink:href="http://testng.org">TestNG</productname> framework offers a - feature which allows the definition of test groups and dependencies - between them. We use this feature to refine our test definition:</para> + <qandaentry xml:id="idCatalogProduct"> + <question> + <label>Header, page numbers and table formatting</label> - <programlisting language="java">package sda.unittesting; -... -public class MyStackTest { - ... - @Test (<emphasis role="bold">groups = "basic"</emphasis>) - public void empty() { - assert(stack.empty()); - } - @Test (<emphasis role="bold">groups = "basic"</emphasis>) - public void pushPopEmpty() { - ... - } - - @Test (<emphasis role="bold">dependsOnGroups = "basic"</emphasis>) - public void push6() { - ... - }</programlisting> + <para>Extend <xref linkend="idCatalogStart"/> by adding page + numbers. The order number and prices shall be formatted as + tables. Add a ruler to each page's head. The result should + look like <filename + xlink:href="Ref/src/Dom/climbenriched.product.pdf">climbenriched.product.pdf</filename></para> + </question> - <para>The first two tests will now belong to the same test group - <quote>basic</quote>. The <emphasis role="bold"><code>dependsOnGroups = - "basic"</code></emphasis> declaration will guarantee that our - <code>push6</code> test will be launched as the last one. So we get the - expected result:</para> + <answer> + <para>Solution see <filename + xlink:href="Ref/src/Dom/catalog2fo.product.xsl">catalog2fo.product.xsl</filename>.</para> + </answer> + </qandaentry> - <programlisting>PASSED: empty -PASSED: pushPopEmpty -FAILED: push6 -java.lang.ArrayIndexOutOfBoundsException: 5 - at sda.unittesting.MyStack.push(MyStack.java:8) - at sda.unittesting.MyStackTest.push6(MyStackTest.java:30) - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) -... + <qandaentry xml:id="idCatalogToc"> + <question> + <label>A table of contents.</label> + <para>Each product description's page number shall appear in a + table of contents together with the product's + <code>title</code> as in <filename + xlink:href="Ref/src/Dom/climbenriched.toc.pdf">climbenriched.toc.pdf</filename>.</para> + </question> -=============================================== - Default test - Tests run: 3, Failures: 1, Skips: 0 -===============================================</programlisting> + <answer> + <para>Solution see <filename + xlink:href="Ref/src/Dom/catalog2fo.toc.xsl">catalog2fo.toc.xsl</filename>.</para> + </answer> + </qandaentry> - <para>In fact the order between the first two tests might be critical as - well. The <quote><code>pushPopEmpty</code></quote> test leaves our stack - in an empty state. If this is not the case reversing the execution order - of <quote><code>pushPopEmpty</code></quote> and - <quote><code>empty</code></quote> would cause an error as well.</para> + <qandaentry xml:id="idCatalogToclink"> + <question> + <label>A table of contents with hypertext links.</label> - <para>Programming <abbrev - xlink:href="http://en.wikipedia.org/wiki/Integrated_development_environment">IDE</abbrev>s - like eclipse provide elements for test result visualization. Our last - test gets summarized as:</para> + <para>The table of contents' entries may offer hypertext + features to supporting browsers as in <filename + xlink:href="Ref/src/Dom/climbenriched.toclink.pdf">climbenriched.toclink.pdf</filename>. + In addition include the document's <tag + class="starttag">introduction</tag>.</para> + </question> - <screenshot> - <info> - <title><productname - xlink:href="http://testng.org">TestNG</productname> result - presentation in eclipse</title> - </info> + <answer> + <para>Solution see <filename + xlink:href="Ref/src/Dom/catalog2fo.toclink.xsl">catalog2fo.toclink.xsl</filename>.</para> + </answer> + </qandaentry> - <mediaobject> - <imageobject> - <imagedata fileref="Ref/Screen/eclipseTestngResult.screen.png"/> - </imageobject> - </mediaobject> - </screenshot> + <qandaentry xml:id="idCatalogFinal"> + <question> + <label>A final version.</label> - <para>We can drill down from a result of type failure to its occurrence - within the corresponding code.</para> + <para>Add the following features:</para> + + <orderedlist> + <listitem> + <para>Number the table of contents starting with page i, + ii, iii, iv and so on. Start the product descriptions with + page 1. On each page's footer a text <quote>page xx of + yy</quote> shall be displayed. This requires the + definition of an anchor <code>id</code> on the <abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> + document's last page.</para> + </listitem> + + <listitem> + <para>Add PDF bookmarks by using <orgname>XEP</orgname>'s + <abbrev + xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> + extensions. This requires the namespace declaration + <code>xmlns:rx="http://www.renderx.com/XSL/Extensions"</code> + in the XSLT script's header.</para> + </listitem> + </orderedlist> + + <para>The result may look like <filename + xlink:href="Ref/src/Dom/climbenriched.final.pdf">climbenriched.final.pdf</filename>. + N.B.: It may take some effort to achieve this result. This + effort is left to the <emphasis>interested</emphasis> + participants.</para> + </question> + + <answer> + <para>Solution see <filename + xlink:href="Ref/src/Dom/catalog2fo.toclink.xsl">catalog2fo.toclink.xsl</filename>.</para> + </answer> + </qandaentry> + </qandadiv> + </qandaset> + </section> </chapter> <appendix> @@ -25408,15 +15507,9 @@ java.lang.ArrayIndexOutOfBoundsException: 5 </appendix> </part> -<<<<<<< HEAD <part xml:id="sda2"> <title annotations="ws/eclipse/HibIntro/target/classes">Structured Data and Applications 2</title> -======= - <part> - <title annotations="ws/eclipse/HibIntro/target/classes">Persistence - strategies and application development</title> ->>>>>>> d6323b6ef90b1907e7f809b7765f28762f5ea52d <chapter xml:id="orm"> <title>Object Relational Mapping</title> -- GitLab