<?xml version="1.0" encoding="UTF-8"?> <book version="5.0" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xi="http://www.w3.org/2001/XInclude" xmlns:svg="http://www.w3.org/2000/svg" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:html="http://www.w3.org/1999/xhtml" xmlns:db="http://docbook.org/ns/docbook"> <info> <title>Lecture notes of Martin Goik</title> <author> <personname><firstname>Martin</firstname> <surname>Goik</surname></personname> <affiliation> <orgname>http://medieninformatik.hdm-stuttgart.de</orgname> </affiliation> </author> <legalnotice> <para>Source code available at <uri xlink:href="https://version.mi.hdm-stuttgart.de/git/GoikLectures">https://version.mi.hdm-stuttgart.de/git/GoikLectures</uri></para> </legalnotice> <annotation> <para>ToDo: Figures from old lecture slides.</para> <para>Images and streams, Stored procedures, Transactions</para> </annotation> </info> <glossary xml:id="glossary"> <glossentry xml:id="gloss_Java"> <glossterm><trademark xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark></glossterm> <glossdef> <para>General purpose programming language with support for object oriented concepts.</para> </glossdef> </glossentry> <glossentry xml:id="gloss_Javadoc"> <glossterm><trademark xlink:href="http://docs.oracle.com/javase/1.5.0/docs/guide/javadoc">Javadoc</trademark></glossterm> <glossdef> <para>Extracting documentation embedded in <link linkend="gloss_Ja"><trademark>Java</trademark></link> source code.</para> </glossdef> </glossentry> <glossentry xml:id="gloss_JPA"> <glossterm><abbrev>JPA</abbrev></glossterm> <glossdef> <para><link xlink:href="http://www.javaworld.com/javaworld/jw-01-2008/jw-01-jpa1.html">Java Persistence Architecture</link></para> </glossdef> </glossentry> <glossentry xml:id="gloss_ORM"> <glossterm><abbrev>ORM</abbrev></glossterm> <glossdef> <para>Object relational mapping.</para> </glossdef> </glossentry> <glossentry xml:id="gloss_XML"> <glossterm><abbrev>XML</abbrev></glossterm> <glossdef> <para>The <link xlink:href="http://www.w3.org/XML">Extensible Markup Language</link>.</para> </glossdef> </glossentry> </glossary> <part xml:id="sda1"> <title>Structured Data and Applications 1</title> <chapter xml:id="prerequisites"> <title>Prerequisites</title> <section xml:id="resources"> <title>Lecture resources</title> <glosslist> <glossentry> <glossterm>Lecture notes as PDF</glossterm> <glossdef> <para><uri xlink:href="http://www.mi.hdm-stuttgart.de/freedocs/topic/de.hdm_stuttgart.mi.sda1/printversion.pdf">http://www.mi.hdm-stuttgart.de/freedocs/topic/de.hdm_stuttgart.mi.sda1/printversion.pdf</uri></para> <caution> <para>Some figures including videos are left blank.</para> </caution> </glossdef> </glossentry> <glossentry> <glossterm>List of exercises</glossterm> <glossdef> <para>The lecture notes contain exercises to be solved by you! A complete list is available at <uri xlink:href="http://www.mi.hdm-stuttgart.de/freedocs/topic/de.hdm_stuttgart.mi.sda1/apb.html">http://www.mi.hdm-stuttgart.de/freedocs/topic/de.hdm_stuttgart.mi.sda1/apb.html</uri>. You may also use the corresponding PDF version within <filename xlink:href="http://www.mi.hdm-stuttgart.de/freedocs/topic/de.hdm_stuttgart.mi.sda1/printversion.pdf">printversion.pdf</filename> to keep track of your personal advance by filling in your completion status.</para> </glossdef> </glossentry> <glossentry> <glossterm><link linkend="gloss_Javadoc"><trademark>Javadoc</trademark></link> references and source code</glossterm> <glossdef> <para>The lecture notes contain a lot of <link linkend="gloss_Javadoc"><trademark>Javadoc</trademark></link> references. Most classes appearing within these lecture notes have <link linkend="gloss_Javadoc"><trademark>Javadoc</trademark></link> generated links to the source code as well. For example when clicking on the class name in <classname>sda.jdbc.intro.v1.SimpleInsert</classname> you will see the complete implementation.</para> </glossdef> </glossentry> <glossentry> <glossterm>Links to animated figures</glossterm> <glossdef> <para>The lecture notes' online version contains links to <uri xlink:href="http://www.mi.hdm-stuttgart.de/freedocs/topic/de.hdm_stuttgart.mi.sda1/jdbcWrite.html">PDF images</uri>. Clicking on <quote>Animated PDF Version</quote> takes you to a referenced PDF which in full screen mode of Acrobat Reader or <trademark>google-chrome</trademark> provides a slide like animation.</para> </glossdef> </glossentry> <glossentry> <glossterm><trademark>Virtualbox</trademark> image</glossterm> <glossdef> <para>A <productname xlink:href="https://www.virtualbox.org">Virtualbox</productname> image is available in the following formats: <glosslist> <glossentry> <glossterm>Split <command>rar</command> archive (100 MB chunks):</glossterm> <glossdef> <para>You may want to use <productname xlink:href="http://jdownloader.org">Jdownloader</productname> or similar tools to download the chunked archive at <uri xlink:href="ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/Rarformat">ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/Rarformat</uri>.</para> <para>If you are using <productname xlink:href="http://jdownloader.org">Jdownloader</productname> a container file is <link xlink:href="ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/Meta/lubuntu.dlc">available here</link> for your convenience. If you have configured the flashgot extension in a running <productname xlink:href="http://jdownloader.org">Jdownloader</productname> process you may trigger a download by clicking in <uri xlink:href="ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/Meta/lubuntu.html">ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/Meta/lubuntu.html</uri>.</para> </glossdef> </glossentry> <glossentry> <glossterm>Uncompressed raw image:</glossterm> <glossdef> <para><uri xlink:href="ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.vdi">ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.vdi</uri></para> </glossdef> </glossentry> </glosslist> It contains (hopefully) all related tools from the <link xlink:href="http://www.mi.hdm-stuttgart.de">CSM</link> department's lecture room Linux installation:</para> <itemizedlist> <listitem> <para>Eclipse J2EE version with <productname xlink:href="http://www.eclipse.org/datatools">Database developer tools</productname>, <productname xlink:href="http://git-scm.com">git</productname>, <trademark xlink:href="http://oxygenxml.com">Oxygenxml</trademark>, <productname xlink:href="http://testng.org/doc/eclipse.html">TestNG</productname> and <productname xlink:href="http://subversion.apache.org/">svn</productname> plugins installed.</para> </listitem> <listitem> <para>A running <productname xlink:href="http://www.mysql.com/">Mysql</productname> server preconfigured with user <quote><code>hdmuser</code></quote>, password <quote><code>XYZ</code></quote> and database <quote><code>hdm</code></quote>.</para> </listitem> <listitem> <para><productname xlink:href="http://www.xmlmind.com/xmleditor">Xmlmind XML editor</productname> for visually editing technical documents based on <productname xlink:href="http://docbook.org/tdg5/index.html">docbook</productname> or <productname xlink:href="http://en.wikipedia.org/wiki/Darwin_Information_Typing_Architecture">DITA</productname>.</para> </listitem> </itemizedlist> <caution> <para>This VM is only accessible from within the <orgname xlink:href="http://www.hdm-stuttgart.de">HdM</orgname> network. External downloads require <productname xlink:href="https://wiki.mi.hdm-stuttgart.de/wiki/VPN">OpenVPN</productname>.</para> </caution> <para>The virtual machine is based on the <productname xlink:href="http://lubuntu.net">Lubuntu</productname> fork of the <productname xlink:href="http://www.ubuntu.com">Ubuntu</productname> Linux distribution for resource saving reasons.</para> </glossdef> </glossentry> <glossentry xml:id="oxygenLicenseKey"> <glossterm><uri>Oxygen Xml Editor</uri> license key</glossterm> <glossdef> <para>This is the only software component in this lecture requiring a license. Your <orgname>HdM</orgname> affiliation entitles you to use the <productname xlink:href="http://oxygenxml.com/">Oxygenxml</productname> software for educational (non-commercial) purposes. The corresponding key is available from <uri xlink:href="ftp://mirror.mi.hdm-stuttgart.de/Firmen/Eclipse/Plugins/Oxygen/Keys/Version14/Student/licensekey.txt">ftp://mirror.mi.hdm-stuttgart.de/Firmen/Eclipse/Plugins/Oxygen/Keys/Version14/Student/licensekey.txt</uri>.</para> <para>This license key is compatible both with the standalone and the eclipse plugin version of the product.</para> <caution> <para>The license key's <abbrev xlink:href="http://en.wikipedia.org/wiki/File_Transfer_Protocol">ftp</abbrev> URL is only accessible from within the <orgname xlink:href="http://www.hdm-stuttgart.de">HdM</orgname> network. External access requires <link xlink:href="https://wiki.mi.hdm-stuttgart.de/wiki/VPN">Vpn activation</link>.</para> </caution> </glossdef> </glossentry> <glossentry> <glossterm>Source code of lecture resources</glossterm> <glossdef> <para>The complete lecture sources are available from <link xlink:href="https://version.mi.hdm-stuttgart.de/git/GoikLectures">https://version.mi.hdm-stuttgart.de/git/GoikLectures</link>.</para> <para>You may simply execute <quote><command xlink:href="http://git-scm.com/">git</command> <option>clone</option> <option>https://version.mi.hdm-stuttgart.de/git/GoikLectures</option> <option>.</option></quote> to check out the master tree.</para> </glossdef> </glossentry> <glossentry> <glossterm>Source code of exercises and examples</glossterm> <glossdef> <para>These sources contain a subdirectory ws/eclipse/Jdbc which can be imported as an eclipse project. This allows for browsing solutions to the exercises and executing sample applications. Import into eclipse works the following way:</para> <itemizedlist> <listitem> <para>When starting eclipse choose <filename>.../ws/eclipse</filename> as workspace</para> </listitem> <listitem> <para>In eclipse click <quote>File --> Import --> General --> Existing Projects into Workspace</quote>. After re-selecting the current workspace <filename>.../ws/eclipse</filename> the folder <filename>Jdbc</filename> should be on the list of importable projects.</para> <para>Depending on your eclipse installation you may have to adjust the <link linkend="gloss_Java"><trademark>Java</trademark></link> system libraries. Right click on your project root in the package explorer and choose <quote>Build Path --> Configure Buildpath</quote>. The <quote>JRE System Library</quote> entry in the <quote>Libraries</quote> tab may have to be changed to suit your eclipse's installation needs. You may want to create a dummy <link linkend="gloss_Java"><trademark>Java</trademark></link> project to find the correct setting.</para> </listitem> </itemizedlist> </glossdef> </glossentry> </glosslist> </section> <section xml:id="tools"> <title>Tools</title> <para>The subsequent sections describe tools being helpful to successfully carry out the exercises. These descriptions are suitable for current Linux/Ubuntu systems. However these tool are available for <trademark>Windows</trademark> or <trademark>Apple</trademark> systems as well. For the latter some command line hints may have to be replaced by using GUI based tools.</para> <para>You may want to use the <link xlink:href="ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.vdi">corresponding</link> <link xlink:href="https://www.virtualbox.org">Virtualbox image</link> containing a complete system avoiding installation hassles. This should work well one reasonable current hardware systems.</para> <section xml:id="eclipse"> <title><productname xlink:href="http://www.oracle.com/technetwork/java/javase/jdk-7-readme-429198.html">JDK</productname> and Eclipse</title> <para>So you like to take the hard way rather than using <link xlink:href="ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.vdi">the virtualbox image</link>? Good! Real programmers tend to complicate things!</para> <para>The Eclipse IDE will be used as the primary coding tool especially for <link linkend="gloss_Java"><trademark>Java</trademark></link> and XML. Users may use different tools like e.g. <productname xlink:href="http://netbeans.org">Netbeans</productname> or <productname xlink:href="http://www.altova.com/de/xmlspy.html">XML-Spy</productname>. There are however some caveats:</para> <itemizedlist> <listitem> <para>Certain functionalities may not be provided</para> </listitem> <listitem> <para><orgname>HdM</orgname> staff support in case of troubles will be limited to coding and exclude tool support. In other words: You are on your own regarding tool related issues.</para> </listitem> </itemizedlist> <para>Installation of eclipse requires a suitable <link linkend="gloss_Java"><trademark>Java</trademark></link> Development Kit.</para> <caution> <para>Your<productname xlink:href="http://www.oracle.com/technetwork/java/javase/jdk-7-readme-429198.html">JDK</productname> selection may be affected by your system's hardware. On a 64 bit system you may install either a 32 bit or a 64 bit <productname xlink:href="http://www.oracle.com/technetwork/java/javase/jdk-7-readme-429198.html">JDK</productname>. If you subsequently install eclipse you must select the appropriate 32 or 64 Bit version matching your<productname xlink:href="http://www.oracle.com/technetwork/java/javase/jdk-7-readme-429198.html">JDK</productname> choice.</para> </caution> <para>Due to Oracle's (end-user unfriendly) licensing policy you may have to install this component manually. For <productname xlink:href="http://www.ubuntu.com">Ubuntu</productname> and <productname xlink:href="http://www.debian.org">Debian</productname> systems a standard (package manager compatible) procedure is being described at <uri xlink:href="http://www.webupd8.org/2012/01/install-oracle-java-jdk-7-in-ubuntu-via.html">http://www.webupd8.org/2012/01/install-oracle-java-jdk-7-in-ubuntu-via.html</uri>. This boils down to (being executed as user root or preceded by <command>sudo</command> <option>...</option>):</para> <programlisting>add-apt-repository ppa:webupd8team/java apt-get update apt-get install oracle-jdk7-installer</programlisting> <para>During the installation process you will have to accept Oracle's license terms. If you do so this information will be cached and not be asked again for when updating via <command>aptitude </command><option>update</option>;<command>aptitude</command> <option>safe-upgrade</option>. After successful installation when executing <command xlink:href="http://www.oracle.com/us/technologies/java">java</command> <option>-version</option> in a shell you should see something similar to:</para> <programlisting>goik@goiki:~$ <emphasis role="bold">java -version</emphasis> java version "1.7.0_07" Java(TM) SE Runtime Environment (build 1.7.0_07-b10) Java HotSpot(TM) Server VM (build 23.3-b01, mixed mode)</programlisting> <para>The Eclipse IDE comes <link xlink:href="http://www.eclipse.org/downloads">with various flavours</link> depending on which plugins are already being shipped. For our purposes the <quote><productname>Eclipse Classic</productname></quote> <link linkend="gloss_Java"><trademark>Java</trademark></link> edition is sufficient. You may however want to install other flavours like <quote><productname>Eclipse IDE for Java EE Developers</productname></quote> if you require features beyond this course's needs. Remember to download the correct 32 or 64 bit version corresponding to your<productname xlink:href="http://www.oracle.com/technetwork/java/javase/jdk-7-readme-429198.html">JDK</productname>.</para> <para>Follow <uri xlink:href="http://askubuntu.com/questions/26632/how-to-install-eclipse#answer-145018">http://askubuntu.com/questions/26632/how-to-install-eclipse#answer-145018</uri> to install eclipse on your system.</para> </section> <section xml:id="oxygenxmlInstall"> <title><productname xlink:href="http://oxygenxml.com">Oxygenxml</productname> plugin</title> <para>Go to <uri xlink:href="http://www.oxygenxml.com/download_oxygenxml_developer.html?os=Eclipse">http://www.oxygenxml.com/download_oxygenxml_developer.html?os=Eclipse</uri>. You may choose between the <quote>Plugin Update site</quote> and <quote>Plugin zip distribution</quote> installation method. The latter allows for better long term eclipse plugin management and is being described at</para> <para>There are two different ways to install Eclipse plugins:</para> <itemizedlist> <listitem> <para>Use Eclipse's built in Update manager by <link xlink:href="http://www.oxygenxml.com/download_oxygenxml_developer.html?os=Eclipse#eclipse_install_instructions">defining a corresponding update site</link>.</para> </listitem> <listitem> <para>Unzip <filename>com.oxygenxml.developer_14.0.0.v2012082911.zip</filename> in a subfolder of <filename>.../eclipse/dropins</filename> and restart eclipse (as root).</para> </listitem> </itemizedlist> <para>See <xref linkend="oxygenLicenseKey"/> for obtaining a license key. You may as well install the standalone version of the Oxygen XML Editor.</para> </section> <section xml:id="testngInstall"> <title><foreignphrase>TestNG</foreignphrase> plugin</title> <para>Some exercises require the TestNG plugin to be installed in the Eclipse IDE. You may proceed in a similar way as in <uri linkend="oxygenxmlInstall">Oxygenxml</uri>. According to <uri xlink:href="http://testng.org/doc/eclipse.html#eclipse-installation">http://testng.org/doc/eclipse.html#eclipse-installation</uri> the Eclipse URL being needed is <quote>http://beust.com/eclipse</quote>.</para> </section> <section xml:id="mysql"> <title><productname xlink:href="http://www.mysql.com">Mysql</productname> Database components</title> <para>We start by installing the <productname xlink:href="http://www.mysql.com">Mysql</productname> server:</para> <programlisting>root@goiki:~# aptitude install mysql-server The following NEW packages will be installed: libdbd-mysql-perl{a} libdbi-perl{a} libnet-daemon-perl{a} libplrpc-perl{a} mysql-client-5.5{a} mysql-server-5.5 0 packages upgraded, 6 newly installed, 0 to remove and 0 not upgraded. Need to get 0 B/17.8 MB of archives. After unpacking 63.2 MB will be used. Do you want to continue? [Y/n/?]</programlisting> <para>Hit <keycap>Y - return</keycap> to start. During the installation you will be asked for the <productname xlink:href="http://www.mysql.com">Mysql</productname> servers <quote>root</quote> (Administrator) password:</para> <programlisting>Package configuration ┌───────────────────────────┤ Configuring mysql-server-5.5 ├────────────────────────────┐ │ While not mandatory, it is highly recommended that you set a password for the MySQL │ │ administrative "root" user. │ │ │ │ If this field is left blank, the password will not be changed. │ │ │ │ New password for the MySQL "root" user: │ │ │ │ ********_____________________________________________________________________________ │ │ │ │ <Ok> │ │ │ └───────────────────────────────────────────────────────────────────────────────────────┘ </programlisting> <para>This has to be entered twice. Keep a <emphasis role="bold">permanent</emphasis> record of this entry. Alternatively set a bookmark to <uri xlink:href="https://help.ubuntu.com/community/MysqlPasswordReset">https://help.ubuntu.com/community/MysqlPasswordReset</uri> for later reference *** and don't blame me! ***.</para> <para>At this point we should be able to connect to our newly installed Server. We create a database <quote>hdm</quote> to be used for our exercises:</para> <programlisting>goik@goiki:~$ mysql -u root -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 42 Server version: 5.5.24-0ubuntu0.12.04.1 (Ubuntu) Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> <emphasis role="bold">create database hdm;</emphasis> Query OK, 1 row affected (0.00 sec)</programlisting> <para>Following <uri xlink:href="https://dev.mysql.com/doc/refman/5.5/en/adding-users.html">https://dev.mysql.com/doc/refman/5.5/en/adding-users.html</uri> we add a new user and grant full access to the newly created database:</para> <programlisting>goik@goiki:~$ mysql -u root -p Enter password: ... mysql> CREATE USER 'hdmuser'@'localhost' IDENTIFIED BY 'XYZ'; mysql> use hdm; mysql> GRANT ALL PRIVILEGES ON *.* TO 'hdmuser'@'localhost' WITH GRANT OPTION; mysql> FLUSH PRIVILEGES;</programlisting> <para>The next step is optional. The <productname xlink:href="http://www.ubuntu.com">Ubuntu</productname> <productname xlink:href="http://www.mysql.com">Mysql</productname> server default configuration allows connections only via <varname>loopback</varname> interface i.e. <varname>localhost</varname>. If you want your <productname xlink:href="http://www.mysql.com">Mysql</productname> server to listen to the external network interface comment out the bind-address parameter in <filename>/etc/mysql/my.cnf</filename>:</para> <programlisting># Instead of skip-networking the default is now to listen only on # localhost which is more compatible and is not less secure. # <emphasis role="bold">bind-address = 127.0.0.1</emphasis></programlisting> <para>Since we are dealing with <link linkend="gloss_Java"><trademark>Java</trademark></link> a <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> driver is needed to connect Applications to our <productname xlink:href="http://www.mysql.com">Mysql</productname> server:</para> <programlisting>root@goiki:~# aptitude install libmysql-java</programlisting> <para>This provides the file /usr/share/java/mysql-connector-java-5.1.16.jar and two symbolic links:</para> <programlisting>goik@goiki:~$ cd /usr/share/java goik@goiki:/usr/share/java$ ls -al mysql* -rw-r--r-- 1 ... 2011 <emphasis role="bold">mysql-connector-java-5.1.16.jar</emphasis> lrwxrwxrwx 1 ... 2011 <emphasis role="bold">mysql-connector-java.jar -> mysql-connector-java-5.1.16.jar</emphasis> lrwxrwxrwx 1 ... 2011 <emphasis role="bold">mysql.jar -> mysql-connector-java.jar</emphasis></programlisting> </section> </section> <section xml:id="lectureNotes"> <title>Lecture related resources</title> <para>The sources for lecture notes and exercises are available from the <orgname xlink:href="http://www.mi.hdm-stuttgart.de">MIB</orgname> <productname xlink:href="http://git-scm.com">git</productname> repository:</para> <para><uri xlink:href="https://version.mi.hdm-stuttgart.de/git/GoikLectures">https://version.mi.hdm-stuttgart.de/git/GoikLectures</uri></para> <para>Check-out is straightforward:</para> <programlisting>goik@goiki:~$ mkdir StructuredData;cd StructuredData goik@goiki:~/StructuredData$ git clone https://version.mi.hdm-stuttgart.de/git/GoikLectures . Cloning into '.'... remote: Counting objects: 694, done ... Resolving deltas: 100% (296/296), done.</programlisting> <para>After checkout an eclipse workspace holding the complete example source code becomes visible:</para> <programlisting>goik@goiki:~/StructuredData$ cd ws/eclipse goik@goiki:~/StructuredData/ws/eclipse$ ls -al insgesamt 16 drwxr-xr-x 3 goik fb1prof 4096 Nov 8 22:04 . drwxr-xr-x 4 goik fb1prof 4096 Nov 8 22:04 .. -rw-r--r-- 1 goik fb1prof 11 Nov 8 22:04 .gitignore <emphasis role="bold">drwxr-xr-x 6 goik fb1prof 4096 Nov 8 22:04 Jdbc</emphasis></programlisting> <para>The subdirectory <filename>Jdbc</filename> can be imported as an eclipse project via File --> import --> General --> Existing Projects into workspace. This should enable each participant to browse and execute the examples being provided in the lecture notes. It also contains the a <productname xlink:href="http://www.mysql.com">Mysql</productname> driver in Jdbc/lib/mysql-connector-java-5.1.16.jar being required to set up a <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> connection.</para> </section> <section xml:id="toolingConfigJdbc"> <titleabbrev>Tooling</titleabbrev> <title>Tooling: Configuring and using the <link xlink:href="http://www.eclipse.org/datatools">Eclipse database development</link> plugin</title> <para>For some basic SQL communications the Eclipse environment offers a standard plugin (Database development). Establishing connections to a specific database server generally requires prior installation of a <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> driver on the client side as being shown in the following video:</para> <figure xml:id="figureConfigJdbcDriver"> <title>Adding a <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> Driver for <productname xlink:href="http://www.mysql.com">Mysql</productname> to the database plugin.</title> <mediaobject> <videoobject> <videodata fileref="Ref/Video/jdbcDriverConfig.mp4"/> </videoobject> </mediaobject> </figure> <para>During the exercises the eclipse database development perspective may be used to browse and structure SQL tables and data. The following video demonstrates the configuration of a <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> connection to a local (<varname>localhost</varname> network interface) database server. With respect to the introduction given in <xref linkend="mysql"/> we assume the existence of a database <code>hdm</code> and a corresponding account (hdmuser/Password <code>XYZ</code>) on our database server.</para> <figure xml:id="figureConfigJdbcConnection"> <title>Configuring a <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> connection to a (local) <productname xlink:href="http://www.mysql.com">Mysql</productname> database server.</title> <mediaobject> <videoobject> <videodata fileref="Ref/Video/jdbcConnection.mp4"/> </videoobject> </mediaobject> </figure> <para>We are now ready to communicate with our database server. The last video in this section shows some basic SQL tasks:</para> <figure xml:id="figureEclipseBasicSql"> <title>Executing SQL statements, browsing schema and retrieving data</title> <mediaobject> <videoobject> <videodata fileref="Ref/Video/eclipseBasicSql.mp4"/> </videoobject> </mediaobject> </figure> </section> </chapter> <chapter xml:id="xmlIntro"> <title>Introduction to XML</title> <section xml:id="xmlBasic"> <title>The XML industry standard</title> <para>A short question might be: <quote>What is XML?</quote> An answer might be: The acronym XML stands for <quote>E<emphasis>x</emphasis>tensible <emphasis>M</emphasis>arkup <emphasis>L</emphasis><foreignphrase>anguage</foreignphrase></quote> and is an industry standard being published by the W3C standardization organization. Like other industry software standards talking about XML leads to talk about XML based software: Applications and frameworks supplying added values to software implementors and enhancing data exchange between applications.</para> <para>Many readers are already familiar with XML without explicitly referring to the standard itself: The world wide web's <foreignphrase>lingua franca</foreignphrase> HTML has been ported to an XML dialect forming the <link xlink:href="http://www.w3.org/MarkUp">XHTML</link> Standard. The idea behind this standard is to distinguish between an abstract markup language and rendered results being generated from so called document instances by a browser:</para> <figure xml:id="renderXhtmlMarkup"> <title>Rendering XHTML markup</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/xhtml.fig"/> </imageobject> </mediaobject> </figure> <para>Xhtml is actually a good example to illustrate the tree like, hierarchical structure of XML documents:</para> <figure xml:id="xhtmlTree"> <title>Xhtml tree structure</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/xhtmlexample.fig" scale="65"/> </imageobject> </mediaobject> </figure> <para>We may extend this example by representing a mathematical formula via a standard called <link xlink:href="http://www.w3.org/Math">Mathml</link>:</para> <figure xml:id="mathmlExample"> <title>A formula in <link xlink:href="http://www.w3.org/Math">MathML</link> representation.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/sqrtrender.fig"/> </imageobject> </mediaobject> </figure> <para>Again we observe a similar situation: A database like <emphasis>representation</emphasis> of a formula on the left and a <emphasis>rendered</emphasis> version on the right. Regarding XML we have:</para> <itemizedlist> <listitem> <para>The <link xlink:href="http://www.w3.org/Math">MathML</link> standard intended to describe mathematical formulas. The standard defines a set of <emphasis>tags</emphasis> like e.g. <tag class="starttag">math:msqrt</tag> with well-defined semantics regarding permitted attribute values and nesting rules.</para> </listitem> <listitem> <para>Informal descriptions of formatting expectations.</para> </listitem> <listitem> <para>Software transforming an XML formula representation into visible or printable output. In other words: A rendering engine.</para> </listitem> </itemizedlist> <para>XML documents may also be regarded as a persistence mechanism to represent and store data. Similarities to Relational Database Systems exist. A RDBMS (<emphasis>R</emphasis><foreignphrase>elational</foreignphrase> <emphasis>D</emphasis><foreignphrase>atabase</foreignphrase> <emphasis>M</emphasis><foreignphrase>anagement</foreignphrase> <emphasis>S</emphasis><foreignphrase>ystem</foreignphrase>) is typically capable to hold Tera bytes of data being organized in tables. The arrangement of data may be subject to various constraints like candidate- or foreign key rules. With respect to both end users and software developers a RDBMS itself is a building block in a complete solution. We need an application on top of it acting as a user interface to the data being contained.</para> <para>In contrast to a RDBMS XML allows data to be organized hierarchically. The <link xlink:href="http://www.w3.org/Math">MathML</link> representation given in <xref linkend="mathmlExample"/> may be graphically visualized:</para> <figure xml:id="mathmltree"> <title>A tree graph representation of the <link xlink:href="http://www.w3.org/Math">MathML</link> example given before.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/sqrtree.fig"/> </imageobject> </mediaobject> </figure> <para>CAD applications may user XML documents as a representation of graphical primitives:</para> <informalfigure> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/attributes.fig" scale="65"/> </imageobject> </mediaobject> </informalfigure> <para>Of course RDBMS also allow the representation of tree like structures or arbitrary graphs. But these have to be modelled by using foreign key constraints since relational tables themselves have a <quote>flat</quote> structure. Some RDBMS vendors provide extensions to the SQL standard which allow <quote>native</quote> representations of <link linkend="gloss_XML"><abbrev>XML</abbrev></link> documents.</para> </section> <section xml:id="xmlHtml"> <title>Well formed XML documents</title> <para>The general structure of an <link linkend="gloss_XML"><abbrev>XML</abbrev></link> document is as follows:</para> <figure xml:id="xmlbase"> <title><link linkend="gloss_XML"><abbrev>XML</abbrev></link> basic structure</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/xmlbase.fig" scale="65"/> </imageobject> </mediaobject> </figure> <para>We explore a simple XML document representing messages like E-mails:</para> <figure xml:id="memoWellFormed"> <title>The representation of a short message.</title> <programlisting><?xml<co xml:id="first_xml_code_magic"/> version="1.0"<co xml:id="first_xml_code_version"/> encoding="UTF-8"<co xml:id="first_xml_code_encoding"/>?> <memo><co xml:id="first_xml_code_topelement"/> <from>M. Goik</from><co xml:id="first_xml_code_from"/> <to>B. King</to> <to>A. June</to> <subject>Best whishes</subject> <content>Hi all, congratulations to your splendid party</content> </memo></programlisting> </figure> <calloutlist> <callout arearefs="first_xml_code_magic"> <para>The very first characters <code><?xml</code> may be regarded as a <link xlink:href="http://en.wikipedia.org/wiki/Magic_number_(programming)">magic number string</link> being used as a format indicator which allows to distinguish between different file types i.e. GIF, JPEG, HTML and so on.</para> </callout> <callout arearefs="first_xml_code_version"> <para>The <code>version="1.0"</code> attribute tells us that all subsequent lines will conform to the <link xlink:href="http://www.w3.org/TR/xml">XML</link> standard of version 1.0. This way a document can express its conformance to the version 1.0 standard even if in the future this standard evolves to a higher version e.g. <code>version="2.1"</code>.</para> </callout> <callout arearefs="first_xml_code_encoding"> <para>The attribute <code>encoding="UTF-8"</code> tells us that all text in the current document uses <link xlink:href="http://unicode.org">Unicode</link> encoding. <link xlink:href="http://unicode.org">Unicode</link> is a widely accepted industry standard for font encoding. Thus European, Cyrillic and most Asian font codes are allowed to be used in documents <emphasis>simultaneously</emphasis>. Other encodings may limit the set of allowed characters, e.g. <code>encoding="ISO-8859-1"</code> will only allow characters belonging to western European languages. However a system also needs to have the corresponding fonts (e.g. TrueType) being installed in order to render the document appropriately. A document containing Chinese characters is of no use if the underlying rendering system lacks e.g. a set of Chinese True Type fonts.</para> </callout> <callout arearefs="first_xml_code_topelement"> <para>An XML document has exactly one top level <emphasis>node</emphasis>. In contrast to the HTML standard these nodes are commonly called elements rather than tags. In this example the top level (root) element is <tag class="starttag">memo</tag>.</para> </callout> <callout arearefs="first_xml_code_from"> <para>Each XML element like <tag class="starttag">from</tag> has a corresponding counterpart <tag class="endtag">from</tag>. In terms of XML we say each element being opened has to be closed. In conjunction with the precedent point this is equivalent to the fact that each XML document represents a tree structure as being shown in the <link linkend="mathmltree">tree graph</link> representation.</para> </callout> </calloutlist> <para>As with the introductory formula example this representation itself is of limited usefulness: In an office environment we need a rendered version being given either as print or as some online format like E-Mail or HTML.</para> <para>From a software developer's point of view we may use a piece of software called a <emphasis>parser</emphasis> to test the document's standard conformance. At the MI department we may simply invoke <userinput><command>xmlparse</command> message.xml</userinput> to start a check:</para> <programlisting><errortext>goik>xmlparse wellformed.xml Parsing was successful</errortext></programlisting> <para>Various XML related plugins are supplied for the <productname xlink:href="http://eclipse.org">eclipse platform</productname> like the <productname xlink:href="http://oxygenxml.com">Oxygen software</productname> supplying <quote>life</quote> conformance checking while editing XML documents. Now we test our assumptions by violating some of the rules stated before. We deliberately omit the closing element <tag class="endtag">from</tag>:</para> <figure xml:id="omitFrom"> <title>An invalid XML document due to the omission of <tag class="endtag">from</tag>.</title> <programlisting><?xml version="1.0" encoding="UTF-8"?> <memo> <from>M. Goik <co xml:id="omitFromMissingElement"/> <to>B. King</to> <to>A. June</to> <subject>Best whishes</subject> <content>Hi all, congratulations to your splendid party</content> </memo></programlisting> <calloutlist> <callout arearefs="omitFromMissingElement"> <para>The opening element <tag class="starttag">from</tag> is not terminated by <tag class="endtag">from</tag>.</para> </callout> </calloutlist> </figure> <para>Consequently the parser's output reads:</para> <programlisting><errortext>goik>xmlparse omitfrom.xml file:///ma/goik/workspace/Vorlesungen/Input/Memo/omitfrom.xml:8:3: fatal error org.xml.sax.SAXParseException: The element type "from" must be terminated by the matching end-tag "</from>". parsing error</errortext></programlisting> <para>Experienced HTML authors may be confused: In fact HTML is not an XML standard. Instead HTML belongs to the set of SGML applications. SGML is a much older standard namely the <emphasis>Standard Generalized Markup Language</emphasis>.</para> <para>Even if every XML element has a closing counterpart the resulting XML may be invalid:</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <memo> <from>M. Goik<to>B. King</from></to> <to>A. June</to> <subject>Best whishes</subject> <content>Hi all, congratulations to your splendid party</content> </memo></programlisting> <para>The parser echoes:</para> <programlisting><computeroutput>file:///ma/goik/workspace/Vorlesungen/Input/Memo/nonest.xml:3:29: fatal error org.xml.sax.SAXParseException: The element type "to" must be terminated by the matching end-tag "</to>". parsing error</computeroutput></programlisting> <para>This type of error is caused by so called improper nesting of elements: The element <tag class="starttag">from</tag>is closed before the <quote>inner</quote> element <tag class="starttag">to</tag> has been closed. Actually this violates the expressibility of XML documents as a tree like structure. The situation may be resolved by choosing:</para> <programlisting>...<from>M. Goik<to>B. King</to></from>...</programlisting> <para>We provide two examples illustrating proper and improper nesting of XML documents:</para> <figure xml:id="fig_nestingProper"> <title>Proper nesting of XML elements</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/propernest.fig" scale="65"/> </imageobject> </mediaobject> </figure> <para>The following example violates proper nesting constraint and thus does not provide an XML document:</para> <figure xml:id="fig_improperNest"> <title>Improperly nested elements</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/impropernest.fig"/> </imageobject> </mediaobject> </figure> <!-- goik:later <para>An animation showing the usage of the Oxygen plug in for the examples given above can be found <uri xlink:href="src/viewlet/wellformed/wellformed_viewlet_swf.html">here</uri>.</para> --> <para>XML elements may have so called attributes like <tag class="attribute">date</tag> in the following example:</para> <figure xml:id="memoWellAttrib"> <title>An XML document with attributes.</title> <programlisting><?xml version="1.0" encoding="UTF-8"?> <memo date="10.02.2006" priority="high"> <from>M. Goik</from> <to>B. King</to> <to>A. June</to> <subject>Best whishes</subject> <content>Hi all, congratulations to your splendid party</content> </memo></programlisting> </figure> <para>The conformance of a XML document with the following rules may be verified by invoking a parser:</para> <itemizedlist> <listitem> <para>Within the <emphasis>scope</emphasis> of a given element an attribute name must be unique. In the example above one may not define a second attribute <varname>date="..."</varname> within the same element <memo ... >. This reflects the usual programming language semantics of attributes: In a <link linkend="gloss_Java"><trademark>Java</trademark></link> class an attribute is represented by an unique identifier and thus cannot appear twice.</para> </listitem> <listitem> <para>An attribute value must be enclosed either in single (') or double (") quotes. This is different from the HTML standard which allows attribute values without quotes provided the given attribute value does not give rise to ambiguities. For example <TD align=left> is allowed since the attribute value <tag class="attvalue">left</tag> does not contain any spaces thus allowing a parser to recognize the end of the value's definition.</para> </listitem> </itemizedlist> <qandaset role="exercise"> <title>A graphical representation of a memo.</title> <qandadiv> <qandaentry xml:id="example_memoAttribTree"> <question> <para>Draw a graphical representation similar as in <xref linkend="mathmltree"/> of the memo document being given in <xref linkend="memoWellAttrib"/>.</para> </question> <answer> <para>The <link linkend="memoWellAttrib">memo document's</link> structure may be visualized as:</para> <informalfigure xml:id="memotreeFigure"> <para>A graphical representation of <xref linkend="memoWellAttrib"/>:</para> <informalfigure xml:id="memotreeFigureFalse"> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/memotree.fig"/> </imageobject> </mediaobject> </informalfigure> <para>The sequence of <emphasis>element</emphasis> child nodes is important in XML and has to be preserved. Only the order of the two attributes <tag class="attribute">date</tag> and <tag class="attribute">priority</tag> is undefined: They actually belong to the <tag class="starttag">memo</tag> node serving as a dictionary with the attribute names being the keys and the attribute values being the values of the dictionary.</para> </informalfigure> </answer> </qandaentry> <qandaentry xml:id="example_attribInQuotes"> <question> <label>Attributes and quotes</label> <para>As stated before XML attributes have to be enclosed in single or double quotes. Construct an XML document with mixed quotes like <code><date day="monday'></code>. How does the parser react? Find the corresponding syntax definition of legal attribute values in the <link xlink:href="http://www.w3.org/TR/xml">XML standard W3C Recommendation</link>.</para> </question> <answer> <para>The parser flags a mixture of single and double quotes for a given attribute as an error. The XML standard <link xlink:href="http://www.w3.org/TR/xml#NT-AttValue">defines</link> the syntax of attribute values: An attribute value has to be enclosed <emphasis>either</emphasis> in two single <emphasis>or</emphasis> in two double quotes as being defined in <uri xlink:href="http://www.w3.org/TR/xml/#NT-AttValue">http://www.w3.org/TR/xml/#NT-AttValue</uri>.</para> </answer> </qandaentry> <qandaentry xml:id="quoteInAttributValue"> <question> <label>Quotes as part of an attributes value?</label> <para>Single and double quote are used to delimit an attribute value. May quotes appear themselves as part of an at tribute's value, e.g. like in a person's name <code>Gary "King" Mandelson</code>?</para> </question> <answer> <para>Attribute values may contain double quotes if the attributes value is enclosed in single quotes and vice versa. As a limitation the value of an an attribute may not contain single quotes and double quotes at the same time:</para> <informalfigure xml:id="exampleSingleDoubleQuotes"> <para>Quotes as part of attribute values.</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <test> <person name='Gary "King" Mandelson'/> <!-- o.k. --> <person name="Gary 'King' Mandelson"/> <!-- o.k. --> <person name="Gary 'King 'S.' "Mandelson"'/> <!-- oops! --> </test></programlisting> </informalfigure> </answer> </qandaentry> </qandadiv> </qandaset> <para>Some constraints being imposed on XML documents by the standard defined so far may be summarized as:</para> <itemizedlist> <listitem> <para>A XML documents requires to have exactly one top level element.</para> </listitem> <listitem> <para>Elements have to be properly nested. An element must not be closed if an <quote>inner</quote> Element is still open.</para> </listitem> <listitem> <para>Attribute names within a given Element must be unique.</para> </listitem> <listitem> <para>Attribute values <emphasis>must</emphasis> be quoted correctly.</para> </listitem> </itemizedlist> <para>The very last rule shows one of several differences to the HTML Standard: In HTML a lot of elements don't have to be closed. For example paragraphs (<tag class="starttag">p</tag>) or images (<tag class="starttag">img src='foo.gif'</tag>) don't have to be closed explicitly. This is due to the fact that HTML used to be defined in accordance with the older <emphasis><emphasis role="bold">S</emphasis>tandard <emphasis role="bold">G</emphasis>eneralized <emphasis role="bold">M</emphasis>arkup <emphasis role="bold">L</emphasis>anguage</emphasis> (SGML) Standard.</para> <para>These constraints are part of the definition of a <link xlink:href="http://www.w3.org/TR/xml#sec-well-formed">well formed document</link>. The specification imposes additional constraints for a document to be well-formed. Some of these constraints require an understanding of so called entities being described in <xref linkend="chapter_entities"/>.</para> </section> </chapter> <chapter xml:id="dtd"> <title>Beyond well- formedness</title> <section xml:id="motivationDdt"> <title>Motivation</title> <para>So far we are able to create XML documents containing hierarchically structured data. We may nest elements and thus create tree structures of arbitrary depth. The only restrictions being imposed by the XML standard are the constraints of well - formedness. For many purposes in software development this is not sufficient.</para> <para>A company named <productname>Softmail</productname> might implement an email system which uses <link linkend="memoWellAttrib">memo</link> document files as low level data representation serving as a persistence layer. Now a second company named <productname>Hardmail</productname> wants to integrate mails generated by <productname>Softmail</productname>'s system into its own business product. The <productname>Hardmail</productname> software developers might <emphasis>infer</emphasis> the logical structure of <productname>Softmail</productname>'s email representation but the following problems arise:</para> <itemizedlist> <listitem> <para>The logical structure will in practice become more complex: E-mails may contain attachments leading to multi part messages. Additional header information is required for standard Internet mail compliance. This adds additional complexity to the XML structure being mandatory for data representation. Relying only on well-formedness the specification of an internal E-mail format can only be achieved <emphasis>informally</emphasis>. Thus a rule like <quote>Each E-mail must have a subject</quote> may be written down in the specification. A software developer will code these rules but probably make mistakes as the set of rules grows.</para> <para>In contrast a RDBMS based solution offers to solve such problems in a declarative manner: A developer may use a <code>NOT NULL</code> constraint on a subject attribute of type <code>VARCHAR</code> thus inhibiting empty subjects.</para> </listitem> <listitem> <para>As <productname>Softmail</productname>'s product evolves its internal E-mail XML format is subject to change due to functional extensions and possibly bug fixes both giving rise to interoperability problems.</para> </listitem> </itemizedlist> <para>Generally speaking well formed XML documents lack grammar constraints as being available for programming languages. In case of RDBMS developers can impose primary-, foreign and <code>CHECK</code> constraints in a <emphasis>declarative</emphasis> manner rather than hard coding them into their applications (A solution bad programmers are in favour of though...). Various XML standards exist for declarative constraint definitions namely:</para> <itemizedlist> <listitem> <para>Document Type Definitions being discussed in <xref linkend="dtdBasic"/>.</para> </listitem> <listitem> <para><link xlink:href="http://www.w3.org/XML/Schema">XML Schema</link></para> </listitem> <listitem> <para><link xlink:href="http://www.relaxng.org">RelaxNG</link></para> </listitem> </itemizedlist> </section> <section xml:id="dtdBasic"> <title>Document type definitions (DTD)</title> <section xml:id="dtdFirstExample"> <title>Structural descriptions for documents</title> <para>As an example we choose documents of type <emphasis>memo</emphasis> as a starting point. Documents like the example from <xref linkend="memoWellAttrib"/> may be <emphasis>informally</emphasis> described to be a sequence of the following mandatory items:</para> <figure xml:id="figure_memo_informalconstraints"> <title>Informal constraints on <tag class="element">memo</tag> document instances</title> <itemizedlist> <listitem> <para><emphasis>Exactly one</emphasis> sender.</para> </listitem> <listitem> <para><emphasis>One or more</emphasis> recipients.</para> </listitem> <listitem> <para>Subject</para> </listitem> <listitem> <para>Content</para> </listitem> </itemizedlist> <para>In addition we have:</para> <itemizedlist> <listitem> <para>A date string <emphasis>must</emphasis> be supplied</para> </listitem> <listitem> <para>A priority <emphasis>may</emphasis> be supplied with allowed values to be chosen from the set of values <tag class="attvalue">low</tag>, <tag class="attvalue">medium</tag> or <tag class="attvalue">high</tag>.</para> </listitem> </itemizedlist> </figure> <para>All these fields contain ordinary text to be filled in by a user and shall appear exactly in the defined order. For simplicity we do not care about email address syntax rules being described in <link xlink:href="http://www.w3.org/Protocols/rfc822">RFC based address schemes</link>. We will see how the <emphasis>constraints</emphasis> mentioned above can be modelled in XML by an extension to the concept of well formed documents.</para> </section> <section xml:id="section_memo_machinereadable"> <title>A machine readable description</title> <para>We now introduce an example of a <link xlink:href="http://www.w3.org/TR/xml#dt-doctype">Document Type Definition (DTD)</link> being part of the XML 1.0 standard. Such a DTD allows the specification of additional constraints to both element nodes and their attributes. Our set of <link linkend="figure_memo_informalconstraints" revision="">informal constraints</link> on memo documents may now be expressed as:</para> <figure xml:id="figure_memo_dtd"> <title>A DTD to describe memo documents.</title> <programlisting><!ELEMENT memo (from, to+, subject, content)> <co xml:id="memodtd_memodef"/> <!ATTLIST memo <co xml:id="memodtd_memo_attribs"/> date CDATA #REQUIRED priority (low|medium|high) #IMPLIED> <!ELEMENT from (#PCDATA)> <co xml:id="memodtd_elem_from"/> <!ELEMENT to (#PCDATA)> <!ELEMENT subject (#PCDATA)> <!ELEMENT content (#PCDATA)></programlisting> <calloutlist> <callout arearefs="memodtd_memodef"> <para>A <tag class="element">memo</tag> consists of a sender, at least one recipient, a subject and content.</para> </callout> <callout arearefs="memodtd_memo_attribs"> <para>A <tag class="element">memo</tag> has got one required attribute <varname>date</varname> and an optional attribute <varname>priority</varname> being restricted to the three allowed values <tag class="attvalue">low</tag>, <tag class="attvalue">medium</tag> and <tag class="attvalue">high</tag>.</para> </callout> <callout arearefs="memodtd_elem_from"> <para>A <tag class="starttag">from</tag> element consists of ordinary text. This disallows XML markup. For example <code><from>Smith & partner</from></code> is disallowed since XML uses the ampersand (&) to denote the beginning of an entity like <tag class="genentity">auml</tag> for the German a-umlaut (ä). The correct form is <code><from>Smith &amp; partner</from></code> using the predefined entity <tag class="genentity">amp</tag> as an escape sequence for the ampersand.</para> <para>The term <code>#PCDATA</code> is an acronym for <emphasis>P</emphasis><foreignphrase>arsed</foreignphrase> <emphasis>C</emphasis><foreignphrase>haracter</foreignphrase> <emphasis>Data</emphasis>, an abbreviation for a restricted version of ordinary strings. Without digging into details a <code>#PCDATA</code> string must not contain any markup code like e.g. <tag class="starttag">msqrt</tag>. This ensures that a string does not interfere with the document's XML markup. Parsed Character Data also means that from the viewpoint of XML the element's content is <emphasis>atomic</emphasis> so it can't be divided into substructures by an XML parser.</para> </callout> </calloutlist> </figure> <para>We notice the non-XML syntax of a DTD. It looks similar to an XML document (<!ELEMENT ...>) but in fact it is not even well-formed due to e.g. the exclamation mark in front of the <code>ELEMENT</code> keyword. <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>'s use a different syntax which has been specified in order to describe an XML document's grammar.</para> <para>From the viewpoint of software modeling a DTD is a <emphasis>schema</emphasis>. In the context of XML technologies the term <emphasis>schema</emphasis> refers to <link xlink:href="http://www.w3.org/XML/Schema">XML Schema</link> being an alternative language to describe the structure of XML documents.</para> <para>Readers being familiar with <abbrev xlink:href="http://en.wikipedia.org/wiki/Backus-Naur_form">BNF</abbrev> or <abbrev xlink:href="http://en.wikipedia.org/wiki/Extended_Backus_Naur_form">EBNF</abbrev> will be able to understand the grammatical rules being expressed here.</para> <productionset> <title>A message of type <tag class="starttag">memo</tag></title> <production xml:id="memo.ebnf.memo"> <lhs>Memo Message</lhs> <rhs>'<memo>' <nonterminal def="#memo.ebnf.sender">Sender</nonterminal> [<nonterminal def="#memo.ebnf.recipient">Recipient</nonterminal>]+ <nonterminal def="#memo.ebnf.subject">Subject</nonterminal> <nonterminal def="#memo.ebnf.content">Content</nonterminal> '</memo>'</rhs> </production> <production xml:id="memo.ebnf.sender"> <lhs>Sender</lhs> <rhs>'<from>' <nonterminal def="#memo.ebnf.text"> Text </nonterminal> '</from>'</rhs> </production> <production xml:id="memo.ebnf.recipient"> <lhs>Recipient</lhs> <rhs>'<to>' <nonterminal def="#memo.ebnf.text"> Text </nonterminal> '</to>'</rhs> </production> <production xml:id="memo.ebnf.subject"> <lhs>Subject</lhs> <rhs>'<subject>' <nonterminal def="#memo.ebnf.text"> Text </nonterminal> '</subject>'</rhs> </production> <production xml:id="memo.ebnf.content"> <lhs>Content</lhs> <rhs>'<content>' <nonterminal def="#memo.ebnf.text"> Text </nonterminal> '</content>'</rhs> </production> <production xml:id="memo.ebnf.text"> <lhs>Text</lhs> <rhs>[a-zA-Z0-9]* <lineannotation>In real documents this is too restrictive!</lineannotation></rhs> </production> </productionset> <para>In comparison to our informal description of memo documents a DTD offers an added value: The grammar is machine readable and may thus be used by a parser to check whether an XML document obeys the constraints being imposed. So the parser must be instructed to use a DTD in addition to the XML document in question. For this purpose an XML document may define a reference to a DTD:</para> <figure xml:id="memo_external_dtd"> <title>A memo document instance holding a reference to a document external DTD.</title> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE memo<co xml:id="memo_external_dtd_top_element"/> SYSTEM<co xml:id="memo_external_dtd_system_decl"/> "memo.dtd"<co xml:id="memo_external_dtd_url"/> > <memo date="10.02.2006" priority="high"> <from>M. Goik</from> <to>B. King</to> <to>A. June</to> <subject>Best whishes</subject> <content>Hi all, congratulations to your splendid party</content> </memo></programlisting> <calloutlist> <callout arearefs="memo_external_dtd_top_element"> <para>The element <tag class="element">memo</tag> is chosen to be the top (root) element of the document's tree. It must be defined in the file <filename>memo.dtd</filename>. This is really a choice since a DTD defines a <emphasis>set</emphasis> of elements in <emphasis>arbitrary</emphasis> order. There is no such rule as <quote>define before use</quote>. So a DTD does not tell us which element has to appear on top of a document.</para> <para>Suppose a given DTD offers both <tag class="starttag">book</tag> and <tag class="starttag">report</tag> elements. An XML author writing a complex document will choose <tag class="starttag">book</tag> as top level element rather than <tag class="starttag">report</tag> being more appropriate for a small piece of documentation. Consequently it is an XML authors <emphasis>choice</emphasis> which of the elements being defined in a DTD shall appear as <emphasis>the</emphasis> top level element</para> </callout> <callout arearefs="memo_external_dtd_system_decl"> <para>The <code>SYSTEM</code> keyword states that the DTD rules reside outside the XML document as a separate entity. Though this situation is the most common the grammar rules may also be <link linkend="dtd_and_document">defined inside</link> the XML document itself. For professional use this is not particularly useful but during DTD development it may be an option.</para> </callout> <callout arearefs="memo_external_dtd_url"> <para>The address of the DTD rule set. In the given example it is just a filename but it may as well be an <link xlink:href="http://www.w3.org/Addressing">URL</link> of type <abbrev xlink:href="http://en.wikipedia.org/wiki/File_Transfer_Protocol">ftp</abbrev>, <abbrev xlink:href="http://www.w3.org/Protocols">http</abbrev> and so on, see <xref linkend="memoDtdOnFtp"/>.</para> </callout> </calloutlist> </figure> <para>In presence of a DTD parsing a document is actually a two step process: First the parser will check the document for well -formedness. Then the parser will read the referenced DTD (memo.dtd) and check the document for the additional constraints being defined there.</para> <para>In the current example both the DTD and the XML memo document reside as text files in a common file system folder. For general use a DTD is usually kept at a centralized location. The string following the <code>SYSTEM</code> keyword is actually a <emphasis>U</emphasis><foreignphrase>niform</foreignphrase> <emphasis>R</emphasis><foreignphrase>esource</foreignphrase> <emphasis>L</emphasis><foreignphrase>ocator</foreignphrase> <link xlink:href="http://www.w3.org/Addressing">(URL)</link>. Thus our <filename>memo.dtd</filename> may also be supplied as a <abbrev xlink:href="http://www.w3.org/Protocols">http</abbrev> or <abbrev xlink:href="http://en.wikipedia.org/wiki/File_Transfer_Protocol">ftp</abbrev> <link xlink:href="http://www.w3.org/Addressing">URL</link>:</para> <figure xml:id="memoDtdOnFtp"> <title>A DTD reference to a FTP server.</title> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE memo SYSTEM "ftp://www.hdm-stuttgart.de/memo.dtd"> <memo date="10.02.2006" priority="high"> <from>M. Goik</from> ... </memo></programlisting> </figure> <para>For development purposes we may combine a DTD and a conforming document into a single unit. This is achieved by in line replacing the <code>SYSTEM "memo.dtd"</code> clause by the DTD itself:</para> <figure xml:id="dtd_and_document"> <title>DTD and document within the same file</title> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE memo [<co xml:id="memo_inline_dtd_start"/> <!ELEMENT memo (from, to+, subject, content)> <!ATTLIST memo date CDATA #REQUIRED priority (low|medium|high) #IMPLIED> <!ELEMENT from (#PCDATA)> <!ELEMENT to (#PCDATA)> <!ELEMENT subject (#PCDATA)> <!ELEMENT content (#PCDATA)> ]<co xml:id="memo_inline_dtd_end"/>> <co xml:id="memo_inline_doc_start"/> <memo date="10.02.2006" priority="high"> <from>M. Goik</from> <to>B. King</to> <to>A. June</to> <subject>Best whishes</subject> <content>Hi all, congratulations to your splendid party</content> </memo></programlisting> <calloutlist> <callout arearefs="memo_inline_dtd_start"> <para>The DTD definitions start right after the left bracket <quote>[</quote> thus replacing the <code>SYSTEM "memo.dtd"</code> declaration.</para> </callout> <callout arearefs="memo_inline_dtd_end"> <para>The right bracket <quote>]</quote> terminates the DTD declarations. After finishing the <code><!DOCTYPE ... ></code> declaration the document's content starts.</para> </callout> <callout arearefs="memo_inline_doc_start"> <para>Start of document content.</para> </callout> </calloutlist> </figure> <para>Some terms are helpful in the context of <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>'s:</para> <variablelist> <varlistentry> <term>Validating / non-validating:</term> <listitem> <para>A non-validating parser only checks a document for well- formedness. If it also checks XML documents for conformance to DTD it is a <emphasis>validating</emphasis> parser. Caution: Even a non-validating parser needs to read a DTD (if being supplied) since it might have to expand <link linkend="section_generalentities">general entity</link> declarations being defined in it.</para> </listitem> </varlistentry> <varlistentry> <term>Valid / invalid documents:</term> <listitem> <para>An XML document referencing a DTD may either be valid or invalid depending on its conformance to the DTD in question.</para> </listitem> </varlistentry> <varlistentry> <term>Document instance:</term> <listitem> <para>An XML memo document may conform to the <link linkend="figure_memo_dtd">memo DTD</link>. In this case we call it a <emphasis>document instance</emphasis> of the memo DTD.</para> <para>This situation is quite similar as in typed programming languages: A <link linkend="gloss_Java"><trademark>Java</trademark></link> <code>class</code> declaration is a blueprint for the <link linkend="gloss_Java"><trademark>Java</trademark></link> runtime system to construct <link linkend="gloss_Java"><trademark>Java</trademark></link> objects in memory. This is done by e.g. a statement<code> String name = new String();</code>. The identifier <code>name</code> will hold a reference to an <emphasis>instance of class String</emphasis>. So in a <link linkend="gloss_Java"><trademark>Java</trademark></link> runtime environment a class declaration plays the same role as a DTD declaration in XML. See also <xref linkend="example_memoJavaClass"/>.</para> </listitem> </varlistentry> </variablelist> <para>For further discussions it is very useful to clearly distinguish element definitions in a DTD from their <emphasis>realizations</emphasis> in a corresponding document instance: Our memo DTD defines an element <tag class="starttag">from</tag> to be of content <code>#PCDATA</code>. According to the DTD in a document instance at least one <tag class="starttag">from</tag> clause must appear. If we were talking about HTML document instances we would prefer to talk about a <tag class="starttag">from</tag> <emphasis>tag</emphasis> rather than a <tag class="starttag">from</tag> <emphasis>element</emphasis>.</para> <para>In this document we will use the term <emphasis>element type</emphasis> to denote an <code><!ELEMENT ...</code> definition in a DTD. Thus we will talk about an element type <tag class="element">subject</tag> being defined in <filename>memo.dtd</filename>.</para> <para>An element type being defined in a <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev> may have document instances as realizations. For example the document instance shown in <xref linkend="memo_external_dtd"/> has two <emphasis>nodes</emphasis> of element type <tag class="element">to</tag>. Thus we say that the document instance contains two <emphasis>element nodes</emphasis> of type <tag class="element">to</tag>. We will frequently abbreviate this by saying the instance contains to <tag class="starttag">from</tag> element nodes. And we may even omit the term <emphasis>nodes</emphasis> and simply talk about two <tag class="starttag">from</tag> elements. But the careful reader should always distinguish between a single type <code>foo</code> being defined in a <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev> and the possibly empty set of <tag class="starttag">foo</tag> nodes appearing in valid document instances.</para> <para><abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>'s appear on top of well-formed XML documents:</para> <figure xml:id="wellformedandvalid"> <title>Well-formed and valid documents</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/wellformedandvalid.fig" scale="65"/> </imageobject> </mediaobject> </figure> <qandaset role="exercise"> <title>Validation of memo document instances.</title> <qandadiv> <qandaentry xml:id="example_memoTestValid"> <question> <para>Copy the two files <link xlink:href="Ref/src/Memo.1/message.xml">message.xml</link> and <link xlink:href="Ref/src/Memo.1/memo.dtd">memo.dtd</link> into your eclipse project. Use the Oxygen XML plug in to check if the document is valid. Then subsequently do and undo the following changes each time checking the document for validity:</para> <itemizedlist> <listitem> <para>Omit the <tag class="starttag">from</tag> element.</para> </listitem> <listitem> <para>Change the order of the two sub elements <tag class="starttag">subject</tag> and <tag class="starttag">content</tag>.</para> </listitem> <listitem> <para>Erase the <varname>date</varname> attribute and its value.</para> </listitem> <listitem> <para>Erase the <varname>priority</varname> attribute and its value.</para> </listitem> </itemizedlist> <para>What do you observe?</para> </question> <answer> <para>The <tag class="attribute">priority</tag> attribute is declared as <code>#IMPLIED</code> so it may be omitted. Erasing the <tag class="attribute">priority</tag> attribute thus leaves the document in a valid state. The remaining three edit actions yield an invalid document instance.</para> </answer> </qandaentry> <qandaentry xml:id="example_memoJavaClass"> <question> <label>A memo implementation sketch in Java</label> <para>The aim of this exercise is to clarify the (abstract) relation between XML <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>'s and sets of <link linkend="gloss_Java"><trademark>Java</trademark></link> classes rather then building a running application. We want to model the <link xlink:href="Ref/src/Memo.1/memo.dtd">memo DTD</link> as a set of <link linkend="gloss_Java"><trademark>Java</trademark></link> classes.</para> </question> <answer> <para>The XML attributes <tag class="attribute">date</tag> and <tag class="attribute">priority</tag> can be mapped as <link linkend="gloss_Java"><trademark>Java</trademark></link> attributes. The same applies for the Memo elements <tag class="element">from</tag>, <tag class="element">subject</tag> and <tag class="element">content</tag> which may be implemented as simple Strings or alternatively as separate Classes wrapping the String content. The latter method of implementation should be preferred if the Memo DTD is expected to grow in complexity. A simple sketch reads:</para> <programlisting language="java">import java.util.Date; import java.util.SortedSet; public class Memo { private Date date; Priority priority = Priority.standard; private String from, subject,content; private SortedSet<String> to; // Accessors not yet implemented }</programlisting> <para>The only thing to note here is the implementation of the <tag class="element">to</tag> element: We want to be able to address a <emphasis>set</emphasis> of recipients. Thus we have to disallow duplicates. Note that this is an <emphasis>informal</emphasis> constraint not being handled by our DTD: A Memo document instance <emphasis>may</emphasis> have duplicate content in <tag class="starttag">to</tag> nodes. This is a weakness of <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>s: We are unable to impose uniqueness constraints on the content of partial sets of document nodes.</para> <para>On the other hand our set of recipients has to be ordered: In a XML document instance the order of <tag class="starttag">to</tag> nodes is important and has to be preserved in a <link linkend="gloss_Java"><trademark>Java</trademark></link> representation. Thus we choose an <classname>java.util.SortedSet</classname> parametrized with String type to fulfill both requirements.</para> <para>Our DTD defines:</para> <programlisting><!ATTLIST memo ... priority (low|medium|high) #IMPLIED></programlisting> <para>Starting from <link linkend="gloss_Java"><trademark>Java</trademark></link> 1.5 we may implement this constraint by a type safe enumeration in a file <filename>Priority.java</filename>:</para> <programlisting language="java">public enum Priority{low, standard, high};</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> <para>In the following chapters we will extend the memo document type (<code><!DOCTYPE memo ... ></code>) to demonstrate various concepts of <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>'s and other XML related standards. In parallel a series of exercises deals with building a DTD usable to edit books. This DTD gets extended as our knowledge about XML advances. We start with an initial exercise:</para> <qandaset role="exercise"> <title>>A DTD for editing books</title> <qandadiv> <qandaentry xml:id="example_bookDtd"> <question> <para>Write a DTD describing book document instances with the following features:</para> <itemizedlist> <listitem> <para>A book shall have a title to describe the book itself.</para> </listitem> <listitem> <para>A book shall have at least one but possibly a sequence of chapters.</para> </listitem> <listitem> <para>Each chapter shall have a title and at least one paragraph.</para> </listitem> <listitem> <para>The titles and paragraphs shall consist of ordinary text.</para> </listitem> </itemizedlist> </question> <answer> <para>A possible DTD looks like:</para> <figure xml:id="figure_book.dtd_v1"> <title>A first DTD version for book documents</title> <programlisting><!ELEMENT book (title, chapter+)> <!ELEMENT chapter (title, para+)> <!ELEMENT title (#PCDATA)> <!ELEMENT para (#PCDATA)></programlisting> </figure> <para>We supply a valid document instance:</para> <informalfigure xml:id="bookInitialInstance"> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE book SYSTEM "book.dtd"> <book> <title>Introduction to Java</title> <chapter> <title>Introduction</title> <para>Java is a programming language</para> </chapter> <chapter> <title>The virtual machine</title> <para>We also call it the runtime system.</para> </chapter> <chapter> <title>Annotations</title> <para>Annotations provide a means to add meta information.</para> <para>This is especially useful for framework authors.</para> </chapter> </book></programlisting> </informalfigure> <para>.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="dtdVsSqlDdl"> <title>Relating <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>'s and <acronym xlink:href="http://en.wikipedia.org/wiki/Sql">SQL</acronym> - <abbrev xlink:href="http://en.wikipedia.org/wiki/Data_definition_language">DDL</abbrev></title> <para>XML <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>'s and <acronym xlink:href="http://en.wikipedia.org/wiki/Sql">SQL</acronym> - <abbrev xlink:href="http://en.wikipedia.org/wiki/Data_definition_language">DDL</abbrev> are related: They both describe data models and thus integrity constraints. We consider a simple invoice example:</para> <figure xml:id="invoiceIntegrity"> <title>Invoice integrity constraints</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/invoicedata.fig" scale="65"/> </imageobject> </mediaobject> </figure> <para>A relational implementation may look like:</para> <figure xml:id="invoiceSqlDdl"> <title>Relational implementation</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/invoicedataimplement.fig" scale="65"/> </imageobject> </mediaobject> </figure> <para>This data model can be expressed in XML as well:</para> <figure xml:id="invoiceXml"> <title/> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/invoicewellformed.fig" scale="65"/> </imageobject> </mediaobject> </figure> <qandaset role="exercise"> <title><abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>'s and <acronym xlink:href="http://en.wikipedia.org/wiki/Sql">SQL</acronym>-DDL</title> <qandadiv> <qandaentry> <question> <para><xref linkend="invoiceXml"/> is a complete implementation of the invoice data model including all integrity constraints of <xref linkend="invoiceSqlDdl"/>. Can this be achieved for arbitrary <acronym xlink:href="http://en.wikipedia.org/wiki/Sql">SQL</acronym> schema's.</para> </question> <answer> <para>XML <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>'s cannot express multiple foreign keys. Adding a second foreign key <coref linkend="invoiceSecondFK"/> in a referencing table <code>Order</code> already breaks <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev> expressibility:</para> <programlisting>CREATE TABLE Order ( orderNo BIGINT NOT NULL PRIMARY KEY, customer NUMERIC(5) NOT NULL <emphasis role="bold">REFERENCES Customer</emphasis> <co xml:id="invoiceSecondFK"/>...</programlisting> <remark>This actually is a deficiency of DTD's rather than XML. The XML schema standard does not only allow multiple foreign key definitions but polymorphic references as well.</remark> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="xmlAndJava"> <title>Relating <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>'s and <link linkend="gloss_Java"><trademark>Java</trademark></link> class descriptions.</title> <para>We may also compare XML data constraints to <link linkend="gloss_Java"><trademark>Java</trademark></link>. A <link linkend="gloss_Java"><trademark>Java</trademark></link> class declaration is actually a blueprint for a <trademark xlink:href="http://www.oracle.com/technetwork/java/javase">JRE</trademark> to instantiate compatible objects. Likewise an XML DTD restricts well-formed documents:</para> <figure xml:id="fig_XmlAndJava"> <title>XML <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>'s and <link linkend="gloss_Java"><trademark>Java</trademark></link> class declarations.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/xmlattribandjava.fig" scale="65"/> </imageobject> </mediaobject> </figure> </section> <section xml:id="section_dtdDetail"> <title><abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>'s in detail</title> <para>We have already seen that elements are building blocks of XML documents. Now we regard the formal rules that govern the way <code><!ELEMENT ...></code> declarations may appear in XML. This will lead to the notion of the term <emphasis>Content Model</emphasis>.</para> <para>Then we will shed some light on <code><!ATTRIBUTE ...></code> declarations. We will learn about possible attribute types and default values.</para> <para>Next we explore the <emphasis>physical</emphasis> structure of XML documents. We will see that <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>s and document instances may be physically subdivided into <emphasis>entities</emphasis> without touching their logical structure.</para> <para>Since we want to illustrate DTD grammars by <userinput xlink:href="http://en.wikipedia.org/wiki/Ebnf">EBNF</userinput> diagrams we first show some helpful non-terminals starting with the definition of white space. Apparently this is the same as in most programming languages:</para> <productionset> <title>White Space</title> <production xml:id="w3RecXml_NT-S"> <lhs>S</lhs> <rhs>(#x20 | #x9 | #xD | #xA)+ <lineannotation>space, tabulator, carriage return and line feed</lineannotation></rhs> </production> </productionset> <para>The production rule for <code>Name</code> defines legal identifier names for element names like <tag class="element">memo</tag>. We learn that such an identifier must not begin with a digit. So the rule presented here resembles the grammar constraint on legal identifiers in the <link linkend="gloss_Java"><trademark>Java</trademark></link> programming language. The type <code>NMTOKEN</code> will be needed later when defining element attributes.</para> <productionset> <title>Names and Tokens</title> <production xml:id="w3RecXml_NT-NameChar"> <lhs>NameChar</lhs> <rhs><nonterminal def="#w3RecXml_NT-Letter">Letter</nonterminal> | <nonterminal def="#w3RecXml_NT-Digit">Digit</nonterminal> | '.' | '-' | '_' | ':' | <nonterminal def="#w3RecXml_NT-CombiningChar" xlink:href="#w3RecXml_NT-CombiningChar">CombiningChar</nonterminal> | <nonterminal def="#w3RecXml_NT-Extender">Extender</nonterminal></rhs> </production> <production xml:id="w3RecXml_NT-Name"> <lhs>Name</lhs> <rhs>(<nonterminal def="#w3RecXml_NT-Letter">Letter</nonterminal> | '_' | ':') (<nonterminal def="#w3RecXml_NT-NameChar">NameChar</nonterminal>)*</rhs> </production> <production xml:id="w3RecXml_NT-Names"> <lhs>Names</lhs> <rhs><nonterminal def="#w3RecXml_NT-Name">Name</nonterminal> (#x20 <nonterminal def="#w3RecXml_NT-Name">Name</nonterminal>)*</rhs> </production> <production xml:id="w3RecXml_NT-Nmtoken"> <lhs>Nmtoken</lhs> <rhs>(<nonterminal def="#w3RecXml_NT-NameChar">NameChar</nonterminal>)+</rhs> </production> <production xml:id="w3RecXml_NT-Nmtokens"> <lhs>Nmtokens</lhs> <rhs><nonterminal def="#w3RecXml_NT-Nmtoken">Nmtoken</nonterminal> (#x20 <nonterminal def="#w3RecXml_NT-Nmtoken">Nmtoken</nonterminal>)*</rhs> </production> </productionset> <section xml:id="section_contentmodel"> <title>The content model</title> <para>We already saw examples of XML elements being composed of other elements in our <link linkend="figure_memo_dtd">memo.dtd</link>:</para> <programlisting><!ELEMENT memo (from, to+, subject, content)></programlisting> <para>We call the right side the <emphasis>content model</emphasis> of the <tag class="element">memo</tag> element. The XML 1.0 specification defines <link xlink:href="http://www.w3.org/TR/xml#dt-eldecl">four</link> different <link xlink:href="http://www.w3.org/TR/2006/REC-xml-20060816/#elemdecls">element type definitions</link>:</para> <productionset xml:id="productionset_element_decl"> <title>Element Type Declaration</title> <production xml:id="w3RecXml_NT-elementdecl"> <lhs>elementdecl</lhs> <rhs>'<!ELEMENT' <nonterminal def="#w3RecXml_NT-S">S</nonterminal> <nonterminal def="#w3RecXml_NT-Name">Name</nonterminal> <nonterminal def="#w3RecXml_NT-S">S</nonterminal> <nonterminal def="#w3RecXml_NT-contentspec">contentspec</nonterminal> <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? '>'</rhs> </production> <production xml:id="w3RecXml_NT-contentspec"> <lhs>contentspec</lhs> <rhs>'EMPTY' | 'ANY' | <nonterminal def="#w3RecXml_NT-Mixed">Mixed</nonterminal> | <nonterminal def="#w3RecXml_NT-children">children</nonterminal></rhs> </production> </productionset> <glosslist> <glossentry> <glossterm><link linkend="section_empty">EMPTY</link></glossterm> <glossdef> <para>The element doesn't have any content at all. This makes sense for elements with attributes being allowed as in <tag class="emptytag"> img src="foo.gif"</tag>.</para> </glossdef> </glossentry> <glossentry> <glossterm><link linkend="section_any">ANY</link></glossterm> <glossdef> <para>The element in question may contain a sequence of arbitrary elements and ordinary text (<code>#PCDATA</code>).</para> </glossdef> </glossentry> <glossentry> <glossterm><nonterminal def="#w3RecXml_NT-Mixed">Mixed</nonterminal></glossterm> <glossdef> <para>The element may contain an arbitrary sequence from a set of child elements possibly interspersed with ordinary text.</para> </glossdef> </glossentry> <glossentry> <glossterm><nonterminal def="#w3RecXml_NT-children">children</nonterminal></glossterm> <glossdef> <para>An element contains <emphasis>only</emphasis> other elements. A node of the element type in question may appear as child of itself giving rise to recursion:</para> <programlisting>... <chapter> <chapter> ...</chapter> </chapter></programlisting> </glossdef> </glossentry> </glosslist> <para>All elements being declared are subject to the following validity constraint:</para> <constraintdef> <para>An element type MUST NOT be declared more than once.</para> </constraintdef> <para>Programmers will not be surprised: The above constraint is common to most programming languages. In <link linkend="gloss_Java"><trademark>Java</trademark></link> for example a given local variable may not be redefined:</para> <programlisting language="java">int count = 3; double pi=3.1415926; int count = 2; // Fatal error: A variable must not be // redefined within the given scope</programlisting> <para>However there is no such rule like <quote>Define before use</quote>: Element <emphasis>and</emphasis> attribute definitions may refer to elements being defined <quote>later</quote>:</para> <programlisting><!ATTLIST memo<co xml:id="programlisting_elemattorder_memoatt"/> date CDATA #REQUIRED priority (low|medium|high) #IMPLIED> <!ELEMENT memo<co xml:id="programlisting_elemattorder_memodecl"/> (from, to+, subject, content)> <!ELEMENT from (#PCDATA)> <!ELEMENT to (#PCDATA)> <!ELEMENT subject (#PCDATA)> <!ELEMENT content (#PCDATA)></programlisting> <calloutlist> <callout arearefs="programlisting_elemattorder_memoatt"> <para>Two attributes <varname>date</varname> and <varname>priority</varname> are defined for the element <tag class="starttag">memo</tag> which itself gets defined immediately <emphasis>after</emphasis> this definition.</para> </callout> <callout arearefs="programlisting_elemattorder_memodecl"> <para>The <tag class="element">memo</tag> type definition refers to the element types <tag class="element">from</tag>, <tag class="element">to</tag>, <tag class="element">subject</tag> and <tag class="element">content</tag> all being defined afterwards.</para> </callout> </calloutlist> <section xml:id="section_empty"> <title>The <code>EMPTY</code> declaration</title> <para>Element nodes of content type <code>EMPTY</code> are familiar from e.g. HTML:</para> <programlisting>... <p>We saw the picture <img src="person.gif"> of the officer. ...</programlisting> <para>This code fragment shows an image embedded <emphasis>in line</emphasis> with the current text flow. This is possible in HTML being an SGML standard but it is <emphasis>not</emphasis> allowed in XML. Also the omission of <tag class="starttag">/p</tag> to close the paragraph is disallowed. In XML either of the two forms has to be chosen:</para> <itemizedlist> <listitem> <para><code><p>We saw the picture <img src="person.gif"></img> of the officer.</p></code></para> </listitem> <listitem> <para><code><p>We saw the picture <img src="person.gif"/> of the officer.</p></code></para> </listitem> </itemizedlist> <para>Using <tag class="starttag">img .../</tag> as a shorthand for an empty element is legal in XML but disallowed in SGML and thus HTML. This is one of the possible obstacles when migrating from SGML based HTML documents to an XML version of HTML like <link xlink:href="http://www.w3.org/MarkUp">Xhtml</link>. From <xref linkend="productionset_element_decl"/> we can infer the corresponding DTD declaration:</para> <programlisting><!ELEMENT img EMPTY></programlisting> </section> <section xml:id="section_any"> <title>The <code>ANY</code> declaration</title> <para>The <code>ANY</code> declaration allows every element of a given DTD to appear as a child of the element being defined including the element itself. It is not possible to exclude certain elements from an <code>ANY</code> rule:</para> <figure xml:id="figure_any_declaration"> <title>The <code>ANY</code> declaration</title> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE theater [ <!ELEMENT theater ANY <co xml:id="figure_any_declaration_any"/> > <!ELEMENT actor (#PCDATA) <co xml:id="figure_any_declaration_actor"/> > <!ELEMENT show (#PCDATA) <co xml:id="figure_any_declaration_show"/>> ]> <theater> <actor>Peter Sun</actor> some text <co xml:id="figure_any_declaration_doc_text"/> <show>Must go on</show> <theater>Self referencing!</theater> <co xml:id="figure_any_declaration_actor_self_reference"/> <!-- An error: --> <b>Ooops, no such element defined in DTD</b> <co xml:id="figure_any_declaration_actor_undefined"/> </theater></programlisting> <calloutlist> <callout arearefs="figure_any_declaration_any"> <para>A <tag class="element">theater</tag> element may consist of a sequence of arbitrary content. Every child element must be defined in the DTD.</para> </callout> <callout arearefs="figure_any_declaration_actor figure_any_declaration_show"> <para>Two elements <tag class="element">actor</tag> and <tag class="element">show</tag> consisting of mere textual content.</para> </callout> <callout arearefs="figure_any_declaration_doc_text"> <para>Ordinary text may also be part of the <tag class="starttag">theater</tag> element and may appear everywhere.</para> </callout> <callout arearefs="figure_any_declaration_actor_self_reference"> <para>A <tag class="starttag">theater</tag> element may appear as a child of itself. This gives rise to recursion of arbitrary depth.</para> </callout> <callout arearefs="figure_any_declaration_actor_undefined"> <para>There is no element <tag class="starttag">b</tag> defined in the DTD. Thus the current XML document is invalid.</para> </callout> </calloutlist> </figure> <para>Remark: The restriction to elements being defined in a DTD is common to other content model types as well. Actually every element being referenced by a definition in the DTD <emphasis>must</emphasis> itself be defined in order for the document to be valid.</para> </section> <section xml:id="section_mixed"> <title>Mixed content</title> <para>Mixed content is similar to the ANY declaration. But the set of elements allowed to appear is restricted. We show an example:</para> <figure xml:id="figure_memo_content_mixed"> <title>Extending the memo content type.</title> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE memo [ ... <!ELEMENT content (#PCDATA|emphasis|url)*> <!ELEMENT emphasis (#PCDATA)> <!ELEMENT url (#PCDATA)> <!ATTLIST url href CDATA #REQUIRED> ]> ... <content>The <url href="http://w3.org/XML">XML</url> language is <emphasis>easy</emphasis> to learn. However you need some <emphasis>time</emphasis>.</content> ...</programlisting> <caption> <para>This grammar allows to emphasize text passages and to define hypertext links.</para> </caption> </figure> <para>The formatting expectation is <quote>... The <link xlink:href="http://w3.org/XML">XML</link> language is <emphasis>easy</emphasis> to learn. However you need some <emphasis>time</emphasis>. ...</quote>. We may visualize this document instance as a tree:</para> <figure xml:id="extendContModelGraph"> <title>Graphical representation of the extended <code>content</code> model.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/contentmixed.fig"/> </imageobject> </mediaobject> </figure> <para>More formally the W3C specification defines mixed content models as:</para> <productionset xml:id="productionset_w3RecXml_NT-Mixed"> <title>Mixed-content Declaration</title> <production xml:id="w3RecXml_NT-Mixed"> <lhs>Mixed</lhs> <rhs>'(' <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? '#PCDATA' (<nonterminal def="#w3RecXml_NT-S">S</nonterminal>? '|' <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? <nonterminal def="#w3RecXml_NT-Name">Name</nonterminal>)* <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? ')*' | '(' <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? '#PCDATA' <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? ')'</rhs> </production> </productionset> <para>We notice that out simple <code><!ELEMENT from (#PCDATA)></code> is also described by this definition. It is just a special case of a single text node and no element nodes being present.</para> <qandaset role="exercise"> <title>Variations of mixed content models</title> <qandadiv> <qandaentry xml:id="example_allowed_mixed"> <question> <para>You may assume that the element types <tag class="element">emphasize</tag> and <tag class="element">URL</tag> are correctly defined. Are the following definitions allowed?</para> <itemizedlist> <listitem> <para><code><! ELEMENT mix (#PCDATA)*></code></para> </listitem> <listitem> <para><code><! ELEMENT mix (emphasize|#PCDATA)*></code></para> </listitem> <listitem> <para><code><! ELEMENT mix (#PCDATA|URL)></code></para> </listitem> <listitem> <para><code><! ELEMENT mix (emphasize|#PCDATA)+></code></para> </listitem> </itemizedlist> </question> <answer> <programlisting><! ELEMENT mix (#PCDATA)*></programlisting> <para>Valid due to syntax diagram.</para> <programlisting><! ELEMENT mix (emphasize|#PCDATA)*></programlisting> <para>Not valid. According to the production rule in <xref linkend="productionset_w3RecXml_NT-Mixed"/> the term <code>#PCDATA</code> <emphasis>must</emphasis> be the first token.</para> <programlisting><code><! ELEMENT mix (#PCDATA|URL)></code>, <code><! ELEMENT mix (emphasize|#PCDATA)+></code></programlisting> <para>Both variants are disallowed: The indicator of multiplicity <quote>*</quote> is mandatory and the only legal token to appear.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="section_element_content"> <title>Element content</title> <para>We refer to our first version of our <link linkend="figure_memo_dtd">memo.dtd</link>. The <tag class="element">memo</tag> type declaration reads:</para> <programlisting><!ELEMENT memo (from, to+, subject, content)></programlisting> <para>Basically this states that for valid document instances a <tag class="starttag">memo</tag> node consists of a sequence of other nodes. In this context we denote <tag class="starttag">memo</tag> as <emphasis>parent</emphasis> node. <tag class="element">from</tag>, <tag class="element">to</tag>, <tag class="element">subject</tag> and <tag class="element">content</tag> are called <emphasis>child</emphasis> nodes or <emphasis>children</emphasis> for short.</para> <para>A sequence of elements is a special case of a more general definition of element content in the XML specification. We already used the <quote>+</quote> operator to allow a node to appear multiple times. Actually there are three such operators being defined:</para> <glosslist> <glossentry> <glossterm>?</glossterm> <glossdef> <para>A node may appear once or never.</para> </glossdef> </glossentry> <glossentry> <glossterm>+</glossterm> <glossdef> <para>A node must appear <emphasis>at least</emphasis> once.</para> </glossdef> </glossentry> <glossentry> <glossterm>*</glossterm> <glossdef> <para>A node may appear an arbitrary number of times, possibly not at all.</para> </glossdef> </glossentry> </glosslist> <para>So far we only talked about sequences of element nodes. We may also define mutually exclusive alternatives:</para> <figure xml:id="operatorContentAlt"> <title>The operator <quote>|</quote> defining exclusive alternatives.</title> <programlisting>... <!ELEMENT address (email|telephone|town)<co xml:id="programlisting_alternative_address"/> > <!ELEMENT email (#PCDATA)> <!ELEMENT telephone (#PCDATA)> <!ELEMENT town (#PCDATA)> ... <address><co xml:id="programlisting_alternative_emailchild"/> <email>goik@hdm-stuttgart.de</email> </address> ... <address><co xml:id="programlisting_alternative_telephonechild"/> <telephone>+49 (0)711-8923-2164</telephone> </address> ...</programlisting> <calloutlist> <callout arearefs="programlisting_alternative_address"> <para>An <tag class="element">address</tag> node has <emphasis>either</emphasis> an <tag class="starttag">email</tag> child <emphasis>or</emphasis> a <tag class="starttag">telephone</tag> or a <tag class="starttag">town</tag> child.</para> </callout> <callout arearefs="programlisting_alternative_emailchild"> <para>An <tag class="starttag">address</tag> node having an <tag class="starttag">email</tag> child.</para> </callout> <callout arearefs="programlisting_alternative_telephonechild"> <para>An <tag class="starttag">address</tag> node having an <tag class="starttag">telephone</tag> child.</para> </callout> </calloutlist> </figure> <para>Now we have collected the basic means allowing to structure XML documents. We have the three indicators <quote>?</quote>, <quote>+</quote> and <quote>*</quote> which govern the multiplicity of nodes. On the other hand the two operators <quote>,</quote> and <quote>|</quote> allow us to define sequences or mutually exclusive alternatives of element nodes. The XML standard defines the notion of <emphasis>content particles</emphasis> (<command>cp</command>) which allows these two types of structuring elements to be grouped and nested:</para> <productionset> <title>Element-content Models</title> <production xml:id="w3RecXml_NT-children"> <lhs>children</lhs> <rhs>(<nonterminal def="#w3RecXml_NT-choice">choice</nonterminal> | <nonterminal def="#w3RecXml_NT-seq">seq</nonterminal>) ('?' | '*' | '+')?</rhs> </production> <production xml:id="w3RecXml_NT-cp"> <lhs>cp</lhs> <rhs>(<nonterminal def="#w3RecXml_NT-Name">Name</nonterminal> | <nonterminal def="#w3RecXml_NT-choice">choice</nonterminal> | <nonterminal def="#w3RecXml_NT-seq">seq</nonterminal>) ('?' | '*' | '+')?</rhs> </production> <production xml:id="w3RecXml_NT-choice"> <lhs>choice</lhs> <rhs>'(' <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? <nonterminal def="#w3RecXml_NT-cp">cp</nonterminal> ( <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? '|' <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? <nonterminal def="#w3RecXml_NT-cp">cp</nonterminal> )+ <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? ')'</rhs> </production> <production xml:id="w3RecXml_NT-seq"> <lhs>seq</lhs> <rhs>'(' <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? <nonterminal def="#w3RecXml_NT-cp">cp</nonterminal> ( <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? ',' <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? <nonterminal def="#w3RecXml_NT-cp">cp</nonterminal> )* <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? ')'</rhs> </production> </productionset> <para>We give two examples:</para> <figure xml:id="pureElementContent"> <title>Examples of pure element content models</title> <glosslist> <glossentry> <glossterm><code><!ELEMENT address (email|(name,street,town,telephone?))</code></glossterm> <glossdef> <para>An <tag class="element">address</tag> is given either by an email or by a postal address plus an optional telephone number.</para> </glossdef> </glossentry> <glossentry> <glossterm><code><!ELEMENT figurelist (title, ((table|image|animation), caption?)+)></code></glossterm> <glossdef> <para>We will call table, image and animations <emphasis>block</emphasis> elements. The <tag class="starttag">figurelist</tag> element defines a list of figures. The whole list starts with an overall title. Then we have at least one occurrence of a block element and an optional caption.</para> </glossdef> </glossentry> </glosslist> </figure> <qandaset role="exercise"> <title>Content models and operator priority></title> <qandadiv> <qandaentry xml:id="example_operatorprecedence"> <question> <para>Find and explain the error being buried in the following DTD. After correcting the error construct a valid document instance.</para> <programlisting><!ELEMENT addresslist (address*) > <!ELEMENT address (email | town,street) > <!ELEMENT email (#PCDATA)> <!ELEMENT town (#PCDATA)> <!ELEMENT street (#PCDATA)></programlisting> </question> <answer> <para>The following document uses the DTD:</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE addresslist SYSTEM "address.dtd"> <addresslist> <address> <email>bingo@cheat.com</email> </address> <address> <town>Paris</town> <street>Avenue Kléber</street> </address> </addresslist></programlisting> <para>This yields the following parsing error:</para> <programlisting><errortext>A ')' is required in the declaration of element type "address".</errortext></programlisting> <para>Like many other error messages this one is not really enlightening the reader. We examine the content model of the element <tag class="element">address</tag>:</para> <programlisting>email | town,street</programlisting> <para>We have tree elements joined by two operators namely alternative and sequence. In contrast to e.g. Boolean Algebras the XML standard does not define any operator priority with respect to <quote>|</quote> and <quote>,</quote>. Instead a DTD author must use braces to explicitly define the desired priority:</para> <programlisting><!ELEMENT address (email | (town,street)) ></programlisting> <para>We note that the operators <quote>*</quote>, <quote>+</quote> and <quote>?</quote> have precedence over <quote>|</quote> and <quote>,</quote>. Thus we may write <code>town,street+</code> instead of the clumsy term <code>town,(street)+</code>.</para> </answer> </qandaentry> <qandaentry xml:id="example_book_v2"> <question> <label>Book documents with mixed content and itemized lists</label> <para>Extend the first version of <link linkend="example_bookDtd">book.dtd</link> to support the following features:</para> <itemizedlist> <listitem> <para>Within a <tag class="starttag">chapter</tag> node <tag class="starttag">para</tag> and <tag class="starttag">itemizedlist</tag> elements in arbitrary order shall be allowed.</para> </listitem> <listitem> <para><tag class="starttag">itemizedlist</tag> nodes shall contain at least one <tag class="starttag">listitem</tag>.</para> </listitem> <listitem> <para><tag class="starttag">listitem</tag> nodes shall be composed of one or more para or nested list item elements.</para> </listitem> <listitem> <para>Within a <tag class="starttag">para</tag> we want to be able to emphasize text passages.</para> </listitem> </itemizedlist> <para>The following sample document instance shall be valid:</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE book SYSTEM "book.dtd"> <book> <title>Introduction to Java</title> <chapter> <title>Introduction</title> <para>Java supports <emphasis>lots</emphasis> of concepts:</para> <itemizedlist> <listitem> <para>Single <emphasis>implementation</emphasis> inheritance.</para> </listitem> <listitem> <para>Multiple <emphasis>interface</emphasis> inheritance.</para> <itemizedlist> <listitem><para>Built in types</para></listitem> <listitem><para>User defined types</para></listitem> </itemizedlist> </listitem> </itemizedlist> </chapter> </book></programlisting> </question> <answer> <para>An extended DTD looks like:</para> <figure xml:id="paraListEmphasize"> <title>Version 2 of book.dtd</title> <programlisting><!ELEMENT book (title, chapter+)> <!ELEMENT chapter (title, (para|itemizedlist)+ <co xml:id="figure_book.dtd_v2_chapter"/>)> <!ELEMENT title (#PCDATA)> <!ELEMENT para (#PCDATA|emphasis)*<co xml:id="figure_book.dtd_v2_para"/>> <!ELEMENT emphasis (#PCDATA)> <!ELEMENT itemizedlist (listitem+)<co xml:id="figure_book.dtd_v2_itemizedlist"/>> <!ELEMENT listitem ((para|itemizedlist)<co xml:id="figure_book.dtd_v2_listitem"/>+)></programlisting> <caption> <para>This allows emphasized text in <tag class="starttag">para</tag> nodes and <tag class="starttag">itemizedlists</tag>.</para> </caption> </figure> <calloutlist> <callout arearefs="figure_book.dtd_v2_chapter"> <para>We hook into <tag class="starttag">chapter</tag> to allow arbitrary sequences of at least one <tag class="starttag">para</tag> or <tag class="starttag">itemizedlist</tag> element node.</para> </callout> <callout arearefs="figure_book.dtd_v2_para"> <para><tag class="starttag">para</tag> nodes now allow mixed content.</para> </callout> <callout arearefs="figure_book.dtd_v2_itemizedlist"> <para>An itemized list contains at least one list item.</para> </callout> <callout arearefs="figure_book.dtd_v2_listitem"> <para>A list item contains a sequence of at least one <tag class="starttag">para</tag> or <tag class="starttag">itemizedlist</tag> node. The latter gives rise to nested lists. We find a similar construct in HTML namely unnumbered lists defined by <code><UL><LI>... </code>constructs.</para> </callout> </calloutlist> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="comments_processing"> <title>Comments and processing instructions</title> <para>A XML comment uses the syntax <code><!-- This is a comment! I love comments! --></code>. Without going into details here comments may appear in many locations both within <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>'s and document instances:</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE addresslist [ <!-- An addresslist may contain an arbitrary number of address nodes --> <!ELEMENT addresslist (address)*> <!ELEMENT address (#PCDATA)> ]> <addresslist> <!-- the document author --> <address>goik@hdm-stuttgart.de</address> <address>bingo@problemcompany.com</address> </addresslist></programlisting> <para>Newbies to XML are sometimes confused about so called <emphasis>processing instructions</emphasis> (PI). Similar to XML comments it is possible to embed processing instructions into XML documents. As an example we show an excerpt from the <link xlink:href="http://www.w3.org/TR/2006/REC-xml-20060816/REC-xml-20060816.xml">source file</link> of the XML specification:</para> <programlisting><?xml version='1.0' encoding='UTF-8'?> <!DOCTYPE spec SYSTEM "xmlspec.dtd" [ <!ENTITY base.uri "http://www.w3.org/TR/2006/"> ... ]> <?xml-stylesheet type="text/xsl" href="REC-xml.xsl" <co xml:id="programmlisting_xmlspecsrc_xsltref"/> ?> <co xml:id="programmlisting_xmlspecsrc_pi"/> <spec w3c-doctype="rec" xml:lang="en"> ... <title>Extensible Markup Language (XML)</title> ... </spec></programlisting> <calloutlist> <callout arearefs="programmlisting_xmlspecsrc_xsltref"> <para>A reference to a document external style sheet file. The file <filename>REC-xml.xsl</filename> resides in the same folder as the XML document itself. Thus a relative <link xlink:href="http://www.w3.org/Addressing">URL</link> is sufficient.</para> </callout> <callout arearefs="programmlisting_xmlspecsrc_pi"> <para>A processing instruction allowing a web browser to render the XML file appropriately.</para> </callout> </calloutlist> <para>We first note that from a parser's <quote>point of view</quote> both XML comments and processing instructions are ignored. But software applications working with XML documents may inspect both types and interpret their content.</para> <para>The purpose of the processing instruction in the above document is to enable web browsers to render its content in a meaningful way. In contrast to HTML an arbitrary XML document does not provide any semantics being necessary to create meaningful renderings to end users. A <tag class="element">memo</tag> document may be interesting from a programmer's point of view but an end user will probably prefer either a HTML or a PDF document being <emphasis>generated</emphasis> from it. As we shall see in <xref linkend="xsl"/> the file <filename>REC-xml.xsl</filename> contains style sheet information adhering to the XSLT standard. Thus a browser being capable to process XSLT may visualize the XML document directly.</para> </section> <section xml:id="section_cdatasection"> <title><acronym>CDATA</acronym> sections</title> <para>Editing XML documents with text editors it is tedious since we have to avoid XML markup in <code>#PCDATA</code> or attribute content. A computer scientist writing a documentation on C++ code might want to express <emphasis>bit shift</emphasis> and <emphasis>address of</emphasis> operators:</para> <programlisting><para>If a < b we set c = & <co xml:id="programlisting_wrongmarkup_amp"/> (a >> <co xml:id="programlisting_wrongmarkup_gt"/> b); </para></programlisting> <calloutlist> <callout arearefs="programlisting_wrongmarkup_amp"> <para>First error: The operator <quote>&</quote> is reserved for <link linkend="chapter_entities">general entity references</link> like e.g. <code>&lt;</code>.</para> </callout> <callout arearefs="programlisting_wrongmarkup_gt"> <para>Second error: The character <quote>></quote> is reserved to denote an element node's termination.</para> </callout> </calloutlist> <para>XML offers 5 predefined replacement entities for this purpose:</para> <table xml:id="xmlStandardEntities"> <title>Replacement entities for XML markup characters</title> <?dbhtml table-width="15%" ?> <?dbfo table-width="15%" ?> <tgroup cols="2"> <colspec colwidth="1*"/> <colspec colwidth="2*"/> <tbody> <row> <entry><</entry> <entry><tag class="genentity">lt</tag></entry> </row> <row> <entry>></entry> <entry><tag class="genentity">gt</tag></entry> </row> <row> <entry>&</entry> <entry><tag class="genentity">amp</tag></entry> </row> <row> <entry>"</entry> <entry><tag class="genentity">quot</tag></entry> </row> <row> <entry>'</entry> <entry><tag class="genentity">apos</tag></entry> </row> </tbody> </tgroup> </table> <para>So without an appropriate editor our poor computer scientist will have to write:</para> <programlisting><para>If a &lt; b we set c = &amp; (a &gt;&gt; b); </para></programlisting> <para>Looks promising, right? Actually the better alternative is to use an XML capable editor which allows an author to type <code>If a < b we set c = & (a >> b);</code>. The editor software will present this text to the author and <emphasis>internally</emphasis> save the correct XML code as presented before.</para> <para>If someone is forced to use a pure text editor <acronym>CDATA</acronym> sections the second best alternative. A <acronym>CDATA</acronym> Section encloses a text string which will not be interpreted by an XML parser. It starts with the reserved sequence <code><![CDATA[</code> and terminates with <quote>]]></quote>. The example given before reads:</para> <programlisting><para>If <![CDATA[a < b we set c = & (a >> b);]]> </para></programlisting> <para>The precise definition is:</para> <productionset> <title><acronym>CDATA</acronym> Sections</title> <production xml:id="w3RecXml_NT-CDSect"> <lhs>CDSect</lhs> <rhs><nonterminal def="#w3RecXml_NT-CDStart">CDStart</nonterminal> <nonterminal def="#w3RecXml_NT-CData">CData</nonterminal> <nonterminal def="#w3RecXml_NT-CDEnd">CDEnd</nonterminal></rhs> </production> <production xml:id="w3RecXml_NT-CDStart"> <lhs>CDStart</lhs> <rhs>'<![CDATA['</rhs> </production> <production xml:id="w3RecXml_NT-CData"> <lhs>CData</lhs> <rhs>(<nonterminal def="#w3RecXml_NT-Char">Char</nonterminal>* - (<nonterminal def="#w3RecXml_NT-Char">Char</nonterminal>* ']]>' <nonterminal def="#w3RecXml_NT-Char">Char</nonterminal>*))</rhs> </production> <production xml:id="w3RecXml_NT-CDEnd"> <lhs>CDEnd</lhs> <rhs>']]>'</rhs> </production> </productionset> <para>Thus inside a <acronym>CDATA</acronym> section only the exact sequence <quote>]]></quote> is disallowed.</para> </section> </section> <section xml:id="section_attributetypes"> <title>Attribute types</title> <para>When discussing the content model type <link linkend="section_empty">EMPTY</link> we already mentioned the possibility of element nodes having attributes like <tag class="emptytag">img src="..."</tag>. We discuss two features of element attributes namely its <emphasis>type</emphasis> and the way default values are specified.:</para> <figure xml:id="attribImg"> <title>Attributes of HTML <tag class="emptytag">img</tag> elements.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/attribInElement.fig" scale="65"/> </imageobject> </mediaobject> </figure> <para>We already observed that content model definitions allow us to define <emphasis>composition</emphasis> rules. Thus a <tag class="starttag">chapter</tag> may consist of a <tag class="starttag">title</tag> node followed by <tag class="starttag">para</tag> and other nodes. This defines hierarchical , tree like structures. But the <emphasis>actual</emphasis> string content is defined as <code>#PCDATA</code>. We are unable to specify a node's content to consist purely of numbers for example. In contrast XML DTD attribute definitions offer a limited set of predefined types to choose from.</para> <section xml:id="section_cdata"> <title><code>CDATA</code></title> <para>An element type may be defined to have attributes of type <code>CDATA</code>:</para> <programlisting><!ATTLIST img <co xml:id="programlisting_img_element"/> src<co xml:id="programlisting_img_att_src"/> CDATA<co xml:id="programlisting_img_att_src_type"/> #REQUIRED<co xml:id="programlisting_img_att_src_default"/> ></programlisting> <calloutlist> <callout arearefs="programlisting_img_element"> <para>Start of the definition of a <emphasis>set</emphasis> of attributes for the element type <tag class="element">img</tag>.</para> </callout> <callout arearefs="programlisting_img_att_src"> <para>Start of the first at tribute's definition named <tag class="attribute">src</tag>.</para> </callout> <callout arearefs="programlisting_img_att_src_type"> <para>The attribute <tag class="attribute">src</tag>'s type is <code>CDATA</code>.</para> </callout> <callout arearefs="programlisting_img_att_src_default"> <para>The attribute <tag class="attribute">src</tag> is mandatory, see <xref linkend="section_attribute_default"/> .</para> </callout> </calloutlist> <para>We have to be careful here. The term <code>CDATA</code> resembles <code>#PCDATA</code> already being introduced for content models. Actually these two terms are completely distinct since <code>CDATA</code> refers to attribute values. Consider the following code snippet:</para> <programlisting><para>We may use "quotes" here</para></programlisting> <para>This is completely legal since all characters being used refer to the production rule of <code>#PCDATA</code>. But using the same as an attribute value instead causes trouble:</para> <programlisting><img src="bold.gif" alt="We may use "quotes" here" /></programlisting> <para>This is indeed not even well formed XML. The two inner quotes embedding the substring <code>quotes</code> interfere with the two outer quotes delimiting the attribute <tag class="attribute">src</tag>'s value. As we shall see in <xref linkend="example_quotes"/> there is a solution to this problem but the current example shows that the production rules of <code>#PCDATA</code> and <code>CDATA</code> differ.</para> <qandaset role="exercise"> <title>book.dtd and languages></title> <qandadiv> <qandaentry xml:id="example_book.dtd_v3"> <question> <para>We want to extend our DTD from <xref linkend="example_book_v2"/> by allowing an author to define the language used within the document. Add an attribute declaration to the top level element <tag class="element">book</tag>.</para> </question> <answer> <para>We simply have to add a single line to our DTD:</para> <programlisting><!ELEMENT book (title, chapter+)> <emphasis role="bold"><!ATTLIST book lang CDATA #IMPLIED ></emphasis> ...</programlisting> <para>This allows us to globally set a language for a document:</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE book SYSTEM "book.dtd"> <book lang="english"> <title>Introduction to Java</title> ...</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> <para>The XML specification defines attribute definitions belonging to element types as:</para> <productionset> <title>Attribute-list Declaration</title> <production xml:id="w3RecXml_NT-AttlistDecl"> <lhs>AttlistDecl</lhs> <rhs>'<!ATTLIST' <nonterminal def="#w3RecXml_NT-S">S</nonterminal> <nonterminal def="#w3RecXml_NT-Name">Name</nonterminal> <nonterminal def="#w3RecXml_NT-AttDef">AttDef</nonterminal>* <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? '>'</rhs> </production> <production xml:id="w3RecXml_NT-AttDef"> <lhs>AttDef</lhs> <rhs><nonterminal def="#w3RecXml_NT-S">S</nonterminal> <nonterminal def="#w3RecXml_NT-Name">Name</nonterminal> <nonterminal def="#w3RecXml_NT-S">S</nonterminal> <nonterminal def="#w3RecXml_NT-AttType">AttType</nonterminal> <nonterminal def="#w3RecXml_NT-S">S</nonterminal> <nonterminal def="#w3RecXml_NT-DefaultDecl">DefaultDecl</nonterminal></rhs> </production> </productionset> <para>The first rule tells us that multiple attributes may be defined for a given element. This is quite <quote>normal</quote> since the same applies for example when attributes are defined within <link linkend="gloss_Java"><trademark>Java</trademark></link> or C++ classes. Actually in <link xlink:href="http://www.w3.org/MarkUp">XHTML</link> the <tag class="emptytag">img</tag> element's attribute list is defined as:</para> <programlisting><!ATTLIST img src CDATA #REQUIRED alt CDATA #REQUIRED longdesc CDATA #IMPLIED height CDATA #IMPLIED width CDATA #IMPLIED ... ></programlisting> <para>The second production rule tells us that attribute names like <tag class="attribute">src</tag> must be of <link linkend="w3RecXml_NT-Name">Name</link> production. For example <code>4element</code> would be an illegal name since attribute name strings may contain numbers but not at the beginning. This is quite common in most programming languages and refers to the term of a legal identifier.</para> <para>The second rule also tells us that <code>CDATA</code> is only one among other possible attribute types:</para> <productionset> <title>Attribute Types</title> <production xml:id="w3RecXml_NT-AttType"> <lhs>AttType</lhs> <rhs><nonterminal def="#w3RecXml_NT-StringType">StringType</nonterminal> | <nonterminal def="#w3RecXml_NT-TokenizedType">TokenizedType</nonterminal> | <nonterminal def="#w3RecXml_NT-EnumeratedType">EnumeratedType</nonterminal></rhs> </production> <production xml:id="w3RecXml_NT-StringType"> <lhs>StringType</lhs> <rhs>'CDATA'</rhs> </production> <production xml:id="w3RecXml_NT-TokenizedType"> <lhs>TokenizedType</lhs> <rhs>'ID'| 'IDREF'| 'IDREFS'| 'ENTITY'| 'ENTITIES'| 'NMTOKEN'| 'NMTOKENS'</rhs> </production> </productionset> <para>The discussion of <code>ENTITY</code> types will be deferred till <xref linkend="chapter_entities"/>. Before discussing the remaining types we mention a topic common to all attribute types:</para> <qandaset role="exercise"> <title>Enclosing quotes</title> <qandadiv> <qandaentry xml:id="example_quotes"> <question> <para>We recall the problem of nested quotes yielding non-well formed XML code:</para> <programlisting><img src="bold.gif" alt="We may use "quotes" here" /></programlisting> <para>The XML specification defines legal attribute value definitions as:</para> <productionset> <title>Literals</title> <production xml:id="w3RecXml_NT-EntityValue"> <lhs>EntityValue</lhs> <rhs>'"' ([^%&"] | <nonterminal def="#w3RecXml_NT-PEReference">PEReference</nonterminal> | <nonterminal def="#w3RecXml_NT-Reference">Reference</nonterminal>)* '"' | "'" ([^%&'] | <nonterminal def="#w3RecXml_NT-PEReference">PEReference</nonterminal> | <nonterminal def="#w3RecXml_NT-Reference">Reference</nonterminal>)* "'"</rhs> </production> <production xml:id="w3RecXml_NT-AttValue"> <lhs>AttValue</lhs> <rhs>'"' ([^<&"] | <nonterminal def="#w3RecXml_NT-Reference">Reference</nonterminal>)* '"' | "'" ([^<&'] | <nonterminal def="#w3RecXml_NT-Reference">Reference</nonterminal>)* "'"</rhs> </production> <production xml:id="w3RecXml_NT-SystemLiteral"> <lhs>SystemLiteral</lhs> <rhs>('"' [^"]* '"') | ("'" [^']* "'")</rhs> </production> <production xml:id="w3RecXml_NT-PubidLiteral"> <lhs>PubidLiteral</lhs> <rhs>'"' <nonterminal def="#w3RecXml_NT-PubidChar">PubidChar</nonterminal>* '"' | "'" (<nonterminal def="#w3RecXml_NT-PubidChar">PubidChar</nonterminal> - "'")* "'"</rhs> </production> <production xml:id="w3RecXml_NT-PubidChar"> <lhs>PubidChar</lhs> <rhs>#x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%]</rhs> </production> </productionset> <para>Find out how it is possible to set the attribute <tag class="attribute">alt</tag>'s value to the string <code>We may use "quotes" here</code>.</para> </question> <answer> <para>The production rule for attribute values reads:</para> <productionset> <productionrecap linkend="w3RecXml_NT-AttValue"/> </productionset> <para>This allows us to use either of two alternatives to delimit attribute values:</para> <glosslist> <glossentry> <glossterm><tag class="starttag">img ... alt="..."/</tag></glossterm> <glossdef> <para><emphasis>Validity constraint:</emphasis> do not use <code>"</code> inside the value string.</para> </glossdef> </glossentry> <glossentry> <glossterm><tag class="starttag">img ... alt='...'/</tag></glossterm> <glossdef> <para><emphasis>Validity constraint:</emphasis> do not use <code>'</code> inside the value string.</para> </glossdef> </glossentry> </glosslist> <para>We may take advantage of the second rule:</para> <programlisting><img src="bold.gif" alt='We may use "quotes" here' /></programlisting> <para>Notice that according to <xref linkend="w3RecXml_NT-AttValue"/> the delimiting quotes must not be mixed. The following code is thus not well formed:</para> <programlisting><img src="bold.gif'/></programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="section_nmtoken"> <title><code>NMTOKEN</code> /<code>NMTOKENS</code></title> <para>Name tokens are essentially strings composed of a restricted character set. A name token must for example not contain any white space. We already mentioned its production rule:</para> <productionset> <productionrecap linkend="w3RecXml_NT-Nmtoken"/> </productionset> <para>This may be used to restrict attribute values. We consider a configuration file containing a list of user accounts:</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE userlist [ <!ELEMENT userlist (account*)> <!ELEMENT account EMPTY> <!ATTLIST account username NMTOKEN #REQUIRED password CDATA #IMPLIED > ]> <userlist> <account username="Joe"/> <account username="Mr. Bean"/> <!-- Whoops, an illegal space!--> </userlist></programlisting> <para>We extend the above example by allowing each user to belong to a <emphasis>set</emphasis> of groups. We achieve this by adding an attribute <tag class="attribute">groups</tag> of type <code>NMTOKENS</code>:</para> <programlisting>... <!ATTLIST account username NMTOKEN #REQUIRED groups NMTOKENS #IMPLIED password CDATA #IMPLIED > ]> <userlist> <account username="Joe" groups="admin staff team"/> </userlist></programlisting> <para>This defines a user <code>Joe</code> belonging to the three groups <code>admin</code>, <code>staff</code> and <code>team</code>. Informally we see a list of tokens separated by spaces. This is indeed the formal W3C specification:</para> <productionset> <productionrecap linkend="w3RecXml_NT-Nmtokens"/> </productionset> <para>According to this rule only single spaces (#20) are legal. Actual parser implementations seem to accept more general whitespace here. Thus a sequence of spaces, tabs, carriage returns and newlines is also accepted as a separator value.</para> </section> <section xml:id="section_name_token_group"> <title>Enumeration values</title> <para>The XML standard allows us to define enumerations by restricting an attribute value to a predefined set of name tokens:</para> <productionset> <title>Enumerated Attribute Types</title> <production xml:id="w3RecXml_NT-EnumeratedType"> <lhs>EnumeratedType</lhs> <rhs><nonterminal def="#w3RecXml_NT-NotationType">NotationType</nonterminal> | <nonterminal def="#w3RecXml_NT-Enumeration">Enumeration</nonterminal></rhs> </production> <production xml:id="w3RecXml_NT-NotationType"> <lhs>NotationType</lhs> <rhs>'NOTATION' <nonterminal def="#w3RecXml_NT-S">S</nonterminal> '(' <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? <nonterminal def="#w3RecXml_NT-Name">Name</nonterminal> (<nonterminal def="#w3RecXml_NT-S">S</nonterminal>? '|' <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? <nonterminal def="#w3RecXml_NT-Name">Name</nonterminal>)* <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? ')'</rhs> </production> <production xml:id="w3RecXml_NT-Enumeration"> <lhs>Enumeration</lhs> <rhs>'(' <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? <nonterminal def="#w3RecXml_NT-Nmtoken">Nmtoken</nonterminal> (<nonterminal def="#w3RecXml_NT-S">S</nonterminal>? '|' <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? <nonterminal def="#w3RecXml_NT-Nmtoken">Nmtoken</nonterminal>)* <nonterminal def="#w3RecXml_NT-S">S</nonterminal>? ')'</rhs> </production> </productionset> <para>We start with an example of a <emphasis>Name Token Group</emphasis> aka enumeration:</para> <figure xml:id="figure_nametokengroup"> <title>A name token group</title> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE top [ <!ELEMENT top (chemical*)> <!ELEMENT chemical (#PCDATA)> <!ATTLIST chemical state (solid|liquid|gas) <co xml:id="figure_nametokengroup_att_state"/> #REQUIRED <co xml:id="figure_nametokengroup_att_state_required"/>> ]> <top> <chemical state="gas" <co xml:id="figure_nametokengroup_oxygen_state"/>>Oxygen</chemical> <chemical state="liquid" <co xml:id="figure_nametokengroup_water_state"/>>Water</chemical> <chemical state="superfluous" <co xml:id="figure_nametokengroup_helium_state"/>>Helium</chemical> <!-- Ooops! --> </top></programlisting> <calloutlist> <callout arearefs="figure_nametokengroup_att_state"> <para>The attribute <tag class="attribute">state</tag>'s value may have values from the set {solid, liquid, gas}.</para> </callout> <callout arearefs="figure_nametokengroup_att_state_required"> <para><tag class="attribute">state</tag> is mandatory.</para> </callout> <callout arearefs="figure_nametokengroup_oxygen_state"> <para>A legal value.</para> </callout> <callout arearefs="figure_nametokengroup_water_state"> <para>Another legal value.</para> </callout> <callout arearefs="figure_nametokengroup_helium_state"> <para>The token value <tag class="attvalue">superfluous</tag> does not belong to the set of allowed values. The parser flags this error as:</para> <para><code>Attribute "state" with value "superfluous" must have a value from the list "solid liquid gas ".</code></para> </callout> </calloutlist> </figure> <para>The rule defining an <link linkend="w3RecXml_NT-Enumeration">Enumeration</link> has to be supplemented by a validity constraint: The set of legal token values must not contain duplicates. This would violate the attributes property allowing values to be chosen from a <emphasis>set</emphasis>.</para> <qandaset role="exercise"> <title>Restriction of allowed languages</title> <qandadiv> <qandaentry xml:id="example_book.dtd_v4"> <question> <para xml:lang="">We extend our book.dtd version from <xref linkend="example_book.dtd_v3"/>. The attribute <tag class="attribute">lang</tag> is simple free text. We want to restrict this to allow only values from the set {en,fr,de,it,es}.</para> </question> <answer> <para>We restrict our attribute definition from type <code>CDATA</code> to a name token group:</para> <programlisting><!ATTLIST book lang (en|fr|de|it|es) #IMPLIED ></programlisting> </answer> </qandaentry> </qandadiv> </qandaset> <para>The notation type branch production rule's usage is quite similar:</para> <figure xml:id="attributeNotation"> <title>A notation attribute</title> <programlisting><!DOCTYPE doc [ <!NOTATION <emphasis role="bold">cpp</emphasis> SYSTEM "The ANSI C++ programming language"> <!NOTATION <emphasis role="bold">perl</emphasis> SYSTEM "The PERL script programming language"> <!NOTATION <emphasis role="bold">sql</emphasis> SYSTEM "SQL 92 database query language"> <!ELEMENT doc (code)*> <!ELEMENT code (#PCDATA)> <!ATTLIST code language NOTATION (<emphasis role="bold">cpp</emphasis>|<emphasis role="bold">perl</emphasis>|<emphasis role="bold">sql</emphasis>) #REQUIRED > ]> <doc> <code language="<emphasis role="bold">cpp</emphasis>">delete[] namelist;</code> <code language="<emphasis role="bold">sql</emphasis>">SELECT * FROM User;</code> </doc></programlisting> </figure> <para>The only difference in comparison to a Name Token Group is the keyword <code>NOTATION</code>. There are however additional validity constraints imposed by the XML specification.</para> <para>In the given example the content of <tag class="starttag">para</tag> nodes was declared as <code>#PCDATA</code>. Actually all types of element content except <code>EMPTY</code> may appear.</para> <itemizedlist> <listitem> <para>Values of type <code>NOTATION</code> <emphasis>must</emphasis> match one of the notation names included in the declaration. In the given example this would be either <tag class="attvalue">cpp</tag>, <tag class="attvalue">perl</tag> or <tag class="attvalue">sql</tag>. All notation names in the declaration <emphasis>must</emphasis> be declared.</para> </listitem> <listitem> <para>An element type <emphasis>must not</emphasis> have more than one <code>NOTATION</code> attribute specified. Actually a <code>NOTATION</code> attribute value gives us a <quote>promise</quote> about the expected content of the element node in which it appears. So if the content of a <tag class="starttag">para</tag> node is SQL code it cannot in addition be declared to be of language category type <emphasis>declarative</emphasis>.</para> </listitem> <listitem> <para>For compatibility to SGML an attribute of type <code>NOTATION</code> <emphasis>must not</emphasis> be declared on an element declared <link linkend="section_empty">EMPTY</link>.</para> </listitem> </itemizedlist> </section> <section xml:id="section_id_idref"> <title><code>ID</code> and <code>IDREF / IDREFS</code></title> <para>The pair of attribute types <code>ID</code> and <code>IDREF</code> defines internal references within a given XML document instance. Before considering XML we recall the way document internal references are implemented in HTML. A reference originates from a <emphasis>source</emphasis> and leads to a <emphasis>target</emphasis>, in HTML the latter is frequently called an <emphasis>anchor</emphasis>:</para> <figure xml:id="figure_reference_html"> <title>An internal reference within a HTML document</title> <programlisting><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head><title>Reference example</title></head> <body> <h1>Reference example</h1> <p><a name="foo" <co xml:id="figure_reference_html_anchor"/>></a>This is the target.</p> <p>There may be lots of text in between ...</p> <p>There may be lots of text in between ...</p> <h1>This is a different section</h1> <p>Click <a href="#foo" <co xml:id="figure_reference_html_link1"/>>here</a> to see the target.</p> <h1>This is a third section</h1> <p>Again <a href="#foo" <co xml:id="figure_reference_html_link2"/>>clicking</a> yields the same target.</p> </body> </html></programlisting> </figure> <calloutlist> <callout arearefs="figure_reference_html_anchor"> <para>Each <tag class="starttag">a name="foo"</tag> tag with the given value must appear only once. Thus it is an error if a second tag <tag class="starttag">a name="foo"</tag> appears within the same HTML file since the value <tag class="attvalue">foo</tag> would not be unique.</para> </callout> <callout arearefs="figure_reference_html_link1"> <para>The <quote>#</quote> is a shorthand for a document local reference. A full HTML reference looks like <code>http://someserver.org/docs/intro.html#foo</code> defining a reference to the position indicated by <tag class="starttag"><a name="foo"></tag> within the document with path <code>/docs/intro.html</code> on the server <code>someserver.org</code> accessed by the <link xlink:href="http://www.w3.org/Protocols">HTTP</link> protocol . Thus <quote><code>#foo</code></quote> points to the local target defined by <tag class="starttag">a name="foo"</tag> in the document itself.</para> </callout> <callout arearefs="figure_reference_html_link2"> <para>A second link to the same destination.</para> </callout> </calloutlist> <para>In a database context we would call <tag class="starttag"><a name="foo"></tag> a <emphasis>primary key value</emphasis>. The element node <tag class="starttag">a href="#foo"</tag> would be considered a <emphasis>foreign key</emphasis> reference which may appear multiple times pointing to the same target.</para> <para>In HTML a node may at the same time be itself a reference target and define a reference to another target:</para> <programlisting><a name="thisTarget" href="linkToOtherTarget">click on me!</a></programlisting> <para>The XML standard adopts a different way to implement document internal references. We give an example:</para> <figure xml:id="figure_intern_reference_xml"> <title>Internal references in XML</title> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE catalog [ <!ELEMENT catalog (product*) <co xml:id="figure_intern_reference_xml_catalog"/> > <!ELEMENT product (title, para*) <co xml:id="figure_intern_reference_xml_product"/>> <!ELEMENT title (#PCDATA)> <!ELEMENT para (#PCDATA|link)* <co xml:id="figure_intern_reference_xml_para"/> > <!ELEMENT link (#PCDATA)> <!ATTLIST product id ID <co xml:id="figure_intern_reference_xml_att_product_id"/> #IMPLIED> <!ATTLIST link ref IDREF <co xml:id="figure_intern_reference_xml_att_link_ref"/> #REQUIRED> ]> <catalog> <product id="homeTrainer" <co xml:id="figure_intern_reference_xml_define_id_hometrainer"/> > <title>Home trainer</title> <para>Like to torture yourself in front of your TV?</para> </product> <product <co xml:id="figure_intern_reference_xml_product_no_id"/>> <title>Mountain bike</title> <para>If you hate rain look <link ref="homeTrainer" <co xml:id="figure_intern_reference_xml_define_ref1_hometrainer"/> >here</link>.</para> </product> </catalog></programlisting> </figure> <calloutlist> <callout arearefs="figure_intern_reference_xml_catalog"> <para>Start of the DTD. A catalog consists of products.</para> </callout> <callout arearefs="figure_intern_reference_xml_product"> <para>A product has a title and optional paragraphs to describe it in detail.</para> </callout> <callout arearefs="figure_intern_reference_xml_para"> <para>A paragraph allows mixed content of text and references to other parts of the document.</para> </callout> <callout arearefs="figure_intern_reference_xml_att_product_id"> <para>A <tag class="starttag">product</tag> node may have an attribute <tag class="attribute">id</tag> with an unique value within the document instance.</para> </callout> <callout arearefs="figure_intern_reference_xml_att_link_ref"> <para>A <tag class="starttag">link</tag> <emphasis>must</emphasis> have an attribute <tag class="attribute">ref</tag> with a value referring to an element with a corresponding attribute value of type <code>ID</code>.</para> </callout> <callout arearefs="figure_intern_reference_xml_define_id_hometrainer"> <para>A product with unique <code>id</code> value <code>homeTrainer</code>.</para> </callout> <callout arearefs="figure_intern_reference_xml_product_no_id"> <para>A product without <code>id</code> value. Thus it may not be referenced.</para> </callout> <callout arearefs="figure_intern_reference_xml_define_ref1_hometrainer"> <para>A reference to <emphasis>the</emphasis> element node with a defined attribute of type <code>ID</code> and value <code>homeTrainer</code>.</para> </callout> </calloutlist> <para>From this example we will now present the syntax and validity constraints supplied by the XML specification:</para> <glosslist> <glossentry> <glossterm><code>ID</code></glossterm> <glossdef> <para><itemizedlist> <listitem> <para>Values of type <code>ID</code> <emphasis>must</emphasis> match the <link linkend="w3RecXml_NT-Name">Name</link> production. A name <emphasis>must not</emphasis> appear more than once in an XML document as a value of this type; i.e., <code>ID</code> values <emphasis>must</emphasis> uniquely identify the elements which bear them. In a database context this would be considered a <emphasis>primary key constraint</emphasis>.</para> </listitem> <listitem> <para>An element type <emphasis>must not</emphasis> have more than one <code>ID</code> attribute specified.</para> </listitem> <listitem> <para>An <code>ID</code> attribute <emphasis>must</emphasis> have a declared default of <code>#IMPLIED</code> or <code>#REQUIRED</code>.</para> </listitem> </itemizedlist></para> </glossdef> </glossentry> <glossentry> <glossterm><code>IDREF</code></glossterm> <glossdef> <para>Values of type <code>IDREF</code> MUST match the <link linkend="w3RecXml_NT-Name">Name</link> production. Each Name <emphasis>must</emphasis> match the value of an <code>ID</code> attribute on some element in the XML document; i.e. <code>IDREF</code> values <emphasis>must</emphasis> match the value of some <code>ID</code> attribute. In a database context this would be considered a <emphasis>foreign key constraint</emphasis>.</para> </glossdef> </glossentry> <glossentry> <glossterm><code>IDREFS</code></glossterm> <glossdef> <para>Values of type <code>IDREFS</code> are sets of <code>IDREF</code> values separated by spaces:</para> <programlisting><!DOCTYPE gamelist [ <!ELEMENT gamelist (game+, gameCategory+)> <!ELEMENT game (#PCDATA)> <!ATTLIST game id ID #REQUIRED> <!ELEMENT gameCategory (#PCDATA)> <!ATTLIST gameCategory games IDREFS #REQUIRED> ]> <gamelist> <game id='chess'>Chess</game> <game id='poker'>Poker</game> <game id='bj'>Black Jack</game> <gameCategory games="poker bj">Card games</gameCategory> </gamelist></programlisting> <para>The restriction to the term <emphasis role="bold">set</emphasis> disallowing duplicates is important. The following snippet containing two identical references would be flagged as an error:</para> <programlisting>... <gameCategory games="poker bj poker">Card games</gameCategory> ...</programlisting> </glossdef> </glossentry> </glosslist> <qandaset role="exercise"> <title>Legal attribute values</title> <qandadiv> <qandaentry xml:id="example_legal_attribute_values"> <question> <para>Complete the following matrix. Enter a <quote>+</quote> if the attribute value satisfies the constraint being imposed by the attribute type and a <quote>-</quote> otherwise.</para> <informaltable xml:id="table_legal_attribute_matrix"> <?dbhtml table-width="40%" ?> <?dbfo table-width="40%" ?> <tgroup cols="4"> <colspec colwidth="3*"/> <colspec colwidth="2*"/> <colspec colwidth="2*"/> <colspec colwidth="2*"/> <tbody> <row> <entry/> <entry><code>CDATA</code></entry> <entry><code>NMTOKEN</code></entry> <entry><code>ID</code></entry> </row> <row> <entry><code>_foo</code></entry> <entry/> <entry/> <entry/> </row> <row> <entry><code>too small</code></entry> <entry/> <entry/> <entry/> </row> <row> <entry><code>2three4</code></entry> <entry/> <entry/> <entry/> </row> <row> <entry><code>-man</code></entry> <entry/> <entry/> <entry/> </row> <row> <entry><code>two3four</code></entry> <entry/> <entry/> <entry/> </row> <row> <entry><code>Uhh-oops</code></entry> <entry/> <entry/> <entry/> </row> <row> <entry><code>a+b</code></entry> <entry/> <entry/> <entry/> </row> <row> <entry><code>&</code></entry> <entry/> <entry/> <entry/> </row> </tbody> </tgroup> </informaltable> </question> <answer> <para>We may use the following code to ask a parser:</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE doc [ <!ELEMENT doc (testentry)*> <!ELEMENT testentry EMPTY> <!ATTLIST testentry cd CDATA #REQUIRED nm NMTOKEN #REQUIRED id ID #REQUIRED > ]> <doc> <testentry cd="_foo" nm="_foo" id="_foo"/> <testentry cd="too small" nm="too small" id="too small"/> <testentry cd="2three4" nm="2three4" id="2three4"/> <testentry cd="-man" nm="-man" id="-man"/> <testentry cd="two3four" nm="two3four" id="two3four"/> <testentry cd="Uhh-oops" nm="Uhh-oops" id="Uhh-oops"/> <testentry cd="a+b" nm="a+b" id="a+b"/> </doc></programlisting> <para>This yields:</para> <table xml:id="exerciseAtttypeLegalValue"> <title>Legal attribute values</title> <?dbhtml table-width="40%" ?> <?dbfo table-width="40%" ?> <tgroup cols="4"> <colspec colwidth="3*"/> <colspec colwidth="2*"/> <colspec colwidth="2*"/> <colspec colwidth="2*"/> <tbody> <row> <entry/> <entry><code>CDATA</code></entry> <entry><code>NMTOKEN</code></entry> <entry><code>ID</code></entry> </row> <row> <entry><code>_foo</code></entry> <entry>+</entry> <entry>+</entry> <entry>+</entry> </row> <row> <entry><code>too small</code></entry> <entry>+</entry> <entry>-</entry> <entry>-</entry> </row> <row> <entry><code>2three4</code></entry> <entry>+</entry> <entry>+</entry> <entry>-</entry> </row> <row> <entry><code>-man</code></entry> <entry>+</entry> <entry>+</entry> <entry>-</entry> </row> <row> <entry><code>two3four</code></entry> <entry>+</entry> <entry>+</entry> <entry>+</entry> </row> <row> <entry><code>Uhh-oops</code></entry> <entry>+</entry> <entry>+</entry> <entry>+</entry> </row> <row> <entry><code>a+b</code></entry> <entry>+</entry> <entry>-</entry> <entry>-</entry> </row> <row> <entry><code>&</code></entry> <entry>-</entry> <entry>-</entry> <entry>-</entry> </row> </tbody> </tgroup> </table> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset role="exercise"> <title>book.dtd and internal references</title> <qandadiv> <qandaentry xml:id="example_book.dtd_v5"> <question> <para>We want to extent our DTD from <xref linkend="example_book.dtd_v4"/> to allow document internal references by:</para> <itemizedlist> <listitem> <para>Allowing each <tag class="starttag">chapter</tag>, <tag class="starttag">para</tag> and <tag class="starttag">itemizedlist</tag> to become reference targets.</para> </listitem> <listitem> <para>Extending the element <tag class="element">para</tag>'s mixed content model by a new element <tag class="element">link</tag> with an attribute <tag class="attribute">linkend</tag> being a reference to a target.</para> </listitem> </itemizedlist> </question> <answer> <para>We extend our DTD:</para> <programlisting><!ELEMENT book (title, chapter+)> <!ATTLIST book lang (en|fr|de|it|es) #IMPLIED > <!ELEMENT chapter (title, (para|itemizedlist)+)> <!ATTLIST chapter id <co xml:id="progamlisting_book_v5_chapter_id"/> ID #IMPLIED > <!ELEMENT title (#PCDATA)> <!ELEMENT para (#PCDATA|emphasis|link <co xml:id="progamlisting_book_v5_mixed_link"/>)*> <!ATTLIST para id <co xml:id="progamlisting_book_v5_para_id"/> ID #IMPLIED > <!ELEMENT emphasis (#PCDATA)> <!ELEMENT link (#PCDATA) <co xml:id="progamlisting_book_v5_link"/>> <!ATTLIST link linkend <co xml:id="progamlisting_book_v5_link_linkend"/> IDREF #REQUIRED > <!ELEMENT itemizedlist (listitem+)> <!ATTLIST itemizedlist id <co xml:id="progamlisting_book_v5_itemizedList_id"/> ID #IMPLIED > <!ELEMENT listitem ((para|itemizedlist)+)></programlisting> <calloutlist> <callout arch="" arearefs="progamlisting_book_v5_chapter_id progamlisting_book_v5_para_id progamlisting_book_v5_itemizedList_id"> <para>Defining an attribute <tag class="attribute">id</tag> of type <code>ID</code> for the elements <tag class="element">chapter</tag>, <tag class="element">para</tag> and <tag class="element">itemizedList</tag>. This enables an author to define internal reference targets.</para> </callout> <callout arearefs="progamlisting_book_v5_mixed_link"> <para>A link is part of the element <tag class="element">para</tag>'s mixed content model. Thus an author may define internal references along with ordinary text.</para> </callout> <callout arearefs="progamlisting_book_v5_link"> <para>Like in HTML a link may contain text. If converted to HTML the formatting expectation is a hypertext link.</para> </callout> <callout arearefs="progamlisting_book_v5_link_linkend"> <para>The attribute <tag class="attribute">linkend</tag> holds the reference to an internal target being either a <tag class="element">chapter</tag>, a <tag class="element">para</tag> or an <tag class="element">itemizedList</tag>.</para> </callout> </calloutlist> </answer> </qandaentry> </qandadiv> </qandaset> </section> </section> <section xml:id="section_attribute_default"> <title>Attribute default values</title> <para>We have implicitly introduced attribute default values already. The formal production rule reads:</para> <productionset> <title>Attribute Defaults</title> <production xml:id="w3RecXml_NT-DefaultDecl"> <lhs>DefaultDecl</lhs> <rhs>'#REQUIRED' | '#IMPLIED' | (('#FIXED' <nonterminal def="#w3RecXml_NT-S">S</nonterminal>)? <nonterminal def="#w3RecXml_NT-AttValue">AttValue</nonterminal>)</rhs> </production> </productionset> <para>We have already introduced <code>#REQUIRED</code> and <code>#IMPLIED</code> describing attribute values that <emphasis>must</emphasis> be specified and attribute values that <emphasis>may</emphasis> be specified. The Attribute type declaration <code>#FIXED</code> is typically used during DTD development and rarely for production systems. In a nutshell it enables a DTD author to define an attribute with a fixed value that cannot be overwritten by an author in a document instance.</para> <figure xml:id="attTypeFixed"> <title>The attribute type <code>#FIXED</code></title> <programlisting xml:id="figure_fixed"><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE configuration [ <!ELEMENT configuration (property*)> <!ELEMENT property EMPTY> <!ATTLIST property version CDATA #FIXED "3.4" <co xml:id="programmlisting_fixed_attfixed"/> key NMTOKEN #REQUIRED value CDATA #IMPLIED > ]> <configuration> <property key="user" value="admin"/> <co xml:id="programmlisting_fixed_unset"/> <property key="password" value="verySecret" version="3.4" <co xml:id="programmlisting_fixed_correctlyset"/> /> <!-- Ooops! --> <property key="ldapHost" value="141.62.1.5" version="3.7" <co xml:id="programmlisting_fixed_illdefined"/>/> </configuration></programlisting> </figure> <calloutlist> <callout arearefs="programmlisting_fixed_attfixed"> <para>For each <tag class="element">property</tag> node the attribute <tag class="attribute">version</tag> with value <tag class="attvalue">3.4</tag> is automatically defined.</para> </callout> <callout arearefs="programmlisting_fixed_unset"> <para>The attribute <tag class="attribute">version</tag> is not explicitly set. Any software acting on the document will see the value <tag class="attvalue">3.4</tag> though.</para> </callout> <callout arearefs="programmlisting_fixed_correctlyset"> <para>The attribute <tag class="attribute">version</tag> is explicitly set to the value <tag class="attvalue">3.4</tag> being defined in the DTD.</para> </callout> <callout arearefs="programmlisting_fixed_illdefined"> <para>The attribute <tag class="attribute">version</tag> is explicitly set to the value <tag class="attvalue">3.7</tag> differing from the value <tag class="attvalue">3.4</tag> being defined in the DTD. A validating parser will complain:</para> <programlisting><errortext>[Xerces] Attribute "version" with value "3.7" must have a value of "3.4".</errortext></programlisting> </callout> </calloutlist> <para>Next we discuss attributes with default value definitions:</para> <figure xml:id="attDefDefault"> <title xml:id="figure_attribute_default">Attribute definitions with default values</title> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE doc [ <!ELEMENT doc (para*)> <!ELEMENT para (#PCDATA)> <!ATTLIST para language CDATA "english" <co xml:id="programlisting_attribute_default_language"/>> ]> <doc> <para language="french" <co xml:id="programlisting_attribute_default_french"/>>Une maison</para> <para <co xml:id="programlisting_attribute_default_implicit"/>>A house</para> <para language="english" <co xml:id="programlisting_attribute_default_defaultoverride"/>>Another house</para> </doc></programlisting> </figure> <calloutlist> <callout arearefs="programlisting_attribute_default_language"> <para>Declaration of an attribute <tag class="attribute">language</tag> with default value <tag class="attvalue">english</tag>.</para> </callout> <callout arearefs="programlisting_attribute_default_french"> <para>The attribute value may be overridden as long as the content conforms to the <code>CDATA</code> attribute type.</para> </callout> <callout arearefs="programlisting_attribute_default_implicit"> <para>A <tag class="starttag">para</tag> node with implicit value <tag class="attribute">language="english"</tag>.</para> </callout> <callout arearefs="programlisting_attribute_default_defaultoverride"> <para>Explicitly setting the DTD default value.</para> </callout> </calloutlist> <para>So the difference in declaring an attribute value either <code>#FIXED</code> or with an ordinary default is the fact, that the latter may be overridden with a value differing from the default being supplied in the DTD.</para> </section> </section> <section xml:id="catalogs"> <title>Catalogs for <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>'s</title> <para>Till now our method to reference a DTD from a document instance is via a SYSTEM reference:</para> <programlisting><!DOCTYPE book SYSTEM "ftp://someserver.com/book.dtd"> ...</programlisting> <para>As mentioned before the DTD may be accessed from the file system or referenced by different protocols like http. As an example we consider the XML version of the hypertext markup language HTML:</para> <figure xml:id="figure_xhtmlbase"> <title>A simple XHTML document</title> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head><title>A first start</title></head> <body> <h1>A first start</h1> <p>This is a very simple document</p> </body> </html></programlisting> </figure> <para>In this example the DTD can be accessed via http. This seems to be perfect: A parser reads the document and retrieves referenced resources. But what happens if the HTTP server <code>www.w3.org</code> is inaccessible? Or if someone wants to work offline or in a company's intra net with restricted access policies? In all these cases it is desirable to have a local copy of the DTD to become independent from a remote server. The most simple solution is a copy the complete DTD to the host's local file system:</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html SYSTEM "C:\mystuff\xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> ...</programlisting> <para>This seems to solve the problem of resources being unavailable. But what about interoperability? If we want to exchange documents with other people we cannot expect our partners to supply the DTD at the same location in the file system. For this reason XML supports the concept of <emphasis>public identifiers</emphasis>. We extend the current example:</para> <figure xml:id="figure_xhtml_public"> <title>A XHTML document insversion 2 oftance with public and system identifier</title> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head><title>A first start</title></head> <body> <h1>A first start</h1> <p>This is a very simple document</p> </body> </html></programlisting> </figure> <para>The String <quote>-//W3C//DTD XHTML 1.0 Strict//EN</quote> should uniquely identify the given DTD. Thus a different XHTML DTD version or even a different XML DTD <emphasis>must have</emphasis> a different public identifier. Note that in the above example a <code>SYSTEM</code> identifier <code>http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd</code> must still be present although the keyword <code>SYSTEM</code> is absent.</para> <para>Now a parser may use a <code>PUBLIC</code> identifier to find the DTD even if the resource being referenced by the <code>SYSTEM</code> identifier's value is unavailable. This is achieved by so called DTD catalogs. A catalog maps <code>PUBLIC</code> identifier values to physical resources. It may be conceived as a map:</para> <figure xml:id="publicSystemDict"> <title>A catalog joining public identifiers with physical resources.</title> <programlisting>OVERRIDE YES <co xml:id="figure_emacs_catalog_preferpublic"/> -- prefer public identifiers to system identifiers -- ... -- XHTML 1.0 -- PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" <co xml:id="figure_emacs_catalog_pubid"/> xhtml1-frameset.dtd <co xml:id="figure_emacs_catalog_resource"/> PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" xhtml1-strict.dtd PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" xhtml1-transitional.dtd ... -- Docbook 3.1 -- PUBLIC "-//OASIS//DTD DocBook V3.1//EN" docbook.dtd ...</programlisting> </figure> <calloutlist> <callout arearefs="figure_emacs_catalog_preferpublic"> <para>As being stated in the subsequent comment public identifiers will have precedence over system identifiers.</para> </callout> <callout arearefs="figure_emacs_catalog_pubid"> <para>A public identifier with value <code>-//W3C//DTD XHTML 1.0 Frameset//EN</code> ...</para> </callout> <callout arearefs="figure_emacs_catalog_resource"> <para>... and the corresponding value <filename>${BASEDIR}/xhtml1-frameset.dtd</filename>.</para> </callout> </calloutlist> <para>The format of a catalog file is by no means specified. Some applications prefer XML formats to store these mappings. We note that in presence of a <code>PUBLIC</code> identifier an XML application is free to choose either of the two offered DTD files if both are accessible.</para> <qandaset role="exercise"> <title>Relation between public and system identifiers</title> <qandadiv> <qandaentry xml:id="example_public_system"> <question> <para>We recall <xref linkend="figure_xhtml_public"/>. The public identifier uniquely identifies the DTD. Thus the system identifier still being present seems to be superfluous. How does a parser react if we omit it? Read the XML specification and find the corresponding definition.</para> </question> <answer> <para>Omitting the <code>SYSTEM</code> identifier yields a parsing error:</para> <programlisting><errortext>The system identifier must begin with either a single or double quote character.</errortext></programlisting> <para>This message is a bit confusing. Actually the <code>SYSTEM</code> identifier <emphasis>must</emphasis> still be present and a better parser should actually complain about its absence instead of only remarking the missing begin quotes. The production rule indeed states that even for <code>PUBLIC</code> identifiers a system literal is mandatory:</para> <productionset> <title>External Entity Declaration</title> <production xml:id="w3RecXml_NT-ExternalID"> <lhs>ExternalID</lhs> <rhs>'SYSTEM' <nonterminal def="#w3RecXml_NT-S">S</nonterminal> <nonterminal def="#w3RecXml_NT-SystemLiteral">SystemLiteral</nonterminal> <sbr/> | 'PUBLIC' <nonterminal def="#w3RecXml_NT-S">S</nonterminal> <nonterminal def="#w3RecXml_NT-PubidLiteral">PubidLiteral</nonterminal> <nonterminal def="#w3RecXml_NT-S">S</nonterminal> <nonterminal def="#w3RecXml_NT-SystemLiteral">SystemLiteral</nonterminal></rhs> </production> <production xml:id="w3RecXml_NT-NDataDecl"> <lhs>NDataDecl</lhs> <rhs><nonterminal def="#w3RecXml_NT-S">S</nonterminal> 'NDATA' <nonterminal def="#w3RecXml_NT-S">S</nonterminal> <nonterminal def="#w3RecXml_NT-Name">Name</nonterminal></rhs> </production> </productionset> </answer> </qandaentry> <qandaentry xml:id="example_public_dtdlookup"> <question> <label>DTD lookup by PUBLIC identifier</label> <para>Modify the document of the preceding exercise by:</para> <itemizedlist> <listitem> <para>Change the <code>PUBLIC</code> identifier from <code>-//W3C//DTD XHTML 1.0 Strict//EN</code> to <code>-//W3C//DTD XHTML 1.0 Transitional//EN</code>.</para> </listitem> <listitem> <para>Change the <code>SYSTEM</code> identifier to a resource name which cannot be retrieved.</para> </listitem> </itemizedlist> <para>Use the Oxygen plug in to check whether this document instance is still valid. Which DTD is used for validation? Hint: Check the <option>Window->Preferences->oxyGen->XML->XML Catalog</option> menu.</para> </question> <answer> <para>We modify the <code>SYSTEM</code> identifier by omitting the <filename>.dtd</filename> suffix. Thus the DTD cannot be retrieved by this <link xlink:href="http://www.w3.org/Addressing">URL</link> any longer. But we observe that the document remains valid. We conclude that the parser found a DTD via the <code>PUBLIC</code> identifier.</para> <para>This assumption is indeed true: In the indicated options menu we find that a master catalog file <filename>/usr/share/.../frameworks/catalog.xml</filename> is used for looking up <code>PUBLIC</code> identifiers:</para> <programlisting><?xml version="1.0"?> <!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> ... <nextCatalog catalog="xhtml/dtd/xhtmlcatalog.xml" /> <nextCatalog catalog="xhtml11/dtd/xhtmlcatalog.xml" /> <nextCatalog catalog="xhtml11/schema/xhtmlcatalog.xml" /> ... </catalog></programlisting> <para>And in <filename>xhtml/dtd/xhtmlcatalog.xml</filename> we find:</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> ... <public publicId="<emphasis role="bold">-//W3C//DTD XHTML 1.0 Transitional//EN</emphasis>" uri="<emphasis role="bold">xhtml1-transitional.dtd</emphasis>"/> <public publicId="<emphasis role="bold">-//W3C//DTD XHTML 1.0 Transitional//EN</emphasis>" uri="<emphasis role="bold">xhtml1-strict.dtd</emphasis>"/> <public publicId="-//W3C//DTD XHTML 1.0 Frameset//EN" uri="xhtml1-frameset.dtd"/> ... </catalog></programlisting> <para>We learn from this example that a W3C standard describing a catalog file's structure exists.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> </section> <section xml:id="xhtml"> <title>The XHTML DTD</title> <para>The XHTML standard is completely defined in terms of a family of <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>s. One member of this family is denoted as <emphasis>strict</emphasis> referring to the largest distinction with regards to <quote>traditional</quote> HTML. We start with a <quote>Hello, World</quote> example:</para> <figure xml:id="htmlHelloRender"> <title>A XHTML Hello, World example and its rendering</title> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Hello Example</title> </head> <body> <h1>Hello, World ...</h1> </body> </html></programlisting> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/hello.screen.png"/> </imageobject> </mediaobject> </figure> </section> </chapter> <chapter xml:id="xmlApis"> <title><abbrev xlink:href="http://en.wikipedia.org/wiki/Api">API</abbrev>s for XML document processing</title> <section xml:id="sax"> <title>The Simple API for XML</title> <section xml:id="saxPrinciple"> <title>The principle of a <acronym xlink:href="http://www.saxproject.org">SAX</acronym> application</title> <para>We are already familiar with transformations of XML document instances to other formats. Sometimes the capabilities being offered by a given transformation approach do not suffice for a given problem. Obviously a general purpose programming language like <link linkend="gloss_Java"><trademark>Java</trademark></link> offers superior means to perform advanced manipulations of XML document trees.</para> <para>Before diving into technical details we present an example exceeding the limits of our present transformation capabilities. We want to format an XML catalog document with article descriptions to HTML. The price information however shall resides in a XML document external database namely a RDBMS:</para> <figure xml:id="saxRdbmsAccessPrinciple"> <title>Generating HTML from a XML document and an RDBMS.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/saxxmlrdbms.fig" scale="65"/> </imageobject> </mediaobject> </figure> <para>Our catalog might look like:</para> <figure xml:id="simpleCatalog"> <title>A <link linkend="gloss_XML"><abbrev>XML</abbrev></link> based catalog.</title> <programlisting><catalog> <item orderNo="<emphasis role="bold">3218</emphasis>">Swinging headset</item> <item orderNo="<emphasis role="bold">9921</emphasis>">200W Stereo Amplifier</item> </catalog></programlisting> </figure> <para>The RDBMS may hold some relation with a field <code>orderNo</code> as primary key and a corresponding attribute like <code>price</code>. In a real world application <code>orderNo</code> should probably be an integer typed <code>IDENTITY</code> attribute.</para> <figure xml:id="saxRdbmsSchema"> <title>A Relation containing price information.</title> <programlisting>CREATE TABLE Product ( orderNo CHAR(10) PRIMARY KEY ,price Money ) INSERT INTO Product VALUES('<emphasis role="bold">3218</emphasis>', 42.57) INSERT INTO Product VALUES('<emphasis role="bold">9921</emphasis>', 121.50)</programlisting> <caption> <para>Prices are depending on article numbers.</para> </caption> </figure> <para>The intended HTML output with order numbers being highlighted looks like:</para> <figure xml:id="saxPriceOut"> <title>HTML generated output.</title> <programlisting><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head><title>Available products</title></head> <body> <table border="1"> <tbody> <tr> <th><emphasis role="bold">Order number</emphasis></th> <th>Price</th> <th>Product</th> </tr> <tr> <td><emphasis role="bold">3218</emphasis></td> <td>42,57</td> <td>Swinging headset</td> </tr> <tr> <td><emphasis role="bold">9921</emphasis></td> <td>121,50</td> <td>200W Stereo Amplifier</td> </tr> </tbody> </table> </body> </html></programlisting> <caption> <para>This result HTML document contains content both from our XML document an from the database table <code>Product</code>.</para> </caption> </figure> <para>The intended transformation is beyond the XSLT standard's processing capabilities: XSLT does not enable us to RDBMS content. However some XSLT processors provide extensions for this task.</para> <para>It is tempting to write a <link linkend="gloss_Java"><trademark>Java</trademark></link> application which might use e.g. <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> for database access. But how do we actually read and parse a XML file? Sticking to the <link linkend="gloss_Java"><trademark>Java</trademark></link> standard we might use a <link xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/java/io/FileInputStream.html">FileInputStream</link> instance to read from <code>catalog.xml</code> and write a XML parser by ourself. Fortunately <orgname>SUN</orgname>'s <trademark xlink:href="http://www.oracle.com/technetwork/java/javase">JDK</trademark> already includes an API denoted <acronym xlink:href="http://www.saxproject.org">SAX</acronym>, the <emphasis>S</emphasis>imple <emphasis>A</emphasis>pi for <emphasis>X</emphasis>ml. The<productname xlink:href="http://www.oracle.com/technetwork/java/javase/jdk-7-readme-429198.html">JDK</productname> also includes a corresponding parser implementation. In addition there are third party <acronym xlink:href="http://www.saxproject.org">SAX</acronym> parser implementations available like <productname xlink:href="http://xerces.apache.org">Xerces</productname> from the <orgname xlink:href="http://www.apache.org">Apache Foundation</orgname>.</para> <para>The <acronym xlink:href="http://www.saxproject.org">SAX</acronym> API is event based and will be illustrated by the relationship between customers and a software vendor company:</para> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/updateinfo.fig"/> </imageobject> </mediaobject> <para>After purchasing software customers are asked to register their software. This way the vendor receives the customer's address. Each time a new release is being completed all registered customers will receive a notification typically including a <quote>special offer</quote> to upgrade their software. From an abstract point of view the following two actions take place:</para> <variablelist> <varlistentry> <term>Registration</term> <listitem> <para>The customer registers itself at the company's site indicating it's interest in updated versions.</para> </listitem> </varlistentry> <varlistentry> <term>Notification</term> <listitem> <para>Upon completion of each new software release (considered to be an <emphasis>event</emphasis>) a message is sent to all registered customers.</para> </listitem> </varlistentry> </variablelist> <para>The same principle applies to GUI applications in software development. A key press <emphasis>event</emphasis> for example will be forwarded by an application's <emphasis>event handler</emphasis> to a callback function (sometimes called a <emphasis>handler</emphasis> method) being implemented by an application developer. The <acronym xlink:href="http://www.saxproject.org">SAX</acronym> API works the same way: A parser reads a XML document generating events which <emphasis>may</emphasis> be handled by an application. During document parsing the XML tree structure gets <quote>flattened</quote> to a sequence of events:</para> <figure xml:id="saxFlattenEvent"> <title>Parsing a XML document creates a corresponding sequence of events.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/saxmodel.pdf"/> </imageobject> </mediaobject> </figure> <para>An application may register components to the parser:</para> <figure xml:id="figureSax"> <title><acronym xlink:href="http://www.saxproject.org">SAX</acronym> Principle</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/saxapparch.pdf"/> </imageobject> <caption> <para>A <acronym xlink:href="http://www.saxproject.org">SAX</acronym> application consists of a <acronym xlink:href="http://www.saxproject.org">SAX</acronym> parser and an implementation of event handlers being specific to the application. The application is developed by implementing the two handlers.</para> </caption> </mediaobject> </figure> <para>An Error Handler is required since the XML stream may contain errors. In order to implement a <acronym xlink:href="http://www.saxproject.org">SAX</acronym> application we have to:</para> <orderedlist> <listitem> <para>Instantiate required objects:</para> <itemizedlist> <listitem> <para>Parser</para> </listitem> <listitem> <para>Event Handler</para> </listitem> <listitem> <para>Error Handler</para> </listitem> </itemizedlist> </listitem> <listitem> <para>Register handler instances</para> <itemizedlist> <listitem> <para>register Event Handler to Parser</para> </listitem> <listitem> <para>register Error Handler to Parser</para> </listitem> </itemizedlist> </listitem> <listitem> <para>Start the parsing process by calling the parser's appropriate method.</para> </listitem> </orderedlist> </section> <section xml:id="saxIntroExample"> <title>First steps</title> <para>Our first <acronym xlink:href="http://www.saxproject.org">SAX</acronym> toy application <classname>sax.stat.v1.ElementCount</classname> shall simply count the number of elements it finds in an arbitrary XML document. In addition the <acronym xlink:href="http://www.saxproject.org">SAX</acronym> events shall be written to standard output generating output sketched in <xref linkend="saxFlattenEvent"/>. The application's central implementation reads:</para> <figure xml:id="saxElementCount"> <title>Counting XML elements.</title> <programlisting language="java">package sax.stat.v1; ... public class ElementCount { public void parse(final String uri) { try { final SAXParserFactory saxPf = SAXParserFactory.newInstance(); final SAXParser saxParser = saxPf.newSAXParser(); saxParser.parse(uri, eventHandler); } catch (ParserConfigurationException e){ e.printStackTrace(System.err); } catch (org.xml.sax.SAXException e) { e.printStackTrace(System.err); } catch (IOException e){ e.printStackTrace(System.err); } } public int getElementCount() { return eventHandler.getElementCount(); } private final MyEventHandler eventHandler = new MyEventHandler(); }</programlisting> <caption> <para>This application works for arbitrary well-formed XML documents.</para> </caption> </figure> <para>We now explain this application in detail. The first part deals with the instantiation of a parser:</para> <programlisting language="java">try { final SAXParserFactory saxPf = <emphasis role="bold">SAXParserFactory</emphasis>.newInstance(); final SAXParser saxParser = saxPf.newSAXParser(); saxParser.parse(uri, eventHandler); } catch (ParserConfigurationException e){ e.printStackTrace(System.err); } ...</programlisting> <para>In order to keep an application independent from a specific parser implementation the <acronym xlink:href="http://www.saxproject.org">SAX</acronym> uses the so called <link xlink:href="http://www.dofactory.com/Patterns/PatternAbstract.aspx">Abstract Factory Pattern</link> instead of simply calling a constructor from a vendor specific parser class.</para> <para>In order to be useful the parser has to be instructed to do something meaningful when a XML document gets parsed. For this purpose our application supplies an event handler instance:</para> <programlisting language="java">public void parse(final String uri) { try { final SAXParserFactory saxPf = SAXParserFactory.newInstance(); final SAXParser saxParser = saxPf.newSAXParser(); saxParser.parse(uri, <emphasis role="bold">eventHandler</emphasis>); } catch (org.xml.sax.SAXException e) { ... private final MyEventHandler <emphasis role="bold">eventHandler = new MyEventHandler()</emphasis>; }</programlisting> <para>What does the event handler actually do? It offers methods to the parser being callable during the parsing process:</para> <programlisting language="java">package sax.stat.v1; ... public class MyEventHandler extends <classname>org.xml.sax.helpers.DefaultHandler</classname> { public void <emphasis role="bold"><emphasis role="bold">startDocument()</emphasis></emphasis><co xml:id="programlisting_eventhandler_startDocument"/> { System.out.println("Opening Document"); } public void <emphasis role="bold">endDocument()</emphasis><co xml:id="programlisting_eventhandler_endDocument"/> { System.out.println("Closing Document"); } public void <emphasis role="bold">startElement(String namespaceUri, String localName, String rawName, Attributes attrs)</emphasis> <co xml:id="programlisting_eventhandler_startElement"/>{ System.out.println("Opening \"" + rawName + "\""); elementCount++; } public void <emphasis role="bold">endElement(String namespaceUri, String localName, String rawName)</emphasis><co xml:id="programlisting_eventhandler_endElement"/>{ System.out.println("Closing \"" + rawName + "\""); } public void <emphasis role="bold">characters(char[] ch, int start, int length)</emphasis><co xml:id="programlisting_eventhandler_characters"/>{ System.out.println("Content \"" + new String(ch, start, length) + '"'); } public int getElementCount() <co xml:id="programlisting_eventhandler_getElementCount"/>{ return elementCount; } private int elementCount = 0; }</programlisting> <calloutlist> <callout arearefs="programlisting_eventhandler_startDocument"> <para>This method gets called exactly once namely when opening the XML document as a whole.</para> </callout> <callout arearefs="programlisting_eventhandler_endDocument"> <para>After successfully parsing the whole document instance this method will finally be called.</para> </callout> <callout arearefs="programlisting_eventhandler_startElement"> <para>This method gets called each time a new element is parsed. In the given catalog.xml example it will be called three times: First when the <tag class="starttag">catalog</tag> appears and then two times upon each <item ... >. The supplied parameters depend whether or not name space processing is enabled.</para> </callout> <callout arearefs="programlisting_eventhandler_endElement"> <para>Called each time an element like <tag class="starttag">item ...</tag> gets closed by its counterpart <tag class="endtag">item</tag>.</para> </callout> <callout arearefs="programlisting_eventhandler_characters"> <para>This method is responsible for the treatment of textual content i.e. handling <code>#PCDATA</code> element content. We will explain its uncommon signature a little bit later.</para> </callout> <callout arearefs="programlisting_eventhandler_getElementCount"> <para><function>getElementCount()</function> is a getter method to read only access the private field <varname>elementCount</varname> which gets incremented in <coref linkend="programlisting_eventhandler_startElement"/> each time an XML element opens.</para> </callout> </calloutlist> <para>The call <code>saxParser.parse(uri, eventHandler)</code> actually initiates the parsing process and tells the parser to:</para> <itemizedlist> <listitem> <para>Open the XML document being referenced by the URI argument.</para> </listitem> <listitem> <para>Forward XML events to the event handler instance supplied by the second argument.</para> </listitem> </itemizedlist> <para>A driver class containing a <code>main(...)</code> method may start the whole process and print out the desired number of elements upon completion of a parsing run:</para> <programlisting language="java">package sax.stat.v1; public class ElementCountDriver { public static void main(String argv[]) { ElementCount xmlStats = new ElementCount(); xmlStats.parse("<emphasis role="bold">Input/Sax/catalog.xml</emphasis>"); System.out.println("Document contains " + xmlStats.<emphasis role="bold">getElementCount()</emphasis> + " elements"); } }</programlisting> <para>Processing the catalog example instance yields:</para> <programlisting>Opening Document <emphasis role="bold">Opening "catalog"</emphasis> <co xml:id="programlisting_catalog_output"/> Content " " <emphasis role="bold">Opening "item"</emphasis> <co xml:id="programlisting_catalog_item1"/> Content "Swinging headset" Closing "item" Content " " <emphasis role="bold">Opening "item"</emphasis> <co xml:id="programlisting_catalog_item2"/> Content "200W Stereo Amplifier" Closing "item" Content " " Closing "catalog" Closing Document <emphasis role="bold">Document contains 3 elements</emphasis> <co xml:id="programlisting_catalog_elementcount"/></programlisting> <calloutlist> <callout arearefs="programlisting_catalog_output"> <para>Start parsing element <tag class="starttag">catalog</tag>.</para> </callout> <callout arch="" arearefs="programlisting_catalog_item1"> <para>Start parsing element <tag class="starttag">item orderNo="3218"</tag>Swinging headset<tag class="endtag" role="">item</tag>.</para> </callout> <callout arch="" arearefs="programlisting_catalog_item2"> <para>Start parsing element <tag class="starttag">item orderNo="9921"</tag>200W Stereo Amplifier<tag class="endtag" role="">item</tag>.</para> </callout> <callout arearefs="programlisting_catalog_elementcount"> <para>After the parsing process has completed the application outputs the number of elements being counted so far.</para> </callout> </calloutlist> <para>The output contains some lines of <quote>empty</quote> content. This content is due to whitespace being located between elements. For example a newline appears between the the <tag class="starttag">catalog</tag> and the first <tag class="starttag">item</tag> element. The parser encapsulates this whitespace in a call to the <link xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/ContentHandler.html#characters(char[],%20int,%20int)">characters</link> method. In an application this call will typically be ignored. XML document instances in a professional context will typically not contain any newline characters at all. Instead the whole document is represented as a single line. This inhibits human readability which is not required if the processing applications work well. In this case empty content as above will not appear.</para> <para>The <code>characters(char[] ch, int start, int length)</code> method's signature looks somewhat strange regarding <link linkend="gloss_Java"><trademark>Java</trademark></link> conventions. One might expect <code>characters(String s)</code>. But this way the <acronym xlink:href="http://www.saxproject.org">SAX</acronym> API allows efficient parser implementations: A parser may initially allocate a reasonable large <code>char</code> array of say 128 bytes sufficient to hold 64 (<link xlink:href="http://unicode.org">Unicode</link>) characters. If this buffer gets exhausted the parser might allocate a second buffer of double size thus implementing an <quote>amortized doubling</quote> algorithm:</para> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/saxcharacter.pdf"/> </imageobject> </mediaobject> <para>In this example the first element content fits in the first buffer. The second content <code>200W Stereo Amplifier</code> and the third content <code>Earphone</code> both fit in the second buffer. Subsequent content may require further buffer allocations. Such a strategy minimizes the number of time consuming <code>new </code> <link xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html">String</link> <code>(...)</code> constructor calls being necessary for the more convenient API variant <code>characters(String s)</code>.</para> </section> <section xml:id="saxRegistry"> <title>Event- and error handler registration</title> <para>Our first <acronym xlink:href="http://www.saxproject.org">SAX</acronym> application suffers from the following deficiencies:</para> <itemizedlist> <listitem> <para>The error handling is very sparse. It completely relies on exceptions being thrown by classes like <link xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/SAXException.html">SAXException</link> which frequently do not supply meaningful error information.</para> </listitem> <listitem> <para>The application is not aware of namespaces. Thus reading e.g. <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> document instances will not allow to distinguish between elements from different namespaces like HTML.</para> </listitem> <listitem> <para>The parser will not validate a document instance against a DTD being present.</para> </listitem> </itemizedlist> <para>We now incrementally add these features to the <acronym xlink:href="http://www.saxproject.org">SAX</acronym> parsing process. <acronym xlink:href="http://www.saxproject.org">SAX</acronym> offers an interface <link xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/XMLReader.html">XmlReader</link> to conveniently <emphasis>register</emphasis> event- and error handler instances instead of passing them as a separate argument to the <link xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/SAXParser.html#parse(java.lang.String,%20org.xml.sax.helpers.DefaultHandler)">parse</link> method. We first code an error handler class by implementing the interface <classname>org.xml.sax.ErrorHandler</classname> being part of the <acronym xlink:href="http://www.saxproject.org">SAX</acronym> API:</para> <programlisting language="java">package sax.stat.v2; ... public class MyErrorHandler implements ErrorHandler { <emphasis role="bold">public void warning(SAXParseException e)</emphasis> { System.err.println("[Warning]" + getLocationString(e)); } <emphasis role="bold">public void error(SAXParseException e)</emphasis> { System.err.println("[Error]" + getLocationString(e)); } <emphasis role="bold">public void fatalError(SAXParseException e)</emphasis> throws SAXException{ System.err.println("[Fatal Error]" + getLocationString(e)); } private String getLocationString(SAXParseException e) { return " line " + e.getLineNumber() + ", column " + e.getColumnNumber()+ ":" + e.getMessage(); } }</programlisting> <para>These three methods represent the <classname>org.xml.sax.ErrorHandler</classname> interface. The method <function>getLocationString</function> is used to supply precise parsing error locations by means of line- and column numbers within a document instance. If errors or warnings are encountered the parser will call one of the appropriate public methods:</para> <figure xml:id="saxMissItem"> <title>A non well formed document.</title> <programlisting><?xml version="1.0" encoding="UTF-8"?> <catalog> <item orderNo="3218">Swinging headset</item> <item orderNo="9921">200W Stereo Amplifier </catalog></programlisting> <caption> <para>This document is not well formed since due to a missing a closing <tag class="endtag">item</tag> tag is missing.</para> </caption> </figure> <para>Our error handler method gets called yielding an informative message:</para> <programlisting>[Fatal Error] line 5, column -1:Expected "</item>" to terminate element starting on line 4.</programlisting> <para>This error output is achieved by <emphasis>registering</emphasis> an instance of <classname>sax.stat.v2.MyErrorHandler</classname> to the parser prior to starting the parsing process. In the following code snippet we also register a content handler instance to the parser and thus separate the parser's configuration from its invocation:</para> <programlisting language="java">package sax.stat.v2; ... public class ElementCount { public ElementCount() throws SAXException, ParserConfigurationException{ final SAXParserFactory saxPf = SAXParserFactory.newInstance(); final SAXParser saxParser = saxPf.newSAXParser(); xmlReader = saxParser.getXMLReader(); xmlReader.setContentHandler(eventHandler); <co xml:id="programlisting_assemble_parser_setcontenthandler"/> xmlReader.setErrorHandler(errorHandler); <co xml:id="programlisting_assemble_parser_seterrorhandler"/> } public void parse(final String uri) throws IOException, SAXException{ xmlReader.parse(uri); <co xml:id="programlisting_assemble_parser_invokeparse"/> } public int getElementCount() { return eventHandler.getElementCount(); <co xml:id="programlisting_assemble_parser_getelementcount"/> } private final XMLReader xmlReader; private final MyEventHandler eventHandler = new MyEventHandler(); <co xml:id="programlisting_assemble_parser_createeventhandler"/> private final MyErrorHandler errorHandler = new MyErrorHandler(); <co xml:id="programlisting_assemble_parser_createerrorhandler"/> }</programlisting> <calloutlist> <callout arearefs="programlisting_assemble_parser_setcontenthandler programlisting_assemble_parser_seterrorhandler"> <para>Referring to <xref linkend="figureSax" os=""/> these two calls attach the event- and error handler objects to the parser thus implementing the two arrows from the parser to the application's implementation.</para> </callout> <callout arearefs="programlisting_assemble_parser_invokeparse"> <para>The parser is invoked. Note that in this example we only pass a document's URI but no reference to a handler object.</para> </callout> <callout arearefs="programlisting_assemble_parser_getelementcount"> <para>The method <function>getElementCount()</function> is needed to allow a calling object to access the private <varname>eventHandler</varname> object's <function>getElementCount()</function> method.</para> </callout> <callout arearefs="programlisting_assemble_parser_createeventhandler programlisting_assemble_parser_createerrorhandler"> <para>An event handling and an error handling object are created to handle events during the parsing process.</para> </callout> </calloutlist> <para>The careful reader might notice a subtle difference between the content- and the error handler implementation: The class <classname>sax.stat.v2.MyErrorHandler</classname> implements the interface <classname>org.xml.sax.ErrorHandler</classname>. But <classname>sax.stat.v2.MyEventHandler</classname> is derived from <classname>org.xml.sax.helpers.DefaultHandler</classname> which itself implements the <classname>org.xml.sax.ContentHandler</classname> interface. Actually one might as well start from the latter interface requiring to implement all of it's 11 methods. In most circumstances this only complicates the application's code since it is unnecessary to react to events belonging for example to processing instructions. For this reason it is good coding practice to use the empty default implementations in <classname>org.xml.sax.helpers.DefaultHandler</classname> and to redefine only those methods corresponding to events actually being handled by the application in question.</para> <qandaset role="exercise"> <title>Reading XML attributes</title> <qandadiv> <qandaentry xml:id="exercise_saxAttrib"> <question> <label>Reading an element's set of attributes.</label> <para>The example document instance does include <tag class="attribute">orderNo</tag> attribute values for each <tag class="starttag">item</tag> element. The parser does not yet show these attribute keys and their corresponding values. Read the documentation for <classname xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/Attributes.html">org.xml.sax.Attributes</classname> and extend the given code to use it.</para> </question> <answer> <para>For the given example it would suffice to read the known <tag class="attribute">orderNo</tag> attributes value. A generic solution may ask for the set of all defined attributes and show their values:</para> <programlisting language="java">package sax; public class AttribEventHandler extends DefaultHandler { public void startElement(String namespaceUri, String localName, String rawName, Attributes attrs) { System.out.println("Opening Element " + rawName); for (int i = 0; i < attrs.getLength(); i++){ System.out.println(attrs.getQName(i) + "=" + attrs.getValue(i) + "\n"); } } }</programlisting> </answer> </qandaentry> <qandaentry xml:id="saxRdbms"> <question> <label>SAX processing with RDBMS access.</label> <para>Implement the example given in <xref linkend="saxRdbmsAccessPrinciple"/> to produce the output sketched in <xref linkend="saxPriceOut"/>. You may start by implementing <emphasis>and testing</emphasis> the following methods of a RDBMS interfacing class using <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>:</para> <programlisting language="java">package sax.rdbms; public class RdbmsAccess { public void connect(final String host, final int port, final String userName, final String password) { // <emphasis role="bold">open connection to a database</emphasis> } public String readPrice(final String articleNumber) { return "0"; // <emphasis role="bold">To be implemented as access to a ResultSet object</emphasis> } public void close() { // <emphasis role="bold">close database connection</emphasis> } }</programlisting> <para>You may find it helpful to write a small testbed for the RDBMS access functionality prior to integrate it into your <acronym xlink:href="http://www.saxproject.org">SAX</acronym> application producing HTML output.</para> </question> <answer> <para>We start by creating a suitable RDBMS Table:</para> <programlisting>CREATE SCHEMA AUTHORIZATION midb2 CREATE TABLE Product( orderNo CHAR(10) NOT NULL PRIMARY KEY ,price DECIMAL (9,2) NOT NULL )</programlisting> <para>Next we feed some toy data:</para> <programlisting>INSERT INTO Product VALUES('x-223', 330.20); INSERT INTO Product VALUES('w-124', 110.40);</programlisting> <para>Now we implement our RDBMS access class:</para> <programlisting language="java">package dom.xsl; ... public class DbAccess { public void connect(final String jdbcUrl, final String userName, final String password) { try { conn = DriverManager.getConnection(jdbcUrl, userName, password); priceQuery = conn.prepareStatement(sqlPriceQuery); } catch (SQLException e) { System.err.println("Unable to open connection to database:" + e);} } public String readPrice(final String articleNumber) { String result; try { priceQuery.setString(1, articleNumber); final ResultSet rs = priceQuery.executeQuery(); if (rs.next()) { result = rs.getString("price"); } else { result = "No price available for article '" + articleNumber + "'"; } } catch (SQLException e) { result = "Error reading price for article '" + articleNumber + "':" + e; } return result; } public void close() { try {conn.close();} catch (SQLException e) { System.err.println("Error closing database connection:" + e); } } static { try { Class.forName("com.ibm.db2.jcc.DB2Driver"); } catch (ClassNotFoundException e) { System.err.println("Unable to register Driver:" + e);} } private static final String sqlPriceQuery = "SELECT price FROM Product WHERE orderNo = ?"; private PreparedStatement priceQuery = null; private Connection conn = null; }</programlisting> <para>This access layer may be tested independently from handling catalog instances:</para> <programlisting language="java">package dom/xsl; public class DbAccessDriver { public static void main(String[] args) { final DbAccess dbaccess = new DbAccess(); dbaccess.connect("jdbc:db2://db2.mi.hdm-stuttgart.de:10000/hdm", "midb2", "password"); System.out.println(dbaccess.readPrice("x-223")); System.out.println(dbaccess.readPrice("..aaargh!")); dbaccess.close(); } }</programlisting> <para>If the above test succeeds we may embed the RDBMS access layer into our The <acronym xlink:href="http://www.saxproject.org">SAX</acronym> handler:</para> <programlisting language="java">package sax.rdbms; ... public class HtmlEventHandler extends DefaultHandler{ public void startDocument() { dbaccess.connect("jdbc:db2://db2.mi.hdm-stuttgart.de:10000/hdm", "midb2", "password"); System.out.println("<html><head><title>Catalog</title></head>"); } public void endDocument() { System.out.println("</html>"); dbaccess.close(); } public void startElement(String namespaceUri, String localName, String rawName, Attributes attrs){ if (rawName.equals("catalog")){ System.out.println("<body><H1>A catalog</H1>" +"<table border='1'><tbody>"); System.out.println("<tr><th>Order number</th>\n" + "<th>Price</th>\n" +" <th>Product</th></tr>"); } else if (rawName.equals("item")){ final String orderNo = attrs.getValue("orderNo"); System.out.print("<tr><td>" + orderNo + "</td>\n<td>" + dbaccess.readPrice(orderNo) + "</td>\n<td>"); } else { System.err.println("Element '" + rawName + "' unknown"); } } public void endElement(String namespaceUri, String localName, String rawName) { if (rawName.equals("catalog")){ System.out.println("</tbody></table>"); } else if (rawName.equals("item")){ System.out.println("</td></tr>\n"); } } public void characters(char[] ch, int start, int length) { System.out.print(new String(ch, start, length)); } private DbAccess dbaccess = new DbAccess(); }</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="saxValidate"> <title><acronym xlink:href="http://www.saxproject.org">SAX</acronym> validation</title> <para>So far we only parsed well formed document instances. Our current parser may operate on valid XML instances:</para> <figure xml:id="saxNotValid"> <title>An invalid XML document.</title> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE catalog [ <!ELEMENT catalog (item) > <!ELEMENT item (#PCDATA) > <!ATTLIST item orderNo NMTOKEN #REQUIRED > ]> <catalog> <item orderNo="3218">Swinging headset</item> <item orderNo="9921">200W Stereo Amplifier</item> </catalog></programlisting> <caption> <para>In contrast to <xref linkend="saxMissItem"/> this document is well formed. But it is not <emphasis role="bold">valid</emphasis> with respect to the DTD grammar since more than one <tag class="starttag">item</tag> elements are present.</para> </caption> </figure> <para>This document instance is well-formed but not valid. The parser will not report any error or warning. In order to enable validation we need to configure our parser:</para> <programlisting language="java">xmlReader.setFeature("http://xml.org/sax/features/validation", true);</programlisting> <para>The string <code>http://xml.org/sax/features/validation</code> serves as a key. Since this is an ordinary string value a parser may or may not implement it. The <acronym xlink:href="http://www.saxproject.org">SAX</acronym> standard defines two exception classes for dealing with feature related errors:</para> <variablelist> <varlistentry> <term><link xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/SAXNotRecognizedException.html">SAXNotRecognizedException</link></term> <listitem> <para>The feature is not known to the parser.</para> </listitem> </varlistentry> <varlistentry> <term><link xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/SAXNotSupportedException.html">SAXNotSupportedException</link></term> <listitem> <para>The feature is known to the parser but the parser does not support it or it does not support a specific value being set as a value.</para> </listitem> </varlistentry> </variablelist> </section> <section xml:id="saxNamespace"> <title>Namespaces</title> <para>In order to make a <acronym xlink:href="http://www.saxproject.org">SAX</acronym> parser application namespace aware we have to activate two <acronym xlink:href="http://www.saxproject.org">SAX</acronym> parsing feature:</para> <programlisting language="java">xmlReader = saxParser.getXMLReader(); xmlReader.setFeature("http://xml.org/sax/features/namespaces", true); xmlReader.setFeature("http://xml.org/sax/features/namespace-prefixes", true);</programlisting> <para>This instructs the parser to pass the namespace's name for each element. Namespace prefixes like <code>xsl</code> in <tag class="starttag">xsl:for-each</tag> are also passed and may be used by an application:</para> <programlisting language="java">package sax; ... public class NamespaceEventHandler extends DefaultHandler { ... public void startElement(String <emphasis role="bold">namespaceUri</emphasis>, String localName, String rawName, Attributes attrs) { System.out.println("Opening Element rawName='" + rawName + "'\n" + "namespaceUri='" + <emphasis role="bold">namespaceUri</emphasis> + "'\n" + "localName='" + localName + "'\n--------------------------------------------"); }</programlisting> <para>As an example we take a XSLT script:</para> <programlisting><?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:xsl='http://www.w3.org/1999/XSL/Transform' xmlns:fo='http://www.w3.org/1999/XSL/Format'> <xsl:template match="/"> <fo:block>A block</fo:block> <HTML/> </xsl:template> </xsl:stylesheet></programlisting> <para>This XSLT script being conceived as a XML document instance contains elements belonging to two different namespaces namely <code>http://www.w3.org/1999/XSL/Transform</code> and <code>http://www.w3.org/1999/XSL/Format</code>. The script also contains a <quote>raw</quote> <tag audience="" class="emptytag">HTML</tag> element being introduced only for demonstration purposes belonging to the default namespace. The result reads:</para> <programlisting>Opening Element rawName='xsl:stylesheet' namespaceUri='http://www.w3.org/1999/XSL/Transform' localName='stylesheet' -------------------------------------------- Opening Element rawName='xsl:template' namespaceUri='http://www.w3.org/1999/XSL/Transform' localName='template' -------------------------------------------- Opening Element rawName='fo:block' namespaceUri='http://www.w3.org/1999/XSL/Format' localName='block' -------------------------------------------- Opening Element rawName='HTML' namespaceUri='' localName='HTML'</programlisting> <para>Now the parser tells us to which namespace a given element node belongs to. A XSLT engine for example uses this information to build two classes of elements:</para> <itemizedlist> <listitem> <para>Elements belonging to the namespace <code>http://www.w3.org/1999/XSL/Transform</code> like <tag class="emptytag">xsl:value-of select="..."</tag> have to be interpreted as instructions by the processor.</para> </listitem> <listitem> <para>Elements <emphasis role="bold">not</emphasis> belonging to the namespace <code>http://www.w3.org/1999/XSL/Transform</code> like <tag class="emptytag">html</tag> or <tag class="starttag">fo:block</tag> are copied <quote>as is</quote> to the output.</para> </listitem> </itemizedlist> </section> </section> <section xml:id="dom"> <title>The Document Object Model (<acronym xlink:href="http://www.w3.org/DOM">DOM</acronym>)</title> <titleabbrev><acronym xlink:href="http://www.w3.org/DOM">DOM</acronym></titleabbrev> <section xml:id="domBase"> <title>Language independent specification</title> <titleabbrev>Language independence</titleabbrev> <para>XML documents allow for automated content processing. We already discussed the <acronym xlink:href="http://www.saxproject.org">SAX</acronym> API to access XML documents by <link linkend="gloss_Java"><trademark>Java</trademark></link> applications. There are however situations where <acronym xlink:href="http://www.saxproject.org">SAX</acronym> is not appropriate:</para> <itemizedlist> <listitem> <para>The <acronym xlink:href="http://www.saxproject.org">SAX</acronym> is event based. XML node elements are passed to handler methods. Sometimes we want to access neighbouring nodes from a context node in our handler methods for example a <tag class="starttag">title</tag> following a <tag class="starttag">chapter</tag> node. <acronym xlink:href="http://www.saxproject.org">SAX</acronym> does not offer any support for this. If we need references to neighbouring nodes we have to create them ourselves during a <acronym xlink:href="http://www.saxproject.org">SAX</acronym> parsing run. This is tedious and leads to code being hard to understand.</para> </listitem> <listitem> <para>Some applications may want to select node sets by <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> expressions which is completely impossible in a <acronym xlink:href="http://www.saxproject.org">SAX</acronym> application.</para> </listitem> <listitem> <para>We may want to move subtrees within a document itself (for example exchanging two <tag class="starttag">chapter</tag> nodes) or even transferring them to a different document.</para> </listitem> </itemizedlist> <para>The greatest deficiency of the <acronym xlink:href="http://www.saxproject.org">SAX</acronym> is the fact that an XML instance is not represented as a tree like structure but as a succession of events. The <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> allows us to represent XML document instances as tree like structures and thus enables navigational operations between nodes.</para> <para>In order to achieve language <emphasis>and</emphasis> software vendor independence the <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> approach uses two stages:</para> <itemizedlist> <listitem> <para>The <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> is formulated in an Interface Definition Language (<abbrev xlink:href="http://en.wikipedia.org/wiki/Interface_description_language">IDL</abbrev>)</para> </listitem> <listitem> <para>In order to use the <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> API by a concrete programming language a so called <emphasis>language binding</emphasis> is required. In languages like <link linkend="gloss_Java"><trademark>Java</trademark></link> the language binding will still be a set of (<link linkend="gloss_Java"><trademark>Java</trademark></link>) interfaces. Thus for actually coding an application an implementation of these interfaces is needed</para> </listitem> </itemizedlist> <para>So what exactly may an <abbrev xlink:href="http://en.wikipedia.org/wiki/Interface_description_language">IDL</abbrev> be? The programming language <link linkend="gloss_Java"><trademark>Java</trademark></link> already allows pure interface definitions without any implementation. In C++ the same result can be achieved by so called <emphasis>pure virtual classes</emphasis>. An <abbrev xlink:href="http://en.wikipedia.org/wiki/Interface_description_language">IDL</abbrev> offers extended features to describe such interfaces. For <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> the <productname xlink:href="http://www.omg.org/gettingstarted/corbafaq.htm">CORBA 2.2</productname> <abbrev xlink:href="http://en.wikipedia.org/wiki/Interface_description_language">IDL</abbrev> had been chosen to describe an XML document programming interface. As a first example we take an excerpt from the <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym>'s <link xlink:href="http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-1950641247">Node</link> interface definition:</para> <programlisting>interface Node { // NodeType const unsigned short ELEMENT_NODE = 1; const unsigned short ATTRIBUTE_NODE = 2; const unsigned short TEXT_NODE = 3; ... readonly attribute DOMString nodeName; attribute DOMString nodeValue; // raises(DOMException) on setting // raises(DOMException) on retrieval readonly attribute unsigned short nodeType; readonly attribute Node parentNode; ... readonly attribute NodeList childNodes; readonly attribute Node firstChild; ... Node insertBefore(in Node newChild, in Node refChild) raises(DOMException); ...</programlisting> <para>If we want to implement the <abbrev xlink:href="http://en.wikipedia.org/wiki/Interface_description_language">IDL</abbrev> <classname>org.w3c.dom.Node</classname> specification in e.g. <link linkend="gloss_Java"><trademark>Java</trademark></link> a language binding has to be defined. This means writing <link linkend="gloss_Java"><trademark>Java</trademark></link> code which closely resembles the <abbrev xlink:href="http://en.wikipedia.org/wiki/Interface_description_language">IDL</abbrev> specification. Obviously this task depends on and is restricted by the constructs being offered by the target programming language. The W3C <link xlink:href="http://www.w3.org/TR/DOM-Level-3-Core/java-binding.html">defines</link> the <link linkend="gloss_Java"><trademark>Java</trademark></link> <classname>org.w3c.dom.Node</classname> interface by:</para> <programlisting language="java">package org.w3c.dom; public interface Node { public static final short ELEMENT_NODE = 1; // Node Types public static final short ATTRIBUTE_NODE = 2; public static final short TEXT_NODE = 3; ... public String getNodeName(); public String getNodeValue() throws DOMException; public void setNodeValue(String nodeValue) throws DOMException; public short getNodeType(); public Node getParentNode(); public NodeList getChildNodes(); public Node getFirstChild(); ... public Node insertBefore(Node newChild, Node refChild) throws DOMException; ... }</programlisting> <para>We take <methodname>org.w3c.dom.Node.getChildNodes()</methodname> as an example:</para> <figure xml:id="domRetrieveChildren"> <title>Retrieving child nodes of a given context node</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/domtree.fig" scale="65"/> </imageobject> </mediaobject> </figure> <para>The <classname>org.w3c.dom.Node</classname> interface offers a set of common operations for objects being part of a XML document. But a XML document tree contains different types of nodes such as:</para> <itemizedlist> <listitem> <para>Elements</para> </listitem> <listitem> <para>Attributes</para> </listitem> <listitem> <para>Entities</para> </listitem> </itemizedlist> <para>An XML API may address this issue by offering data types to represent these different kinds of nodes. The <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> <link linkend="gloss_Java"><trademark>Java</trademark></link> Binding defines an inheritance hierarchy of interfaces for this purpose:</para> <figure xml:id="domJavaNodeInterfaces"> <title>Inheritance interface hierarchy in the <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> <link linkend="gloss_Java"><trademark>Java</trademark></link> binding</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/nodeHierarchy.svg"/> </imageobject> </mediaobject> </figure> <para>Two commonly used <link linkend="gloss_Java"><trademark>Java</trademark></link> implementations of these interfaces are:</para> <variablelist> <varlistentry> <term>Xerces</term> <listitem> <para><orgname xlink:href="http://xml.apache.org/xerces2-j">Apache Software foundation</orgname></para> </listitem> </varlistentry> <varlistentry> <term>Jaxp</term> <listitem> <para><orgname xlink:href="http://java.sun.com/xml/jaxp">Sun microsystems</orgname></para> </listitem> </varlistentry> </variablelist> <para>Both implementations offer additional interfaces beyond the <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym>'s scope.</para> <para>Going back to the <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> itself the specification is divided into <link xlink:href="http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/introduction.html#DOMArchitecture-h2">modules</link>:</para> <figure xml:id="figureDomModules"> <title><acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> modules.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/dom-architecture.screen.png"/> </imageobject> </mediaobject> </figure> </section> <section xml:id="domCreate"> <title>Creating a new document from scratch</title> <titleabbrev>New document</titleabbrev> <para>If we want to export non-XML content (e.g. from a RDBMS) into XML we may achieve this by the following recipe:</para> <orderedlist> <listitem> <para>Create a document builder instance.</para> </listitem> <listitem> <para>Create an empty <link xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/Document.html">Document</link> instance.</para> </listitem> <listitem> <para>Fill in the desired Elements and Attributes.</para> </listitem> <listitem> <para>Create a serializer.</para> </listitem> <listitem> <para>Serialize the resulting tree to a stream.</para> </listitem> </orderedlist> <para>An introductory piece of code illustrates these steps:</para> <figure xml:id="simpleDomCreate"> <title>Creation of a XML document instance from scratch.</title> <programlisting language="java">package dom; ... public class CreateDoc { public static void main(String[] args) throws Exception { // Create the root element <emphasis role="bold">final Element titel = new Element("titel"); </emphasis> //Set a date <emphasis role="bold">titel.setAttribute("date", "23.02.2000");</emphasis> // Append a text node as child <emphasis role="bold">titel.addContent(new Text("Versuch 1"));</emphasis> // Set formatting for the XML output <emphasis role="bold">final Format outFormat = Format.getPrettyFormat();</emphasis> // Serialize to console <emphasis role="bold">final XMLOutputter printer = new XMLOutputter(outFormat); printer.output(titel, System.out);</emphasis> } }</programlisting> </figure> <para>We get the following result:</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <titel date="23.02.2000">Versuch 1</titel></programlisting> </section> <section xml:id="domCreateExercises"> <title>Exercises</title> <qandaset role="exercise"> <title>A sub structured <tag class="starttag">title</tag></title> <qandadiv> <qandaentry xml:id="createDocModify"> <question> <label>Creation of an extended XML document instance</label> <para>In order to run the examples given during the lecture the <filename xlink:href="http://www.jdom.org/downloads">jdom2.jar</filename> library must be added to the <envar>CLASSPATH</envar>.</para> <para>The <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> creating example given before may be used as a starting point. Extend the <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> tree created in <xref linkend="simpleDomCreate"/> to produce an extended XML document:</para> <programlisting><title> <long>The long version of this title</long> <short>Short version</short> </title></programlisting> </question> <answer> <programlisting language="java">package dom; ... public class CreateExtended { /** * @param args * @throws IOException */ public static void main(String[] args) throws IOException { final Element titel = new Element("titel"), tLong = new Element("long"), tShort = new Element("short"); <emphasis role="bold">// Append <long> and <short> to parent <title></emphasis> titel.addContent(tLong).addContent(tShort); <emphasis role="bold">// Append text to <long> and <short></emphasis> tLong.addContent(new Text("The long version of this title")); tShort.addContent(new Text("Short version")); <emphasis role="bold">// Set formatting for the XML output</emphasis> Format outFormat = Format.getPrettyFormat(); <emphasis role="bold">// Serialize to console</emphasis> final XMLOutputter printer = new XMLOutputter(outFormat); printer.output(titel, System.out); } }</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="domParse"> <title>Parsing existing XML documents</title> <titleabbrev>Parsing</titleabbrev> <para>We already used a <acronym xlink:href="http://www.saxproject.org">SAX</acronym> to parse an XML document. Rather than handling <acronym xlink:href="http://www.saxproject.org">SAX</acronym> events ourselves these events may be used to construct a <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> representation of our document. This work is done by an instance of. We use our catalog example from <xref linkend="simpleCatalog"/> as an introductory example.</para> <para>We already noticed the need for an <classname>org.xml.sax.ErrorHandler</classname> object during <acronym xlink:href="http://www.saxproject.org">SAX</acronym> processing. A <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> Parser requires a similar type of Object in order to react to parsing errors in a meaningful way. In principle a <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> parser implementor is free to choose his implementation but most implementations are based on top of a <acronym xlink:href="http://www.saxproject.org">SAX</acronym> parser. For this reason it was natural to choose a <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> error handling interface which is similar to a <acronym xlink:href="http://www.saxproject.org">SAX</acronym> <classname>org.xml.sax.ErrorHandler</classname>. The following code serves the needs described before:</para> <figure xml:id="domTreeTraversal"> <title>Accessing a XML Tree purely by <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> methods.</title> <programlisting language="java">package dom; ... public class ArticleOrder { <emphasis role="bold"> // Though we are playing DOM here, a <acronym xlink:href="http://www.saxproject.org">SAX</acronym> parser still // assembles our DOM tree.</emphasis> private SAXBuilder builder = new SAXBuilder(); public ArticleOrder() { <emphasis role="bold">// Though an ErrorHandler is not strictly required it allows // for easierlocalization of XML document errors</emphasis> builder.setErrorHandler(new MySaxErrorHandler(System.out));<co linkends="domSetSaxErrorHandler-co" xml:id="domSetSaxErrorHandler"/> } /** Descending a catalog till its <item> elements. For each product * its name and order number are being written to the output. * @throws ... */ public void process(final String filename) throws JDOMException, IOException { <emphasis role="bold">// Parsing our XML file</emphasis> final Document docInput = builder.build(filename); <emphasis role="bold">// Accessing the document's root element</emphasis> final Element docRoot = docInput.getRootElement(); <emphasis role="bold">// Accessing the <item> children of parent element <catalog></emphasis> final List<Element> items = docRoot.getChildren(); // Element nodes only for (final Element item : items) { System.out.println("Article: " + item.getText() + ", order number: " + item.getAttributeValue("orderNo")); } ...</programlisting> <para>Note <coref linkend="domSetSaxErrorHandler" xml:id="domSetSaxErrorHandler-co"/>: This is our standard <acronym xlink:href="http://www.saxproject.org">SAX</acronym> error handler implementing the <classname>org.xml.sax.ErrorHandler</classname> interface.</para> </figure> <para>Executing this method needs a driver instance providing an input XML filename:</para> <programlisting language="java">package dom; ... public class ArticleOrderDriver { public static void main(String[] argv) throws Exception { final ArticleOrder ao = new ArticleOrder(); ao.process("<emphasis role="bold">Input/article.xml</emphasis>"); } }</programlisting> <para>This yields:</para> <programlisting>Article: Swinging headset, order number: 3218 Article: 200W Stereo Amplifier, order number: 9921</programlisting> <para>To illustrate the internal processes we take a look at the sequence diagram:</para> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/sequenceDomParser.svg"/> </imageobject> </mediaobject> <qandaset role="exercise"> <title>Creating HTML output</title> <qandadiv> <qandaentry xml:id="exercise_domHtmlSimple"> <question> <label>Simple HTML output</label> <para>Instead exporting simple text output in <xref linkend="domTreeTraversal"/> we may also create HTML pages like:</para> <programlisting><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>Available articles</title> </head> <body> <h1>Available articles</h1> <table> <tbody> <tr> <th align="left">Article Description</th><th>Order Number</th> </tr> <tr> <td align="left"><emphasis role="bold">Swinging headset</emphasis></td><td><emphasis role="bold">3218</emphasis></td> </tr> <tr> <td align="left"><emphasis role="bold">200W Stereo Amplifier</emphasis></td><td><emphasis role="bold">9921</emphasis></td> </tr> </tbody> </table> </body> </html></programlisting> <para>Instead of simply writing <code>...println(<html>\n\t<head>...)</code> statements you are expected to code a more sophisticated solution. We may combine<xref linkend="createDocModify"/> and <xref linkend="createDocModify"/>. The idea is reading the XML catalog instance as a <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> as before. Then construct a <emphasis>second</emphasis> <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> tree for the desired HTML output and fill in the article information from the first <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> tree accordingly.</para> </question> <answer> <para>We introduce a class <classname>solve.dom.HtmlTree</classname>:</para> <programlisting language="java">package solve.dom; ... package solve.dom; import java.io.IOException; import java.io.PrintStream; import org.jdom2.DocType; import org.jdom2.Document; import org.jdom2.Element; import org.jdom2.Text; import org.jdom2.output.Format; import org.jdom2.output.XMLOutputter; /** * Holding a HTML DOM to produce output. * @author goik */ public class HtmlTree { private Document htmlOutput; private Element tableBody; public HtmlTree(final String titleText, final String[] tableHeaderFields) { <co linkends="programlisting_catalog2html_htmlskel_co" xml:id="programlisting_catalog2html_htmlskel"/> DocType doctype = new DocType("html", "-//W3C//DTD XHTML 1.0 Strict//EN", "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"); final Element htmlRoot = new Element("html"); <co linkends="programlisting_catalog2html_tablehead_co" xml:id="programlisting_catalog2html_tablehead"/> htmlOutput = new Document(htmlRoot); htmlOutput.setDocType(doctype); // We create a HTML skeleton including an "empty" table final Element head = new Element("head"), body = new Element("body"), table = new Element("table"); htmlRoot.addContent(head).addContent(body); head.addContent(new Element("title").addContent(new Text(titleText))); body.addContent(new Element("h1").addContent(new Text(titleText))); body.addContent(table); tableBody = new Element("tbody"); table.addContent(tableBody); final Element tr = tableBody.addContent(new Element("tr")); for (final String headerField: tableHeaderFields) { tr.addContent(new Element("th").addContent(new Text(headerField))); } } public void appendItem(final String itemName, final String orderNo) {<co linkends="programlisting_catalog2html_insertproduct_co" xml:id="programlisting_catalog2html_insertproduct"/> final Element tr = new Element("tr"); tableBody.addContent(tr); tr.addContent(new Element("td").addContent(new Text(itemName))); tr.addContent(new Element("td").addContent(new Text(orderNo))); } public void serialize(PrintStream out){ // Set formatting for the XML output final Format outFormat = Format.getPrettyFormat(); // Serialize to console final XMLOutputter printer = new XMLOutputter(outFormat); try { printer.output(htmlOutput, System.out); } catch (IOException e) { e.printStackTrace(); System.exit(1); } } /** * @return the table's <tbody> element */ public Element getTable() { return tableBody; } } </programlisting> <calloutlist> <callout arearefs="programlisting_catalog2html_htmlskel" xml:id="programlisting_catalog2html_htmlskel_co"> <para>A basic HTML skeleton is is being created:</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Available articles</title> </head> <body> <h1>Available articles</h1> <table> <emphasis role="bold"><tbody></emphasis> <!-- Data to be inserted here in next step --> <emphasis role="bold"></tbody></emphasis> </table> </body> </html></programlisting> <para>The table containing the product's data is empty at this point and thus invalid.</para> </callout> <callout arearefs="programlisting_catalog2html_tablehead" xml:id="programlisting_catalog2html_tablehead_co"> <para>The table's header is appended but the actual data from our two products is still missing:</para> <programlisting>... <h1>Available articles</h1> <table> <tbody> <tr> <th>Article Description</th> <th>Order Number</th> <emphasis role="bold"></tr></emphasis><!-- Data to be appended after this row in next step --> <emphasis role="bold"></tbody></emphasis> </table> ...</programlisting> </callout> <callout arearefs="programlisting_catalog2html_insertproduct" xml:id="programlisting_catalog2html_insertproduct_co"> <para>Calling <methodname>solve.dom.HtmlTree.appendItem(String,String)</methodname> once per product completes the creation of our HTML DOM tree:</para> <programlisting>... </tr> <tr> <td>Swinging headset</td> <td>3218</td> </tr> <tr> <td>200W Stereo Amplifier</td> <td>9921</td> </tr> </tbody> ...</programlisting> </callout> </calloutlist> <para>The class <classname>solve.dom.Article2Html</classname> reads the catalog data:</para> <programlisting language="java">package solve.dom; ... public class Article2Html { private final SAXBuilder builder = new SAXBuilder(); private final HtmlTree htmlResult; public Article2Html() { builder.setErrorHandler(new MySaxErrorHandler(System.out)); htmlResult = new HtmlTree("Available articles", new String[] { <co linkends="programlisting_catalog2html_glue_createhtmldom_co" xml:id="programlisting_catalog2html_glue_createhtmldom"/> "Article Description", "Order Number" }); } /** Read an Xml catalog instance and insert product names among with their * order numbers into the HTML DOM. Then serialize HTML tree to a stream. * * @param * filename of the Xml source. * @param out * The output stream for HTML serialization. * @throws IOException * @throws JDOMException */ public void process(final String filename, final PrintStream out) throws JDOMException, IOException{ final List<Element> items = builder.build(filename).getRootElement().getChildren(); for (final Element item : items) { <co linkends="programlisting_catalog2html_glue_prodloop_co" xml:id="programlisting_catalog2html_glue_prodloop"/> htmlResult.appendItem(item.getText(), item.getAttributeValue("orderNo")); <co linkends="programlisting_catalog2html_glue_insertprod_co" xml:id="programlisting_catalog2html_glue_insertprod"/> } htmlResult.serialize(out); <co linkends="programlisting_catalog2html_glue_serialize_co" xml:id="programlisting_catalog2html_glue_serialize"/> } }</programlisting> <calloutlist> <callout arearefs="programlisting_catalog2html_glue_createhtmldom" xml:id="programlisting_catalog2html_glue_createhtmldom_co"> <para>Create an instance holding a HTML <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> with a table header containing the strings <emphasis>Article Description</emphasis> and <emphasis>Order Number</emphasis>.</para> </callout> <callout arearefs="programlisting_catalog2html_glue_prodloop" xml:id="programlisting_catalog2html_glue_prodloop_co"> <para>Iterate over all product nodes.</para> </callout> <callout arearefs="programlisting_catalog2html_glue_insertprod" xml:id="programlisting_catalog2html_glue_insertprod_co"> <para>Insert the product's name an order number into the HTML <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym>.</para> </callout> <callout arearefs="programlisting_catalog2html_glue_serialize" xml:id="programlisting_catalog2html_glue_serialize_co"> <para>Serialize the completed HTML <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> tree to the output stream.</para> </callout> </calloutlist> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="domJavaScript"> <title>Using <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> with HTML/Javascript</title> <para>Due to script language support in a variety of browsers we may also use the <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> to implement client side event handling. As an example we <link xlink:href="Ref/src/tablesort.html">demonstrate</link> how a HTML table can be made sortable by clicking on a header's column. The example code along with the code description can be found at <uri xlink:href="http://www.kryogenix.org/code/browser/sorttable">http://www.kryogenix.org/code/browser/sorttable</uri>.</para> <para>Quite remarkably there are only few ingredients required to enrich an ordinary static HTML table with this functionality:</para> <itemizedlist> <listitem> <para>An external Javascript library has to be included via <code><script type="text/javascript" src="sorttable.js"></code></para> </listitem> <listitem> <para>Each sortable HTML table needs:</para> <itemizedlist> <listitem> <para>A unique <code>id</code> attribute</para> </listitem> <listitem> <para>A <code>class="sortable"</code> attribute</para> </listitem> </itemizedlist> </listitem> </itemizedlist> </section> <section xml:id="domXpath"> <title>Using <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym></title> <para><xref linkend="domTreeTraversal"/> demonstrated the possibility to traverse trees solely by using <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> Method calls. Though this approach is possible it will in general not lead to stable applications. Real world examples are often based on large XML documents with complex hierarchical structures. Thus using this rather primitive approach deeply nested method calls are necessary to access desired sets of nodes. In addition changing a DTD will require rewriting large code portions..</para> <para>As we already know from <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> transformations <code>Xpath</code> allows to address node sets inside a XML tree. The role of <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> can be compared to SQL queries when working with relational databases. <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> may also be used within <link linkend="gloss_Java"><trademark>Java</trademark></link> code. As a first example we show an image filename extracting application operating on XHTML documents. The following example contains three <tag class="starttag">img</tag> elements:</para> <figure xml:id="htmlGallery"> <title>A HTML document containing <code>IMG</code> tags.</title> <programlisting><?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> <head> <title>Picture gallery</title> </head> <body> <h1>Picture gallery</h1> <p>Images may appear inline:<emphasis role="bold"><img src="inline.gif" alt="none"/></emphasis></p> <table> <tbody> <tr> <td>Number one:</td> <td><emphasis role="bold"><img src="one.gif" alt="none"/></emphasis></td> </tr> <tr> <td>Number two:</td> <td><emphasis role="bold"><img src="http://www.hdm-stuttgart.de/favicon.ico" alt="none"/></emphasis></td> </tr> </tbody> </table> </body> </html> </programlisting> </figure> <para>A given HTML document may contain <tag class="starttag">img</tag> elements at <emphasis>arbitrary</emphasis> positions. It is sometimes desirable to check for existence and accessibility of such external objects being necessary for the page's correct rendering. A simple XSL script will do first part the job namely extracting the <tag class="starttag">img</tag> elements:</para> <figure xml:id="gallery2imagelist"> <title>A <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> script for image name extraction.</title> <programlisting><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:html="http://www.w3.org/1999/xhtml"> <xsl:output method="text"/> <xsl:template match="/"> <xsl:for-each select="//html:img"> <xsl:value-of select="@src"/> <xsl:text> </xsl:text> </xsl:for-each> </xsl:template> </xsl:stylesheet></programlisting> </figure> <para>Note the necessity for <code>html</code> namespace inclusion into the <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> expression in <code><xsl:for-each select="//html:img"></code>. A simple <code>select="//img"></code> results in an empty node set. Executing the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> script yields a list of image filenames being contained in the HTML page i.e. <code>inline.gif one.gif two.gif</code>.</para> <para>Now we want to write a <link linkend="gloss_Java"><trademark>Java</trademark></link> application which allows to check whether these referenced image files do exist and have sufficient permissions to be accessed. A simple approach may pipe the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> output to our application which then executes the readability checks. Instead we want to incorporate the <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> based search into the application. Ignoring Namespaces and trying to resemble the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> actions as closely as possible our application will have to search for <link xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/Element.html">Element</link> Nodes by the <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> expression <code>//html:img</code>:</para> <figure xml:id="domFindImages"> <title>Extracting <tag class="emptytag">img</tag> element image references from a HTML document.</title> <programlisting language="java">package dom.xpath; ... public class DomXpath { private final SAXBuilder builder = new SAXBuilder(); public DomXpath() { builder.setErrorHandler(new MySaxErrorHandler(System.err)); } public void process(final String xhtmlFilename) throws JDOMException, IOException { final Document htmlInput = builder.build(xhtmlFilename);<co linkends="programlisting_java_searchimg_parse_co" xml:id="programlisting_java_searchimg_parse"/> final XPathExpression<Object> xpath = XPathFactory.instance().compile( "//img" ); <co linkends="programlisting_java_searchimg_pf_co" xml:id="programlisting_java_searchimg_pf"/> <co linkends="programlisting_java_searchimg_newxpath_co" xml:id="programlisting_java_searchimg_newxpath"/> final List<Object> images = xpath.evaluate(htmlInput);<co linkends="programlisting_java_searchimg_execquery_co" xml:id="programlisting_java_searchimg_execquery"/> for (Object o: images) { <co linkends="programlisting_java_searchimg_loop_co" xml:id="programlisting_java_searchimg_loop"/> final Element image = (Element ) o;<co linkends="programlisting_java_searchimg_cast_co" xml:id="programlisting_java_searchimg_cast"/> System.out.print(image.getAttribute("src") + " "); } } }</programlisting> <caption> <para>This application searches for <tag class="emptytag">img</tag> elements and shows their <code>src</code> attribute value.</para> </caption> </figure> <calloutlist> <callout arearefs="programlisting_java_searchimg_parse" xml:id="programlisting_java_searchimg_parse_co"> <para>Parse a XHTML document instance into a DOM tree.</para> </callout> <callout arearefs="programlisting_java_searchimg_pf" xml:id="programlisting_java_searchimg_pf_co"> <para>Create a <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> factory.</para> </callout> <callout arearefs="programlisting_java_searchimg_newxpath" xml:id="programlisting_java_searchimg_newxpath_co"> <para>Create a <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> query instance. This may be used to search for a set of nodes starting from a context node.</para> </callout> <callout arearefs="programlisting_java_searchimg_execquery" xml:id="programlisting_java_searchimg_execquery_co"> <para>Using the document's root node as the context node we search for <tag class="starttag">img</tag> elements appearing at arbitrary positions in our document.</para> </callout> <callout arearefs="programlisting_java_searchimg_loop" xml:id="programlisting_java_searchimg_loop_co"> <para>We iterate over the retrieved list of images.</para> </callout> <callout arearefs="programlisting_java_searchimg_cast" xml:id="programlisting_java_searchimg_cast_co"> <para>Casting to the correct type.</para> </callout> </calloutlist> <para>The result is a list of image filename references:</para> <programlisting>inline.gif one.gif http://www.hdm-stuttgart.de/favicon.ico </programlisting> <qandaset role="exercise"> <title>Legal casting?</title> <qandadiv> <qandaentry> <question> <para>Why is the cast in <coref linkend="programlisting_java_searchimg_cast"/> in <xref linkend="domFindImages"/> guaranteed to never cause a <classname>java.lang.ClassCastException</classname>?</para> </question> <answer> <para>The <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> <code>//img</code> expression is guaranteed to return only <tag class="starttag">img</tag> elements. Thus within our <link linkend="gloss_Java"><trademark>Java</trademark></link> context we are sure to find only <classname>org.jdom2.Element</classname> instances.</para> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset role="exercise"> <title>Verification of referenced images readability</title> <qandadiv> <qandaentry xml:id="exercise_htmlImageVerify"> <question> <para>We want to extend the example given in <xref linkend="domFindImages"/> by testing the existence and checking for readability of referenced images. The following HTML document contains <quote>dead</quote> image references:</para> <programlisting xml:id="domCheckImageAccessibility"><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> ... <body> <h1>External Pictures</h1> <p>A local image reference:<img src="inline.gif" alt="none"/></p> <table> <tbody> <tr> <td>An existing picture:</td> <td><img src="http://www.hdm-stuttgart.de/bilder_navigation/laptop.gif" alt="none"/></td> </tr> <tr> <td>A non-existing picture:</td> <td><img src="<emphasis role="bold">http://www.hdm-stuttgart.de/rotfl.gif</emphasis>" alt="none"/></td> </tr> </tbody> </table> </body> </html></programlisting> <para>Write an application which checks for readability of <abbrev xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev> image references to <emphasis>external</emphasis> Servers starting either with <code>http://</code> or <code>ftp://</code> ignoring other protocol types. Internal image references referring to the <quote>current</quote> server typically look like <code><img src="/images/test.gif"</code>. So in order to distinguish these two types of references we may use the XSL built in function <link xlink:href="http://www.cafeconleche.org/books/bible2/chapters/ch17.html">starts-with()</link> testing for the <code>http</code> or <code>ftp</code> protocol definition part of an <abbrev xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev>. A possible output for the example being given is:</para> <programlisting>Received 'sun.awt.image.URLImageSource' from http://www.hdm-stuttgart.de/bilder_navigation/laptop.gif Unable to open 'http://www.hdm-stuttgart.de/rotfl.gif'</programlisting> <para>The following code snippet shows a helpful class method to check for both correctness of <abbrev xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev>'s and accessibility of referenced objects:</para> <programlisting language="java">package dom.xpath; ... public class CheckUrl { public static void checkReadability(final String urlRef) { try { final URL url = new URL(urlRef); try { final Object imgCandidate = url.getContent(); if (null == imgCandidate) { System.err.println("Unable to open '" + urlRef + "'"); } else { System.out.println("Received '" + imgCandidate.getClass().getName() + "' from " + urlRef); } } catch (IOException e) { System.err.println("Unable to open '" + urlRef + "'"); } } catch (MalformedURLException e) { System.err.println("Adress '" + urlRef + "' is malformed"); } } }</programlisting> </question> <answer> <para>We are interested in the set of images within a given HTML document containing an <link xlink:href="http://www.w3.org/Addressing">URL</link> reference starting either with <code>http://</code> or <code>ftp://</code>. This is achieved by the following <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> expression:</para> <programlisting>//html:img[starts-with(@src, 'http://') or starts-with(@src, 'ftp://')]</programlisting> <para>The application only needs to pass the corresponding <abbrev xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev>'s to the method <link xlink:href="domCheckUrlObjectExistence">CheckUrl.checkReadability()</link>. The rest of the code is identical to the <link linkend="domFindImages">introductory example</link>:</para> <informalfigure xml:id="solutionFintExtImgRef"> <programlisting language="java">package dom.xpath; ... public class CheckExtImage { private final SAXBuilder builder = new SAXBuilder(); public CheckExtImage() { builder.setErrorHandler(new MySaxErrorHandler(System.err)); } public void process(final String xhtmlFilename) throws JDOMException, IOException { final Document htmlInput = builder.build(xhtmlFilename); final XPathExpression<Object> xpath = XPathFactory.instance().compile( "<emphasis role="bold">//img[starts-with(@src, 'http://') or starts-with(@src, 'ftp://')]</emphasis>"); final List<Object> images = xpath.evaluate(htmlInput); for (Object o: images) { final Element image = (Element ) o; <emphasis role="bold">CheckUrl.checkReadability(image.getAttributeValue("src"));</emphasis> } } }</programlisting> </informalfigure> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="domXsl"> <title><acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> and <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev></title> <para><link linkend="gloss_Java"><trademark>Java</trademark></link> based <link linkend="gloss_XML"><abbrev>XML</abbrev></link> applications may use XSL style sheets for processing. A <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> tree may for example be transformed into another tree. The package <link xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/transform/package-frame.html">javax.xml.transform</link> provides interfaces and classes for this purpose. We consider the following product catalog example:</para> <figure xml:id="climbingCatalog"> <title>A simplified <link linkend="gloss_XML"><abbrev>XML</abbrev></link> product catalog</title> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE <emphasis role="bold">catalog</emphasis> SYSTEM "<emphasis role="bold">catalog.dtd</emphasis>"> <catalog> <title>Climbing gear</title> <introduction> <para>We offer a great variety of basic stuff for mountaineering such as ropes, harnesses and runners.</para> <para>Our shop is proud on its large number of sleeping bags available.</para> </introduction> <product id="x-223"> <title>Multi freezing bag Nightmare camper</title> <description> <para>You will feel comfortable till minus 20 degrees - At least if you are a penguin or a polar bear.</para> </description> </product> <product id="r-334"> <title>Rope 40m</title> <description> <para>Excellent for indoor climbing.</para> </description> </product> </catalog></programlisting> <para>A corresponding DTD is straightforward:</para> <programlisting><!ELEMENT catalog (title, introduction, product+) > <!ELEMENT introduction (para+) > <!ELEMENT title (#PCDATA) > <!ELEMENT product (title, description) > <!ATTLIST product id ID #REQUIRED price NMTOKEN #IMPLIED> <!ELEMENT description (para+) > <!ELEMENT para (#PCDATA) ></programlisting> </figure> <para>A <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style sheet may be used to transform this document into the HTML Format:</para> <figure xml:id="catalog2html"> <title>A <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style sheet for catalog transformation to HTML.</title> <programlisting><?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" xmlns="http://www.w3.org/1999/xhtml"> <xsl:template match="/catalog"> <html> <head><title><xsl:value-of select="title"/></title></head> <body style="background-color:#FFFFFF"> <h1><xsl:value-of select="title"/></h1> <xsl:apply-templates select="product"/> </body> </html> </xsl:template> <xsl:template match="product"> <h3><xsl:value-of select="title"/></h3> <xsl:for-each select="description/para"> <p><xsl:value-of select="."/></p> </xsl:for-each> <xsl:if test="price"> <p> <xsl:text>Price:</xsl:text> <xsl:value-of select="price/@value"/> </p> </xsl:if> </xsl:template> </xsl:stylesheet></programlisting> </figure> <para>As a preparation for <xref linkend="exercise_catalogRdbms"/> we now demonstrate the usage of <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> within a <link linkend="gloss_Java"><trademark>Java</trademark></link> application. This is done by a <link xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/transform/Transformer.html">Transformer</link> instance:</para> <figure xml:id="xml2xml"> <title>Transforming an XML document instance to HTML by a XSL style sheet.</title> <programlisting language="java">package dom.xsl; ... public class Xml2Html { private final SAXBuilder builder = new SAXBuilder(); final XSLTransformer transformer; public Xml2Html(final String xslFilename) throws XSLTransformException { builder.setErrorHandler(new MySaxErrorHandler(System.err)); transformer = new XSLTransformer(xslFilename); } public void transform(final String xmlInFilename, final String resultFilename) throws JDOMException, IOException { final Document inDoc = builder.build(xmlInFilename); Document result = transformer.transform(inDoc); // Set formatting for the XML output final Format outFormat = Format.getPrettyFormat(); // Serialize to console final XMLOutputter printer = new XMLOutputter(outFormat); printer.output(result.getDocument(), System.out); } }</programlisting> </figure> <para>A corresponding driver file is needed to invoke a transformation:</para> <figure xml:id="xml2xmlDriver"> <title>A driver class for the xml2xml transformer.</title> <programlisting language="java">package dom.xsl; ... public class Xml2HtmlDriver { ... public static void main(String[] args) { final String inFilename = "Input/Dom/climbing.xml", xslFilename = "Input/Dom/catalog2html.xsl", htmlOutputFilename = "Input/Dom/climbing.html"; try { final Xml2Html converter = new Xml2Html(xslFilename); converter.transform(inFilename, htmlOutputFilename); } catch (Exception e) { System.err.println("The conversion of '" + inFilename + "' by stylesheet '" + xslFilename + "' to output HTML file '" + htmlOutputFilename + "' failed with the following error:" + e); e.printStackTrace(); } } }</programlisting> </figure> <qandaset role="exercise"> <title>HTML from XML and relational data</title> <qandadiv> <qandaentry xml:id="exercise_catalogRdbms"> <question> <label>Catalogs and RDBMS</label> <para>We want to extend the transformation being described before in <xref linkend="xml2xml"/> by reading price information from a RDBMS. Consider the following schema and <code>INSERT</code>s:</para> <programlisting>CREATE TABLE Product( orderNo CHAR(10) ,price NUMERIC(10,2) ); INSERT INTO Product VALUES('x-223', 330.20); INSERT INTO Product VALUES('w-124', 110.40);</programlisting> <para>Adding prices may be implemented the following way:</para> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/xml2html.fig"/> </imageobject> </mediaobject> <para>You may implement this by following these steps:</para> <orderedlist> <listitem> <para>You may reuse class <classname>sax.rdbms.RdbmsAccess</classname> from <xref linkend="saxRdbms"/>.</para> </listitem> <listitem> <para>Use the previous class to modify <xref linkend="xml2xml"/> by introducing a new method <code>addPrices(final Document catalog)</code> which adds prices to the <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> tree accordingly. The insertion points may be reached by an <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> expression.</para> </listitem> </orderedlist> </question> <answer> <para>The additional functionality on top of <xref linkend="xml2xml"/> is represented by a method <methodname>dom.xsl.XmlRdbms2Html.addPrices()</methodname>. This method modifies the <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> input tree prior to applying the XSL. Prices are being inserting based on data received from an RDBMS via <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>:</para> <programlisting language="java">package dom.xsl; ... public class XmlRdbms2Html { private final SAXBuilder builder = new SAXBuilder(); DbAccess db = new DbAccess(); final XSLTransformer transformer; Document catalog; final org.jdom2.xpath.XPathExpression<Object> selectProducts = XPathFactory.instance().compile("/catalog/product"); /** * @param xslFilename the stylesheet being used for subsequent * transformations by {@link #transform(String, String)}. * * @throws XSLTransformException */ public XmlRdbms2Html(final String xslFilename) throws XSLTransformException { builder.setErrorHandler(new MySaxErrorHandler(System.err)); transformer = new XSLTransformer(xslFilename); } /** * The actual workhorse carrying out the transformation * and adding prices from the database table. * * @param xmlInFilename input file to be transformed * @param resultFilename the result file holding the generated HTML document * @throws JDOMException The transformation may fail for various reasons. * @throws IOException */ public void transform(final String xmlInFilename, final String resultFilename) throws JDOMException, IOException { catalog = builder.build(xmlInFilename); addPrices(); final Document htmlResult = transformer.transform(catalog); // Set formatting for the XML output final Format outFormat = Format.getPrettyFormat(); // Serialize to console final XMLOutputter printer = new XMLOutputter(outFormat); printer.output(htmlResult, System.out); } private void addPrices() { final List<Object> products = selectProducts.evaluate(catalog.getRootElement()); db.connect("jdbc:mysql://localhost:3306/hdm", "hdmuser", "XYZ"); for (Object p: products) { final Element product = (Element ) p; final String productId = product.getAttributeValue("id"); product.setAttribute("price", db.readPrice(productId)); } db.close(); } }</programlisting> <para>The method <code>addPrices(...)</code> utilizes our RDBMS access class:</para> <programlisting language="java">package dom.xsl; ... public class DbAccess { public void connect(final String jdbcUrl, final String userName, final String password) { try { conn = DriverManager.getConnection(jdbcUrl, userName, password); priceQuery = conn.prepareStatement(sqlPriceQuery); } catch (SQLException e) { System.err.println("Unable to open connection to database:" + e);} } public String readPrice(final String articleNumber) { String result; try { priceQuery.setString(1, articleNumber); final ResultSet rs = priceQuery.executeQuery(); if (rs.next()) { result = rs.getString("price"); } else { result = "No price available for article '" + articleNumber + "'"; } } catch (SQLException e) { result = "Error reading price for article '" + articleNumber + "':" + e; } return result; } ... }</programlisting> <para>Of course the connection details should be moved to a configuration file.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> </section> </chapter> <chapter xml:id="xsl"> <title>The Extensible Stylesheet Language XSL</title> <para>XSL is a <link xlink:href="http://www.w3.org/Style/XSL">W3C standard</link> which defines a language to transform XML documents into the following output formats:</para> <itemizedlist> <listitem> <para>Ordinary text e.g in <link xlink:href="http://unicode.org">Unicode</link> encoding.</para> </listitem> <listitem> <para>XML.</para> </listitem> <listitem> <para>HTML</para> </listitem> <listitem> <para>XHTML</para> </listitem> </itemizedlist> <para>Transforming a source XML document into a target XML document may be required if:</para> <itemizedlist> <listitem> <para>The target document expresses similar semantics but uses a different XML dialect i.e. different tag names.</para> </listitem> <listitem> <para>The target document is only a view on the source document. We may for example extract the chapter names from a <tag class="starttag">book</tag> document to create a table of contents.</para> </listitem> </itemizedlist> <section xml:id="xsl_helloworld"> <title>A <quote>Hello, world</quote> <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> example</title> <para>We start from an extended version of our <filename>memo.dtd</filename>:</para> <programlisting><!ELEMENT memo (from, to+, subject, content)> <!ATTLIST memo date CDATA #REQUIRED priority (low|medium|high) #IMPLIED> <!ELEMENT from (#PCDATA)> <!ATTLIST from id ID #IMPLIED > <!ELEMENT to (#PCDATA)> <!ATTLIST to id ID #IMPLIED > <!ELEMENT subject (#PCDATA)> <!ELEMENT content (para)+> <!ELEMENT para (#PCDATA|link)*> <!ELEMENT link (#PCDATA) > <!ATTLIST link linkend IDREF #REQUIRED ></programlisting> <para>This DTD allows a memo's document content to be structured into paragraphs. A paragraph may contain links either to the sender or to one of the memo's recipients.</para> <figure xml:id="figure_memoref_instance"> <title>A memo document instance with an internal reference.</title> <programlisting><?xml version="1.0" ?> <!DOCTYPE memo SYSTEM "memo.dtd"> <memo date="9.9.2099" priority="high"> <from id="goik">Martin Goik</from> <to>Adam Hacker</to> <to id="eve">Eve Intruder</to> <subject>Firewall problems</subject> <content> <para>Thanks for your excellent work.</para> <para>Our firewall is definitely broken! This bug has been reported by the <link linkend="goik">sender</link>.</para> </content> </memo></programlisting> </figure> <para>We want to extract the sender's name from an arbitrary <tag class="element">memo</tag> document instance. Using <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> this task can be accomplished by a script <filename>memo2sender.xsl</filename>:</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output method="text"/> <xsl:template match="/memo"> <xsl:value-of select="from"/> </xsl:template> </xsl:stylesheet></programlisting> <para>Before closer examining this code we first show its effect. We need a piece of software called a <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor. It reads both a <tag>memo</tag> document instance and a style sheet and produces the following output:</para> <programlisting><computeroutput>[goik@mupter Memoref]$ xml2xml message.xml memo2sender.xsl Martin Goik</computeroutput></programlisting> <para>The result is the sender's name <computeroutput>Martin Goik</computeroutput>. We may sketch the transformation principle:</para> <figure xml:id="figure_xsl_principle"> <title>An <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor transforming a XML document into a result using a stylesheet</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/xslconvert.fig"/> </imageobject> </mediaobject> </figure> <para>The executable <filename>xml2xml</filename> defined at the MI department is actually a script wrapping the <productname xlink:href="http://saxon.sourceforge.net">Saxon XSLT processor</productname>. We may also use the Eclipse/Oxygen plug in <!-- goik and <uri xlink:href="src/viewlet/xslt_config/xslt_config_viewlet_swf.html"> and define a transformation scenario</uri> thus --> replacing the shell command by a GUI. Next we closer examine the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> example code:</para> <programlisting><xsl:stylesheet <co xml:id="programlisting_helloxsl_stylesheet"/> xmlns:xsl <co xml:id="programlisting_helloxsl_namespace_abbv"/> ="http://www.w3.org/1999/XSL/Transform" version="2.0" <co xml:id="programlisting_helloxsl_xsl_version"/> > <xsl:output method="text" <co xml:id="programlisting_helloxsl_method_text"/>/> <xsl:template <co xml:id="programlisting_helloxsl_template"/> match <co xml:id="programlisting_helloxsl_match"/> ="/memo"> <xsl:value-of <co xml:id="programlisting_helloxsl_value-of"/> select <co xml:base="" xml:id="programlisting_helloxsl_valueof_select_att"/> ="from" /> </xsl:template> </xsl:stylesheet></programlisting> <calloutlist> <callout arearefs="programlisting_helloxsl_stylesheet"> <para>The element stylesheet belongs the the namespace <code>http://www.w3.org/1999/XSL/Transform</code>. This namespace is <emphasis>represented</emphasis> by the literal <literal>xsl</literal>. As an alternative we might also use <tag class="starttag">stylesheet xmlns="http://www.w3.org/1999/XSL/Transform"</tag> instead of <tag class="starttag">xsl:stylesheet ...</tag>. The value of the namespace itself gets defined next.</para> </callout> <callout arearefs="programlisting_helloxsl_namespace_abbv"> <para>The keyword <code>xmlns</code> is reserved by the <link xlink:href="http://www.w3.org/TR/REC-xml-names/">Namespaces in XML</link> specification. In <quote>pure</quote> XML the whole term <code>xmlns:xsl</code> would simply define an attribute. In presence of a namespace aware XML parser however the literal <literal>xsl</literal> represents the attribute value <tag class="attvalue">http://www.w3.org/1999/XSL/Transform</tag>. This value <emphasis>must not</emphasis> be changed! Otherwise a XSL converter will fail since it cannot distinguish processing instructions from other XML elements. An element <tag class="starttag">stylesheet</tag> belonging to a different namespace <code>http//someserver.org/SomeNamespace</code> may have to be generated.</para> </callout> <callout arearefs="programlisting_helloxsl_xsl_version"> <para>The <link xlink:href="http://www.w3.org/TR/xslt20">XSL standard</link> is still evolving. The version number identifies the conformance level for the subsequent code.</para> </callout> <callout arearefs="programlisting_helloxsl_method_text"> <para>The <tag class="attribute">method</tag> attribute in the <link xlink:href="http://www.w3.org/TR/xslt20/#element-output"><xsl:output></link> element specifies the type of output to be generated. Depending on this type we may also define indentation depths and/or encoding. Allowed <tag class="attvalue">method</tag> values are:</para> <glosslist> <glossentry> <glossterm>text</glossterm> <glossdef> <para>Ordinary text.</para> </glossdef> </glossentry> <glossentry> <glossterm>html</glossterm> <glossdef> <para><link xlink:href="http://www.w3.org/TR/html4">HTML</link> markup.</para> </glossdef> </glossentry> <glossentry> <glossterm>xhtml</glossterm> <glossdef> <para><link xlink:href="http://www.w3.org/TR/xhtml1">Xhtml</link> markup differing from the former by e.g. the closing <quote>/></quote> in <tag><img src="..."/></tag>.</para> </glossdef> </glossentry> <glossentry> <glossterm>xml</glossterm> <glossdef> <para>XML code. This is most commonly used to create views on or different dialects of a XML document instance.</para> </glossdef> </glossentry> </glosslist> </callout> <callout arearefs="programlisting_helloxsl_template"> <para>A <tag class="starttag">xsl:template</tag> defines the output that will be created for document nodes being defined by a selector.</para> </callout> <callout arearefs="programlisting_helloxsl_match"> <para>The attribute <tag class="attribute">match</tag> tells us for which nodes of a document instance the given <tag class="starttag">xsl:template</tag> is appropriate. In the given example the value <code>/memo</code> tells us that the template is only responsible for <tag class="element">memo</tag> nodes appearing at top level i.e. being the root element of the document instance.</para> </callout> <callout arch="" arearefs="programlisting_helloxsl_value-of programlisting_helloxsl_valueof_select_att"> <para>A <tag class="element">value-of</tag> element writes content to the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> process' output. In this example the <code>#PCDATA</code> content from the element <tag class="element">from</tag> will be written to the output.</para> </callout> </calloutlist> </section> <section xml:id="xpath"> <title><link xlink:href="http://www.w3.org/TR/xpath">XPath</link> and node sets</title> <para>The <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> standard allows us to retrieve node sets from XML documents by predicate based queries. Thus its role may be compared to <acronym xlink:href="http://en.wikipedia.org/wiki/Sql">SQL</acronym> <code>SELECT</code> ... <code>FROM</code> ...<code>WHERE</code> queries. Some simple examples:</para> <figure xml:id="fig_Xpath"> <title>Simple <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> queries</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/xpath.fig" scale="65"/> </imageobject> </mediaobject> </figure> <para>We are now interested in a list of all recipients being defined in a <tag class="element">memo</tag> element. We introduce the element <tag class="element">xsl:for-each</tag> which iterates over a result set of nodes:</para> <figure xml:id="programlisting_tolist_xpath"> <title>Iterating over the list of recipient nodes.</title> <programlisting><?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output method="text"/> <xsl:template match="/" <co xml:id="programlisting_tolist_match_root"/>> <xsl:for-each select="memo/to" <co xml:id="programlisting_tolist_xpath_memo_to"/> > <xsl:value-of select="." <co xml:id="programlisting_tolist_value_of"/> /> <xsl:text>,</xsl:text> <co xml:id="programlisting_tolist_xsl_text"/> </xsl:for-each> </xsl:template> </xsl:stylesheet></programlisting> </figure> <calloutlist> <callout arearefs="programlisting_tolist_match_root"> <para>This template matches the XML document instance, <emphasis>not</emphasis> the visible <tag class="element"><memo></tag> node.</para> </callout> <callout arearefs="programlisting_tolist_xpath_memo_to"> <para>The <link xlink:href="http://www.w3.org/TR/xpath">XPath</link> expression <tag class="attvalue">memo/to</tag> gets evaluated starting from the invisible top level document node being the context node. For the given document instance this will define a result set containing both <tag class="element"><to></tag> recipient nodes, see <xref linkend="figure_memo_xpath_memo_to"/>.</para> </callout> <callout arearefs="programlisting_tolist_value_of"> <para>The dot <quote>.</quote> represents the <code>#PCDATA</code> content of the current <tag class="element">to</tag> element.</para> </callout> <callout arearefs="programlisting_tolist_xsl_text"> <para>A comma is appended. This is not quite correct since it should be absent for the last element.</para> </callout> </calloutlist> <figure xml:id="figure_recipientlist_trailing_comma"> <title>A list of recipients.</title> <para>The <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> presented before yields:</para> <programlisting><computeroutput>Adam Hacker,Eve Intruder</computeroutput><emphasis role="bold">,</emphasis></programlisting> </figure> <para>Right now we do not bother about the trailing <quote>,</quote> after the last recipient. The surrounding <code><xsl:text></code>,<code></xsl:text></code> elements <emphasis>may</emphasis> be omitted. We encourage the reader to leave them in place since they increase readability when a template's body gets more complex. The element <tag class="starttag">xsl:text</tag> is used to append static text to the output. This way we append a separator after each recipient. We now discuss the role of the two attributes <tag class="attribute">match="/"</tag> and <tag class="attribute">select=memo/to</tag>. Both are examples of so called <link xlink:href="http://www.w3.org/TR/xpath">XPath</link> expressions. They allow to define <emphasis>node sets</emphasis> being subsets from the set of all nodes from a given document instance.</para> <para>Conceptually <link xlink:href="http://www.w3.org/TR/xpath">XPath</link> expressions may be compared to the <acronym xlink:href="http://en.wikipedia.org/wiki/Sql">SQL</acronym> language the latter allowing the retrieval of data<emphasis>sets</emphasis> from a relational database. We illustrate the current example by a figure:</para> <figure xml:id="figure_memo_xpath_memo_to"> <title>Selecting node sets from <tag class="element">memo</tag> document instances</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/memoxpath.fig"/> </imageobject> </mediaobject> </figure> <para>This figure needs some explanation. We observe an additional node <quote>above</quote> <tag class="starttag">memo</tag> being represented as <quote>filled</quote>. This node represents the document instance as a whole and has got <tag>memo</tag> as its only child. We will rediscover this additional root node when we discuss the <abbrev xlink:href="http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407">DOM</abbrev> application programming interface.</para> <para>As already mentioned the expression <code>memo/to</code> evaluates to a <emphasis>set</emphasis> of nodes. In our example this set consists of two nodes of type <tag class="starttag">to</tag> each of them representing a recipient of the memo. We observe a subtle difference between the two <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> expressions:</para> <glosslist> <glossentry> <glossterm><code>match="/"</code></glossterm> <glossdef> <para>The expression starts and actually consists of the string <quote>/</quote>. Thus it can be called an <emphasis>absolute</emphasis> <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> expression. Like a file specification <filename>C:\dos\myprog.exe</filename> it starts on top level and needs no further context information to get evaluated.</para> <para>A <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style sheet <emphasis>must</emphasis> have an <link xlink:href="http://www.w3.org/TR/xslt20/#initiating">initial context node</link> to start the transformation. This is achieved by providing exactly one <tag class="starttag">xsl:template</tag> with an absolute <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> value for its <tag class="attribute">match</tag> attribute like <tag class="attvalue">/memo</tag>.<emphasis/></para> </glossdef> </glossentry> <glossentry> <glossterm><code>select="memo/to"</code></glossterm> <glossdef> <para>This expression can be compared to a <emphasis>relative</emphasis> file path specification like e.g. <filename>../images/hdm.gif</filename>. We need to add the base (context) directory in order for a relative file specification to become meaningful. If the base directory is <filename>/home/goik/xml</filename> than this <emphasis>relative</emphasis> file specification will address the file <filename>/home/goik/images/hdm.gif</filename>.</para> <para>Likewise we have to define a <emphasis>context</emphasis> node if we want to evaluate a relative <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> expression. In our example this is the root node. The XSL specification introduces the term <link xlink:href="http://www.w3.org/TR/xslt20/#context">evaluation context</link> for this purpose.</para> </glossdef> </glossentry> </glosslist> <para>In order to explain relative <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> expressions we consider <code>content/para</code> starting from the (unique!) <tag class="element">memo</tag> node:</para> <figure xml:id="memoXpathPara"> <title>The node set represented by <code>content/para</code> starting at the context node <tag class="starttag">memo</tag>.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/memorelativexpath.fig"/> </imageobject> <caption> <para>The dashed lines represent the relative <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> expressions starting from the context node to each of the nodes in the result set.</para> </caption> </mediaobject> </figure> </section> <section xml:id="xsl_important_elements"> <title>Some important <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> elements</title> <section xml:id="xsl_if"> <title><tag class="starttag">xsl:if</tag></title> <para>Sometimes we need conditional processing rules. We might want create a list of sender and recipients with a defined value for the attribute <tag class="attribute">id</tag>. In the <link linkend="figure_memoref_instance">given example</link> this is only valid for the (unique) sender and the recipient <code><to id="eve">Eve Intruder</to></code>. We assume this set of persons shall be inserted into a relational database table <code>Customer</code> consisting of two <code>NOT NULL</code> columns <code>id</code> an <code>name</code>. Thus both attributes <emphasis>must</emphasis> be specified and we must exclude <tag class="starttag">from</tag> or <tag class="starttag">to</tag> nodes with undefined <tag class="attribute">id</tag> attributes:</para> <figure xml:id="programlisting_memo_export_sql"> <title>Exporting SQL statements.</title> <programlisting>... <xsl:variable name="newline" <co xml:id="programlisting_xsl_if_definevar"/>> <!-- A newline \n --> <xsl:text> </xsl:text> </xsl:variable> <xsl:template match="/memo"> <xsl:for-each select="from|to" <co xml:id="programlisting_xsl_if_foreach"/>> <xsl:if <emphasis role="bold">test="@id"</emphasis> <co xml:id="programlisting_xsl_if_test"/>> <xsl:text>INSERT INTO Customer (id, name) VALUES ('</xsl:text> <xsl:value-of select="@id" <co xml:id="programlisting_xsl_if_select_idattrib"/>/> <xsl:text>', '</xsl:text> <xsl:value-of select="." <co xml:id="programlisting_xsl_if_selectcontent"/>/> <xsl:text>')</xsl:text> <xsl:value-of select="$newline" <co xml:id="programlisting_xsl_if_usevar"/>/> </xsl:if> </xsl:for-each> </xsl:template></programlisting> <caption> <para>We want to export data from XML documents to a database server. For this purpose INSERT statements are being crafted from a XML document containing relevant data.</para> </caption> </figure> <calloutlist> <callout arearefs="programlisting_xsl_if_definevar"> <para>Define a file local variable <code>newline</code>. Dealing with text output frequently requires the insertion of newlines. Due to the syntax of the <tag class="element">xsl:text</tag> elements this tends to clutter the code.</para> </callout> <callout arearefs="programlisting_xsl_if_foreach"> <para>Iterate over the set of the sender node and all recipient nodes.</para> </callout> <callout arearefs="programlisting_xsl_if_test"> <para>The attribute value of <tag class="attribute">test</tag> will be <link xlink:href="http://www.w3.org/TR/xslt20/#xsl-if">evaluated</link> as a boolean. In this example it evaluates to <code>true</code> iff the attribute <tag class="attribute">id</tag> is defined for the context node. Since we are inside the <tag class="element">xsl:for-each</tag> block all context nodes are either of type <tag class="starttag">from</tag> or <tag class="starttag">to</tag> and thus <emphasis>may</emphasis> have an <tag class="attribute">id</tag> attribute.</para> </callout> <callout arearefs="programlisting_xsl_if_select_idattrib"> <para>The <tag class="attribute">id</tag> attributes value is copied to the output. The <quote>@</quote> character in <code>select="@id"</code> tells the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor to read the value of an <emphasis>attribute</emphasis> with name <tag class="attribute">id</tag> rather then the content of a nested sub<emphasis>element</emphasis> like in <code><to id="foo"><id>I am nested!</id></to></code>.</para> </callout> <callout arearefs="programlisting_xsl_if_selectcontent"> <para>As stated earlier the dot <quote>.</quote> denotes the current context element. In this example simply the <code>#PCDATA</code> content is copied to the output.</para> </callout> <callout arearefs="programlisting_xsl_if_usevar"> <para>The <quote>$</quote> sign in front of <code>newline</code> tells the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor to access the variable <varname>newline</varname> previously defined in <coref linkend="programlisting_xsl_if_definevar"/> rather then interpreting it as the name of a sub element or an attribute.</para> </callout> </calloutlist> <para>As expected the recipient entry <quote>Adam Hacker</quote> does not appear due to the fact that no <tag class="attribute">id</tag> attribute is defined in its <tag class="starttag">to</tag> element:</para> <programlisting><computeroutput>INSERT INTO Customer (id, name) VALUES ('goik', 'Martin Goik') INSERT INTO Customer (id, name) VALUES ('eve', 'Eve intruder')</computeroutput></programlisting> <qandaset role="exercise"> <title>The XPath functions position() and last()</title> <qandadiv> <qandaentry xml:id="example_position_last"> <question> <para>We return to our recipient list in <xref linkend="figure_recipientlist_trailing_comma"/>. We are interested in a list of recipients avoiding the trailing comma:</para> <programlisting><computeroutput>Adam Hacker,Eve Intruder</computeroutput></programlisting> <para>We may use a <tag class="element">xsl:if</tag> to insert a comma for all but the very last recipient node. This can be achieved by using the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> functions <link xlink:href="http://www.w3.org/TR/xpath#function-position">position()</link> and <link xlink:href="http://www.w3.org/TR/xpath#function-last">last()</link>. Hint: The arithmetic operator <quote><</quote> may be used in <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> to compare two integer numbers. However it must be escaped as <code>&lt;</code> in order to be XML compatible.</para> </question> <answer> <para>We have to exclude the comma for the last node of the recipient list. If we have e.g. 10 recipients the function <code>position()</code> will return values integer values starting at 1 and ending with 10. So for the last node the comparison <code>10 < 10</code> will evaluate to false:</para> <programlisting><xsl:for-each select="memo/to"> <xsl:value-of select="."/> <xsl:if test="position() &lt; last()"> <xsl:text>,</xsl:text> </xsl:if> </xsl:for-each></programlisting> </answer> </qandaentry> <qandaentry xml:id="example_avoid_xsl_if"> <question> <label>Avoiding xsl:if</label> <para>In <xref linkend="programlisting_memo_export_sql"/> we used the <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> value <quote>from|to</quote> to select the desired sender and recipient nodes. Inside the <tag class="element">xsl:for-each</tag> block we permitted only those nodes which have an <tag class="attribute">id</tag> attribute. These two steps may be combined into a single <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> expression obsoleting the <tag class="element">xsl:if</tag>.</para> </question> <answer> <para>We simply need a modified <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> in the <tag class="element">for-each</tag>:</para> <programlisting><xsl:for-each select="<emphasis role="bold">from[@id]|to[@id]</emphasis>"> <xsl:text>INSERT INTO Customer (id, name) VALUES ('</xsl:text> <xsl:value-of select="@id"/> <xsl:text>', '</xsl:text> <xsl:value-of select="."/> <xsl:text>')</xsl:text> <xsl:value-of select="$newline"/> </xsl:for-each></programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="xsl_apply_templates"> <title><tag class="starttag">xsl:apply-templates</tag></title> <para>We already used <tag class="element">xsl:for-each</tag> to iterate over a list of element nodes. <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> offers a different possibility for this purpose. The idea is to define the formatting rules at a centralized location. So the solution to <xref linkend="example_position_last"/> in an equivalent way:</para> <programlisting><xsl:template match="/"> <xsl:apply-templates select="memo/to" <co xml:id="programlisting_apply_templates_apply"/>/> </xsl:template> <xsl:template match="to" <co xml:id="programlisting_apply_templates_match"/>> <xsl:value-of select="."/> <xsl:if test="<emphasis role="bold">position()</emphasis> &lt; <emphasis role="bold">last()</emphasis>"> <xsl:text>,</xsl:text> </xsl:if> </xsl:template></programlisting> <calloutlist> <callout arearefs="programlisting_apply_templates_apply"> <para>Definition of the recipient node list. Each element of this list shall be processed further.</para> </callout> <callout arearefs="programlisting_apply_templates_match"> <para>This template <emphasis>may</emphasis> be used by a XSL processor to format nodes of type <tag class="starttag">to</tag>. Since the processor is asked to do exactly this in <xref linkend="programlisting_apply_templates_apply"/> the current template will <emphasis>really</emphasis> be used in this example.</para> </callout> </calloutlist> <para>The procedure outlined above may have the following advantages:</para> <itemizedlist> <listitem> <para>Some elements being central for a DTD may appear at different places. For example a <tag class="starttag">title</tag> element is likely to appear as a child of chapters, sections, tables figures and so on. It may be sufficient to define a single template with a <code>match="title"</code> attribute which contains all rules being required.</para> </listitem> <listitem> <para>Sometimes the body of a <tag class="starttag">xsl:for-each</tag> ... <tag class="endtag">xsl:for-each</tag> spans multiple screens thus limiting code readability. Factoring out the body into a template may avoid this obstacle.</para> </listitem> </itemizedlist> <para>This method is well known from programming languages: If the code inside a loop is needed multiple times or reaches a painful line count <emphasis>good</emphasis> programmers tend to define a separate method. For example:</para> <programlisting language="java">for (int i = 0; i < 10; i++){ if (a[i] < b[i]){ max[i] = b; } else { max[i] = a; } ... }</programlisting> <para>Inside the loop's body the relative maximum value of two variables gets computed. This may be needed at several locations and thus it is convenient to centralize this code into a method:</para> <programlisting language="java">// cf. <xsl:template match="..."> static int maximum(int a, int b){ if (a < b){ return b; } else { return a; } } ... // cf. <xsl:apply-templates select="..."/> for (int i = 0; i < 10; i++){ max[i] = maximum(a[i], b[i]); }</programlisting> <para>So far calling a static method in <link linkend="gloss_Java"><trademark>Java</trademark></link> may be compared to a <tag class="starttag">xsl:apply-templates</tag>. There is however one big difference. In <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> the <quote>method</quote> being called may not exist at all. A <tag class="starttag">xsl:apply-templates</tag> instructs a processor to format a set of nodes. It does not contain information about any rules being defined to do this job:</para> <programlisting><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output method="text"/> <xsl:template match="/memo"> <xsl:apply-templates <emphasis role="bold">select="content"</emphasis>/> </xsl:template> </xsl:stylesheet></programlisting> <para>Since no suitable template supplying rules for <tag class="starttag">content</tag> nodes exists a <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor uses a default formatting rule instead:</para> <programlisting><computeroutput>Thanks for your excellent work.Our firewall is definitely broken! This bug has been reported by the sender.</computeroutput></programlisting> <para>We observe that the <code>#PCDATA</code> content strings of the element itself and all (recursive) sub elements get glued together into one string. In most cases this is definitely not intended. Omitting a necessary template is usually a programming error. It is thus good programming practice during style sheet development to define a special template catching forgotten rules:</para> <programlisting><xsl:template match="/memo"> <xsl:apply-templates select="content"/> </xsl:template> <xsl:template match="*"> <xsl:message> <xsl:text>Error: No template defined matching element '</xsl:text> <xsl:value-of select="name(.)"/> <xsl:text>'</xsl:text> </xsl:message> </xsl:template></programlisting> <para>The <quote>*</quote> matches any element if there is no <link xlink:href="http://www.w3.org/TR/xslt20/#conflict">better matching</link> rule defined. Since we did not supply any template for <tag class="starttag">content</tag> nodes at all this default template will match nodes of type <tag class="starttag">content</tag>. The function <code>name()</code> is predefined in <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> and returns the element type name of a node. During the formatting process we will now see the following warning message:</para> <programlisting><computeroutput>Error: No template defined matching element 'content'</computeroutput></programlisting> <para>We note that for document nodes <tag class="starttag">xyz</tag><code>foo</code><tag class="endtag">xyz</tag> containing only <code>#PCDATA</code> a simple <tag class="emptytag">xsl:apply-templates select="xyz"</tag> is sufficient: A <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor uses its default rule and copies the node's content <code>foo</code> to its output.</para> <qandaset role="exercise"> <title>Extending the export to a RDBMS</title> <qandadiv> <qandaentry xml:id="example_rdbms_person"> <question> <para>We assume that our RDBMS table <code>Customer</code> from <xref linkend="programlisting_memo_export_sql"/> shall be replaced by a table <code>Person</code>. We expect the senders of memo documents to be employees of a given company. Conversely the recipients of memos are expected to be customers. Our <code>Person</code> table shall have a <quote>tag</quote> like column named <code>type</code> having exactly two allowed values <code>customer</code> or <code>employee</code> being controlled by a <code>CHECK</code> constraint, see <xref linkend="table_person"/>. Create a style sheet generating the necessary SQL statements from a memo document instance. Hint: Define two different templates for <tag class="starttag">from</tag> and <tag class="starttag">to</tag> nodes.</para> </question> <answer> <para>We define two templates differing only in the static string value for a person's type. The relevant <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> portion reads:<programlisting><xsl:template match="/memo"> <xsl:apply-templates select="from|to"/> </xsl:template> <xsl:template match="from"> <xsl:text>INSERT INTO Person (name, type) VALUES('</xsl:text> <xsl:value-of select="."/> <xsl:text>', <emphasis role="bold">'employee'</emphasis>)</xsl:text> <xsl:value-of select="$newline"/> </xsl:template> <xsl:template match="to"> <xsl:text>INSERT INTO Person (name, type) VALUES('</xsl:text> <xsl:value-of select="."/> <xsl:text>', <emphasis role="bold">'customer'</emphasis>)</xsl:text> <xsl:value-of select="$newline"/> </xsl:template></programlisting></para> </answer> </qandaentry> </qandadiv> </qandaset> <table xml:id="table_person"> <title>The Person table</title> <?dbhtml table-width="30%" ?> <?dbfo table-width="40%" ?> <tgroup cols="2"> <colspec colwidth="3*"/> <colspec colwidth="2*"/> <thead> <row> <entry>name</entry> <entry>type</entry> </row> </thead> <tbody> <row> <entry>Martin Goik</entry> <entry>employee</entry> </row> <row> <entry>Adam Hacker</entry> <entry>customer</entry> </row> <row> <entry>Eve intruder</entry> <entry>customer</entry> </row> </tbody> </tgroup> </table> </section> <section xml:id="xsl_choose"> <title><tag class="starttag">xsl:choose</tag></title> <para>We already described the <tag class="starttag">xsl:if</tag> which can be compared to an <code>if(..){...}</code> statement in many programming languages. The <tag class="starttag">xsl:choose</tag> element can be compared to multiple <code>else</code> conditions including an optional final <code>else</code> block being reached if all boolean tests fail:</para> <programlisting language="java">if (condition a){ ...//block 1 } else if (condition b){ ... //block b } ... ... else { ... //code being reached whan all conditions evaluate to false }</programlisting> <para>We want to generate a list of memo recipient names with roman type numeration up to 10. Higher numbers shall be displayed in ordinary decimal notation:</para> <programlisting><computeroutput>I:Adam Hacker II:Eve intruder III: ... IV: ... ...</computeroutput></programlisting> <para>Though <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> offers <link xlink:href="http://www.w3.org/TR/xslt20/#convert">a better way</link> we may generate these number literals by:</para> <programlisting><xsl:template match="/memo"> <xsl:apply-templates select="to"/> </xsl:template> <xsl:template match="to"> <xsl:choose> <xsl:when test="1 = position()">I</xsl:when> <xsl:when test="2 = position()">II</xsl:when> <xsl:when test="3 = position()">III</xsl:when> <xsl:when test="4 = position()">IV</xsl:when> <xsl:when test="5 = position()">V</xsl:when> <xsl:when test="6 = position()">VI</xsl:when> <xsl:when test="7 = position()">VII</xsl:when> <xsl:when test="8 = position()">VIII</xsl:when> <xsl:when test="9 = position()">IX</xsl:when> <xsl:when test="10 = position()">X</xsl:when> <xsl:otherwise> <xsl:value-of select="position()"/> </xsl:otherwise> </xsl:choose> <xsl:text>:</xsl:text> <xsl:value-of select="."/> <xsl:value-of select="$newline"/> </xsl:template></programlisting> <para>Note that this conversion is incomplete: If the number in question is larger than 10 it will be formatted in ordinary decimal style according to the <tag class="starttag">xsl:otherwise</tag> clause.</para> </section> <section xml:id="section_html_book"> <title>A complete HTML formatting example</title> <para>We now present a series of exercises showing how to format <tag class="starttag">book</tag> document instances to XHTML. This is done in a step by step manner each time showing correspondent code snippets for our <filename>memo.dtd</filename>.</para> <section xml:id="section_memo_to_list"> <title>Listing the recipients of a memo</title> <para>In order to generate a XHTML <link xlink:href="http://www.w3.org/TR/html401/struct/lists.html#h-10.2">list</link> of all <tag class="starttag">memo</tag> recipients of a memo we have to use <tag class="starttag">xsl:output method="xhtml"</tag> and embed the required HTML tags in our <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style sheet:</para> <programlisting><xsl:output method="xhtml" indent="yes"/> <xsl:template match="/memo"> <html> <head> <title>Recipient list</title> </head> <body> <ul> <xsl:apply-templates select="to"/> </ul> </body> </html> </xsl:template> <xsl:template match="to"> <li> <xsl:value-of select="."/> </li> </xsl:template></programlisting> <para>Processing this style sheet for a <tag class="starttag">memo</tag> document instance yields:</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <html> <head> <title>Recipient list</title> </head> <body> <ul> <li>Adam Hacker</li> <li>Eve intruder</li> </ul> </body> </html></programlisting> <para>The generated Xhtml code does not contain a reference to a DTD. We may supply this reference by modifying our <tag class="emptytag">xsl:output</tag> directive:</para> <programlisting><xsl:output method="xhtml" indent="yes" <emphasis role="bold">doctype-public</emphasis>="-//W3C//DTD XHTML 1.0 Strict//EN" <emphasis role="bold">doctype-system</emphasis>="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"/></programlisting> <para>This adds a corresponding header which allows to validate the generated HTML:</para> <programlisting><!DOCTYPE html PUBLIC "<emphasis role="bold">-//W3C//DTD XHTML 1.0 Strict//EN</emphasis>" "<emphasis role="bold">http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd</emphasis>"> <html><head> ...</programlisting> <para>This may be improved further by instructing the XSL formatter to use <uri xlink:href="http://www.w3.org/1999/xhtml">http://www.w3.org/1999/xhtml</uri> as default namespace:</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet <emphasis role="bold">xmlns="http://www.w3.org/1999/xhtml"</emphasis> xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output method="xhtml" indent="yes" doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN" doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"/> <xsl:template match="/"> <html><head> ... </xsl:template> ... </xsl:stylesheet></programlisting> <para>This yields the following output::</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html <emphasis role="bold">xmlns="http://www.w3.org/1999/xhtml"</emphasis>> <head> ... </html></programlisting> <para>The top level element <tag class="element">html</tag> is now declared to belong to the namespace <code>xmlns="http://www.w3.org/1999/xhtml</code>. This will be inherited by all inner Xhtml elements.</para> <qandaset role="exercise"> <title>Transforming book instances to Xhtml</title> <qandadiv> <qandaentry xml:id="example_xsl_book_1_dtd"> <question> <para>Create a <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style sheet to transform instances of the first version of <link endterm="example_bookDtd" linkend="example_bookDtd">book.dtd</link> (<xref linkend="example_bookDtd"/>) into <uri xlink:href="http://www.w3.org/TR/xhtml1/#a_dtd_XHTML-1.0-Strict">Xhtml 1.0 strict</uri>.</para> <para>You should first construct a Xhtml document <emphasis>manually</emphasis> before coding the XSL. After you have a <quote>working</quote> Xhtml example document create a <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style sheet which transforms arbitrary <filename>book.dtd</filename> document instances into a corresponding Xhtml file.</para> </question> <answer> <programlisting><?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output indent="yes" method="xhtml"/> <xsl:template match="/book"> <html> <head> <title><xsl:value-of select="title"/></title> </head> <body> <h1><xsl:value-of select="title"/></h1> <xsl:apply-templates select="chapter"/> </body> </html> </xsl:template> <xsl:template match="chapter"> <h2><xsl:value-of select="title"/></h2> <xsl:apply-templates select="para"/> </xsl:template> <xsl:template match="para"> <p><xsl:value-of select="."/></p> </xsl:template> </xsl:stylesheet></programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="section_xsl_attribute"> <title><tag class="starttag">xsl:attribute</tag></title> <para>Sometimes we want to set attribute values in a generated XML document. For example we might want to set the background color <quote>red</quote> if a memo has a priority value of <tag class="attvalue">high</tag>:</para> <programlisting><h1 style="background:red">Firewall problems</h1></programlisting> <para>Regarding our memo example this may be achieved by:</para> <programlisting><xsl:template match="/memo"> <html> ... <body> <xsl:variable name="<emphasis role="bold">messageColor</emphasis>" <co xml:id="programlisting_priority_lolor_vardef"/>> <xsl:choose> <xsl:when test="@priority = 'low'">green</xsl:when> <xsl:when test="@priority = 'medium'">yellow</xsl:when> <xsl:when test="@priority = 'high'">red</xsl:when> </xsl:choose> </xsl:variable> <h1 style="background:{<emphasis role="bold">$messageColor</emphasis>};" <co xml:id="programlisting_priority_lolor_usevar"/>> <xsl:value-of select="subject"/> </h1> </body> </html> </xsl:template></programlisting> <calloutlist> <callout arearefs="programlisting_priority_lolor_vardef"> <para>Definition of a color name depending on the attribute <tag class="attvalue">priority</tag>'s value. The set off possible attribute values (low,medium,high) is mapped to the color names (green, yellow,red).</para> </callout> <callout arearefs="programlisting_priority_lolor_usevar"> <para>The color variable is used to compose the attribute <tag class="attribute">style</tag>'s value. The curly <code>{...}</code> braces are part of the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> standard's syntax. They are required here to instruct the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor to substitute the local variable <code>messageColor</code>'s value instead of simply copying the literal string <quote><code>$messageColor</code></quote> itself to the output document e.g. generating <tag class="starttag">h1 style = "background:$messageColor;"</tag>.</para> </callout> </calloutlist> <para>Instead of constructing an extra variable <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> offers a slightly more compact way for the same purpose. The <tag class="starttag">xsl:attribute</tag> element allows us to define the name of an attribute to be added together with an attribute value specification:</para> <programlisting><xsl:template match="/memo"> <html> ... <h1> <xsl:attribute name="<emphasis role="bold">style</emphasis>"> <xsl:text>background:</xsl:text> <xsl:choose> <xsl:when test="@priority = 'low'">green</xsl:when> <xsl:when test="@priority = 'medium'">yellow</xsl:when> <xsl:when test="@priority = 'high'">red</xsl:when> </xsl:choose> </xsl:attribute> <xsl:value-of select="subject"/> </h1> </body> </html> </xsl:template></programlisting> <qandaset role="exercise"> <title>Adding a table of contents (toc)</title> <qandadiv> <qandaentry xml:id="example_book_toc"> <question> <para>For larger document instances it is convenient to add a table of contents to the generated Xhtml document. <!-- We demonstrate the desired result as an <uri xlink:href="src/viewlet/bookhtmltoc/bookhtmltoc_viewlet_swf.html">animation</uri>.--></para> <para>For this exercise you need a unique string value for each <tag class="starttag">chapter</tag> node. If a <tag class="starttag">chapter</tag>'s <tag class="attribute">id</tag> attribute had been declared as <code>#REQUIRED</code> its value would do this job perfectly. Unfortunately you cannot rely on its existence since it is declared to be <code>#IMPLIED</code> and may thus be absent.</para> <para>XSL offers a standard function for this purpose namely <link xlink:href="http://www.w3.org/TR/xslt20/#generate-id">generate-id(...)</link>. In a nutshell this function takes a XML node as an argument (or being called without arguments it uses the context node) and creates a string value being unique with respect to <emphasis>all</emphasis> other nodes in the document. For a given node the function may be called repeatedly and is guaranteed to always return the same value during the <emphasis>same</emphasis> transformation run. So it suffices to add something like <tag class="starttag">a href="#{generate-id(...)}"</tag> or use it in conjunction with <tag class="starttag">xsl:attribute</tag>.</para> </question> <answer> <para>We use the <code>generate-id()</code> function to create a unique identity string for each chapter node. Since we also want to define links to the table of contents we need another unique string value. It is tempting to simply use a static value like <quote>__toc__</quote> for this purpose. However we can not be sure that this value coincides with one of the <code>generate-id()</code> function return values.</para> <para>A cleaner solution uses the <tag class="starttag">book</tag> node's generated identity string for this purpose. As stated before this value is definitively unique:</para> <programlisting><xsl:template match="/book"> ... <body> <h1><xsl:value-of select="title"/></h1> <h2 id="{generate-id(.)}" <co xml:base="" xml:id="programlisting_book_toc_def_toc"/>>Table of contents</h2> <ul> <xsl:for-each select="chapter"> <li> <a href="#{generate-id(.)}" <co xml:base="" xml:id="programlisting_book_toc_ref_chap"/>><xsl:value-of select="title"></xsl:value-of></a> </li> </xsl:for-each> </ul> <xsl:apply-templates select="chapter"/> </body> </html> </xsl:template> <xsl:template match="chapter"> <h2 id="{generate-id(.)}" <co xml:base="" xml:id="programlisting_book_toc_def_chap"/>> <a href="#{generate-id(/book)}" <co xml:base="" xml:id="programlisting_book_toc_ref_toc"/>> <xsl:value-of select="title"/> </a> </h2> <xsl:apply-templates select="para"/> </xsl:template> ...</programlisting> <calloutlist> <callout arearefs="programlisting_book_toc_def_toc"> <para>The current context node is <tag class="starttag">book</tag>. We use it as argument to <code>generate-id()</code> to create a unique identity string.</para> </callout> <callout arearefs="programlisting_book_toc_ref_chap"> <para>The <tag class="starttag">xsl:for-each</tag> iterates over all <tag class="starttag">chapter</tag> nodes. We reference the corresponding target nodes being created in <xref linkend="programlisting_book_toc_def_chap"/>.</para> </callout> <callout arearefs="programlisting_book_toc_def_chap"> <para>Each <tag class="starttag">chapter</tag>'s heading is supplied with a unique identity string being referenced from <xref linkend="programlisting_book_toc_ref_chap"/>.</para> </callout> <callout arearefs="programlisting_book_toc_ref_toc"> <para>Clicking on a chapter's title shall take us back to the table of contents (toc). So we create a hypertext link referencing our toc heading's identity string being defined in <xref linkend="programlisting_book_toc_def_toc"/>.</para> </callout> </calloutlist> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="section_xsl_mixed"> <title>XSL and mixed content</title> <para>We come back to our memo example from <xref linkend="figure_memo_content_mixed"/> and ask ourselves how to format mixed content. In the example the following part of a document instance was given:</para> <programlisting><content>The <emphasis role="bold"><url href="http://w3.org/XML">XML</url></emphasis> language is <emphasis role="bold"><emphasis>easy</emphasis></emphasis> to learn. However you need some <emphasis role="bold"><emphasis>time</emphasis></emphasis>.</content></programlisting> <para>Embedded element nodes have been set to bold style in order to distinguish them from <code>#PCDATA</code> text nodes. We may also use <xref linkend="figure_memo_content_mixed"/> to help understanding the formatting process of mixed content. First we mention a possible way our Xhtml output might look like:</para> <programlisting><p>The <emphasis role="bold"><a href="http://w3.org/XML">XML</a>language is<em>easy</em></emphasis> to learn. However you need some <emphasis role="bold"><em>time</em></emphasis>.</p></programlisting> <para>We start with a first version of an <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> template:</para> <programlisting> <xsl:template match="content"> <p> <xsl:value-of select="."/> </p> </xsl:template></programlisting> <para>As mentioned earlier all <code>#PCDATA</code> text nodes of the whole subtree are glued together leading to:</para> <programlisting><p>The XML language is easy to learn. However you need some time.</p></programlisting> <para>Our next attempt is to define templates to format the elements <tag class="starttag">url</tag> and <tag class="starttag">emphasis</tag>:</para> <programlisting>... <xsl:template match="content"> <p> <xsl:apply-templates select="emphasis|url"/> </p> </xsl:template> <xsl:template match="url"> <a href="{@href}"><xsl:value-of select="."/></a> </xsl:template> <xsl:template match="emphasis"> <em><xsl:value-of select="."/></em> </xsl:template> ...</programlisting> <para>As expected the sub elements are formatted correctly. Unfortunately the <code>#PCDATA</code> text nodes between the element nodes are lost:</para> <programlisting><p> <a href="http://w3.org/XML">XML</a> <em>easy</em> <em>time</em> </p></programlisting> <para>To correct this transformation script we have to tell the formatting processor to include bare text nodes into the output. The <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> standard defines a function <link xlink:href="http://www.w3.org/TR/xpath#path-abbrev">text()</link> for this purpose. It returns the boolean value <code>true</code> for an argument node of type text:</para> <programlisting>... <xsl:template match="content"> <p> <xsl:apply-templates select="<emphasis role="bold">text()</emphasis>|emphasis|url"/> </p> </xsl:template> ...</programlisting> <para>The yields the desired output. The text node result elements are shown in bold style</para> <programlisting><p><emphasis role="bold">The</emphasis> <a href="http://w3.org/XML">XML</a><emphasis role="bold"> language is </emphasis><em>easy</em><emphasis role="bold"> to learn. However you need some </emphasis><em>time</em><emphasis role="bold">.</emphasis></p></programlisting> <para>Some remarks:</para> <orderedlist> <listitem> <para>The <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> expression <code>select="text()|emphasis|url"</code> corresponds nicely to the content model definition in the DTD:</para> <programlisting><!ELEMENT content (#PCDATA|emphasis|url)*></programlisting> </listitem> <listitem> <para>In most mixed content models <emphasis>all</emphasis> sub elements of e.g. <tag class="starttag" role="">content</tag> have to be formatted. During development some of the elements defined in a DTD are likely to be omitted by accidence. For this reason the <quote>typical</quote> <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> expression acting on mixed content models is defined to match <emphasis>any</emphasis> sub element nodes:</para> <programlisting>select="text()|<emphasis role="bold">*</emphasis>"</programlisting> </listitem> <listitem> <para>Regarding <code>select="text()|emphasis|url"</code> we have defined two templates for element nodes <tag class="starttag">emphasis</tag> and <tag class="starttag">url</tag>. What happens to those text nodes being matched by <code>text()</code>? These are subject to a default rule: The content of bare text nodes is written to the output. We may however redefine this default rule by adding a template:</para> <programlisting><xsl:template match="text()"> <emphasis role="bold"><span style="color:red"> <xsl:value-of select="."/> </span></emphasis> </xsl:template></programlisting> <para>This yields:</para> <programlisting><p> <emphasis role="bold"><span style="color:red">The </span></emphasis> <a href="http://w3.org/XML">XML</a> <emphasis role="bold"><span style="color:red"> language is </span></emphasis> <em>easy</em> <emphasis role="bold"><span style="color:red"> to learn. However you need some </span></emphasis> <em>time</em> <emphasis role="bold"><span style="color:red">.</span></emphasis> </p></programlisting> <para>In most cases it is not desired to replace all text nodes throughout the whole document. In the current example we might only format text nodes being <emphasis>immediate</emphasis> children of <tag class="starttag">content</tag>. This may be achieved by restricting the <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> expression to <tag class="starttag">xsl:template match="content/text()"</tag>.</para> </listitem> </orderedlist> </section> <section xml:id="section_xsl_functionid"> <title>The function <code>id()</code></title> <para>In <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> we sometimes want to lookup nodes by an attribute value of type <link linkend="section_id_idref">ID</link>. We consider our product catalog from <xref linkend="figure_intern_reference_xml"/>. The following <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> may be used to create Xhtml documents from <tag class="starttag">catalog</tag> instances:</para> <programlisting xml:lang=""><xsl:template match="/catalog"> <html> <head><title>Product catalog</title></head> <body> <h1>List of Products</h1> <xsl:apply-templates select="product"/> </body> </html> </xsl:template> <xsl:template match="product"> <h2 id="{@id}" <co xml:base="" xml:id="programlisting_catalog2html_v1_defid"/>><xsl:value-of select="title"/></h2> <xsl:apply-templates select="para"/> </xsl:template> <xsl:template match="para"> <p><xsl:apply-templates select="text()|*" <co xml:id="programlisting_catalog2html_v1_mixed"/>/></p> </xsl:template> <xsl:template match="link"> <a href="#{@ref}" <co xml:id="programlisting_catalog2html_v1_refid"/>><xsl:value-of select="."/></a> </xsl:template></programlisting> <calloutlist> <callout arearefs="programlisting_catalog2html_v1_defid"> <para>The <code>ID</code> attribute <tag class="starttag">product id="foo"</tag> is unique within the document instance. We may thus use it as an unique string value in the generated Xhtml, too.</para> </callout> <callout arearefs="programlisting_catalog2html_v1_mixed"> <para>Mixed content consisting of text and <tag class="starttag">link</tag> nodes.</para> </callout> <callout arearefs="programlisting_catalog2html_v1_refid"> <para>We define a file local Xhtml reference to a product.</para> </callout> </calloutlist> <para>The <tag class="starttag">para</tag> element from the example document instance containing a <tag class="starttag">link ref="homeTrainer"</tag> reference will be formatted as:</para> <programlisting><p>If you hate rain look <a href="#homeTrainer">here</a>.</p></programlisting> <para>Now suppose we want to add the product's title <emphasis>Home trainer</emphasis> here to give the reader an idea about the product without clicking the hypertext link:</para> <programlisting><p>If you hate rain look <a href="#homeTrainer">here</a> <emphasis role="bold">(Home trainer)</emphasis>.</p></programlisting> <para>This title text node is part of the <tag class="starttag">product</tag>node being referenced from the current <tag class="starttag">para</tag>:</para> <figure xml:id="linkIdrefProduct"> <title>A graphical representation of our <tag class="starttag">catalog</tag>.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/xsl_id.fig"/> </imageobject> <caption> <para>The dashed line shows the <code>IDREF</code> based reference from the <tag class="starttag">link</tag> to the <tag class="starttag">product</tag> node.</para> </caption> </mediaobject> </figure> <para>In <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> we may follow <code>ID</code> reference by means of the built in function <link xlink:href="http://www.w3.org/TR/xpath#function-id">id(...)</link>:</para> <programlisting><xsl:template match="link"> <a href="#{@ref}"><xsl:value-of select="."/></a> <xsl:text> (</xsl:text> <xsl:value-of select="<emphasis role="bold">id(@ref)</emphasis>/title" <co xml:id="programlisting_xsl_id_follow"/>/> <xsl:text>)</xsl:text> </xsl:template></programlisting> <para>Evaluating <code>id(@ref)</code> at <xref linkend="programlisting_xsl_id_follow"/> returns the first <tag class="starttag">product</tag> <emphasis>node</emphasis>. We simply take its <tag class="starttag">title</tag> value and embed it into a pair of braces. This way the desired text portion <emphasis role="bold">(Home trainer)</emphasis> gets added after the hypertext link.</para> <qandaset role="exercise"> <title>Extending the memo style sheet by mixed content and itemized lists</title> <qandadiv> <qandaentry xml:id="example_book_xsl_mixed"> <question> <para>In <xref linkend="example_book.dtd_v5"/> we constructed a DTD allowing itemized lists an mixed content for <tag class="starttag">book</tag> instances. This DTD also allowed to define <tag class="starttag">emphasis</tag>, <tag class="starttag">table</tag> and <tag class="starttag">link</tag> elements being part of a mixed content definition. Extend the current book2html.xsl to account for these extensions.</para> <para xlink:href="http://www.w3.org/TR/xslt20/#element-copy-of">As we already saw in our memo example itemized lists in Xhtml are represented by the element <tag class="starttag">ul</tag> containing <tag class="starttag">li</tag> elements. Since <tag class="starttag">p</tag> elements are also allowed to appear as children our itemized lists can be easily mapped to Xhtml tags. A<tag class="starttag">link</tag> node may be transformed into <tag class="starttag">a href="..."</tag> Xhtml node.</para> <para>The table model is a simplified version of the Xhtml table model. Read the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> documentation of the element <tag class="emptytag">xsl:copy-of</tag> at <link xlink:href="http://www.w3.org/TR/xslt20/#element-copy-of">copy-of</link> for processing tables.</para> </question> <answer> <para>The full source code of the solution is available at <link xlink:href="Ref/src/Dtd/book/v5/book2html.1.xsl">(Online HTML version) ... book2html.1.xsl</link>. We discuss some important aspects. The following table provides mapping rules from <filename>book.dtd</filename> to Xhtml:</para> <table xml:id="table_book2xhtml_element_mappings"> <title>Mapping elements from <filename>book.dtd</filename> to Xhtml</title> <?dbhtml table-width="50%" ?> <?dbfo table-width="50%" ?> <tgroup cols="2"> <colspec colwidth="3*"/> <colspec colwidth="2*"/> <thead> <row> <entry>book.dtd</entry> <entry>Xhtml</entry> </row> </thead> <tbody> <row> <entry><tag class="starttag">book</tag>/<tag class="starttag">title</tag></entry> <entry><tag class="starttag">h1</tag></entry> </row> <row> <entry><tag class="starttag">chapter</tag>/<tag class="starttag">title</tag></entry> <entry><tag class="starttag">h2</tag></entry> </row> <row> <entry><tag class="starttag">para</tag> (mixed content)</entry> <entry><tag class="starttag">p</tag></entry> </row> <row> <entry><tag class="starttag">link href="foo"</tag></entry> <entry><tag class="starttag">a href="foo"</tag></entry> </row> <row> <entry><tag class="starttag">emphasis</tag></entry> <entry><tag class="starttag">em</tag></entry> </row> <row> <entry><tag class="starttag">itemizedlist</tag></entry> <entry><tag class="starttag">ul</tag></entry> </row> <row> <entry><tag class="starttag">listitem</tag></entry> <entry><tag class="starttag">li</tag></entry> </row> <row> <entry><tag class="starttag">table</tag>, <tag class="starttag">caption</tag>,<tag class="starttag">tr</tag>, <tag class="starttag">td</tag> along with all attributes</entry> <entry>Identity copy</entry> </row> </tbody> </tgroup> </table> <para>Since our table model is a subset of the HTML table model we may simply copy corresponding nodes to the output:</para> <programlisting><xsl:template match="table"> <xsl:copy-of select="."/> </xsl:template></programlisting> <para>Next we need rules for itemized lists and paragraphs. Our model already implements lists in a way that closely resembles XHTML lists. Since the structure are compatible we only have to provide a mapping:</para> <programlisting><xsl:template match="para"> <p id="{generate-id(.)}"><xsl:apply-templates select="text()|*" /></p> </xsl:template> <xsl:template match="itemizedlist"> <ul><xsl:apply-templates select="listitem"/></ul> </xsl:template> <xsl:template match="listitem"> <li><xsl:apply-templates select="*"/></li> </xsl:template></programlisting> <para>Since <emphasis>all</emphasis> chapters are reachable via hypertext links from the table of contents we <emphasis>must</emphasis> supply a unique <code>id</code> value <xref linkend="programlisting_book2html_single_chapterid"/> for <emphasis>all</emphasis> of them. Chapters and paragraphs may be referenced by <tag class="starttag">link</tag> elements and thus <emphasis>both</emphasis> need a unique identity value. For simplicity we create both of them via <code>generate-id()</code>. In a more sophisticated solution the strategy would be slightly different:</para> <itemizedlist> <listitem> <para>If a <tag class="starttag">chapter</tag> node does have an <code>id</code> attribute defined then take its value.</para> </listitem> <listitem> <para>If a <tag class="starttag">chapter</tag> node does <emphasis>not</emphasis> have an <code>id</code> attribute defined then use <code>generate-id()</code>.</para> </listitem> <listitem> <para><tag class="starttag">para</tag> nodes only get values in XHTML if they do have an <code>id</code> attribute defined. This is consistent since these nodes are never referenced from the table of contents. Thus an identity is only required if the <tag class="starttag">para</tag> node is referenced by a <tag class="starttag">link</tag>. If that is a case the <tag class="starttag">para</tag> surely does have a defined identity value.</para> </listitem> </itemizedlist> <para>We also have to provide a hypertext link <xref linkend="programlisting_book2html_single_toclink"/> to the table of contents:</para> <programlisting><xsl:template match="chapter"> <h2 id="{<emphasis role="bold">generate-id(.)</emphasis>}" <co xml:base="" xml:id="programlisting_book2html_single_chapterid"/>> <a href="#{<emphasis role="bold">generate-id(/book)</emphasis>}" <co xml:base="" xml:id="programlisting_book2html_single_toclink"/>><xsl:value-of select="title"/></a> </h2> <xsl:apply-templates select="para|itemizedlist|table"/> </xsl:template></programlisting> <para>Implementing the <tag class="starttag">link</tag> element is somewhat more complicated. We cannot use the <code>@ref</code> attribute values itself as <tag class="starttag">a href="..."</tag> attribute values since the target's identity string is generated via <code>generate-id()</code>. But we may follow the reference via the <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> <link linkend="section_xsl_functionid">id()</link> function and then use the target's identity value:</para> <programlisting><xsl:template match="link"> <a href="#{generate-id(id(@linkend))}"> <xsl:value-of select="."/> </a> </xsl:template></programlisting> <para>The call to <code>id(@linkend)</code> returns either a <tag class="starttag">chapter</tag> or a <tag class="starttag">para</tag> node since according to the DTD attributes of type <code>ID</code> are only defined for these two elements. Using this node as input to <code>generate-id()</code> returns the desired identity value for the generated Xhtml.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="xslAxis"> <title>XSL axis definitions</title> <para>XSL allows us to traverse a document instance's graph in different directions. We start with a memo document instance:</para> <programlisting><!DOCTYPE memo SYSTEM "memo.dtd"> <memo date="9.9.2099"> <from>Joe</from> <to>Jack</to> <to>Eve</to> <to>Jude</to> <to>Tolstoi</to> <subject>Ignore me!</subject> <content> <para>Dumb text.</para> </content> </memo></programlisting> <para>This instance defines four nodes of type <tag class="starttag">to</tag>. For each of these we want to create a line of text showing also the preceding and the following recipients:</para> <programlisting> <----Jack----> Eve Jude Tolstoi <co xml:id="programlisting_axis_jack"/> Jack <----Eve----> Jude Tolstoi <co xml:id="programlisting_axis_eve"/> Jack Eve <----Jude----> Tolstoi <co xml:id="programlisting_axis_jude"/> Jack Eve Jude <----Tolstoi----> <co xml:id="programlisting_axis_tolstoi"/></programlisting> <calloutlist> <callout arearefs="programlisting_axis_jack"> <para>Jack has no predecessor and 3 successors</para> </callout> <callout arearefs="programlisting_axis_eve"> <para>Eve has 1 predecessor and 2 successors</para> </callout> <callout arearefs="programlisting_axis_jude"> <para>Jude has 2 predecessors and 1 successor</para> </callout> <callout arearefs="programlisting_axis_tolstoi"> <para><personname>Tolstoi</personname> has 3 predecessors and no successor</para> </callout> </calloutlist> <para>XSL supports this type of transformation by supplying <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> axis definitions. We consider a memo document with 9 <tag class="starttag">to</tag> nodes:</para> <figure xml:id="memo9recipients"> <title>A memo with 9 recipients</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/memofour.fig"/> </imageobject> </mediaobject> </figure> <para>We marked the 4-th recipient to represent the context node. All three <tag class="starttag">to</tag> nodes to the <quote>left</quote> belong to the <emphasis>set</emphasis> of preceding siblings with respect to the context node. Likewise the 5 neighbours to the right are called following siblings. Returning to our <quote>four recipient</quote> example we may create the desired output by:</para> <programlisting><xsl:template match="/"> <xsl:apply-templates select="memo/to"/> </xsl:template> <xsl:template match="to"> <xsl:for-each select="preceding-sibling::to" <co xml:id="programlisting_memo_four_xsl_preceding"/>> <xsl:value-of select="."/> <xsl:text> </xsl:text> </xsl:for-each> <xsl:text> &lt;----</xsl:text> <xsl:value-of select="."/> <co xml:id="programlisting_memo_four_xsl_context"/> <xsl:text>----&gt; </xsl:text> <xsl:for-each select="following-sibling::to"> <co xml:id="programlisting_memo_four_xsl_following"/> <xsl:value-of select="."/> <xsl:text> </xsl:text> </xsl:for-each> <xsl:value-of select="$newline"/> </xsl:template></programlisting> <calloutlist> <callout arearefs="programlisting_memo_four_xsl_preceding"> <para>Iterate on the set of recipients <quote>left</quote> of the context node.</para> </callout> <callout arearefs="programlisting_memo_four_xsl_context"> <para>Taking the context node's value embedded in <code><---- ... ----></code>.</para> </callout> <callout arearefs="programlisting_memo_four_xsl_following"> <para>Iterate on the set of recipients <quote>right</quote> of the context node.</para> </callout> </calloutlist> <para>More formally the set of preceding siblings is defined to be the set of all nodes having the same parent as the context node and appearing <quote>before</quote> the context node. The notion <quote>before</quote> is meant in the sense of a <link xlink:href="http://en.wikipedia.org/wiki/Depth-first_search">depth-first</link> traversal of the document tree. <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> provides different axis definitions, see <uri xlink:href="http://www.w3.org/TR/xpath#axes">http://www.w3.org/TR/xpath#axes</uri> for details. We provide an illustration here:</para> <figure xml:id="disjointAxeSets"> <title>Disjoint <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> axis definitions.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/preceding.fig"/> </imageobject> <caption> <para>The sets defined by ancestor, descendant, following, preceding and self are disjoint. Their union forms the set of all document nodes.</para> </caption> </mediaobject> </figure> <para>Some remarks:<itemizedlist> <listitem> <para>If the context node is already the topmost node i.e. the root node then the sets defined by <code>ancestor</code> and <code>parent</code> are empty.</para> </listitem> <listitem> <para>The <code>parent</code> set <emphasis>always</emphasis> contains zero or one node.</para> </listitem> </itemizedlist></para> </section> <section xml:id="xslChunking"> <title>Splitting documents into chunks</title> <para>Sometimes we want to generate multiple output documents from a single XML source. It may for example be a bad idea to transform a book of 200 printed pages into a <emphasis>single</emphasis> online HTML page. Instead we may split each chapter into a separate HTML file and create navigation links between them.</para> <para>We consider a memo document instance. We want to generate one text file for each memo recipient containing just the recipient's name using the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> element <link xlink:href="http://www.w3.org/TR/xslt20/#element-result-document"><xsl:result-document></link>:</para> <programlisting><xsl:template match="/memo"> <xsl:apply-templates select="to"/> </xsl:template> <xsl:template match="to"> <emphasis role="bold"><xsl:result-document</emphasis> <co xml:id="programlisting_xsl_result_document_main"/> <emphasis role="bold">href="file_{position()}.txt"</emphasis> <co xml:id="programlisting_xsl_result_document_href"/> <emphasis role="bold">method="text"</emphasis> <co xml:id="programlisting_xsl_result_document_method"/>> <xsl:value-of select="."/> <co xml:id="programlisting_xsl_result_document_content"/> <emphasis role="bold"></xsl:result-document></emphasis> </xsl:template></programlisting> <calloutlist> <callout arearefs="programlisting_xsl_result_document_main"> <para>The output from all generating <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> directives will be redirected from standard output to another output channel.</para> </callout> <callout arearefs="programlisting_xsl_result_document_href"> <para>The output will be written to a file named <filename>file_i.txt</filename> with the decimal number <code>i</code> ranging from the value 1 to the number of recipients.</para> </callout> <callout arearefs="programlisting_xsl_result_document_method"> <para>The <code>method</code> attribute may possibly override a value being given in the <tag class="starttag">xsl:output</tag> element. We may also redefine <link xlink:href="http://www.w3.org/TR/xslt20/#element-result-document">other attributes</link> from <tag class="starttag">xsl:output</tag> like <code>doctype-{public.system}</code>, and <code>encoding</code>.</para> </callout> <callout arearefs="programlisting_xsl_result_document_content"> <para>All output being generated in this region gets redirected to the channel specified in <xref linkend="programlisting_xsl_result_document_href"/>.</para> </callout> </calloutlist> <qandaset role="exercise"> <title>Splitting book into chapter files</title> <qandadiv> <qandaentry xml:id="example_book_chunk"> <question> <para>Extend your solution of <xref linkend="example_book_xsl_mixed"/> by writing each <tag class="starttag">chapter</tag>'s content into a separate Xhtml file. In addition create a file <filename>index.html</filename> which contains references to the corresponding <tag class="starttag">chapter</tag> documents. Thus for a document instance with two chapters the overall navigation structure is illustrated by <xref linkend="figure_book_navigation"/>.</para> <para>Implementing the <tag class="starttag">link</tag> tag may cause a problem: An internal link may reference a <tag class="starttag">para</tag>. You need to identify the <tag class="starttag">chapter</tag> node embedding this para. This may be done by using a suitable <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> axis direction.</para> </question> <answer> <para>The full source code of the solution is available at <link xlink:href="Ref/src/Dtd/book/v5/book2chunks.1.xsl">(Online HTML version) ... book2chunks.1.xsl</link>. First we generate the table of contents as the file <filename>index.html</filename>:</para> <programlisting><xsl:template match="/"> <xsl:result-document href="index.html"> <xsl:apply-templates select="book"/> </xsl:result-document> <xsl:for-each select="book/chapter"> <xsl:result-document href="{generate-id(.)}.html"> <xsl:apply-templates select="."/> </xsl:result-document> </xsl:for-each> </xsl:template> <xsl:template match="book"> <html> <head><title><xsl:value-of select="title"/></title></head> <body> <h1><xsl:value-of select="title"/></h1> <h2>Table of contents</h2> <ul> <xsl:for-each select="<emphasis role="bold">chapter</emphasis>"> <li><a href="{<emphasis role="bold">generate-id(.)</emphasis>}.html"><xsl:value-of select="title"/></a></li> </xsl:for-each> </ul> </body> </html> </xsl:template></programlisting> <para>The <tag class="starttag">link ref="..."</tag> may reference a <tag class="starttag">chapter</tag> or a <tag class="starttag">para</tag>. So we may need to <quote>step up</quote> from a paragraph to the corresponding chapter node:</para> <programlisting><xsl:template match="link"> <xsl:variable name="reftargetNode" select="id(@linkend)"/> <xsl:variable name="reftargetParentChapter" select="$reftargetNode/ancestor-or-self::chapter"/> <a href="{generate-id($reftargetParentChapter)}.html#{ generate-id($reftargetNode)}"> <xsl:value-of select="."/> </a> </xsl:template></programlisting> <para>This is consistent since <emphasis>all</emphasis> <tag class="starttag">p</tag> nodes in the generated Xhtml receive a unique <code>id</code> value regardless whether the originating <tag class="starttag">para</tag> node does have one.</para> </answer> </qandaentry> </qandadiv> </qandaset> <figure xml:id="figure_book_navigation"> <title>A <tag class="starttag">book</tag> document with two chapters</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/booknavigate.fig"/> </imageobject> </mediaobject> </figure> </section> </section> </section> </chapter> <chapter xml:id="introPersistence"> <title>Accessing Relational Data</title> <section xml:id="persistence"> <title>Persistence in Object Oriented languages</title> <para>Following <xref linkend="Bauer05"/> we may define persistence by:</para> <blockquote> <para>persistence allows an object to outlive the process that created it. The state of the object may be stored to disk and an object with the same state re-created at some point in the future.</para> </blockquote> <para>The notion of <quote>process</quote> refers to operating systems. Let us start wit a simple example assuming a <link linkend="gloss_Java"><trademark>Java</trademark></link> class User:</para> <programlisting>public class User { String cname; //The user's common name e.g. 'Joe Bix' String uid; //The user's unique system ID (login name) e.g. 'bix' // getters, setters and other stuff ... }</programlisting> <para>A relational implementation might look like:</para> <programlisting>CREATE TABLE User( CHAR(80) cname ,CHAR(10) uid PRIMARY KEY )</programlisting> <para>Now a <link linkend="gloss_Java"><trademark>Java</trademark></link> application may create instances of class <code>User</code> and save these to a database:</para> <figure xml:id="processObjPersist"> <title>Persistence across process boundaries</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/persistence.fig" scale="65"/> </imageobject> </mediaobject> </figure> <para>Both the <trademark xlink:href="http://www.oracle.com/technetwork/java/javase">JRE</trademark> instances and the RDBMS database server are processes (or sets of processes) typically existing in different address spaces. The two <trademark xlink:href="http://www.oracle.com/technetwork/java/javase">JRE</trademark> processes mentioned here may as well be started in disjoint address spaces. In fact we might even run two entirely different applications implemented in different programming languages like <abbrev xlink:href="http://www.php.net">PHP</abbrev>.</para> <para>It is important to mention that the two arrows <quote>save</quote> and <quote>load</quote> thus typically denote a communication across machine boundaries.</para> </section> <section xml:id="jdbcIntro"> <title>Introduction to <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark></title> <section xml:id="jdbcWrite"> <title>Write access, principles</title> <para>Connecting an application to a database means to establish a connection from a client to a database server:</para> <figure xml:id="jdbcClientServer"> <title>Networking between clients and database servers</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/clientserv.fig"/> </imageobject> </mediaobject> </figure> <para>So <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> is just one among a whole bunch of protocol implementations connecting database servers and applications. Consequently <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> is expected to appear in the lower layer of multi-tier applications. We take a three-tier application as a starting point:</para> <figure xml:id="jdbcThreeTier"> <title>The role of <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> in a three-tier application</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/jdbcThreeTier.fig"/> </imageobject> </mediaobject> </figure> <para>We may add an additional layer. Web applications are typically being build on top of an application server (<productname xlink:href="http://www.ibm.com/software/de/websphere/">WebSphere</productname>, <productname xlink:href="http://glassfish.java.net">Glassfish</productname>, <productname xlink:href="http://www.jboss.org/jbossas">Jboss</productname>,...) providing additional services:</para> <figure xml:id="jdbcFourTier"> <title><trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> connecting application server and database.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/jdbcFourTier.fig"/> </imageobject> </mediaobject> </figure> <para>So what is actually required to connect to a database server? A client requires the following parameter values to open a connection:</para> <orderedlist> <listitem xml:id="ItemJdbcProtocol"> <para>The type of database server i.e. <productname xlink:href="http://www.oracle.com/us/products/database">Oracle</productname>, <productname xlink:href="www.ibm.com/software/data/db2">DB2</productname>, <productname xlink:href="http://www-01.ibm.com/software/data/informix">Informix</productname>, <productname xlink:href="http://www.mysql.com">Mysql</productname> etc. This information is needed because of vendor dependent <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> protocol implementations.</para> </listitem> <listitem> <para>The server's <link xlink:href="http://en.wikipedia.org/wiki/Domain_Name_System">DNS</link> name or IP number</para> </listitem> <listitem> <para>The database service's port number at the previously defined host. The database server process listens for connections to this port number.</para> </listitem> <listitem xml:id="itemJdbcDatabaseName"> <para>The database name within the given database server</para> </listitem> <listitem> <para>Optional: A database user's account name and password.</para> </listitem> </orderedlist> <para>Items <xref linkend="ItemJdbcProtocol"/> - <xref linkend="itemJdbcDatabaseName"/> will be encapsulated into a so called <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> <link xlink:href="http://en.wikipedia.org/wiki/Uniform_Resource_Locator">URL</link>. We consider a typical example corresponding to the previous parameter list:</para> <figure xml:id="jdbcUrlComponents"> <title>Components of a <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> URL</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/jdbcurl.fig" scale="65"/> </imageobject> </mediaobject> </figure> <para>In fact this <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> URL example closely resembles other types of URL strings as being defined in <uri xlink:href="http://www.ietf.org/rfc/rfc2396.txt">http://www.ietf.org/rfc/rfc2396.txt</uri>. Look for <code>opaque_part</code> to understand the second <quote>:</quote> in the protocol definition part of a <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> URL. Common example for <abbrev xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev>s are:</para> <itemizedlist> <listitem> <para><code>http://www.hdm-stuttgart.de/aaa</code></para> </listitem> <listitem> <para><code>http://someserver.com:8080/someResource</code></para> </listitem> <listitem> <para><code>ftp://mirror.mi.hdm-stuttgart.de/Firmen</code></para> </listitem> </itemizedlist> <para>We notice the explicit mentioning of a port number 8080 in the second example; The default <abbrev xlink:href="http://www.w3.org/Protocols">http</abbrev> protocol port number is 80. So if a web server accepts connections at port 80 we do not have to specify this value. A web browser will automatically use this default port.</para> <para>Actually the notion <quote><code>jdbc:mysql</code></quote> denotes a sub protocol implementation namely<orgname> Mysql</orgname>'s implementation of <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>. Connecting to an IBM DB2 server would require jdbc:db2 for this protocol part.</para> <para>In contrast to <abbrev xlink:href="http://www.w3.org/Protocols">http</abbrev> no standard ports are <quote>officially</quote> assigned for <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> protocol variants. Due to vendor specific implementations this does not make any sense. Thus we <emphasis role="bold">always</emphasis> have to specify the port number when opening <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> connections.</para> <para>Writing <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> based applications follows a simple scheme:</para> <figure xml:id="jdbcArchitecture"> <title>Architecture of JDBC</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/jdbcarch.fig"/> </imageobject> </mediaobject> </figure> <para>From a programmer's point of view the <classname>java.sql.DriverManager</classname> is a bootstrapping object: Other objects like Statement instances are created from this central and unique object.</para> <para>The first instance being created by the <classname>java.sql.DriverManager</classname> is an object of type <classname>java.sql.Connection</classname>. In <xref linkend="exerciseJdbcWhyInterface"/> we discuss the way vendor specific implementation details are hidden by Interfaces. We can distinguish between:</para> <orderedlist> <listitem> <para>Vendor neutral specific parts of a <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> environment. These are those components being shipped by Oracle or other organizations providing <link linkend="gloss_Java"><trademark>Java</trademark></link> runtimes. The class <classname>java.sql.DriverManager</classname> belongs to this domain.</para> </listitem> <listitem> <para>Vendor specific parts. In <xref linkend="jdbcArchitecture"/> this starts with the <classname>java.sql.Connection</classname> object.</para> </listitem> </orderedlist> <para>The <classname>java.sql.Connection</classname> object thus marks the boundary between a <trademark xlink:href="http://www.oracle.com/technetwork/java/javase">JDK</trademark> / <trademark xlink:href="http://www.oracle.com/technetwork/java/javase">JRE</trademark> and a <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> Driver implementation from e.g. Oracle or other institutions.</para> <para><xref linkend="jdbcArchitecture"/> does not show details about the relations between <classname>java.sql.Connection</classname>, <classname>java.sql.Statement</classname> and <classname>java.sql.ResultSet</classname> objects. We start by giving a rough description of the tasks and responsibilities these three types have:</para> <glosslist> <glossentry> <glossterm><classname>java.sql.Connection</classname></glossterm> <glossdef> <para>Holding a permanent connection to a database server. Both client and server can contact each other. The database server may for example terminate a transaction if problems like deadlocks occur.</para> </glossdef> </glossentry> <glossentry> <glossterm><classname>java.sql.Statement</classname></glossterm> <glossdef> <para>We have two distinct classes of actions:</para> <orderedlist> <listitem> <para>Instructions to modify data on the database server. These include <code>INSERT</code>, <code>UPDATE</code> and <code>DELETE</code> operations as far as <abbrev>SQL-DML</abbrev> is concerned. <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> acts as a means of transport and merely returns integer values back to the client like the number of rows being affected by an UPDATE.</para> </listitem> <listitem> <para>Instructions reading data from the server. This is done by sending SELECT statements. It is not sufficient to just return integer values: Instead <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> needs to copy complete datasets back to the client to fill containers being accessible by applications. This is being discussed in <xref linkend="jdbcRead"/>.</para> </listitem> </orderedlist> </glossdef> </glossentry> </glosslist> <para>We shed some light on the relationship between these important <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> components and their respective creation:<figure xml:id="jdbcObjectCreation"> <title>Important <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> instances and relationships.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/jdbcObjectRelation.fig"/> </imageobject> </mediaobject> </figure></para> </section> <section xml:id="writeAccessCoding"> <title>Write access, coding!</title> <para>So how does it actually work with respect to coding? You may want to read <xref linkend="toolingConfigJdbc"/> before starting your exercises. We first prepare a database table using Eclipse's database tools:</para> <figure xml:id="figSchemaPerson"> <title>A relation <code>Person</code> containing names and email addresses</title> <programlisting><emphasis role="strong">CREATE</emphasis> <emphasis role="strong">TABLE</emphasis> Person ( name CHAR(20) ,email CHAR(20) <emphasis>UNIQUE</emphasis>)</programlisting> </figure> <para>Our actual (toy) <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> application will insert a single object ('Jim', 'jim@foo.org') into the <code>Person</code> relation. This is simpler than reading data since no client <classname>java.sql.ResultSet</classname> container is needed:</para> <figure xml:id="figJdbcSimpleWrite"> <title>A simple <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> application inserting data into a relational table.</title> <programlisting language="java">01 package sda.jdbc.intro.v1; 02 03 import java.sql.Connection; 04 import java.sql.DriverManager; 05 import java.sql.SQLException; 06 import java.sql.Statement; 07 08 public class SimpleInsert { 09 10 public static void main(String[] args) throws SQLException { 11 // Step 1: Open a connection to the database server 12 final Connection conn = DriverManager.getConnection( 13 "jdbc:mysql://localhost:3306/hdm", "hdmuser", "XYZ"); 14 // Step 2: Create a Statement instance 15 final Statement stmt = conn.createStatement(); 16 // Step 3: Execute the desired INSERT 17 final int updateCount = stmt.executeUpdate( 18 "INSERT INTO Person VALUES('Jim', 'jim@foo.org')"); 19 // Step 4: Give feedback to the enduser 20 System.out.println("Successfully inserted " + updateCount + " dataset(s)"); 21 } 22 }</programlisting> </figure> <para>Looks simple? Unfortunately it does not (yet) work:</para> <programlisting>Exception in thread "main" java.sql.SQLException: <emphasis role="bold"> No suitable driver found for jdbc:mysql://localhost:3306/hdm</emphasis> at java.sql.DriverManager.getConnection(DriverManager.java:604) at java.sql.DriverManager.getConnection(DriverManager.java:221) at sda.jdbc.intro.SimpleInsert.main(SimpleInsert.java:12)</programlisting> <para>What's wrong here? In <xref linkend="figureConfigJdbcDriver"/> we needed a <productname xlink:href="http://www.mysql.com">Mysql</productname> <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> Driver implementation <filename>mysql-connector-java.jar</filename> as a prerequisite to open connections to a database server. This implementation is mandatory for our toy application as well. All we have to do is adding <filename>mysql-connector-java.jar</filename> to our <link linkend="gloss_Java"><trademark>Java</trademark></link> <varname>CLASSPATH</varname> at <emphasis role="bold">runtime</emphasis>.</para> <para>Depending on our <link linkend="gloss_Java"><trademark>Java</trademark></link> environment this will be achieved by different means. Eclipse requires the definition of a run configuration as being described in <uri xlink:href="http://help.eclipse.org/juno/index.jsp?topic=/org.eclipse.jdt.doc.user/tasks/tasks-java-local-configuration.htm">http://help.eclipse.org/juno/index.jsp?topic=/org.eclipse.jdt.doc.user/tasks/tasks-java-local-configuration.htm</uri>. When configuring a run-time configuration for <classname>sda.jdbc.intro.SimpleInsert</classname> we have to add <filename>mysql-connector-java.jar</filename> to the <varname>Classpath</varname> tab. The following screen shot shows a working configuration:</para> <figure xml:id="figureConfigRunExtJar"> <title>Creating an Eclipse run time configuration containing a <productname xlink:href="http://www.mysql.com">Mysql</productname> <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> Driver Jar marked red.</title> <screenshot> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/runConfigJarAnnot.screen.png" scale="70"/> </imageobject> </mediaobject> </screenshot> </figure> <para>This time execution works as expected:</para> <programlisting>Successfully inserted 1 dataset(s)</programlisting> <qandaset role="exercise"> <title>Exception on inserting objects</title> <qandadiv> <qandaentry> <question> <para>A second invocation of <classname>sda.jdbc.intro.v1.SimpleInsert</classname> yields the following runtime error:</para> <programlisting>Exception in thread "main" com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: <emphasis role="bold">Duplicate entry 'jim@foo.org' for key 'email'</emphasis> ... at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1617) at sda.jdbc.intro.SimpleInsert.main(SimpleInsert.java:17)</programlisting> </question> <answer> <para>This expected error is easy to understand: The exception's message text <emphasis role="bold">Duplicate entry 'Jim' for key 'PRIMARY'</emphasis> informs us about a UNIQUE key constraint violation with respect to the attribute <code>email</code> in our schema definition in <xref linkend="figSchemaPerson"/>. We cannot add a second entry with the same value <code>'jim@foo.org'</code>.</para> </answer> </qandaentry> </qandadiv> </qandaset> <para>It is worth to mention that the <productname xlink:href="http://www.mysql.com">Mysql</productname> driver implementation does not have to be available at compile time. JDBC uses interfaces in favour of concrete class. Only at runtime we do need concrete classes.</para> <para>On the other hand when working with eclipse we need a separate runtime configuration for each runnable <link linkend="gloss_Java"><trademark>Java</trademark></link> application. This becomes tedious after some time. So you may want to follow the author and just add <filename>mysql-connector-java.jar</filename> to your compile time <envar>CLASSPATH</envar>.</para> <para>We now discuss some important methods being defined in the relevant <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> interfaces:</para> <glosslist> <glossentry> <glossterm><classname>java.sql.Connection</classname></glossterm> <glossdef> <itemizedlist> <listitem> <para><link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#createStatement()">createStatement()</link></para> </listitem> <listitem> <para><link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#setAutoCommit(boolean)">setAutoCommit()</link>, <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#getAutoCommit()">getAutoCommit()</link></para> </listitem> <listitem> <para><link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#getWarnings()">getWarnings()</link></para> </listitem> <listitem> <para><link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#isClosed()">isClosed()</link>, <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#isValid(int)">isValid(int timeout)</link></para> </listitem> <listitem> <para><link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#rollback()">rollback()</link>, <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#commit()">commit()</link> and .</para> </listitem> <listitem> <para><link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#close()">close()</link></para> </listitem> </itemizedlist> </glossdef> </glossentry> <glossentry> <glossterm><classname>java.sql.Statement</classname></glossterm> <glossdef> <itemizedlist> <listitem> <para><link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#executeUpdate(java.lang.String)">executeUpdate(String sql)</link></para> </listitem> <listitem> <para><link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#getConnection()">getConnection()</link></para> </listitem> <listitem> <para><link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#getResultSet()">getResultSet()</link></para> </listitem> <listitem> <para><link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#close()">close()</link> and <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#isClosed()">isClosed()</link></para> </listitem> </itemizedlist> </glossdef> </glossentry> </glosslist> <qandaset role="exercise"> <title><trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> and transactions</title> <qandadiv> <qandaentry> <question> <para><link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#setAutoCommit(boolean)">How does the method setAutoCommit()</link> relate to <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#commit()">commit()</link> and <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#rollback()">rollback()</link>?</para> </question> <answer> <para>A connections default state is <code>autocommit == true</code>. This means that individual SQL statements are executed as separate transactions.</para> <para>If we want to group two or more statements into a transaction we have to:</para> <orderedlist> <listitem> <para>Call <code>connection.setAutoComit(false)</code></para> </listitem> <listitem> <para>From now on subsequent SQL statements will implicitly become part of a transaction till either of the three events happens:</para> <orderedlist numeration="loweralpha"> <listitem> <para><code>connection.commit()</code></para> </listitem> <listitem> <para><code>connection.rollback()</code></para> </listitem> <listitem> <para>The transaction gets aborted by the database server. This may for example happen in case of a deadlock conflict with a second transaction.</para> </listitem> </orderedlist> <para>Note that the first two events are initiated by our client software. The third possible action is being carried out by the database server.</para> </listitem> </orderedlist> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset role="exercise"> <title>Closing <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> connections</title> <qandadiv> <qandaentry> <question> <para>Why is it very important to call the close() method for <classname>java.sql.Connection</classname> and / or <classname>java.sql.Statement</classname> instances?</para> </question> <answer> <para>A <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> connection ties network resources (socket connections). These may be used up if e.g. new connections get established within a loop without being closed.</para> <para>The situation is comparable to memory leaks when using programming languages lacking a garbage collector.</para> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset role="exercise"> <title>Aborted transactions</title> <qandadiv> <qandaentry> <question> <para>In the previous exercise we mentioned the possibility of a transaction abort issued by the database server. Which responsibility arises for an application programmer? Hint: How may an implementation become aware of such an abort transaction event?</para> </question> <answer> <para>If a database server aborts a transaction a <classname>java.sql.SQLException</classname> will be thrown. An application must be aware of this possibility and thus implement a sensible <code>catch(...)</code> clause accordingly.</para> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset role="exercise"> <title>Interfaces and classes in <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark></title> <qandadiv> <qandaentry xml:id="exerciseJdbcWhyInterface"> <question> <para>The <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> standard mostly defines interfaces as <classname>java.sql.Connection</classname> and <classname>java.sql.Statement</classname>. Why are these not being defined as classes? Moreover why is <classname>java.sql.DriverManager</classname> being defined as a class rather than an interface?</para> <para>You may want to supply code examples to explain your argumentation.</para> </question> <answer> <para>Figure <xref linkend="jdbcArchitecture"/> tells us about the vendor independent architecture of <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>. Oracle for example may implement a class <code>com.oracle.jdbc.OracleConnection</code>:</para> <programlisting annotations="nojavadoc">package com.oracle.jdbc; import java.sql.Connection; import java.sql.Statement; import java.sql.SQLException; public class OracleConnection implements Connection { ... Statement createStatement(int resultSetType, int resultSetConcurrency) throws SQLException) { // Implementation omitted here due to // limited personal hacking capabilities ... } ... }</programlisting> <para>If a programmer only uses the <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> interfaces rather than a vendor's classes it is much easier to make the resulting application work with different databases from other vendors. This way a company's implementation is not exposed to our own <link linkend="gloss_Java"><trademark>Java</trademark></link> code.</para> <para>Regarding the special role of <classname>java.sql.DriverManager</classname> we notice the need of a starting point: We have to create an initial instance of some class. In theory (<emphasis role="bold">BUT NOT IN PRACTICE!!!</emphasis>) the following (ugly code) might be possible:</para> <programlisting>package my.personal.application; import java.sql.Connection; import java.sql.Statement; import java.sql.SQLException; public someClass { public void someMethod(){ Connection conn = <emphasis role="bold">new OracleConnection()</emphasis>; // bad idea! ... } ... }</programlisting> <para>The problem with this approach is the explicit constructor call: Whenever we want to use another database we have two possibilities:</para> <itemizedlist> <listitem> <para>Rewrite our code.</para> </listitem> <listitem> <para>Introduce some sort of switch statement to provide a fixed number of databases beforehand:</para> <programlisting>public void someMethod(final String vendor){ final Connection conn; switch(vendor) { case "ORACLE": conn = new OracleConnection(); break; case "DB2": conn = new Db2Connection(); break; default: conn = null; break; } ... }</programlisting> <para>Adding a new database still requires code rewriting.</para> </listitem> </itemizedlist> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset role="exercise"> <title>Driver dispatch mechanism</title> <qandadiv> <qandaentry> <question> <para>In exercise <xref linkend="exerciseJdbcWhyInterface"/> we saw a hypothetic way to resolve the interface/class resolution problem by using a switch clause. How is this <code>switch</code> clause's logic actually realized in a <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> based application? (<quote>behind the scenes</quote>)</para> <para>Hint: Read the documentation of <classname>java.sql.DriverManager</classname>.</para> </question> <answer> <para>Prior to opening a Connection a <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> driver registers itself at the <classname>java.sql.DriverManager</classname> singleton instance. For this purpose the standard defined the method <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/DriverManager.html#registerDriver(java.sql.Driver)">registerDriver(Driver)</link>. On success the <classname>java.sql.DriverManager</classname> adds the driver to an internal dictionary:</para> <informaltable border="1"> <col width="20%"/> <col width="30%"/> <tr> <th>protocol</th> <th>driver instance</th> </tr> <tr> <td>jdbc:mysql</td> <td>mysqlDriver instance</td> </tr> <tr> <td>jdbc:oracle</td> <td>oracleDriver instance</td> </tr> <tr> <td>...</td> <td>...</td> </tr> </informaltable> <para>So whenever the method <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/DriverManager.html#getConnection(java.lang.String,%20java.lang.String,%20java.lang.String)">getConnection()</link> is being called the <classname>java.sql.DriverManager</classname> will scan the <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> URL and isolate the protocol part. If we start with <code>jdbc:mysql://someserver.com:3306/someDatabase</code> this is just <code>jdbc:mysql</code>. The value is then being looked up in the above table of registered drivers to choose an appropriate instance or null otherwise. This way our hypothetic switch including the default value null is actually implemented.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="propertiesFile"> <title>Connection properties</title> <para>So far our application depicted in <xref linkend="figJdbcSimpleWrite"/> suffers both from missing error handling and hard-coded parameters.</para> <para>Professional applications must be configurable. Changing the password currently requires source code modification and recompilation. <link linkend="gloss_Java"><trademark>Java</trademark></link> offers a standard procedure to externalize parameters like <varname>username</varname>, <varname>password</varname> an <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> connection URL as being present in <xref linkend="figJdbcSimpleWrite"/>: We may externalize these parameters to external so called properties files:</para> <figure xml:id="propertyExternalization"> <title>Externalize a single string <code>"User name"</code> to a separate file <filename>message.properties</filename>.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/externalize.fig"/> </imageobject> </mediaobject> </figure> <para>The current figure shows the externalization of just a single property. The file <filename>message.properties</filename> contains key-value pairs. The key <code>PropHello.uname</code> contains the value <code>User name</code>. Multiple strings may be externalized to the same properties file.</para> <para>Eclipse does have tool support for externalization. Simply hit Source --> Externalize Strings from the context menu. This activates a wizard to define property keys, renaming the generated helper class' name and finally create the actual <filename>message.properties</filename> file.</para> <qandaset role="exercise"> <title>Moving <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> <abbrev xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev> and credentials to a property file</title> <qandadiv> <qandaentry> <question> <para>Start executing the code given in <xref linkend="figJdbcSimpleWrite"/>. Then extend this example by externalizing all <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> related connection parameters to a <filename>jdbc.properties</filename> file like:</para> <programlisting>SimpleInsert.jdbcUrl=jdbc:mysql://localhost:3306/hdm SimpleInsert.password=XYZ SimpleInsert.username=hdmuser</programlisting> <para>As being stated earlier the eclipse wizard assists you by generating both the properties file and a helper class reading that file at runtime.</para> </question> <answer> <para>The current exercise is mostly related to tooling. From our <link linkend="gloss_Java"><trademark>Java</trademark></link> code the context menu allows us to choose the desired wizard:</para> <informalfigure> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/externalize.screen.png"/> </imageobject> </mediaobject> </informalfigure> <para>We may now:</para> <itemizedlist> <listitem> <para>Select the strings to be externalized.</para> </listitem> <listitem> <para>Supply key names. In the subsequent screenshot this task has already been started by manually replacing the default <code>SimpleInsert.1</code> by <code>Simpleinsert.jdbc</code>.</para> </listitem> <listitem> <para>Redefine other parameters like prefix, properties file name etc. In the following screenshot only the first of three keys has been manually renamed to the sensible value <varname>SimpleInsert.jdbc</varname>.</para> </listitem> </itemizedlist> <informalfigure> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/externalize2.screen.png"/> </imageobject> </mediaobject> </informalfigure> <para>The wizard also generates a class <classname>sda.jdbc.intro.v1.DbProps</classname> to actually access our properties:</para> <programlisting language="java">package sda.jdbc.intro.v1; ... public class DbProps { private static final String BUNDLE_NAME = "sda.jdbc.intro.v1.database"; private static final ResourceBundle RESOURCE_BUNDLE = ResourceBundle .getBundle(BUNDLE_NAME); private DbProps() { } public static String getString(String key) { try { return RESOURCE_BUNDLE.getString(key); } catch (MissingResourceException e) { return '!' + key + '!'; } } }</programlisting> <para>Our <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> related code now contains three references to external properties:</para> <programlisting language="java">package sda.jdbc.intro.v1; ... public class SimpleInsert { public static void main(String[] args) throws SQLException { // Step 1: Open a connection to the database server final Connection conn = DriverManager.getConnection ( <emphasis role="bold">DbProps.getString("PersistenceHandler.jdbcUrl"), </emphasis> <emphasis role="bold">DbProps.getString("PersistenceHandler.username")</emphasis>, <emphasis role="bold">DbProps.getString("PersistenceHandler.password")</emphasis>); // Step 2: Create a Statement instance final Statement stmt = conn.createStatement(); // Step 3: Execute the desired INSERT final int updateCount = stmt.executeUpdate( "INSERT INTO Person VALUES('Jim', 'jim@foo.org')"); // Step 4: Give feedback to the enduser System.out.println("Successfully inserted " + updateCount + " dataset(s)"); } }</programlisting> <para>The current base name <classname>sda.jdbc.intro.v1.PersistenceHandler</classname> is related to a later exercise.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="sectSimpleInsertGui"> <title>A first GUI sketch</title> <para>So far all data records being transferred to the database server are still hard-coded in our application. In practice a user wants to enter data of persons to be submitted to the database.</para> <para>We now guide you to develop a first version of a simple GUI for this tasks. A more <link linkend="figureDataInsert2">elaborate version</link> will be presented in a follow-up exercise. The screenshot illustrates the intended application behaviour:</para> <figure xml:id="simpleInsertGui"> <title>A simple GUI to insert data into a database server.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/simpleInsertGui.screen.png"/> </imageobject> <caption> <para>After clicking <quote>Insert</quote> a message is being presented to the user. This message may as well indicate a failure.</para> </caption> </mediaobject> </figure> <para>Implementing Swing GUI applications requires knowledge as being taught in e.g. <link xlink:href="http://www.hdm-stuttgart.de/studenten/stundenplan/vorlesungsverzeichnis/vorlesung_detail?vorlid=5212221">113300 Entwicklung von Web-Anwendungen</link>. If you do not (yet) feel comfortable writing <productname xlink:href="http://docs.oracle.com/javase/tutorial/uiswing/index.html">Swing</productname> applications you may want to read <uri xlink:href="http://www.javamex.com/tutorials/swing">http://www.javamex.com/tutorials/swing</uri> and <emphasis role="bold">really</emphasis> understand the examples being presented therein.</para> <qandaset role="exercise"> <title>GUI for inserting Person data to a database server</title> <qandadiv> <qandaentry> <question> <para>Write a GUI application as being outlined in <xref linkend="simpleInsertGui"/>. You may proceed as follows:</para> <orderedlist> <listitem> <para>Write a dummy GUI without any database functionality. Only present the two labels an input fields and the Insert button.</para> </listitem> <listitem> <para>Add an <classname>java.awt.event.ActionListener</classname> which generates a SQL INSERT Statement when clicking the Insert button. Return this string to the user as being shown in the message window of <xref linkend="simpleInsertGui"/>.</para> <para>At this point you still do not need a database connection. The message shown to the user is just a fake, so the GUI <emphasis role="bold">appears</emphasis> to be working.</para> </listitem> <listitem> <para>Establish a <classname>java.sql.Connection</classname> and create a <classname>java.sql.Statement</classname> instance when launching your application. Use the latter in your <classname>java.awt.event.ActionListener</classname> to actually insert datasets into your database.</para> </listitem> </orderedlist> </question> <answer> <para>The complete implementation resides in <classname>sda.jdbc.intro.v01.InsertPerson</classname>:</para> <programlisting language="java">package sda.jdbc.intro.v01; import ... public class InsertPerson extends JFrame { ... public InsertPerson () throws SQLException{ super ("Add a person's data"); setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); final JPanel databaseFieldPanel = new JPanel(); databaseFieldPanel.setLayout(new GridLayout(0,2)); add(databaseFieldPanel, BorderLayout.CENTER); databaseFieldPanel.add(new JLabel("Name:")); final JTextField nameField = new JTextField(15); databaseFieldPanel.add(nameField); databaseFieldPanel.add(new JLabel("E-mail:")); final JTextField emailField = new JTextField(15); databaseFieldPanel.add(emailField); final JButton insertButton = new JButton("Insert"); add(insertButton, BorderLayout.SOUTH); final Connection conn = DriverManager.getConnection( "jdbc:mysql://localhost:3306/hdm", "hdmuser", "XYZ"); final Statement stmt = conn.createStatement(); insertButton.addActionListener(new ActionListener() { // Linking the GUI to the database server. We assume an open // connection and a correctly initialized Statement instance @Override public void actionPerformed(ActionEvent event) { final String sql = "INSERT INTO Person VALUES('" + nameField.getText()+ "', '" + emailField.getText() + "')"; // We have to catch this Exception because an ActionListener's signature // prohibits the existence of a "throws" clause. try { final int updateCount = stmt.executeUpdate(sql); JOptionPane.showMessageDialog(null, "Successfully executed \n'" + sql + "'\nand inserted " + updateCount + " dataset"); } catch (SQLException e) { e.printStackTrace(); } } }); pack(); } }</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="jdbcExceptions"> <title>Handling possible exceptions</title> <para>Our current code lacks any kind of error handling: Exceptions will not be caught at all and invariably lead to program termination. This is of course inadequate regarding professional software. In case of problems we have to:</para> <itemizedlist> <listitem> <para>Gracefully recover or shut down our application. We may for example show a pop up window <quote>Terminating due to an internal error</quote>.</para> </listitem> <listitem> <para>Enable the customer to supply the development team with helpful information. The user may for example be asked to submit a log file in case of errors.</para> </listitem> </itemizedlist> <para>In addition the solution <classname>sda.jdbc.intro.v01.InsertPerson</classname> contains an ugly mix of GUI components and database related code. We take a first step to decouple these two distinct concerns:</para> <qandaset role="exercise" xml:id="exercicseGuiStateful"> <title>Handling the database layer</title> <qandadiv> <qandaentry> <question> <para>Implement a class <code>PersistenceHandler</code> to be later used as a component of our next step GUI application prototype. This class should have the following methods:</para> <programlisting language="java">... /** * Handle database communication. There are two * distinct internal states <q>disconnected</q> and <q>connected</q>, see * {@link #isConnected()}. These two states may be toggled by invoking * {@link #connect()} and {@link #disconnect()} respectively. * * The following snippet illustrates the intended usage: * <pre> public static void main(String[] args) { final PersistenceHandler ph = new PersistenceHandler(); if (ph.connect()) { if (!ph.add("Jim", "jim@foo.com")) { System.err.println("Insert Error:" + ph.getErrorMessage()); } } else { System.err.println("Connect error:" + ph.getErrorMessage()); } }</pre> * * @author goik */ public class PersistenceHandler { ... /** * Instance in <q>disconnected</q> state. See {@link #isConnected()} */ public PersistenceHandler() {/* only present here to supply Javadoc comment */} /** * Inserting a (name, email) record into the database server. In case of * errors corresponding messages may subsequently be retrieved by calling * {@link #getErrorMessage()}. * * <dt><b>Precondition:</b></dt> <dd>must be in * <q>connected</q> state, see {@link #isConnected()}</dd> * * @param name * A person's name * @param email * A person's email address * * @return true if the current data record has been successfully inserted * into the database server. false in case of error(s). */ public boolean add(final String name, final String email){ ... } /** * Retrieving error messages in case a call to {@link #add(String, String)}, * {@link #connect()}, or {@link #disconnect()} yields an error. * * @return the error explanation corresponding to the latest failed * operation, null if no error yet occurred. */ public String getErrorMessage() { return ...; } /** * Open a connection to a database server. * * <dt><b>Precondition:</b><dd> * <dd>must be in <q>disconnected</q> state, see {@link #isConnected()}</dd> * * <dt><b>Precondition:</b><dd> * <dd>The following properties must be set: * <pre>PersistenceHandler.jdbcUrl=jdbc:mysql://localhost:3306/hdm PersistenceHandler.password=XYZ PersistenceHandler.username=foo</pre> * </dd> * * @return true if connecting was successful */ public boolean connect () { ... } /** * Close a connection to a database server and clean up JDBC related resources * * Error messages in case of failure may subsequently be retrieved by * calling {@link #getErrorMessage()}. * * <dt><b>Precondition:</b></dt> * <dd>must be in <q>connected</q> state, see {@link #isConnected()}</dd> * * @return true if disconnecting was successful, false in case error(s) occur. */ public boolean disconnect() { ... } /** * An instance can either be in <q>connected</q> or <q>disconnected</q> state. The * state can be toggled by invoking {@link #connect()} or * {@link #disconnect()} respectively. * * @return true if connected, false otherwise */ public boolean isConnected() { return ...; } }</programlisting> <para>Notice the two internal states <quote>disconnected</quote> and <quote>connected</quote>:</para> <figure xml:id="figPersistenceHandlerStates"> <title>Possible states and transitions for instances of <code>PersistenceHandler</code>.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/persistHandlerStates.fig"/> </imageobject> </mediaobject> </figure> <para>According to the above documentation a newly created <code>PersistenceHandler</code> instance should be in disconnected state. As being shown in the <link linkend="gloss_Java"><trademark>Java</trademark></link> class description you may test your implementation without any GUI code. If you are already familiar with unit testing this might be a good start as well.</para> </question> <answer> <para>We show a possible implementation of <classname>sda.jdbc.intro.v1.PersistenceHandler</classname>:</para> <programlisting language="java">package sda.jdbc.intro.v1; ... public class PersistenceHandler { Connection conn = null; Statement stmt = null; String errorMessage = null; /** * New instances are in <q>disconnected</q> state. See {@link #isConnected()} */ public PersistenceHandler() {/* only present here to supply Javadoc comment */} /** * Inserting a (name, email) record into the database server. In case of * errors corresponding messages may subsequently be retrieved by calling * {@link #getErrorMessage()}. * * <dt><b>Precondition:</b></dt> <dd>must be in * <q>connected</q> state, see {@link #isConnected()}</dd> * * @param name * A person's name * @param email * A person's email address * * @return true if the current data record has been successfully inserted * into the database server. false in case of error(s). */ public boolean add(final String name, final String email){ final String sql = "INSERT INTO Person VALUES('" + name + "', '" + email + "')"; try { stmt.executeUpdate(sql); return true; } catch (SQLException e) { errorMessage = "Unable to execute '" + sql + "': '" + e.getMessage() + "'"; return false; } } /** * Retrieving error messages in case a call to {@link #add(String, String)}, * {@link #connect()}, or {@link #disconnect()} yields an error. * * @return the error explanation corresponding to the latest failed * operation, null if no error yet occurred. */ public String getErrorMessage() { return errorMessage; } /** * Open a connection to a database server. * * <dt><b>Precondition:</b><dd> * <dd>must be in <q>disconnected</q> state, see {@link #isConnected()}</dd> * * <dt><b>Precondition:</b><dd> * <dd>The following properties must be set: * <pre>PersistenceHandler.jdbcUrl=jdbc:mysql://localhost:3306/hdm PersistenceHandler.password=XYZ PersistenceHandler.username=foo</pre> * </dd> * * @return true if connecting was successful */ public boolean connect () { try { conn = DriverManager.getConnection( DbProps.getString("PersistenceHandler.jdbcUrl"), DbProps.getString("PersistenceHandler.username"), DbProps.getString("PersistenceHandler.password")); try { stmt = conn.createStatement(); return true; } catch (SQLException e) { errorMessage = "Connection opened but Statement creation failed:\"" + e.getMessage() + "\"."; try { conn.close(); } catch (SQLException ee) { errorMessage += "Closing connection failed:\"" + e.getMessage() + "\"."; } conn = null; } } catch (SQLException e) { errorMessage = "Unable to open connection:\"" + e.getMessage() + "\"."; } return false; } /** * Close a connection to a database server and clean up JDBC related resources * * Error messages in case of failure may subsequently be retrieved by * calling {@link #getErrorMessage()}. * * <dt><b>Precondition:</b></dt> * <dd>must be in <q>connected</q> state, see {@link #isConnected()}</dd> * * @return true if disconnecting was successful, false in case error(s) occur. */ public boolean disconnect() { boolean resultStatus = true; final StringBuffer messageCollector = new StringBuffer(); try { stmt.close(); } catch (SQLException e) { resultStatus = false; messageCollector.append("Unable to close Statement:\"" + e.getMessage() + "\"."); } stmt = null; try { conn.close(); } catch (SQLException e) { resultStatus = false; messageCollector.append("Unable to close connection:\"" + e.getMessage() + "\"."); } conn = null; if (!resultStatus) { errorMessage = messageCollector.toString(); } return resultStatus; } /** * An instance can either be in <q>connected</q> or <q>disconnected</q> state. The * state can be toggled by invoking {@link #connect()} or * {@link #disconnect()} respectively. * * @return true if connected, false otherwise */ public boolean isConnected() { return null != conn; } }</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> <para>We may now complete the next enhancement step of our GUI database client.</para> <qandaset role="exercise"> <title>Connection on user action</title> <qandadiv> <qandaentry xml:id="exerciseGuiWriteTakeTwo"> <question> <label>An application writing records to a database server</label> <para>Our aim is to enhance the first GUI prototype being described in <xref linkend="simpleInsertGui"/>. The application shall start being disconnected from the database server. Prior to entering data the user shall be guided to open a connection. The following video illustrates the desired user interface:</para> <figure xml:id="figureDataInsert2"> <title>A GUI frontend for adding personal data to a server.</title> <mediaobject> <videoobject> <videodata fileref="Ref/Video/dataInsert.mp4"/> </videoobject> </mediaobject> </figure> <para>In case a user closes the main window while still being connected a disconnect from the database server shall be enforced. For this purpose we must handle the event when the user clicks on the closing button within the window decoration. An exit handler method is being required to terminate a potentially open database connection.</para> </question> <answer> <para>Our implementation uses the class <classname>sda.jdbc.intro.v1.PersistenceHandler</classname> for handling all database communication. The GUI needs to visualize the two different states <quote>disconnected</quote> and <quote>connected</quote>. In <quote>disconnected</quote> state the whole input pane for entering datasets and clicking the <quote>Insert</quote> button is locked. So the user is forced to actively open a database connection.</para> <para>Notice also the <classname>java.awt.event.WindowAdapter</classname> implementation being executed when closing the application's main window. The <methodname>java.awt.event.WindowAdapter.windowClosing(java.awt.event.WindowEvent)</methodname> method disconnects any existing database connection thus freeing resources.</para> <programlisting language="java">package sda.jdbc.intro.v1; import ... public class InsertPerson extends JFrame { private static final long serialVersionUID = 6815975741605247675L; final PersistenceHandler persistenceHandler = new PersistenceHandler(); final JTextField nameField = new JTextField(15), emailField = new JTextField(20); final JButton toggleConnectButton = new JButton(), insertButton = new JButton("Insert"); final JPanel databaseFieldPanel = new JPanel(); private void setGuiConnectionState(final boolean state) { if (state) { toggleConnectButton.setText("Disconnect"); } else { toggleConnectButton.setText("Connect"); } for (final Component c: databaseFieldPanel.getComponents()){ c.setEnabled(state); } } public static void main(String[] args) throws SQLException { InsertPerson app = new InsertPerson(); app.setVisible(true); } public InsertPerson (){ super ("Add a person's data"); setSize(500, 500); addWindowListener(new WindowAdapter() { // In case a user closes our application window while still being connected // we have to close the database connection. @Override public void windowClosing(WindowEvent e) { super.windowClosing(e); if (persistenceHandler.isConnected() && !persistenceHandler.disconnect()) { System.exit(1); } else { System.exit(0); } }); Box top = Box.createHorizontalBox(); add(top, BorderLayout.NORTH); top.add(toggleConnectButton); toggleConnectButton.addActionListener(new ActionListener() { @Override public void actionPerformed(ActionEvent e) { if (persistenceHandler.isConnected()) { if (persistenceHandler.disconnect()){ setGuiConnectionState(false); } else { JOptionPane.showMessageDialog(null, persistenceHandler.getErrorMessage()); } } else { if (persistenceHandler.connect()){ setGuiConnectionState(true); } else { JOptionPane.showMessageDialog(null, persistenceHandler.getErrorMessage()); } } } }); databaseFieldPanel.setLayout(new GridLayout(0,2)); add(databaseFieldPanel); databaseFieldPanel.add(new JLabel("Name:")); databaseFieldPanel.add(nameField); databaseFieldPanel.add(new JLabel("E-mail:")); databaseFieldPanel.add(emailField); insertButton.addActionListener(new ActionListener() { @Override public void actionPerformed(ActionEvent e) { if (persistenceHandler.add(nameField.getText(), emailField.getText())) { nameField.setText(""); emailField.setText(""); JOptionPane.showMessageDialog(null, "Succesfully inserted dataset"); } else { JOptionPane.showMessageDialog(null, persistenceHandler.getErrorMessage()); } } }); databaseFieldPanel.add(Box.createGlue()); databaseFieldPanel.add(insertButton); setGuiConnectionState(false); pack(); } }</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="jdbcSecurity"> <title><trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> and security</title> <section xml:id="jdbcSecurityNetwork"> <title>Network sniffing</title> <para>Sniffing <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> network traffic is one possibility for intruders to compromise database applications. This requires physical access to either of:</para> <itemizedlist> <listitem> <para>Server host</para> </listitem> <listitem> <para>Client host</para> </listitem> <listitem> <para>intermediate hub, switch or router.</para> </listitem> </itemizedlist> <figure xml:id="figJdbcSniffing"> <title>Sniffing a <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> connection by an intruder.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/jdbcSniffing.fig"/> </imageobject> </mediaobject> </figure> <para>We demonstrate a possible attack by analyzing the network traffic between our application shown in <xref linkend="figJdbcSimpleWrite"/> and the <productname xlink:href="http://www.mysql.com">Mysql</productname> database server. Prior to starting the application we set up <productname xlink:href="http://www.wireshark.org">Wireshark</productname> for filtered capturing:</para> <itemizedlist> <listitem> <para>Connecting to the <varname>loopback</varname> (lo) interface only. This is sufficient since our client connects to <varname>localhost</varname>.</para> </listitem> <listitem> <para>Filtering packets if not of type <acronym xlink:href="http://en.wikipedia.org/wiki/Transmission_Control_Protocol">TCP</acronym> and having port number 3306</para> </listitem> </itemizedlist> <para>This yields the following capture being shortened for the sake of brevity:</para> <programlisting>[... 5.5.24-0ubuntu0.12.04.1.%...X*e?I1ZQ...................e,F[yoA5$T[N.mysql_native_password. A...........!.......................hdmuser <co xml:id="tcpCaptureUsername"/>......U.>S.%..~h...!.xhdm............j..../* ... INSERT INTO Person VALUES('Jim', 'jim@foo.org') <co xml:id="tcpCaptureSqlInsert"/>6... .&.#23000Duplicate entry 'jim@foo.org' for key 'email' <co xml:id="tcpCaptureErrmsg"/></programlisting> <calloutlist> <callout arearefs="tcpCaptureUsername"> <para>The <varname>username</varname> initiating the connection to the database server.</para> </callout> <callout arearefs="tcpCaptureSqlInsert"> <para>The <code>INSERT ...</code> statement.</para> </callout> <callout arearefs="tcpCaptureErrmsg"> <para>The resulting error message being sent back to the client.</para> </callout> </calloutlist> <para>Something seems to be missing here: The user's password. Our code in <xref linkend="figJdbcSimpleWrite"/> contains the password <quote><varname>XYZ</varname></quote> in clear text. But even using the search function of <productname xlink:href="http://www.wireshark.org">Wireshark</productname> does not show any such string within the above capture. The <productname xlink:href="http://www.mysql.com">Mysql</productname> documentation however <link xlink:href="http://dev.mysql.com/doc/refman/5.0/en/security-against-attack.html">reveals</link> that everything but the password is transmitted in clear text. So all we might identify is a hash of <code>XYZ</code>.</para> <para>So regarding our (current) <productname xlink:href="http://www.mysql.com">Mysql</productname> implementation the impact of this attack type is somewhat limited but still severe: All data being transmitted between client and server may be disclosed. This typically comprises sensible data as well. Possible solutions:</para> <itemizedlist> <listitem> <para>Create an encrypted tunnel between client and server like e.g. <link xlink:href="http://www.debianadmin.com/howto-use-ssh-local-and-remote-port-forwarding.html">ssh port forwarding</link> or <link xlink:href="http://de.wikipedia.org/wiki/Virtual_Private_Network">VPN</link>.</para> </listitem> <listitem> <para>Many database vendors <link xlink:href="http://dev.mysql.com/doc/refman/5.1/de/connector-j-reference-using-ssl.html">supply SSL</link> or similar <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> protocol encryption extensions. This requires additional configuration procedures like setting up server side certificates. Moreover similar to the http/https protocols encryption generally slows down data traffic.</para> </listitem> </itemizedlist> <para>Of course this is only relevant if the transport layer is considered to be insecure. If both server and client reside within the same trusted infrastructure no action has to be taken. We also note that this kind of problem is not limited to <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>. In fact all protocols lacking encryption are subject to this type of attack.</para> </section> <section xml:id="sqlInjection"> <title>SQL injection</title> <para>Before diving into technical details we shed some light on the possible impact of this common attack type being described in this chapter. Our example is the well known Heartland Payment Systems data breach:</para> <figure xml:id="figHeartlandSecurityBreach"> <title>Summary about possible SQL injection impact based on the Heartland security breach</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/heartland.fig"/> </imageobject> </mediaobject> </figure> <para>Why should we be concerned with SQL injection? In the introduction of <xref linkend="bibClarke09"/> a compelling argument is being given:</para> <blockquote> <para>Many people say they know what SQL injection is, but all they have heard about or experienced are trivial examples. SQL injection is one of the most devastating vulnerabilities to impact a business, as it can lead to exposure of all of the sensitive information stored in an application's database, including handy information such as usernames, passwords, names, addresses, phone numbers, and credit card details.</para> </blockquote> <para>In this lecture due to limited resources we only deal with trivial examples mentioned above. One possible way SQL injection attacks work is by inserting SQL code into fields being designed for end user input:</para> <figure xml:id="figSqlInject"> <title>SQL injection triggered by ordinary user input.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/sqlinject.fig"/> </imageobject> </mediaobject> </figure> <qandaset role="exercise"> <title>Attack from the dark side</title> <qandadiv> <qandaentry xml:id="sqlInjectDropTable"> <question> <para>Use the application from <xref linkend="exerciseGuiWriteTakeTwo"/> and <xref linkend="figSqlInject"/> to launch a SQL injection attack. We provide some hints:</para> <orderedlist> <listitem> <para>The <productname xlink:href="http://www.mysql.com">Mysql</productname> <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> driver implementation already provides precautions to hamper SQL injection attacks. In its default configuration a sequence of SQL commands separated by semicolons (<quote>;</quote>) will not be executed but flagged as a SQL syntax error. We take an example:</para> <programlisting>INSERT INTO Person VALUES (...);DROP TABLE Person</programlisting> <para>In order to execute these so called multi user queries we explicitly have to enable a <productname xlink:href="http://www.mysql.com">Mysql</productname> property. This may be achieved by extending our <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> URL:</para> <programlisting>jdbc:mysql://localhost:3306/hdm?<emphasis role="bold">allowMultiQueries=true</emphasis></programlisting> <para>The <productname xlink:href="http://www.mysql.com">Mysql</productname> manual <link xlink:href="http://dev.mysql.com/doc/refman/5.1/en/connector-j-reference-configuration-properties.html">contains </link>a remark regarding this parameter:</para> <remark>Notice that this has the potential for SQL injection if using plain java.sql.Statements and your code doesn't sanitize input correctly.</remark> <para>In other words: You have been warned!</para> </listitem> <listitem> <para>You may now use either of the two input fields <quote>name</quote> or <quote>email</quote> to inject arbitrary SQL code.</para> </listitem> </orderedlist> </question> <answer> <para>We construct a suitable string being injected to drop our <code>Person</code> table:</para> <programlisting>Jim', 'jim@c.com');DROP TABLE Person;INSERT INTO Person VALUES('Joe</programlisting> <para>This being entered into the name field kills our <code>Table</code> relation effectively. As the error message shows two INSERT statements are separated by a DROP TABLE statement. So after executing the first INSERT our database server drops the whole table. At last the second INSERT statement fails giving rise to an error message no end user will ever understand:</para> <figure xml:id="figSqlInjectDropPerson"> <title>Dropping the <code>Person</code> table by SQL injection</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/sqlInject.screen.png"/> </imageobject> </mediaobject> </figure> <para>According to the message text the table <code>Person</code> gets dropped as expected. Thus the subsequent (second) <code>INSERT</code> action is bound to fail.</para> <para>In practice this result my be avoided. The database user will (hopefully!) not have sufficient permissions to drop the whole table. Malicious modifications by INSERT, UPDATE or DELETE statements are still possible.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="sanitizeUserInput"> <title>Sanitizing user input</title> <para>There are at least two general ways to deal with the disastrous result of <xref linkend="sqlInjectDropTable"/>:</para> <itemizedlist> <listitem> <para>Keep the database server from interpreting user input completely. This is probably the best way and will be discussed in <xref linkend="sectPreparedStatements"/>.</para> </listitem> <listitem> <para>Let the application check and process user input. Dangerous user input may be modified prior to being embedded in SQL statements or being rejected completely.</para> </listitem> </itemizedlist> <para>The first method is definitely superior in most cases. There are however cases where the restrictions being implied are too severe. We may for example choose dynamically which tables shall be accessed. So an SQL statement's structure rather than just its predicates is affected by user input. There are at least two standard procedures dealing with this problem:</para> <glosslist> <glossentry> <glossterm>Input Filtering</glossterm> <glossdef> <para>In the simplest case we check a user's input by regular expressions. An example is an input field in a login window representing a system user name. Legal input may allows letters and digits only. Special characters, whitespace etc. are typically prohibited. The input does have a minimum length of one character. A maximum length may be imposed as well. So we may choose the regular expression <code>[A-Za-z0-9]+</code> to check valid user names.</para> </glossdef> </glossentry> <glossentry> <glossterm><foreignphrase>Whitelisting</foreignphrase></glossterm> <glossdef> <para>In many cases Input fields only allow a restricted set of values. Consider an input field for names of planets. An application may keep a dictionary table to validate user input:</para> <informaltable border="1"> <col width="10%"/> <col width="5%"/> <tr> <td>Mercury</td> <td>1</td> </tr> <tr> <td>Venus</td> <td>2</td> </tr> <tr> <td>Earth</td> <td>3</td> </tr> <tr> <td>...</td> <td>...</td> </tr> <tr> <td>Neptune</td> <td>9</td> </tr> <tr> <td><emphasis role="bold">Default:</emphasis></td> <td><emphasis role="bold">0</emphasis></td> </tr> </informaltable> <para>So if a user enters a valid planet name a corresponding number representing this particular planet will be sent to the database. If the user enters an invalid string an error message may be raised.</para> <para>In a GUI in many situations this may be better accomplished by presenting the list of planets to choose from. In this case a user has no chance to enter invalid or even malicious code.</para> </glossdef> </glossentry> </glosslist> <para>So we have an <quote>interceptor</quote> sitting between user input fields and SQL generating code:</para> <figure xml:id="figInputFiltering"> <title>Validating user input prior to dynamically composing SQL statements.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/filtering.fig"/> </imageobject> </mediaobject> </figure> <qandaset role="exercise"> <title>Using regular expressions in <link linkend="gloss_Java"><trademark>Java</trademark></link></title> <qandadiv> <qandaentry> <question> <para>This exercise is a preparation for <xref linkend="exercisefilterUserInput"/>. The aim is to deal with regular expressions and to use them in <link linkend="gloss_Java"><trademark>Java</trademark></link>. If you don't know yet about regular expressions / pattern matching you may want to read either of:</para> <itemizedlist> <listitem> <para><link xlink:href="http://www.aivosto.com/vbtips/regex.html">Regular expressions - An introduction</link></para> </listitem> <listitem> <para><link xlink:href="http://www.codeproject.com/Articles/939/An-Introduction-to-Regular-Expressions">An Introduction to Regular Expressions</link></para> </listitem> <listitem> <para><link xlink:href="http://www.regular-expressions.info/tutorial.html">Regular Expression Tutorial</link></para> </listitem> </itemizedlist> <para>Complete the implementation of the following skeleton:</para> <programlisting language="java">... import java.util.regex.Matcher; import java.util.regex.Pattern; public static void main(String[] args) { final String [] wordList = new String [] {"Eric", "126653BBb", "_login","some text"}; final String [] regexpList = new String[] {"[A-K].*", "[^0-9]+.*", "_[a-z]+", ""}; for (final String word: wordList) { for (final String regexp: regexpList) { testMatch(word, regexp); } } } /** * Matching a given word by a regular expression. A log message is being * written to stdout. * * Hint: The implementation is based on the explanation being given in the * introduction to {@link Pattern} * * @param word This string will be matched by the subsequent argument. * @param regexp The regular expression tested to match the previous argument. * @return true if regexp matches word, false otherwise. */ public static boolean testMatch(final String word, final String regexp) { .../* to be implemented by <emphasis role="bold">**YOU**</emphasis> */ }</programlisting> <para>As being noted in the <link linkend="gloss_Java"><trademark>Java</trademark></link> above you may want to read the documentation of class <classname>java.util.regex.Pattern</classname>. The intended output of the above application is:</para> <programlisting>The expression '[A-K].*' matches 'Eric' The expression '[^0-9]+.*' ... ...</programlisting> </question> <answer> <para>A possible implementation is given by <classname>sda.regexp.RegexpPrimer</classname>.</para> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset role="exercise"> <title>Input validation by regular expressions</title> <qandadiv> <qandaentry xml:id="exercisefilterUserInput"> <question> <para>The application of <xref linkend="sqlInjectDropTable"/> proved to be vulnerable to SQL injection. Sanitize the two user input field's values to prevent such behaviour.</para> <itemizedlist> <listitem> <para>Find appropriate regular expressions to check both username and email. Some hints:</para> <glosslist> <glossentry> <glossterm>username</glossterm> <glossdef> <para>Regarding SQL injection the <quote>;</quote> character is among the most critical. You may want to exclude certain special characters. This doesn't harm since their presence in a user's name is likely to be a typo rather then any sensitive input.</para> </glossdef> </glossentry> <glossentry> <glossterm>email</glossterm> <glossdef> <para>There are tons of <quote>ultimate</quote> regular expressions available to check email addresses. Remember that rather avoiding <quote>wrong</quote> email addresses the present task is to avoid SQL injection. So find a reasonable one which may be too permissive regarding RFC email syntax rules but sufficient to secure your application.</para> <para>A concise definition of an email's syntax is being given in <link xlink:href="http://tools.ietf.org/html/rfc5322#section-3.4.1">RFC5322</link>. Its implementation is beyond scope of the current lecture. Moreover it is questionable whether E-mail clients and mail transfer agents implement strict RFC compliance.</para> </glossdef> </glossentry> </glosslist> <para>Both regular expressions must cover the whole user input from the beginning to the end. This can be achieved by using <code>^ ... $</code>.</para> </listitem> <listitem> <para>The <link linkend="gloss_Java"><trademark>Java</trademark></link> standard class <classname>javax.swing.InputVerifier</classname> may help you validating user input.</para> </listitem> <listitem> <para>The following screenshot may provide an idea for GUI realization and user interaction in case of errors. Of course the submit button's action should be disabled in case of erroneous input. The user should receive a helpful error message instead.</para> <figure xml:id="figInsertValidate"> <title>Error message being presented to the user.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/insertValidate.screen.png"/> </imageobject> <caption> <para>In the current example the trailing <quote>;</quote> within the E-Mail field is invalid.</para> </caption> </mediaobject> </figure> </listitem> </itemizedlist> </question> <answer> <para>Extending <classname>javax.swing.InputVerifier</classname> allows us to build a generic class to filter user text input by arbitrary regular expressions:</para> <programlisting language="java">package sda.jdbc.intro.v1.sanitize; ... public class RegexpVerifier extends InputVerifier { final Pattern syntaxPattern; final JLabel validationLabel; private boolean inputValid = false; private final String errMsg; ... public RegexpVerifier (final String regex, final JLabel validationLabel, final String errMsg) { this.validationLabel = validationLabel; this.errMsg = errMsg; syntaxPattern = Pattern.compile(regex); } @Override public boolean verify(JComponent input) { if (input instanceof JTextField) { final String userInput = ((JTextField) input).getText(); if (syntaxPattern.matcher(userInput).find()) { validationLabel.setText(""); inputValid = true; } else { validationLabel.setText(errMsg); inputValid = false; } } return inputValid; } public boolean inputIsValid () { return inputValid; } }</programlisting> <para>Instances of <classname>sda.jdbc.intro.v1.sanitize.RegexpVerifier</classname> <coref linkend="emailVerifier"/> <coref linkend="nameVerifier"/> may now be used to validate our two input data fields <coref linkend="setNameValidation"/> <coref linkend="setEmailValidation"/>. We put emphasis on the changes with respect to <classname>sda.jdbc.intro.v1.InsertPerson</classname>:</para> <programlisting language="java">package sda.jdbc.intro.v1.sanitize; ... public class InsertPerson extends JFrame { final JTextField nameField = new JTextField(15); final JLabel nameFieldValidationLabel <co xml:id="nameVerifier"/> = new JLabel(); final RegexpVerifier nameFieldVerifier = new RegexpVerifier( "^[^;'\"]+$", nameFieldValidationLabel, "No special characters"); final JTextField emailField = new JTextField(20); final JLabel emailFieldValidationLabel <co xml:id="emailVerifier"/> = new JLabel(); final RegexpVerifier emailFieldVerifier = new RegexpVerifier("^[\\w\\-\\.\\_]+@[\\w\\-\\.]*[a-zA-Z]{2,4}$", emailFieldValidationLabel, "email not valid"); ... public static void main(String[] args) throws SQLException { InsertPerson app = new InsertPerson(); app.setVisible(true); } public InsertPerson (){ ... databaseFieldPanel.add(nameField); <emphasis role="bold">nameFieldValidationLabel.setForeground(Color.RED); databaseFieldPanel.add(nameFieldValidationLabel); nameField.setInputVerifier(nameFieldVerifier);</emphasis> <co xml:id="setNameValidation"/> databaseFieldPanel.add(new JLabel("E-mail:")); databaseFieldPanel.add(emailField); <emphasis role="bold">databaseFieldPanel.add(emailFieldValidationLabel); emailFieldValidationLabel.setForeground(Color.RED); emailField.setInputVerifier(emailFieldVerifier);</emphasis> <co xml:id="setEmailValidation"/> insertButton.addActionListener(new ActionListener() { @Override public void actionPerformed(ActionEvent e) { <emphasis role="bold">if (!nameFieldVerifier.inputIsValid() || !emailFieldVerifier.inputIsValid()) { JOptionPane.showMessageDialog(null, "Invalid input value(s)"); }</emphasis> else { ...</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="sectPreparedStatements"> <title><classname>java.sql.PreparedStatement</classname> objects</title> <para>Sanitizing user input is an essential means to secure an application. The <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> standard however provides a mechanism being superior regarding the purpose of protecting applications against SQL injection attacks. We shed some light on our current mechanism sending SQL statements to a database server:</para> <figure xml:id="sqlTransport"> <title>SQL statements in <link linkend="gloss_Java"><trademark>Java</trademark></link> applications get parsed at the database server</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/sqlTransport.fig"/> </imageobject> </mediaobject> </figure> <para>This architecture raises two questions:</para> <orderedlist> <listitem> <para>What happens in case identical SQL statements are executed repeatedly? This may happen inside a loop when thousands of records with identical structure are being sent to a database.</para> </listitem> <listitem> <para>Is this architecture adequate with respect to security concerns?</para> </listitem> </orderedlist> <para>The first question is related to performance: Parsing statements being identical despite the properties being contained within is a waste of resources. We consider the transfer of records between different databases:</para> <programlisting>INSERT INTO Person VALUES ('Jim', 'jim@q.org') INSERT INTO Person VALUES ('Eve', 'eve@y.org') INSERT INTO Person VALUES ('Pete', 'p@rr.com') ...</programlisting> <para>In this case it does not make sense to repeatedly parse identical SQL statements. Using single <code>INSERT</code> statements with multiple data records may not be an option when the number of records grows.</para> <para>The second question is related to our current security topic: The database server's interpreter my be so <quote>kind</quote> to interpret an attacker's malicious code as well.</para> <para>Both topics are being addressed by <classname>java.sql.PreparedStatement</classname> objects. Basically these objects allow for separation of an SQL statements structure from parameter values contained within. The scenario given in <xref linkend="sqlTransport"/> may be implemented as:</para> <figure xml:id="sqlTransportPrepare"> <title>Using <classname>java.sql.PreparedStatement</classname> objects.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/sqlTransportPrepare.fig"/> </imageobject> </mediaobject> </figure> <para>Prepared statements are an example for parameterized SQL statements which exist in various programming languages. When using <classname>java.sql.PreparedStatement</classname> instances we actually have three distinct phases:</para> <orderedlist> <listitem> <para xml:id="exerciseGuiWritePrepared">Creating an instance of <classname>java.sql.PreparedStatement</classname>. The SQL statement possibly containing place holders gets parsed.</para> </listitem> <listitem> <para>Setting all placeholder values. This does not involve any further SQL syntax parsing.</para> </listitem> <listitem> <para>Execute the statement.</para> </listitem> </orderedlist> <para>Steps 2. and 3. may be repeated as often as desired without any re-parsing of SQL statements thus saving resources on the database server side.</para> <para>Our introductory toy application <xref linkend="figJdbcSimpleWrite"/> may be rewritten using <classname>java.sql.PreparedStatement</classname> objects:</para> <programlisting language="java">sda.jdbc.intro.v1; ... public class SimpleInsert { public static void main(String[] args) throws SQLException { final Connection conn = DriverManager.getConnection (... // Step 2: Create a PreparedStatement instance final PreparedStatement pStmt = conn.prepareStatement( "INSERT INTO Person VALUES(<emphasis role="bold">?, ?</emphasis>)");<co xml:id="listPrepCreate"/> // Step 3a: Fill in desired attribute values pStmt.setString(1, "Jim");<co xml:id="listPrepSet1"/> pStmt.setString(2, "jim@foo.org");<co xml:id="listPrepSet2"/> // Step 3b: Execute the desired INSERT final int updateCount = pStmt.executeUpdate();<co xml:id="listPrepExec"/> // Step 4: Give feedback to the enduser System.out.println("Successfully inserted " + updateCount + " dataset(s)"); } }</programlisting> <calloutlist> <callout arearefs="listPrepCreate"> <para>An instance of <classname>java.sql.PreparedStatement</classname> is being created. Notice the two question marks representing two place holders for string values to be inserted in the next step.</para> </callout> <callout arearefs="listPrepSet1 listPrepSet2"> <para>Fill in the two placeholder values being defined at <coref linkend="listPrepCreate"/>.</para> <caution> <para>Since half the world of programming folks will index a list of n elements starting from 0 to n-1, <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> apparently counts from 1 to n. Working with <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> would have been too easy otherwise.</para> </caution> </callout> <callout arearefs="listPrepExec"> <para>Execute the beast! Notice the empty parameter list. No SQL is required since we already prepared it in <coref linkend="listPrepCreate"/>.</para> </callout> </calloutlist> <para>The problem of SQL injection disappears completely when using <classname>java.sql.PreparedStatement</classname> instances. An attacker may safely enter offending strings like:</para> <programlisting>Jim', 'jim@c.com');DROP TABLE Person;INSERT INTO Person VALUES('Joe</programlisting> <para>The above string will be taken <quote>as is</quote> and thus simply becomes part of the database server's content.</para> <qandaset role="exercise"> <title>Prepared Statements to keep the barbarians at the gate</title> <qandadiv> <qandaentry xml:id="exerciseSqlInjectPrepare"> <question> <para>In <xref linkend="sqlInjectDropTable"/> we found our implementation in <xref linkend="exerciseGuiWriteTakeTwo"/> to be vulnerable with respect to SQL injection. Rather than sanitizing user input you shall use <classname>java.sql.PreparedStatement</classname> objects to secure the application.</para> </question> <answer> <para>Due to our separation of GUI and persistence handling we only need to re-implement <classname>sda.jdbc.intro.sqlinject.PersistenceHandler</classname>. We have to replace <classname>java.sql.Statement</classname> by <classname>java.sql.PreparedStatement</classname> instances. A possible implementation is <classname>sda.jdbc.intro.v1.prepare.PersistenceHandler</classname>. We may now safely enter offending strings like:</para> <programlisting>Jim', 'jim@c.com');DROP TABLE Person;INSERT INTO Person VALUES('Joe</programlisting> <para>This time the input value is taken <quote>as is</quote> and yields the following error message:</para> <informalfigure> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/sqlInjectPrepare.screen.png"/> </imageobject> </mediaobject> </informalfigure> <para>The offending string exceeds the length of the attribute <code>name</code> within the database table <code>Person</code>. We may enlarge this value to allow the <code>INSERT</code> operation:</para> <programlisting>CREATE TABLE Person ( name char(<emphasis role="bold">80</emphasis>) <emphasis role="bold">-- a little bit longer --</emphasis> ,email CHAR(20) UNIQUE );</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> <para>We may have followed the track of test-driven development. In that case we would have written tests before actually implementing our application. In the current lecture we will do this the other way round in the following exercise. The idea is to assure software quality when fixing bugs or extending an application.</para> <para>The subsequent exercise requires the <productname xlink:href="http://testng.org/doc/eclipse.html#eclipse-installation">TestNG</productname> plugin for Eclipse to be installed. This should already be the case both in the MI exercise classrooms and in the Virtualbox image provided at <uri xlink:href="ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.vdi">ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.vdi</uri>. If you use a private Eclipse installation you may want to follow <xref linkend="testngInstall"/>.</para> <qandaset role="exercise"> <title>Testing <classname>sda.jdbc.intro.v1.PersistenceHandler</classname> using <productname xlink:href="http://testng.org">TestNG</productname></title> <qandadiv> <qandaentry> <question> <para>Read <xref linkend="chapUnitTesting"/>. Then test:</para> <itemizedlist> <listitem> <para>Proper behaviour when opening and closing connections.</para> </listitem> <listitem> <para>Proper behavior when inserting data</para> </listitem> <listitem> <para>Expected behaviour when entering duplicate values violating integrity constraints. Look for error messages as well.</para> </listitem> </itemizedlist> <para>You may write code to initialize the database state appropriately prior to start tests.</para> </question> <answer> <para><productname xlink:href="http://testng.org">TestNG</productname> may be directed by <classname>sda.jdbc.intro.v1.prepare.PersistenceHandlerTest</classname>.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> </section> <section xml:id="jdbcRead"> <title>Read Access</title> <para>So far we've sent records to a database server. Applications however need both directions: Pushing data to a Server and receiving data as well. The overall process looks like:</para> <figure xml:id="jdbcReadWrite"> <title>Server / client object's life cycle</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/jdbcReadWrite.fig"/> </imageobject> </mediaobject> </figure> <para>So far we've only covered the second (<code>UPDATE</code>) part of this picture. Reading objects from a database server into a client's (transient) address space requires a container object to hold the data in question. Though <link linkend="gloss_Java"><trademark>Java</trademark></link> offers standard container interfaces like <classname>java.util.List</classname> the <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> standard has created separate specifications like <classname>java.sql.ResultSet</classname>. Instances of <classname>java.sql.ResultSet</classname> will hold transient copies of (database) objects. The next figure outlines the basic approach:</para> <figure xml:id="figJdbcRead"> <title>Reading data from a database server.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/jdbcread.fig" scale="65"/> </imageobject> </mediaobject> </figure> <para>We take an example. Suppose our database contains a table of our friends' nicknames and their respective birth dates:</para> <table border="1" xml:id="figRelationFriends"> <caption>Names and birth dates of friends.</caption> <tr> <td><programlisting>CREATE TABLE Friends ( id INTEGER NOT NULL PRIMARY KEY ,nickname char(10) ,birthdate DATE );</programlisting></td> <td><programlisting>INSERT INTO Friends VALUES (1, 'Jim', '1991-10-10') ,(2, 'Eve', '2003-05-24') ,(3, 'Mick','2001-12-30') ;</programlisting></td> </tr> </table> <para>Following the outline in <xref linkend="figJdbcRead"/> we may access our data by:</para> <figure xml:id="listingJdbcRead"> <title>Accessing relational data</title> <programlisting language="java">package sda.jdbc.intro; ... public class SimpleRead { public static void main(String[] args) throws SQLException { // Step 1: Open a connection to the database server final Connection conn = DriverManager.getConnection ( DbProps.getString("PersistenceHandler.jdbcUrl"), DbProps.getString("PersistenceHandler.username"), DbProps.getString("PersistenceHandler.password")); // Step 2: Create a Statement instance final Statement stmt = conn.createStatement(); <emphasis role="bold">// Step 3: Creating the client side JDBC container holding our data records</emphasis> <emphasis role="bold">final ResultSet data = stmt.executeQuery("SELECT * FROM Friends");</emphasis> <co linkends="listingJdbcRead-1" xml:id="listingJdbcRead-1-co"/> <emphasis role="bold">// Step 4: Dataset iteration while (data.next()) {</emphasis> <co linkends="listingJdbcRead-2" xml:id="listingJdbcRead-2-co"/> <emphasis role="bold">System.out.println(data.getInt("id")</emphasis> <co linkends="listingJdbcRead-3" xml:id="listingJdbcRead-3-co"/> <emphasis role="bold">+ ", " + data.getString("nickname")</emphasis> <co linkends="listingJdbcRead-3" xml:id="listingJdbcRead-4-co"/> <emphasis role="bold">+ ", " + data.getString("birthdate"));</emphasis> <co linkends="listingJdbcRead-3" xml:id="listingJdbcRead-5-co"/> } } }</programlisting> </figure> <para>The marked code segment above shows difference with respect to our data insertion application <classname>sda.jdbc.intro.SimpleInsert</classname>. Some remarks are in order:</para> <calloutlist> <callout arearefs="listingJdbcRead-1-co" xml:id="listingJdbcRead-1"> <para>As being mentioned in the introduction to this section the <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> standard comes with its own container interface rather than <classname>java.util.List</classname> or similar.</para> </callout> <callout arearefs="listingJdbcRead-2-co" xml:id="listingJdbcRead-2"> <para>Calling <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#next()">next()</link> prior to actually accessing data on the client side is mandatory! The <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#next()">next()</link> method places the internal iterator to the first element of our dataset if not empty. Follow the link address and **read** the documentation.</para> </callout> <callout arearefs="listingJdbcRead-3-co listingJdbcRead-4-co listingJdbcRead-5-co" xml:id="listingJdbcRead-3"> <para>The access methods have to be chosen according to matching types. An overview of database/<link linkend="gloss_Java"><trademark>Java</trademark></link> type mappings is being given in <uri xlink:href="http://docs.oracle.com/javase/1.3/docs/guide/jdbc/getstart/mapping.html">http://docs.oracle.com/javase/1.3/docs/guide/jdbc/getstart/mapping.html</uri>.</para> </callout> </calloutlist> <qandaset role="exercise"> <title>Getter methods and type conversion</title> <qandadiv> <qandaentry> <question> <para>Apart from type mappings the <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> access methods like <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getString(int)">getString()</link> may also be used for type conversion. Modify <xref linkend="listingJdbcRead"/> by:</para> <itemizedlist> <listitem> <para>Read the database attribute <code>id</code> by <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getString(java.lang.String)">getString(String)</link>.</para> </listitem> <listitem> <para>Read the database attribute nickname by <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getInt(java.lang.String)">getInt(String)</link>.</para> </listitem> </itemizedlist> <para>What do you observe?</para> </question> <answer> <para>Modifying our iteration loop:</para> <programlisting>// Step 4: Dataset iteration while (data.next()) { System.out.println(data.<emphasis role="bold">getString</emphasis>("id") <co linkends="jdbcReadWrongType-1" xml:id="jdbcReadWrongType-1-co"/> + ", " + data.<emphasis role="bold">getInt</emphasis>("nickname") <co linkends="jdbcReadWrongType-2" xml:id="jdbcReadWrongType-2-co"/> + ", " + data.getString("birthdate")); }</programlisting> <para>We observe:</para> <calloutlist> <callout arearefs="jdbcReadWrongType-1-co" xml:id="jdbcReadWrongType-1"> <para>Calling <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getString(int)">getString()</link> for a database attribute of type INTEGER does not cause any trouble: The value gets silently converted to a string value.</para> </callout> <callout arearefs="jdbcReadWrongType-2-co" xml:id="jdbcReadWrongType-2"> <para>Calling <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getInt(java.lang.String)">getInt(String)</link> for the database field of type CHAR yields an (expected) Exception:</para> </callout> </calloutlist> <programlisting>Exception in thread "main" java.sql.SQLException: Invalid value for getInt() - 'Jim' at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073) ...</programlisting> <para>We may however provide <quote>compatible</quote> data records:</para> <programlisting>DELETE FROM Friends; INSERT INTO Friends VALUES (1, <emphasis role="bold">'31'</emphasis>, '1991-10-10');</programlisting> <para>This time our application executes perfectly well:</para> <programlisting>1, 31, 1991-10-10</programlisting> <para>Conclusion: The <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> driver performs a conversion from a string type to an integer similar like the <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/lang/Integer.html#parseInt(java.lang.String)">parseInt(String)</link> method.</para> <para>The next series of exercises aims on a more powerful implementation of our person data insertion application in <xref linkend="exerciseInsertLoginCredentials"/>.</para> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset role="exercise"> <title>Handling NULL values.</title> <qandadiv> <qandaentry> <question> <para>The attribute <code>birthday</code> in our database table Friends allows <code>NULL</code> values:</para> <programlisting>INSERT INTO Friends VALUES (1, 'Jim', '1991-10-10') ,(2, <emphasis role="bold"> NULL</emphasis>, '2003-5-24') ,(3, 'Mick', '2001-12-30');</programlisting> <para>Starting our current application yields:</para> <programlisting>1, Jim, 1991-10-10 2, null, 2003-05-24 3, Mick, 2001-12-30</programlisting> <para>This might be confuses with a person having the nickname <quote>null</quote>. Instead we would like to have:</para> <programlisting>1, Jim, 1991-10-10 2, -Name unknown- , 2003-05-24 3, Mick, 2001-12-30</programlisting> <para>Extend the current code of <classname>sda.jdbc.intro.SimpleRead</classname> to produce the above result in case of nickname <code>NULL</code> values.</para> <para>Hint: Read the documentation of <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#wasNull()">wasNull()</link>.</para> </question> <answer> <para>A possible implementation is being given in <classname>sda.jdbc.intro.v1.SimpleRead</classname>.</para> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset role="exercise"> <title>A user authentication <quote>strategy</quote></title> <qandadiv> <qandaentry xml:id="exerciseInsecureAuth"> <question> <para>Our current application for entering <code>Person</code> records lacks authentication: A user simply connects to the database using credentials being hard coded in a properties file. A programmer suggests to implement authentication based on the following extension of the <code>Person</code> table:</para> <programlisting>CREATE TABLE Person ( name char(80) NOT NULL ,email CHAR(20) NOT NULL UNIQUE ,login CHAR(10) UNIQUE -- login names must be unique -- ,password CHAR(20) );</programlisting> <para>On clicking <quote>Connect</quote> a user may enter his login name and password, <quote>fred</quote> and <quote>12345678</quote> in the following example:</para> <figure xml:id="figLogin"> <title>Login credentials for database connection</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/login.screen.png" scale="90"/> </imageobject> </mediaobject> </figure> <para>Based on these input values the following SQL query is being executed by a <classname>java.sql.Statement</classname> object:</para> <programlisting>SELECT * FROM Person WHERE login='<emphasis role="bold">fred</emphasis>' and password = '<emphasis role="bold">12345678</emphasis>'</programlisting> <para>Since the login attribute is UNIQUE we are sure to receive either 0 or 1 dataset. Our programmer proposes to grant login if the query returns at least one dataset.</para> <para>Discuss this implementation sketch with a colleague. Do you think this is a sensible approach? <emphasis role="bold">Write down</emphasis> your results.</para> </question> <answer> <para>The approach is essentially unusable due to severe security implications. Since it is based on <classname>java.sql.Statement</classname> rater than on <classname>java.sql.PreparedStatement</classname> objects it is vulnerable to SQL injection attacks. A user my enter the following password value in the GUI:</para> <programlisting>sd' OR '1' = '1</programlisting> <para>Based on the login name <quote>fred</quote> the following SQL string is being crafted:</para> <programlisting>SELECT * FROM Person WHERE login='fred' and password = 'sd' OR <emphasis role="bold">'1' = '1'</emphasis>;</programlisting> <para>Since the WHERE clause's last component always evaluates to true, all objects from the <code>Person</code> relation are returned thus permitting login.</para> <para>The implementation approach suffers from a second deficiency: The passwords are stored in clear text. If an attacker gains access to the <code>Person</code> table he'll immediately retrieve the passwords of all users. This problem can be solved by storing hash values of passwords rather than the clear text values themselves.</para> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset role="exercise" xml:id="passwordHashes"> <title>Passwords and hash values</title> <qandadiv> <qandaentry xml:id="exerciseHashTraining"> <question> <para>In exercise <xref linkend="exerciseInsecureAuth"/> we discarded the idea of clear text passwords in favour of password hashes. In order to avoid Rainbow cracking so called salted hashes are superior. You should read <uri xlink:href="https://www.heckrothindustries.co.uk/articles/an-introduction-to-password-hashes">https://www.heckrothindustries.co.uk/articles/an-introduction-to-password-hashes</uri> for overview purposes. The article contains further references on the bottom of the page.</para> <para>With respect to an implementation <uri xlink:href="http://stackoverflow.com/questions/2860943/suggestions-for-library-to-hash-passwords-in-java#11038230">http://stackoverflow.com/questions/2860943/suggestions-for-library-to-hash-passwords-in-java</uri> provides a simple example for:</para> <itemizedlist> <listitem> <para>Creating a salted hash from a given password string.</para> </listitem> <listitem> <para>Verify if a hash string matches a given clear text password.</para> </listitem> </itemizedlist> <para>The example uses an external library. On <productname xlink:href="http://www.ubuntu.com">Ubuntu</productname> Linux this may be installed by issuing <command>aptitude</command> <option>install</option> <option>libcommons-codec-java</option>. On successful install the file <filename>/usr/share/java/commons-codec-1.5.jar</filename> may be appended to your <envar>CLASSPATH</envar>.</para> <para>You may as well use <uri xlink:href="http://crackstation.net/hashing-security.htm#javasourcecode">http://crackstation.net/hashing-security.htm#javasourcecode</uri> as a starting point. This example works standalone without needing an external library. Note: Tis example produces different (incompatible) hash values.</para> <para>Create a simple main() method to experiment with the two class methods.</para> </question> <answer> <para>Starting from <uri xlink:href="http://stackoverflow.com/questions/2860943/suggestions-for-library-to-hash-passwords-in-java#11038230">http://stackoverflow.com/questions/2860943/suggestions-for-library-to-hash-passwords-in-java</uri> we create a slightly modified class <classname>sda.jdbc.intro.auth.HashProvider</classname> offering both hash providing <coref linkend="hashProviderMethod"/> and verifying <coref linkend="hashVerifyMethod"/> methods:</para> <programlisting language="java">package sda.jdbc.intro.auth; ... public class HashProvider { ... /** Computes a salted PBKDF2 hash of given plaintext password suitable for storing in a database. */ public static <emphasis role="bold">String getSaltedHash</emphasis> <co xml:id="hashProviderMethod"/>(char [] password) { byte[] salt; try { salt = SecureRandom.getInstance("SHA1PRNG").generateSeed(saltLen); // store the salt with the password return Base64.encodeBase64String(salt) + "$" + hash(password, salt); } catch (NoSuchAlgorithmException e) { e.printStackTrace(); } System.exit(1); return null; } /** Checks whether given plaintext password corresponds to a stored salted hash of the password. */ public static <emphasis role="bold">boolean check</emphasis> <co xml:id="hashVerifyMethod"/>(char[] password, String stored){ String[] saltAndPass = stored.split("\\$"); if (saltAndPass.length != 2) return false; String hashOfInput = hash(password, Base64.decodeBase64(saltAndPass[0])); return hashOfInput.equals(saltAndPass[1]); } ...}</programlisting> <para>We may test the two class methods <methodname>sda.jdbc.intro.auth.HashProvider.getSaltedHash(char[])</methodname>(...) and <methodname>sda.jdbc.intro.auth.HashProvider.check(char[],String)</methodname> by a separate driver class. Notice the <quote>$</quote> sign <coref linkend="saltPwhashSeparator"/> separating salt and password hash:</para> <programlisting language="java">package sda.jdbc.intro.auth; public class TestHashProvider { public static void main(String [] args) throws Exception { final char [] clearText = {'s', 'e', 'c'}; final String hash = <emphasis role="bold">HashProvider.getSaltedHash(clearText)</emphasis>; System.out.println("Hash:" + hash); if (HashProvider.check(clearText, <co xml:id="saltPwhashSeparator"/> "<emphasis role="bold">HwX2DkuYiwp7xogm3AGndza8DKRVvCMntxRvCrCGFPw=</emphasis>$<emphasis role="bold">6Ix11yHNB4uPZuF2IQYxVV/MYragJwTDE33OIFR9a24=</emphasis>")) { System.out.println("hash matches"); } else { System.out.println("hash does not match"); ...</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset role="exercise" xml:id="exercise_GuiEnterPersonAuth"> <title>Gui authentication: The real McCoy</title> <qandadiv> <qandaentry xml:id="exerciseInsertLoginCredentials"> <question> <para>We now implement a refined version to enter <code>Person</code> records based on the solutions of two related exercises:</para> <glosslist> <glossentry> <glossterm><xref linkend="exercisefilterUserInput"/></glossterm> <glossdef> <para>Avoiding SQL injection by sanitizing user input</para> </glossdef> </glossentry> <glossentry> <glossterm><xref linkend="exerciseSqlInjectPrepare"/></glossterm> <glossdef> <para>Avoiding SQL injection by using <classname>java.sql.PreparedStatement</classname> objects.</para> </glossdef> </glossentry> </glosslist> <para>A better solution should combine both techniques. Non-vulnerability a basic requirement. Checking an E-Mail for minimal conformance is an added value.</para> <para>In order to address authentication the relation Person has to be extended appropriately. The GUI needs two additional fields for login name and password as well. The following video demonstrates the intended behaviour:</para> <figure xml:id="videoConnectAuth"> <title>Intended usage behaviour for insertion of data records.</title> <mediaobject> <videoobject> <videodata fileref="Ref/Video/connectauth.mp4"/> </videoobject> </mediaobject> </figure> <para>Don't forget to use password hashes like those from <xref linkend="exerciseHashTraining"/>. Due to their length you may want to consider the data type <code>TEXT</code>.</para> </question> <answer> <para>In comparison to earlier versions it does make sense to add some internal container structures. First we note, that each GUI input field requires:</para> <itemizedlist> <listitem> <para>A label like <quote>Enter password</quote>.</para> </listitem> <listitem> <para>A corresponding field object to hold user entered input.</para> </listitem> <listitem> <para>A validator checking for correctness of entered data.</para> </listitem> <listitem> <para>A label or text field for warning messages in case of invalid user input.</para> </listitem> </itemizedlist> <para>First we start by grouping label <coref linkend="uiuLabel"/>, input field's verifier <coref linkend="uiuVerifier"/> and the error message label <coref linkend="uiuErrmsg"/> in <classname>sda.jdbc.intro.auth.UserInputUnit</classname>:</para> <programlisting>package sda.jdbc.intro.auth; ... public class UserInputUnit { final JLabel label; <co xml:id="uiuLabel"/> final InputVerifierNotify verifier; <co xml:id="uiuVerifier"/> final JLabel errorMessage; <co xml:id="uiuErrmsg"/> public UserInputUnit(final String guiText, final InputVerifierNotify verifier) { this.label = new JLabel(guiText); this.verifier = verifier; errorMessage = new JLabel(); } ...</programlisting> <para>The actual GUI text field is being defined <coref linkend="verfierGuiField"/> in class <classname>sda.jdbc.intro.auth.InputVerifierNotify</classname>:</para> <programlisting language="java">package sda.jdbc.intro.auth; ... public abstract class InputVerifierNotify extends InputVerifier { protected final String errorMessage; public final JLabel validationLabel; public final JTextField field; <co xml:id="verfierGuiField"/> public InputVerifierNotify(final JTextField field, final String errorMessage) { ...</programlisting> <para>We need two field verifier classes being derived from <classname>sda.jdbc.intro.auth.InputVerifierNotify</classname>:</para> <glosslist> <glossentry> <glossterm><classname>sda.jdbc.intro.auth.RegexpVerifier</classname></glossterm> <glossdef> <para>This one is well known from earlier versions and is used to validate text input fields by regular expressions.</para> </glossdef> </glossentry> <glossentry> <glossterm><classname>sda.jdbc.intro.auth.InputVerifierNotify</classname></glossterm> <glossdef> <para>This verifier class is responsible for comparing our two password fields to have identical values.</para> </glossdef> </glossentry> </glosslist> <para>All these components get assembled in <classname>sda.jdbc.intro.auth.InsertPerson</classname>. We remark some important points:</para> <programlisting>package sda.jdbc.intro.auth; ... public class InsertPerson extends JFrame { ... // GUI attributes for user input final UserInputUnit name = <co linkends="listingInsertUserAuth-1" xml:id="listingInsertUserAuth-1-co"/> new UserInputUnit( "Name", new RegexpVerifier(new JTextField(15), "^[^;'\"]+$", "No special characters allowed")); // We need a reference to the password field to avoid // casting from JTextField later. private final JPasswordField passwordField = new JPasswordField(10); <co linkends="listingInsertUserAuth-2" xml:id="listingInsertUserAuth-2-co"/> private final UserInputUnit password = new UserInputUnit( "Password", new RegexpVerifier(passwordField, "^.{6,20}$", "length from 6 to 20 characters")); ... private final UserInputUnit passwordRepeat = new UserInputUnit( "repeat pass.", new EqualValueVerifier <co linkends="listingInsertUserAuth-3" xml:id="listingInsertUserAuth-3-co"/> (new JPasswordField(10), passwordField, "Passwords do not match")); private final UserInputUnit [] userInputUnits = <co linkends="listingInsertUserAuth-4" xml:id="listingInsertUserAuth-4-co"/> {name, email, login, password, passwordRepeat}; ... private void userLoginDialog() {...} ... public InsertPerson (){ ... databaseFieldPanel.setLayout(new GridLayout(0, 3)); //Third column for validation label add(databaseFieldPanel); for (UserInputUnit unit: userInputUnits) { <co linkends="listingInsertUserAuth-5" xml:id="listingInsertUserAuth-5-co"/> databaseFieldPanel.add(unit.label); databaseFieldPanel.add(unit.verifier.field); databaseFieldPanel.add(unit.verifier.validationLabel); } insertButton.addActionListener(new ActionListener() { @Override public void actionPerformed(ActionEvent e) { if (inputValuesAllValid()) { if (persistenceHandler.add( <co linkends="listingInsertUserAuth-6" xml:id="listingInsertUserAuth-6-co"/> name.getText(), email.getText(), login.getText(), passwordField.getPassword())) { clearMask(); ...} private void clearMask() { <co linkends="listingInsertUserAuth-7" xml:id="listingInsertUserAuth-7-co"/> for (UserInputUnit unit: userInputUnits) { unit.verifier.field.setText(""); unit.verifier.clear(); } } private boolean inputValuesAllValid() {<co linkends="listingInsertUserAuth-8" xml:id="listingInsertUserAuth-8-co"/> for (UserInputUnit unit: userInputUnits) { if (!unit.verifier.verify(unit.verifier.field)){ return false; } } return true; } }</programlisting> <calloutlist> <callout arearefs="listingInsertUserAuth-1-co" xml:id="listingInsertUserAuth-1"> <para>All GUI related stuff for entering a user's name</para> </callout> <callout arearefs="listingInsertUserAuth-2-co" xml:id="listingInsertUserAuth-2"> <para>Password fields need special treatment: <code>getText()</code> is superseded by <code>getPassword()</code>. In order to avoid casts from <classname>javax.swing.JTextField</classname> to <classname>javax.swing.JPasswordField</classname> we simply keep an extra reference.</para> </callout> <callout arearefs="listingInsertUserAuth-3-co" xml:id="listingInsertUserAuth-3"> <para>In order to check both password fields for identical values we need a different validator <classname>sda.jdbc.intro.auth.EqualValueVerifier</classname> expecting both password fields in its constructor.</para> </callout> <callout arearefs="listingInsertUserAuth-4-co" xml:id="listingInsertUserAuth-4"> <para>All 5 user input elements get grouped by an array. This allows for iterations like in <coref linkend="listingInsertUserAuth-7-co"/> or <coref linkend="listingInsertUserAuth-8-co"/>.</para> </callout> <callout arearefs="listingInsertUserAuth-5-co" xml:id="listingInsertUserAuth-5"> <para>Adding all GUI elements to the base pane in a loop.</para> </callout> <callout arearefs="listingInsertUserAuth-6-co" xml:id="listingInsertUserAuth-6"> <para>Providing user entered values to the persistence provider.</para> </callout> <callout arearefs="listingInsertUserAuth-7-co" xml:id="listingInsertUserAuth-7"> <para>Whenever a dataset has been successfully sent to the database we have to clean our GUI to possibly enter another record.</para> </callout> <callout arearefs="listingInsertUserAuth-8-co" xml:id="listingInsertUserAuth-8"> <para>Thanks to our grouping aggregation of individual input GUI field validation states becomes easy.</para> </callout> </calloutlist> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset role="exercise"> <title>Architectural security considerations</title> <qandadiv> <qandaentry> <question> <para>In <xref linkend="exercise_GuiEnterPersonAuth"/> we achieved end user credential protection. How about the overall application security? Provide improvement proposals if appropriate.</para> </question> <answer> <para>Connecting the client to our database server solely depends on credentials <coref linkend="databaseUserHdmPassword"/> being stored in a properties file <filename>database.properties</filename>:</para> <programlisting>PersistenceHandler.jdbcUrl=jdbc:mysql://localhost:3306/hdm PersistenceHandler.username=hdmuser <co xml:id="databaseUserHdmUsername"/> PersistenceHandler.password=<emphasis role="bold">XYZ</emphasis> <co xml:id="databaseUserHdmPassword"/></programlisting> <para>This properties file is user accessible and contains the password in clear text. Arbitrary applications connecting to the database server using this account do have all permissions being granted to <code>hdmuser</code> <coref linkend="databaseUserHdmUsername"/>. In order for our application to work correctly the set of granted permissions contains at least inserting datasets. Thus new users e.g. <code>smith</code> including credentials may be inserted. Afterwards the original application can be started by logging in as <code>smith</code>.</para> <para>Conclusion: The current application architecture is seriously flawed with respect to security.</para> <para>Rather then using a common database account <code>hdmuser</code> we may configure per-user accounts on the database server having individual user credentials. This way user credentials are no longer stored in our <code>Person</code> table but are being managed by the database server's user management and privilege facilities. This completely avoids storing credentials on the client side.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> </section> </chapter> <chapter xml:id="chapUnitTesting"> <title>Unit testing with <productname xlink:href="http://testng.org">TestNG</productname></title> <para>This chapter presents a very short introduction to the basic usage of unit testing. We start with a simple stack implementation:</para> <programlisting language="java">package sda.unittesting; public class MyStack { int [] data = new int[5]; int numElements = 0; public void push(final int n) { data[numElements] = n; numElements++; } public int pop() { numElements--; return data[numElements]; } public int top() { return data[numElements - 1]; } public boolean empty() { return 0 == numElements; } }</programlisting> <para>Readers being familiar with stacks will immediately notice a deficiency in the above code: This stack is actually bounded. It only allows us to store a maximum number of five integer values.</para> <para>The following implementation allows us to functionally test our <classname>sda.unittesting.MyStack</classname> implementation with respect to the usual stack behaviour:</para> <programlisting language="java" linenumbering="numbered">package sda.unittesting; public class MyStackFuncTest { private static void assertTrue(boolean status) { if (!status) { throw new RuntimeException("Assert failed"); } } public static void main(String[] args) { final MyStack stack = new MyStack(); // Test 1: A new MyStack instance should not contain any elements. assertTrue(stack.empty()); // Test 2: Adding and removal stack.push(4); assertTrue (!stack.empty()); assertTrue (4 == stack.top()); assertTrue (4 == stack.pop()); assertTrue (stack.empty()); // Test 3: Trying to add more than five values stack.push(1);stack.push(2);stack.push(3);stack.push(4); stack.push(5); stack.push(6); assertTrue(6 == stack.pop()); } }</programlisting> <para>Execution yields a runtime exception which is due to the attempted insert operation <code>stack.push(6)</code>:</para> <programlisting>Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 5 at sda.unittesting.MyStack.push(MyStack.java:8) at sda.unittesting.MyStackFuncTest.main(MyStackFuncTest.java:20)</programlisting> <para>The execution result is easy to understand since our <classname>sda.unittesting.MyStack </classname> implementation only allows to store 5 values.</para> <para>Our testing application is fine so far. It does however lack some features:</para> <itemizedlist> <listitem> <para>automatic initialization before starting tests and finalization at the end.</para> </listitem> <listitem> <para>Our test is monolithic: We used comments to document different tests. This knowledge is implicit and thus invisible to testing frameworks. Test results (failure/success) cannot be assigned to test 1, test 2 for example.</para> </listitem> <listitem> <para>Aggregation and visualization of test results</para> </listitem> <listitem> <para>Dependencies between individual tests</para> </listitem> <listitem> <para>Ability to enable and disable tests according to a project's maturity level. In our example test 3 might be disabled till an unbounded implementation gets completed.</para> </listitem> </itemizedlist> <para>Testing frameworks like <productname xlink:href="http://junit.org">Junit</productname> or <productname xlink:href="http://testng.org">TestNG</productname> provide means for efficient and flexible test organization. Using <productname xlink:href="http://testng.org">TestNG</productname> our current test application including only test 1 and test 2 reads:</para> <programlisting language="java">package sda.unittesting; import org.testng.annotations.Test; public class MyStackTestSimple { final MyStack stack = new MyStack(); @Test public void empty() { assert(stack.empty()); } @Test public void pushPopEmpty() { assert (stack.empty()); stack.push(4); assert (!stack.empty()); assert (4 == stack.top()); assert (4 == stack.pop()); assert (stack.empty()); } }</programlisting> <para>We notice the absence of a <function>main()</function> method. Our testing framework uses the above code for test definitions. In contrast to our homebrew solution the individual tests are now defined in a machine readable fashion. This allows for sophisticated statistics. Executing inside <productname xlink:href="http://testng.org">TestNG</productname> produces the following results:</para> <programlisting>PASSED: empty PASSED: pushPopEmpty =============================================== Default test Tests run: 2, Failures: 0, Skips: 0 =============================================== =============================================== Default suite Total tests run: 2, Failures: 0, Skips: 0 ===============================================</programlisting> <para>Both tests run successfully. So why did we omit test 3 which is bound to fail? We now add it to the test suite:</para> <programlisting language="java">package sda.unittesting; ... public class MyStackTestSimple1 { ... @Test public void empty() { assert(stack.empty()); ... @Test public void push6() { stack.push(1); stack.push(2); stack.push(3); stack.push(4); stack.push(5); stack.push(6); assert (6 == stack.pop()); } ...</programlisting> <para>As expected test 3 fails. But the result shows test 2 failing as well:</para> <programlisting>PASSED: empty FAILED: push6 java.lang.ArrayIndexOutOfBoundsException: 5 at sda.unittesting.MyStack.push(MyStack.java:8) at sda.unittesting.MyStackTestSimple1.push6(MyStackTestSimple1.java:30) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ... FAILED: pushPopEmpty java.lang.AssertionError at sda.unittesting.MyStackTestSimple1.pushPopEmpty(MyStackTestSimple1.java:15) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ... =============================================== Default test Tests run: 3, Failures: 2, Skips: 0 ===============================================</programlisting> <para>This unexpected result is due to the execution order of the three individual tests. Within our class <classname>sda.unittesting.MyStackTestSimple1</classname> the three tests appear in the sequence test 1, test 2 and test 3. This however is just the order of source code. The testing framework will not infer any order and thus execute our three tests in <emphasis role="bold">arbitrary</emphasis> order. The execution log shows the actual order:</para> <orderedlist> <listitem> <para>Test <quote><code>empty</code></quote></para> </listitem> <listitem> <para>Test <quote><code>push6</code></quote></para> </listitem> <listitem> <para>Test <quote><code>pushPopEmpty</code></quote></para> </listitem> </orderedlist> <para>So the second test will raise an exception and leave the stack filled with the maximum possible five elements. Thus it is not empty and the <quote><code>pushPopEmpty</code></quote> test fails as well.</para> <para>If we want to avoid this type of errors we may:</para> <itemizedlist> <listitem> <para>Declare tests within separate (test class) definitions</para> </listitem> <listitem> <para>Define dependencies like test X can only be executed after test Y.</para> </listitem> </itemizedlist> <para>The <productname xlink:href="http://testng.org">TestNG</productname> framework offers a feature which allows the definition of test groups and dependencies between them. We use this feature to refine our test definition:</para> <programlisting language="java">package sda.unittesting; ... public class MyStackTest { ... @Test (<emphasis role="bold">groups = "basic"</emphasis>) public void empty() { assert(stack.empty()); } @Test (<emphasis role="bold">groups = "basic"</emphasis>) public void pushPopEmpty() { ... } @Test (<emphasis role="bold">dependsOnGroups = "basic"</emphasis>) public void push6() { ... }</programlisting> <para>The first two tests will now belong to the same test group <quote>basic</quote>. The <emphasis role="bold"><code>dependsOnGroups = "basic"</code></emphasis> declaration will guarantee that our <code>push6</code> test will be launched as the last one. So we get the expected result:</para> <programlisting>PASSED: empty PASSED: pushPopEmpty FAILED: push6 java.lang.ArrayIndexOutOfBoundsException: 5 at sda.unittesting.MyStack.push(MyStack.java:8) at sda.unittesting.MyStackTest.push6(MyStackTest.java:30) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ... =============================================== Default test Tests run: 3, Failures: 1, Skips: 0 ===============================================</programlisting> <para>In fact the order between the first two tests might be critical as well. The <quote><code>pushPopEmpty</code></quote> test leaves our stack in an empty state. If this is not the case reversing the execution order of <quote><code>pushPopEmpty</code></quote> and <quote><code>empty</code></quote> would cause an error as well.</para> <para>Programming <abbrev xlink:href="http://en.wikipedia.org/wiki/Integrated_development_environment">IDE</abbrev>s like eclipse provide elements for test result visualization. Our last test gets summarized as:</para> <screenshot> <info> <title><productname xlink:href="http://testng.org">TestNG</productname> result presentation in eclipse</title> </info> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/eclipseTestngResult.screen.png" scale="75"/> </imageobject> </mediaobject> </screenshot> <para>We can drill down from a result of type failure to its occurrence within the corresponding code.</para> </chapter> <chapter xml:id="fo"> <title>Generating printed output</title> <titleabbrev>Print</titleabbrev> <section xml:id="foIntro"> <title>Online and print versions</title> <titleabbrev>online / print</titleabbrev> <para>We already learned how to transform XML documents into HTML by means of a <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style sheet processor. In principle we may create printed output by using a HTML Browser's print function. However the result will not meet reasonable typographical standards. A list of commonly required features for printed output includes:</para> <variablelist> <varlistentry> <term>Line breaks</term> <listitem> <para>Text paragraphs have to be divided into lines. To achieve best results the processor must implement the hyphenation rules of the language in question in order to automatically hyphenate long words. This is especially important for text columns of limited width as appearing in newspapers.</para> </listitem> </varlistentry> <varlistentry> <term>Page breaks</term> <listitem> <para>Since printed pages are limited in height the content has to be broken into pages. This may be difficult to achieve:</para> <itemizedlist> <listitem> <para>Large images being indivisible may have to be deferred to the following page leaving large amounts of empty space.</para> </listitem> <listitem> <para>Long tables may have to be subdivided into smaller blocks. Thus it may be required to define sets of additional footers like <quote>to be continued on the next page</quote> and additional table headers containing column descriptions on subsequent pages.</para> </listitem> </itemizedlist> </listitem> </varlistentry> <varlistentry> <term>Page references</term> <listitem> <para>Document internal references via <link xlink:href="http://www.w3.org/TR/xml#id">ID</link> / <link xlink:href="http://www.w3.org/TR/xml#idref">IDREF</link> pairs may be represented as page references like <quote>see page 32</quote>.</para> </listitem> </varlistentry> <varlistentry> <term>Left and right pages</term> <listitem> <para>Books usually have a different layout for <quote>left</quote> and <quote>right</quote> pages. Page numbers usually appear on the left side of a <quote>left</quote> page and vice versa.</para> <para>Very often the head of each page contains additional information e.g. a chapter's name on each <quote>left</quote> page head and the actual section's name on each <quote>right</quote> page's head.</para> <para>In addition chapters usually start on a <quote>right</quote> page. Sometimes a chapter's starting page has special layout features e.g. a missing description in the page's head which will only be given on subsequent pages.</para> </listitem> </varlistentry> <varlistentry> <term>Footnotes</term> <listitem> <para>Footnotes have to be numbered on a per page basis and have to appear on the current page.</para> </listitem> </varlistentry> </variablelist> </section> <section xml:id="foStart"> <title>A simple <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> document</title> <titleabbrev>Simple <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev></titleabbrev> <para>A renderer for printed output from XML content also needs instructions how to format the different elements. A common way to define these formatting properties is by using <emphasis>Formatting Objects</emphasis> (<abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>) standard. <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> documents may be compared to HTML. A HTML document has to be rendered by a piece of software called a browser in order to be viewed as an image. Likewise <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> documents have to be rendered by a piece of software called a formatting objects processor which typically yields PostScript or PDF output. As a starting point we take a simple example:</para> <figure xml:id="foHelloWorld"> <title>The most simple <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> document</title> <programlisting><?xml version="1.0" encoding="utf-8"?> <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:layout-master-set> <!-- Define a simple page layout --> <fo:simple-page-master master-name="simplePageLayout" page-width="60mm" page-height="100mm"> <fo:region-body/> </fo:simple-page-master> </fo:layout-master-set> <!-- Print a set of pages using the previously defined layout --> <fo:page-sequence master-reference="simplePageLayout"> <fo:flow flow-name="xsl-region-body"> <emphasis role="bold"><fo:block>Hello, World ...</fo:block></emphasis> </fo:flow> </fo:page-sequence> </fo:root></programlisting> </figure> <para>PDF generation is initiated by executing a <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> processor. At the MI department the script <code>fo2pdf</code> invokes <orgname>RenderX</orgname>'s <productname xlink:href="http://www.renderx.com">xep</productname> processor:</para> <programlisting>fo2pdf -fo hello.fo -pdf hello.pdf</programlisting> <para>This creates a PDF file which may be printed or previewed by e.g. <productname xlink:href="http://www.adobe.com">Adobe</productname>'s acrobat reader or evince under Linux. For a list of command line options see <productname xlink:href="http://www.renderx.com/reference.html">xep's documentation</productname>.</para> </section> <section xml:id="layoutParam"> <title>Page layout</title> <para>The result from of our <quote>Hello, World ...</quote> code is not very impressive. In order to develop more elaborated examples we have to understand the underlying layout model being defined in a <link xlink:href="http://www.w3.org/TR/xsl/#fo_simple-page-master">fo:simple-page-master</link> element. First of all <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> allows to subdivide a physical page into different regions:</para> <figure xml:id="foRegionList"> <title>Regions being defined in a page.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/regions.fig"/> </imageobject> </mediaobject> </figure> <para>The most important area in this model is denoted by <link xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link>. Other regions like <link xlink:href="http://www.w3.org/TR/xsl/#fo_region-before">fo:region-before</link> are typically used as containers for meta information such as chapter headings and page numbering. We take a closer look to the <link xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link> area and supply an example of parameterization:</para> <figure xml:id="foParamRegBody"> <title>A complete <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> parameterizing of a physical page and the <link xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link>.</title> <programlisting><?xml version="1.0" encoding="utf-8"?> <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format" font-size="6pt"> <fo:layout-master-set> <co xml:id="programlisting_fobodyreg_masterset"/> <fo:simple-page-master master-name="<emphasis role="bold">simplePageLayout</emphasis>" <co xml:id="programlisting_fobodyreg_simplepagelayout"/> page-width = "50mm" page-height = "80mm" margin-top = "5mm" margin-bottom = "20mm" margin-left = "5mm" margin-right = "10mm"> <fo:region-body <co xml:id="programlisting_fobodyreg_regionbody"/> margin-top = "10mm" margin-bottom = "5mm" margin-left = "10mm" margin-right = "5mm"/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="<emphasis role="bold">simplePageLayout</emphasis>"> <co xml:id="programlisting_fobodyreg_pagesequence"/> <fo:flow flow-name="xsl-region-body"> <co xml:id="programlisting_fobodyreg_flow"/> <fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <co xml:id="programlisting_fobodyreg_block"/> <fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <coref linkend="programlisting_fobodyreg_block"/> <fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <coref linkend="programlisting_fobodyreg_block"/> <fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <coref linkend="programlisting_fobodyreg_block"/> </fo:flow> </fo:page-sequence> </fo:root></programlisting> </figure> <calloutlist> <callout arearefs="programlisting_fobodyreg_masterset"> <para>As the name suggests multiple layout definitions can appear here. In this example only one layout is defined.</para> </callout> <callout arearefs="programlisting_fobodyreg_simplepagelayout"> <para>Each layout definition carries a key attribute master-name being unique with respect to all defined layouts appearing in <emphasis>the</emphasis> <tag class="starttag">fo:layout-master-set</tag>. We may thus call it a <emphasis>primary key</emphasis> attribute. The current layout definition's key has the value <code>simplePageLayout</code>. The length specifications appearing here are visualized in <xref linkend="paramRegBodyVisul"/> and correspond to the white rectangle.</para> </callout> <callout arearefs="programlisting_fobodyreg_regionbody"> <para>Each layout definition <emphasis>must</emphasis> have a region body being the region in which the documents main text flow will appear. A layout definition <emphasis>may</emphasis> also define top, bottom and side regions as we will see <link linkend="paramHeadFoot">later</link>. The body region is shown with pink background in <xref linkend="paramRegBodyVisul"/>.</para> </callout> <callout arearefs="programlisting_fobodyreg_pagesequence"> <para>A <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> document may have multiple page sequences for example one per each chapter of a book. It <emphasis>must</emphasis> reference an <emphasis>existing</emphasis> layout definition via its <code>master-reference</code> attribute. So we may regard this attribute as a foreign key targeting the set of all defined layout definitions.</para> </callout> <callout arearefs="programlisting_fobodyreg_flow"> <para>A flow allows us to define in which region output shall appear. In the current example only one layout containing one region of type body definition being able to receive text output exists.</para> </callout> <callout arearefs="programlisting_fobodyreg_block"> <para>A <tag class="starttag">fo:block</tag> element may be compared to a paragraph element <tag class="starttag">p</tag> in HTML. The attribute <link xlink:href="http://www.w3.org/TR/xsl/#space-after">space-after</link>="2mm" adds a space of two mm after each <link xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link> container.</para> </callout> </calloutlist> <para>The result looks like:</para> <figure xml:id="paramRegBodyVisul"> <title>Parameterizing page- and region view port. All length dimensions are in mm.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/overlay.fig"/> </imageobject> </mediaobject> </figure> </section> <section xml:id="headFoot"> <title>Headers and footers</title> <titleabbrev>Header/footer</titleabbrev> <para>Referring to <xref linkend="foRegionList"/> we now want to add fixed headers and footers frequently being used for page numbers. In a textbook each page might have the actual chapter's name in its header. This name should not change as long as the text below <link xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link> still belongs to the same chapter. In <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> this is achieved by:</para> <itemizedlist> <listitem> <para>Encapsulating each chapter's content in a <link xlink:href="http://www.w3.org/TR/xsl/#fo_page-sequence">fo:page-sequence</link> of its own.</para> </listitem> <listitem> <para>Defining the desired header text below <link xlink:href="http://www.w3.org/TR/xsl/#fo_static-content">fo:static-content</link> in the area defined by <link xlink:href="http://www.w3.org/TR/xsl/#fo_region-before">fo:region-before</link>.</para> </listitem> </itemizedlist> <para>The notion <link xlink:href="http://www.w3.org/TR/xsl/#fo_static-content">fo:static-content</link> refers to the fact that the content is constant (static) within the given page sequence. The new version reads:</para> <figure xml:id="paramHeadFoot"> <title>Parameterizing header and footer.</title> <programlisting><?xml version="1.0" encoding="utf-8"?> <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format" font-size="6pt"> <fo:layout-master-set> <fo:simple-page-master master-name="simplePageLayout" page-width = "50mm" page-height = "80mm" margin-top = "5mm" margin-bottom = "20mm" margin-left = "5mm" margin-right = "10mm"> <fo:region-body margin-top = "10mm" margin-bottom = "5mm" <co xml:id="programlisting_head_foot_bodydef"/> margin-left = "10mm" margin-right = "5mm"/> <fo:region-before extent="5mm"/> <co xml:id="programlisting_head_foot_beforedef"/> <fo:region-after extent="5mm"/> <co xml:id="programlisting_head_foot_afterdef"/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="simplePageLayout"> <fo:static-content flow-name="xsl-region-before"> <co xml:id="programlisting_head_foot_beforeflow"/> <fo:block font-weight="bold" font-size="8pt">Headertext</fo:block> </fo:static-content> <fo:static-content flow-name="xsl-region-after"> <co xml:id="programlisting_head_foot_afterflow"/> <fo:block> <fo:page-number/> </fo:block> </fo:static-content> <fo:flow flow-name="xsl-region-body"> <fo:block space-after="8mm">Dumb text .. dumb text.</fo:block> <fo:block space-after="8mm">Dumb text .. dumb text.</fo:block> <fo:block space-after="8mm">More text .. more text.</fo:block> <fo:block space-after="8mm">More text .. more text.</fo:block> <fo:block space-after="8mm">More text .. more text.</fo:block> </fo:flow> </fo:page-sequence> </fo:root></programlisting> </figure> <calloutlist> <callout arearefs="programlisting_head_foot_bodydef"> <para>Defining the body region.</para> </callout> <callout arearefs="programlisting_head_foot_beforedef programlisting_head_foot_afterdef"> <para>Defining two regions at the top and bottom of each page. The <code>extent</code> attribute denotes the height of these regions. <emphasis>Caveat</emphasis>: The attribute <code>extent</code>'s value gets subtracted from the <code>margin-top</code> or <code>margin-bottom</code> value being defined in the corresponding <tag class="starttag">fo:region-body</tag> element. So if we consider for example the <tag>fo:region-before</tag> we have to obey:</para> <para>extent <= margin-top</para> <para>Otherwise we may not even see any output.</para> </callout> <callout arearefs="programlisting_head_foot_beforeflow"> <para>A <code>fo:static-content</code> denotes text portions which are decoupled from the <quote>usual</quote> text flow. For example as a book's chapter advances over multiple pages we expect the constant chapter's title to appear on top of each page. In the current example the static string <code>Headertext</code> will appear on each page's top for the whole <tag class="starttag">fo:page-sequence</tag> in which it is defined. Notice the <code>flow-name="xsl-region-after"</code> reference to the region being defined in <coref linkend="programlisting_head_foot_beforedef"/>.</para> </callout> <callout arearefs="programlisting_head_foot_afterflow"> <para>We do the same here for the page's footer. Instead of static text we output <tag>fo_page-number</tag> yielding the current page's number.</para> <para>This time <code>flow-name="xsl-region-after"</code> references the region definition in <coref linkend="programlisting_head_foot_afterdef"/>. Actually the attribute <code>flow-name</code> is restricted to the following five values corresponding to all possible region definitions within a layout:</para> <informaltable> <?dbhtml table-width="50%" ?> <?dbfo table-width="50%" ?> <tgroup cols="2"> <colspec align="left" colwidth="1*"/> <colspec align="left" colwidth="1*"/> <tbody> <row> <entry><tag class="starttag">fo:region-body</tag></entry> <entry>xsl-region-body</entry> </row> <row> <entry><tag class="starttag">fo:region-before</tag></entry> <entry>xsl-region-before</entry> </row> <row> <entry><tag class="starttag">fo:region-after</tag></entry> <entry>xsl-region-after</entry> </row> <row> <entry><tag class="starttag">fo:region-start</tag></entry> <entry>xsl-region-start</entry> </row> <row> <entry><tag class="starttag">fo:region-end</tag></entry> <entry>xsl-region-end</entry> </row> </tbody> </tgroup> </informaltable> </callout> </calloutlist> <para>This results in two pages with page numbers 1 and 2:</para> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/headfoot.fig"/> </imageobject> </mediaobject> <para>The free chapter from <xref linkend="bibHarold04"/> book contains additional information on extended <link xlink:href="http://www.cafeconleche.org/books/bible2/chapters/ch18.html#d1e2250">layout definitions</link>. The <orgname xlink:href="http://w3.org">W3C</orgname> as the holder of the FO standard defines the elements <link xlink:href="http://www.w3.org/TR/xsl/#fo_layout-master-set">fo:layout-master-set</link>, <link xlink:href="http://www.w3.org/TR/xsl/#fo_simple-page-master">fo:simple-page-master</link> and <link xlink:href="http://www.w3.org/TR/xsl/#fo_page-sequence">fo:page-sequence</link></para> </section> <section xml:id="foContainer"> <title>Important Objects</title> <section xml:id="fo_block"> <title><code>fo:block</code></title> <para>The FO standard borrows a lot from the CSS standard. Most formatting objects may have <link xlink:href="http://www.w3.org/TR/xsl/#section-N19349-Description-of-Property-Groups">CSS like properties</link> with similar semantics, some properties have been added. We take a <link xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link> container as an example:</para> <figure xml:id="blockInline"> <title>A <link xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link> with a <link xlink:href="http://www.w3.org/TR/xsl/#fo_inline">fo:inline</link> descendant.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/blockprop.fo.pdf"/> </imageobject> </mediaobject> <programlisting>... <fo:block font-weight='bold' border-bottom-style='dashed' border-style='solid' border='1mm'>A lot of attributes and <fo:inline background-color='black' color='white'>inverted</fo:inline> text.</fo:block> ...</programlisting> </figure> <para>The <link xlink:href="http://www.w3.org/TR/xsl/#fo_inline">fo:inline</link> descendant serves as a means to change the <quote>current</quote> property set. In HTML/CSS this may be achieved by using the <code>SPAN</code> tag:</para> <programlisting><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>Blocks/spans and CSS</title> </head> <body> <h1>Blocks/spans and CSS</h1> <p style="font-weight: bold; border: 1mm; border-style: solid; border-bottom-style: dashed;" >A lot of attributes and <span style="color: white;background-color: black;" >inverted</span> text.</p> </body> </html></programlisting> <para>Though being encapsulated in an attribute <code>class</code> we find a one-to-one correspondence between FO and CSS in this case. The HTML rendering works as expected.<mediaobject> <imageobject> <imagedata fileref="Ref/Screen/mozparaspancss.screen.png"/> </imageobject> </mediaobject>:</para> </section> <section xml:id="fo_list"> <title>Lists</title> <para>The easiest type of lists are unlabeled (itemized) lists as being expressed by the <code>UL</code>/<code>LI</code> tags in HTML. FO allows a much more detailed parametrization regarding indents and distances between labels and item content. Relevant elements are <link xlink:href="http://www.w3.org/TR/xsl/#fo_list-block">fo:list-block</link>, <link xlink:href="http://www.w3.org/TR/xsl/#fo_list-item">fo:list-item</link> and <link xlink:href="http://www.w3.org/TR/xsl/#fo_list-item-body">fo:list-item-body</link>. The drawback is a more complex setup for <quote>default</quote> lists:</para> <figure xml:id="listItemize"> <title>An itemized list and result.</title> <programlisting>... <fo:list-block provisional-distance-between-starts="2mm"> <fo:list-item> <fo:list-item-label end-indent="label-end()"> <fo:block>&#8226;</fo:block> </fo:list-item-label> <fo:list-item-body start-indent="body-start()"> <fo:block>Flowers</fo:block> </fo:list-item-body> </fo:list-item> <fo:list-item> <fo:list-item-label end-indent="label-end()"> <fo:block>&#8226;</fo:block> </fo:list-item-label> <fo:list-item-body start-indent="body-start()"> <fo:block>Animals</fo:block> </fo:list-item-body> </fo:list-item> </fo:list-block> ...</programlisting> <mediaobject> <imageobject> <imagedata align="left" fileref="Ref/Fig/itemize.fo.pdf"/> </imageobject> </mediaobject> </figure> <para>The result looks somewhat primitive in relation to the amount of source code it necessitates. The power of these constructs shows up when trying to format nested lists of possibly different types like enumerations or definition lists under the requirement of typographical excellence. More complex examples are presented in <link xlink:href="http://www.cafeconleche.org/books/bible2/chapters/ch18.html#d1e4979">Xmlbible book</link> of <xref linkend="bibHarold04"/>.</para> </section> <section xml:id="leaderRule"> <title>Leaders and rules</title> <titleabbrev>Leaders/rules</titleabbrev> <para>Sometimes adjustable horizontal space between two neighbouring objects has to be filled e.g. in a book's table of contents. The <link xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link> serves this purpose:</para> <figure xml:id="leaderToc"> <title>Two simulated entries in a table of contents.</title> <programlisting>... <fo:block text-align-last='justify'>Valid XML<fo:leader leader-pattern="dots"/> page 7</fo:block> <fo:block text-align-last='justify'>XSL <fo:leader leader-pattern='dots'/> page 42</fo:block> ...</programlisting> <mediaobject> <imageobject> <imagedata align="left" fileref="Ref/Fig/leader.fo.pdf"/> </imageobject> </mediaobject> </figure> <para>The attributes' value <link xlink:href="http://www.w3.org/TR/xsl/#text-align-last">text-align-last</link> = <code>'justify'</code> forces the <link xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link> to extend to the available width of the current <link xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link> area. The <link xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link> inserts the necessary amount of content of the specified type defined in in <link xlink:href="http://www.w3.org/TR/xsl/#leader-pattern">leader-pattern</link> to fill up the gap between its neighbouring components. This principle can be extended to multiple objects:</para> <figure xml:id="leaderMulti"> <title>Four entries separated by equal amounts of dotted space.</title> <programlisting><fo:block text-align-last='justify'>A<fo:leader leader-pattern="dots"/>B<fo:leader leader-pattern="dots"/>C<fo:leader leader-pattern="dots"/>D</fo:block></programlisting> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/leadermulti.fo.pdf"/> </imageobject> </mediaobject> </figure> <para>A <link xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link> may also be used to draw horizontal lines to separate objects. In this case there are no neighbouring components within the <quote>current</quote> line in which the <link xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link> appears. This is frequently used to draw a border between <code>xsl-region-body</code> and <code>xsl-region-before</code> and/or <code>xsl-region-after</code>:</para> <figure xml:id="leaderSeparate"> <title>A horizontal line separator between header and body of a page.</title> <programlisting>... <fo:page-sequence master-reference="simplePageLayout"> <fo:static-content flow-name="xsl-region-before"> <fo:block text-align-last='justify'>FO<fo:leader/>page 5</fo:block> <fo:block text-align-last='justify'> <fo:leader leader-pattern="rule" leader-length="100%"/> </fo:block> </fo:static-content> <fo:flow flow-name="xsl-region-body"> <fo:block>Some body text ...</fo:block> </fo:flow> </fo:page-sequence>...</programlisting> <mediaobject> <imageobject> <imagedata align="left" fileref="Ref/Fig/separate.fo.pdf"/> </imageobject> </mediaobject> </figure> <para>Note the empty leader <code><</code> <link xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link> <code>/></code> between the <quote> <code>FO</code> </quote> and the <quote>page 5</quote> text node inserting horizontal whitespace to get the page number centered to the header's right edge. This is in accordance with the <link xlink:href="http://www.w3.org/TR/xsl/#leader-pattern">leader-pattern</link> attributes default value <code>space</code>.</para> </section> <section xml:id="pageNumbering"> <title>Page numbers</title> <para>We already saw an example of page numbering via <link xlink:href="http://www.w3.org/TR/xsl/#fo_page-number">fo:page-number</link> in <xref linkend="paramHeadFoot"/>. Sometimes a different style for page numbering is desired. The default page numbering style may be changed by means of the <link xlink:href="http://www.w3.org/TR/xsl/#fo_page-sequence">fo:page-sequence</link> element's attribute <link xlink:href="http://www.w3.org/TR/xsl/#format">format</link>. For a closer explanation the <link xlink:href="http://www.w3.org/TR/2007/REC-xslt20-20070123/#convert">W3X XSLT standards documentation</link> may be consulted:</para> <figure xml:id="pageNumberingRoman"> <title>Roman style page numbers.</title> <programlisting>... <fo:page-sequence format="i" master-reference="simplePageLayout"> <fo:static-content flow-name="xsl-region-after"> <fo:block text-align-last='justify'> <fo:leader leader-pattern="rule" leader-length="100%"/> </fo:block> <fo:block font-weight="bold"> <fo:page-number/> </fo:block> </fo:static-content> <fo:flow flow-name="xsl-region-body"> <fo:block>Some text...</fo:block> <fo:block>More text, more text, more text.</fo:block> <fo:block>More text, more text, more text.</fo:block> <fo:block>Enough text.</fo:block> </fo:flow> </fo:page-sequence> ...</programlisting> <mediaobject> <imageobject> <imagedata align="left" fileref="Ref/Fig/pageStack.fig"/> </imageobject> </mediaobject> </figure> </section> <section xml:id="foMarker"> <title>Marker</title> <figure xml:id="dictionary"> <title>A dictionary with running page headers.</title> <programlisting>... <fo:page-sequence master-reference="simplePageLayout"> <fo:static-content flow-name="xsl-region-before"> <fo:block font-weight="bold"> <fo:retrieve-marker retrieve-class-name="alpha" retrieve-position="first-starting-within-page" />-<fo:retrieve-marker retrieve-position="last-starting-within-page" retrieve-class-name="alpha"/> </fo:block> <fo:block text-align-last='justify'> <fo:leader leader-pattern="rule" leader-length="100%"/></fo:block> </fo:static-content> <fo:flow flow-name="xsl-region-body"> <fo:block> <fo:marker marker-class-name="alpha">A </fo:marker>Ant</fo:block> <fo:block> <fo:marker marker-class-name="alpha">B </fo:marker>Bug</fo:block> <fo:block> <fo:marker marker-class-name="alpha">L </fo:marker>Lion</fo:block> <fo:block> <fo:marker marker-class-name="alpha">N </fo:marker>Nose</fo:block> <fo:block> <fo:marker marker-class-name="alpha">P </fo:marker>Peg</fo:block> </fo:flow> </fo:page-sequence> ...</programlisting> <mediaobject> <imageobject> <imagedata align="left" fileref="Ref/Fig/dictionaryStack.fig"/> </imageobject> </mediaobject> </figure> </section> <section xml:id="foIntRef"> <title>Internal references</title> <titleabbrev>References</titleabbrev> <para>Regarding printed documents we may define two categories of document internal references:</para> <variablelist> <varlistentry> <term><emphasis>Page number references</emphasis></term> <listitem> <para>This is the <quote>classical</quote> type of a reference e.g. in books. An author refers the reader to a distant location by writing <quote>... see further explanation in section 4.5 on page 234</quote>. A book's table of contents assigning page numbers to topics is another example. This way the implementation of a reference relies solely on the features a printed document offers.</para> </listitem> </varlistentry> <varlistentry> <term><emphasis>Hypertext references</emphasis></term> <listitem> <para>This way of implementing references utilizes features of (online) viewers for printable documents. For example PDF viewers like <productname xlink:href="http://www.adobe.com">Adobe's Acrobat reader</productname> or the evince application are able to follow hypertext links in a fashion known from HTML browsers. This browser feature is based on hypertext capabilities defined in the Adobe's PDF de-facto standard.</para> </listitem> </varlistentry> </variablelist> <para>Of course the second type of references is limited to people who use an online viewer application instead of reading a document from physical paper.</para> <para>We now show the implementation of <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> based page references. As already being discussed for <link xlink:href="http://www.w3.org/TR/xml#id">ID</link> / <link xlink:href="http://www.w3.org/TR/xml#idref">IDREF</link> pairs we need a link destination (anchor) and a link source. The <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> standard uses the same anchor implementation as in XML for <link xlink:href="http://www.w3.org/TR/xml#id">ID</link> typed attributes: <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> objects <emphasis>may</emphasis> have an attribute <link xlink:href="http://www.w3.org/TR/xsl/#id">id</link> with a document wide unique value. The <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> element <link xlink:href="http://www.w3.org/TR/xsl/#fo_page-number-citation">fo:page-number-citation</link> is used to actually create a page reference via its attribute <link xlink:href="http://www.w3.org/TR/xsl/#ref-id">ref-id</link>:</para> <figure xml:id="refJavaXml"> <title>Two blocks mutual page referencing each other.</title> <programlisting>... <fo:flow flow-name='xsl-region-body'> <fo:block id='xml'>Java section see page <fo:page-number-citation ref-id='java'/>. </fo:block> <fo:block id='java'>XML section see page <fo:page-number-citation ref-id='xml'/>. </fo:block> </fo:flow> ...</programlisting> <mediaobject> <imageobject> <imagedata align="left" fileref="Ref/Fig/pagerefStack.fig"/> </imageobject> </mediaobject> </figure> <para>NB: Be careful defining <link xlink:href="http://www.w3.org/TR/xsl/#id">id</link> attributes for objects being descendants of <link xlink:href="http://www.w3.org/TR/xsl/#fo_static-content">fo:static-content</link> nodes. Such objects typically appear on multiple pages and are therefore no unique anchors. A reference carrying such an id value thus actually refers to 1 <= n values on n different pages. Typically a user agent will choose the first object of this set when clicking the link. So in effect the parent <link xlink:href="http://www.w3.org/TR/xsl/#fo_page-sequence">fo:page-sequence</link> is chosen as the effective link target.</para> <para>The element <link xlink:href="http://www.w3.org/TR/xsl/#fo_basic-link">fo:basic-link</link> creates PDF hypertext links. We extend the previous example:</para> <figure xml:id="refJavaXmlHyper"> <title>Two blocks with mutual page- and hypertext references.</title> <programlisting><fo:flow flow-name='xsl-region-body'> <fo:block id='xml'>Java section see <fo:basic-link color="blue" internal-destination="java">page<fo:page-number-citation ref-id='java'/>.</fo:basic-link></fo:block> <fo:block id='java'>XML section see <fo:basic-link color="blue" internal-destination="xml">page <fo:page-number-citation ref-id='xml'/>.</fo:basic-link></fo:block > </fo:flow></programlisting> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/pagerefhyperStack.fig"/> </imageobject> </mediaobject> </figure> </section> <section xml:id="pdfBookmarks"> <title>PDF bookmarks</title> <titleabbrev>Bookmarks</titleabbrev> <para>The PDF specification allows to define so called bookmarks offering an explorer like navigation:</para> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/pdfbookmarks.screen.png"/> </imageobject> </mediaobject> <para>PDF bookmarks are <link xlink:href="http://www.w3.org/TR/2006/REC-xsl11-20061205/#d0e14206">part of the XSL-FO 1.1</link> Standard. Some <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> processors still continue to use proprietary solutions for bookmark creation with respect to the older <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> 1.0 standard. For details of bookmark extensions by <orgname>RenderX</orgname>'s processor see <link xlink:href="http://www.renderx.com/tutorial.html#PDF_Bookmarks">xep's documentation</link>.</para> </section> </section> <section xml:id="xml2fo"> <title>Constructing <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> from XML documents</title> <titleabbrev><abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> from XML</titleabbrev> <para>So far we have learnt some basic <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> elements. As with HTML we typically generate FO code from other sources rather than crafting it by hand. The general picture is:</para> <figure xml:id="htmlFoProduction"> <title>Different target formats from common source.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/crossmedia.fig" scale="65"/> </imageobject> <caption> <para>We may generate both online and printed documentation from a common source. This requires style sheets for the desired destination formats in question.</para> </caption> </mediaobject> </figure> <para>We discussed the <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> standard as an input format for printable output production by a renderer. In this way a <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> document is similar to HTML being a format to be rendered by a web browser for visual (screen oriented) output production. The transformation from a XML source (e.g. a memo document) to <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> is still missing. As for HTML we may use <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> as a transformation means. We generate the sender's surname from a memo document instance:</para> <figure xml:id="memo2fosurname"> <title>Generating a sender's surname for printing.</title> <programlisting><?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes"/> <xsl:template match="/"> <fo:root> <fo:layout-master-set> <fo:simple-page-master master-name="simplePageLayout" page-width="294mm" page-height="210mm" margin="5mm"> <fo:region-body margin="15mm"/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="simplePageLayout"> <fo:flow flow-name="xsl-region-body"> <fo:block font-size="20pt"> <xsl:text>Sender:</xsl:text> <fo:inline font-weight='bold'> <xsl:value-of select="memo/from/surname"/> </fo:inline> </fo:block> </fo:flow> </fo:page-sequence> </fo:root> </xsl:template> </xsl:stylesheet></programlisting> </figure> <para>A suitable XML document instance reads:</para> <figure xml:id="memoMessage"> <title>A <code>memo</code> document instance.</title> <programlisting><?xml version="1.0" ?> <!DOCTYPE memo SYSTEM "memo.dtd"> <memo> <from> <name>Martin</name> <surname>Goik</surname> </from> <to> <name>Adam</name> <surname>Hacker</surname> </to> <to> <name>Eve</name> <surname>Intruder</surname> </to> <date year="2005" month="1" day="6"/> <subject>Firewall problems</subject> <content> <para>Thanks for your excellent work.</para> <para>Our firewall is definitely broken!</para> </content> </memo></programlisting> </figure> <para>Some remarks:</para> <orderedlist> <listitem> <para>The <link xlink:href="http://www.w3.org/TR/2007/REC-xslt20-20070123/#element-stylesheet">xsl_stylesheet</link> element contains a namespace definition for the target FO document's namespace, namely:</para> <programlisting>xmlns:xsl="http://www.w3.org/1999/XSL/Transform"</programlisting> <para>This is required to use elements like <link xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link> belonging to the FO namespace.</para> </listitem> <listitem> <para>The option value <code>indent="yes"</code> in <link xlink:href="http://www.w3.org/TR/2007/REC-xslt20-20070123/#element-output">xsl_output</link> is usually set to "no" in a production environment to avoid whitespace related problems.</para> </listitem> <listitem> <para>The generation of a print format like PDF is actually a two step process. To generate message.pdf from message.xml by a stylesheet memo2fo.xsl we need the following calls:</para> <variablelist> <varlistentry> <term><emphasis>XML document instance to FO</emphasis></term> <listitem> <programlisting>xml2xml message.xml memo2fo.xsl -o message.fo</programlisting> </listitem> </varlistentry> <varlistentry> <term><emphasis>FO to PDF</emphasis></term> <listitem> <programlisting>fo2pdf -fo message.fo -pdf message.pdf</programlisting> </listitem> </varlistentry> </variablelist> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/xml2fo2pdf.fig"/> </imageobject> </mediaobject> <para>When debugging of the intermediate <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> file is not required both steps may be combined into a single call:</para> <programlisting>fo2pdf -xml message.xml -xsl memo2fo.xsl -pdf message.pdf</programlisting> </listitem> </orderedlist> </section> <section xml:id="foCatalog"> <title>Formatting a catalog.</title> <titleabbrev>A catalog</titleabbrev> <para>We now take the <link linkend="climbingCatalog">climbing catalog example</link> with prices being added and incrementally create a series of PDF versions improving from one version to another.</para> <qandaset role="exercise"> <title>A first PDF version of the catalog</title> <qandadiv> <qandaentry xml:id="idCatalogStart"> <question> <para>Write a <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> script to generate a starting version <filename xlink:href="Ref/src/Dom/climbenriched.start.pdf">climbenriched.start.pdf</filename>.</para> </question> <answer> <programlisting><?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes"/> <xsl:template match="/"> <fo:root font-size="10pt"> <fo:layout-master-set> <fo:simple-page-master master-name="productPage" page-width="80mm" page-height="110mm" margin="5mm"> <fo:region-body margin="15mm"/> <fo:region-before extent="10mm"/> </fo:simple-page-master> </fo:layout-master-set> <xsl:apply-templates select="catalog/product" /> </fo:root> </xsl:template> <xsl:template match="product"> <fo:page-sequence master-reference="productPage"> <fo:static-content flow-name="xsl-region-before"> <fo:block font-weight="bold"> <xsl:value-of select="title"/> </fo:block> </fo:static-content> <fo:flow flow-name="xsl-region-body"> <xsl:apply-templates select="description/para"/> <fo:block>Price:<xsl:value-of select="@price"/></fo:block> <fo:block>Order no:<xsl:value-of select="@id"/></fo:block> </fo:flow> </fo:page-sequence> </xsl:template> <xsl:template match="para"> <fo:block space-after="10px"> <xsl:value-of select="."/> </fo:block> </xsl:template> </xsl:stylesheet></programlisting> </answer> </qandaentry> <qandaentry xml:id="idCatalogProduct"> <question> <label>Header, page numbers and table formatting</label> <para>Extend <xref linkend="idCatalogStart"/> by adding page numbers. The order number and prices shall be formatted as tables. Add a ruler to each page's head. The result should look like <filename xlink:href="Ref/src/Dom/climbenriched.product.pdf">climbenriched.product.pdf</filename></para> </question> <answer> <para>Solution see <filename xlink:href="Ref/src/Dom/catalog2fo.product.xsl">catalog2fo.product.xsl</filename>.</para> </answer> </qandaentry> <qandaentry xml:id="idCatalogToc"> <question> <label>A table of contents.</label> <para>Each product description's page number shall appear in a table of contents together with the product's <code>title</code> as in <filename xlink:href="Ref/src/Dom/climbenriched.toc.pdf">climbenriched.toc.pdf</filename>.</para> </question> <answer> <para>Solution see <filename xlink:href="Ref/src/Dom/catalog2fo.toc.xsl">catalog2fo.toc.xsl</filename>.</para> </answer> </qandaentry> <qandaentry xml:id="idCatalogToclink"> <question> <label>A table of contents with hypertext links.</label> <para>The table of contents' entries may offer hypertext features to supporting browsers as in <filename xlink:href="Ref/src/Dom/climbenriched.toclink.pdf">climbenriched.toclink.pdf</filename>. In addition include the document's <tag class="starttag">introduction</tag>.</para> </question> <answer> <para>Solution see <filename xlink:href="Ref/src/Dom/catalog2fo.toclink.xsl">catalog2fo.toclink.xsl</filename>.</para> </answer> </qandaentry> <qandaentry xml:id="idCatalogFinal"> <question> <label>A final version.</label> <para>Add the following features:</para> <orderedlist> <listitem> <para>Number the table of contents starting with page i, ii, iii, iv and so on. Start the product descriptions with page 1. On each page's footer a text <quote>page xx of yy</quote> shall be displayed. This requires the definition of an anchor <code>id</code> on the <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> document's last page.</para> </listitem> <listitem> <para>Add PDF bookmarks by using <orgname>XEP</orgname>'s <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> extensions. This requires the namespace declaration <code>xmlns:rx="http://www.renderx.com/XSL/Extensions"</code> in the XSLT script's header.</para> </listitem> </orderedlist> <para>The result may look like <filename xlink:href="Ref/src/Dom/climbenriched.final.pdf">climbenriched.final.pdf</filename>. N.B.: It may take some effort to achieve this result. This effort is left to the <emphasis>interested</emphasis> participants.</para> </question> <answer> <para>Solution see <filename xlink:href="Ref/src/Dom/catalog2fo.toclink.xsl">catalog2fo.toclink.xsl</filename>.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> </chapter> <chapter xml:id="chapter_entities"> <title>Entities</title> <para>Entities target the <emphasis>physical</emphasis> structure of <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>s and document instances. Both <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>s and XML document instances may be <emphasis>physically</emphasis> composed of smaller pieces:</para> <itemizedlist> <listitem> <para><abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>s often reuse standard components. For example many <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>s adopted the HTML table model. Entities offer an elegant way to include such building blocks into other <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>s.</para> </listitem> <listitem> <para>A book may <emphasis>logically</emphasis> consist of 10 chapters. We may use entities to represent a book by a single master document plus 10 separate XML documents representing each chapter.</para> </listitem> </itemizedlist> <para>In correspondence with these two examples we first note that two different types of entities exist:</para> <glosslist> <glossentry> <glossterm>Parameter entities</glossterm> <glossdef> <para>May only be used within <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>s but not in document instances.</para> </glossdef> </glossentry> <glossentry> <glossterm>General entities</glossterm> <glossdef> <para>May be used both in <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>s and in document instances.</para> </glossdef> </glossentry> </glosslist> <para>Both types of entities exist in two flavors <quote>Internal</quote> and <quote>external</quote> depending on whether they are defined within a document itself or in an external document being referenced.</para> <section xml:id="section_parameterentity"> <title xml:id="section_parameterentities">Parameter entities</title> <para>We consider the following DTD:</para> <figure xml:id="figure_nonmodular_doc"> <title>A DTD <filename>doc.dtd</filename> describing document instances consisting of paragraphs and figures</title> <programlisting><!ELEMENT doc (para|figure)* <co xml:id="programlisting_figure1_doc"/>> <!ELEMENT para (#PCDATA) > <!ELEMENT figure (caption, image) <co xml:id="programlisting_figure1_figure"/>> <!ELEMENT caption (#PCDATA) <co xml:id="programlisting_figure1_caption"/>> <!ELEMENT image EMPTY > <!ATTLIST image src CDATA #REQUIRED <co xml:id="programlisting_figure1_image_src"/>></programlisting> </figure> <calloutlist> <callout arearefs="programlisting_figure1_doc"> <para>A document consists of an arbitrary sequence of paragraphs and figures.</para> </callout> <callout arearefs="programlisting_figure1_figure"> <para>A figure has a caption describing the image's content and an <tag class="starttag">image</tag> node. The formatting expectation may be defined as an image with a caption being placed below.</para> </callout> <callout arearefs="programlisting_figure1_caption"> <para>A textual description of the corresponding image.</para> </callout> <callout arearefs="programlisting_figure1_image_src"> <para>The attribute <tag class="attribute">src</tag> contains an URI to image data.</para> </callout> </calloutlist> <para>An <filename>example.xml</filename> document instance looks like:</para> <programlisting><!DOCTYPE doc SYSTEM "doc.dtd"> <doc> <para>A paragraph</para> <figure> <caption>A nice image</caption> <image src="image.png"/> </figure> </doc></programlisting> <para>In a <quote>real</quote> DTD a <tag class="element">figure</tag> element will have more complexity. An author of a different DTD describing a fashion catalog may want to reuse the <tag class="element">figure</tag> element as a component. This may be achieved by moving all <tag class="element">figure</tag> related definitions into a separate file <filename>figure.mod</filename>:</para> <figure xml:id="figureEntityDef"> <title>The <tag class="element">figure</tag> element implemented in an independent DTD module <filename>figure.mod</filename></title> <programlisting><!ELEMENT figure (caption, image) > <!ELEMENT caption (#PCDATA) > <!ELEMENT image EMPTY > <!ATTLIST image src CDATA #REQUIRED ></programlisting> </figure> <para>Now we may include this module in a master DTD:</para> <figure xml:id="figure_doc_master"> <title>The master DTD which includes the <code>figure.mod</code> module</title> <programlisting><!ENTITY % <co xml:id="figure_doc_master_pentity"/>figure.mod <co xml:id="figure_doc_master_identifier"/>SYSTEM <co xml:id="figure_doc_master_keyword_system"/>"figure.mod" <co xml:id="figure_doc_master_entity_filename"/>> %figure.mod; <co xml:id="figure_doc_master_include"/> <!ELEMENT doc (para|figure)* > <!ELEMENT para (#PCDATA) ></programlisting> <calloutlist> <callout arearefs="figure_doc_master_pentity"> <para>The percent sign <quote>%</quote> defines the following identifier to be a <emphasis>parameter</emphasis> entity. Without this character it would define a <link linkend="section_generalentities">general</link> entity.</para> </callout> <callout arearefs="figure_doc_master_identifier"> <para>The entity to be defined will be represented by the local identifier <code>figure.mod</code>.<filename/></para> </callout> <callout arearefs="figure_doc_master_keyword_system"> <para>The <code>SYSTEM</code> keyword states that the following content is a reference to an <emphasis>external</emphasis> object.</para> </callout> <callout arearefs="figure_doc_master_entity_filename"> <para><filename>figure.mod</filename> is just the filename of a DTD module containing all definitions of the <tag class="element">figure</tag> element.</para> </callout> <callout arearefs="figure_doc_master_include"> <para>The variable <code>figure.mod</code> represents parameter entity definitions. We have to <emphasis>include</emphasis> them to the current DTD in order to make them part of it. In C/C++ the term <code>%figure.mod;</code> would read <code>#include "figure.mod"</code>.</para> </callout> </calloutlist> </figure> <para>This file functions as a complete replacement for the non modular DTD presented at the <link linkend="figure_nonmodular_doc">beginning</link>. This way <filename>figure.mod</filename> acts as a <quote>building block</quote> that may be reused in other <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>'s as well. We note that using an entity in a XML DTD is a two step process:</para> <itemizedlist> <listitem> <para>Declaration of an entity.</para> </listitem> <listitem> <para><quote>Use</quote> of a declared entity.</para> </listitem> </itemizedlist> <para>Many programming languages combine these two steps into one. Examples are:</para> <glosslist> <glossentry> <glossterm>C/C++:</glossterm> <glossdef> <para><code>#include "stdio.h"</code></para> </glossdef> </glossentry> <glossentry> <glossterm><link linkend="gloss_Java"><trademark>Java</trademark></link>:</glossterm> <glossdef> <para><code>import de.hdm-stuttgart.xml;</code></para> </glossdef> </glossentry> </glosslist> <para>On the other hand there are similarities concerning the way entities are handled. If we take C/C++ as an example we observe the following situation: A compiler reads a <quote>master</quote> file and includes (possibly recursively) sets of other files. This part of the compilation process is carried out by a separate software called a preprocessor which may be invoked independently. As an example we take a <quote>master</quote> file <filename>main.c</filename> written in the programming language C:</para> <programlisting language="c">/* no #include <stdio.h> for simplicity */ #include "maximum.h" void main(char **args){ printf("The maximum of %d and %d is %d", 3, 5, <emphasis role="bold">max(3,5)</emphasis>); }</programlisting> <para>The referenced file <filename>maximum.h</filename> being included contains a single line defining the macro <code>max(...)</code> appearing in the <code>printf</code> statement:</para> <programlisting language="c">#define <emphasis role="bold">max(a, b)</emphasis> ( (a)>(b) ? (a) : (b) )</programlisting> <para>Despite some warning messages we may compile and execute <code>main.c</code>:</para> <programlisting><computeroutput>[goik@mupter ~]$ cc -o main main.c ... warnings omitted ... [goik@mupter ~]$ ./main The maximum of 3 and 5 is 5</computeroutput></programlisting> <para>Now we may also execute the C preprocessor separately:</para> <programlisting>[goik@mupter ~]$ cpp -P main.c void main(char **args){ printf("The maximum of %d and %d is %d", 3, 5, <emphasis role="bold">( (3)>(5) ? (3) : (5) )</emphasis>); }</programlisting> <para>We observe that the preprocessor has resolved the dependency from <filename>main.c</filename> to <filename>maximum.h</filename> by in line replacing the macro call <code>max(3,5)</code> into <code>( (3)>(5) ? (3) : (5) )</code>. This output is then read by the <quote>real</quote> compiler to create an executable binary file <code>main</code>.</para> <figure xml:id="cppCompilerTwoStep"> <title>Two processing steps building an executable from a C file</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/cpp.fig"/> </imageobject> </mediaobject> </figure> <para>A XML parser validating a document will do the same both regarding the document instance itself and any entities which have to be resolved. The first step before any real parsing is executed by the <emphasis>entity resolver</emphasis> which can be compared to a C Preprocessor. We reconsider our figure DTD example:</para> <figure xml:id="entityResolv"> <title>The entity resolving process. The dashed arrows show <code>SYSTEM</code> references to external entities.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/entityresolve.fig"/> </imageobject> </mediaobject> </figure> <para>The actual XML validating parser will examine the output <filename>resolve.xml from the entity resolver</filename>.</para> <para>As we noted in the introduction to this chapter entities may also be of type internal. This means they are defined within a document itself rather than residing in an external object. We consider the following example:</para> <programlisting><!ENTITY % <emphasis role="bold">url</emphasis> "CDATA" <co xml:id="programlisting_internparam_urlent"/>> <!ELEMENT doc (para|figure)* > <!ELEMENT para (#PCDATA) > <!ELEMENT figure (caption, image) > <!ELEMENT caption (#PCDATA) > <!ELEMENT image EMPTY > <!ATTLIST image src %<emphasis role="bold">url</emphasis>;<co xml:id="programlisting_internparam_urluse"/> #REQUIRED ></programlisting> <calloutlist> <callout arearefs="programlisting_internparam_urlent"> <para>An internal parameter entity <tag class="paramentity">url</tag> is defined. Since the <code>SYSTEM</code> keyword is absent the definition is taken <quote>as is</quote>.</para> </callout> <callout arearefs="programlisting_internparam_urluse"> <para>The internal entity <tag class="paramentity">url</tag> is used. The entity resolver will replace this term by the string <code>CDATA</code>.</para> </callout> </calloutlist> <para>From a practical point of view we might argue that the given code does not make sense. Actually the entity <tag class="paramentity">url</tag> does a kind of <quote>copy/paste</quote> action. There seems to be no benefit since the parser still sees the attribute type <code>CDATA</code> and will thus still accept invalid <link xlink:href="http://www.w3.org/Addressing">URLs</link> like <code>http://c:\mydir\</code>.</para> <para>The actual gain is readability: In a DTD attributes of <emphasis>desired</emphasis> type <link xlink:href="http://www.w3.org/Addressing">URL</link> appear frequently. In the scope of DTDs there is no appropriate data type describing the <link xlink:href="http://www.ietf.org/rfc/rfc1738.txt">formal rules</link> a <link xlink:href="http://www.w3.org/Addressing">URL</link> has to obey. But at least the reader will notice the <emphasis>intention</emphasis> that the attribute <tag class="attribute">src</tag> of the element <tag class="element">image</tag> shall contain a <link xlink:href="http://www.w3.org/Addressing">URL</link>.</para> <para>In the next example we want to extend out book.dtd by allowing simplified HTML tables:</para> <table border="1" xml:id="example_table_col_rowspan"> <caption>A table caption</caption> <?target dbhtml table-width="50%"?> <?target dbfo table-width="50%"?> <tr> <td rowspan="2">A cell spanning two rows</td> <td>a single cell</td> </tr> <tr> <td>another single cell</td> </tr> <tr> <td colspan="2">A cell spanning two columns</td> </tr> </table> <qandaset role="exercise"> <title>book.dtd and tables</title> <qandadiv> <qandaentry xml:id="example_docbook_v5"> <question> <para>The <link linkend="example_table_col_rowspan">example table</link> presented before may be defined by the following code snippet:</para> <programlisting>... <table border="1" <co xml:id="programlisting_table_col_rowspan_attborder"/> > <caption>A table caption</caption> <tr> <td rowspan="2" <co xml:id="programlisting_table_col_rowspan_attrowspan"/>>A cell spanning two rows</td> <td>a single cell</td> </tr> <tr> <td>another single cell</td> </tr> <tr> <td colspan="2" <co xml:id="programlisting_table_col_rowspan_attcolspan"/>>A cell spanning two columns</td> </tr> </table> ...</programlisting> <calloutlist> <callout arearefs="programlisting_table_col_rowspan_attborder"> <para>We want a table with borders. In a HTML rendered version the number indicates the line with in pixel. In this example we expect a line width of one pixel.</para> </callout> <callout arearefs="programlisting_table_col_rowspan_attrowspan"> <para>The cell will span two rows.</para> </callout> <callout arearefs="programlisting_table_col_rowspan_attcolspan"> <para>The cell will span two columns.</para> </callout> </calloutlist> <para>Define a DTD table module <filename>table.mod</filename> and include it into the <filename>book.dtd</filename> via an external parameter entity.</para> </question> <answer> <para>The table model definitions in <filename>table.mod</filename> read:</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!ELEMENT table (caption, tr+)> <!ATTLIST table border NMTOKEN #IMPLIED > <!ELEMENT caption (#PCDATA) > <!ELEMENT tr (td+) > <!ELEMENT td (#PCDATA) > <!ATTLIST td colspan NMTOKEN #IMPLIED rowspan NMTOKEN #IMPLIED ></programlisting> <para>This may be included into our <filename>book.dtd</filename> via:</para> <programlisting><!ENTITY % table.mod SYSTEM "table.mod" > %table.mod; <!ELEMENT book (title, chapter+)> ...</programlisting> <para>The complete source code is available <link xlink:href="Ref/src/Dtd/book/v5/book.dtd">here</link> . A document instance reads:</para> <programlisting><!DOCTYPE book SYSTEM "book.dtd"> <book lang="en"> <title>Introduction to Java</title> <chapter id="introJava"> <title>Introduction</title> <para id="notUsed">Documentation on <link linkend="introJava">types</link></para> <table border="1"> <caption>A table caption</caption> <tr> <td rowspan="2">A cell spanning two columns</td> <td>a single cell</td> </tr> <tr> <td>another single cell</td> </tr> <tr> <td colspan="2">A cell spanning two rows</td> </tr> </table> </chapter> </book></programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="section_generalentities"> <title>General entities</title> <para>Parameter entities are limited to appear only within the scope of <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">DTD</abbrev>s. They must not appear in document instances. This motivates the introduction of general entities. We start with an example of a copyright notice:</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <para>All rights, including copyright are owned or controlled for these purposes by the company.</para> <para>For further information, see Section Two of the Member Agreement.</para></programlisting> <para>We notice that this code is not even well formed XML: It has got two <tag class="element">para</tag> nodes at top level.</para> <para>We assume that the company in question produces a great number of documents. These two paragraphs shall be kept at a centralized location to be included into all publications. For this purpose the document shall be accessible from <filename>ftp://internal.com/copyright.xml</filename> in the company's intra net. Starting with our previously introduced <code>doc.dtd</code> we may embed and use this copyright document:</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE doc SYSTEM "doc.dtd" [ <co xml:id="programlisting_copyright_internal"/> <!ENTITY copyrightnotice <co xml:id="programlisting_copyright_entitydef"/> SYSTEM "ftp://internal.com/copyright.xml"> ]<co xml:id="programlisting_copyright_endsubset"/>> <doc> <para>A paragraph</para> <figure> <caption>A nice image</caption> <image src="image.png"/> </figure> &copyrightnotice; <co xml:id="programlisting_copyright_entityuse"/> </doc></programlisting> <calloutlist> <callout arearefs="programlisting_copyright_internal"> <para>The left bracket <quote>[</quote> marks the begin of the document's <emphasis>internal DTD subset</emphasis>.</para> </callout> <callout arearefs="programlisting_copyright_entitydef"> <para>An external general entity <tag class="genentity">copyrightnotice</tag> is declared. The <link xlink:href="http://www.w3.org/Addressing">URL</link> following the <code>SYSTEM</code> keyword defines a reference to the external definitions.</para> </callout> <callout arearefs="programlisting_copyright_endsubset"> <para>Internal subset definitions end here.</para> </callout> <callout arearefs="programlisting_copyright_entityuse"> <para>The entity <tag class="genentity">copyrightnotice</tag> is used. The entity resolver will expand it to the actual content of <filename>ftp://internal.com/copyright.xml</filename>.</para> </callout> </calloutlist> <para>The careful reader will have already guessed that from a XML processing application's viewpoint this is equivalent to:</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE doc SYSTEM "doc.dtd"> <doc> <para>A paragraph</para> <figure> <caption>A nice image</caption> <image src="image.png"/> </figure> <para>All rights, including copyright are owned or controlled for these purposes by the company.</para> <para>For further information, see Section Two of the Member Agreement.</para> </doc></programlisting> <para>We now have to clarify the term <quote>internal subset</quote> in the context of DTDs and start with:</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE doc SYSTEM "doc.dtd" [ <!ENTITY copyrightnotice SYSTEM "ftp://internal.com/copyright.xml"> ]>...</programlisting> <para>The XML standard allows markup declarations to appear both in <filename>doc.dtd</filename> itself and within the range being delimited by the braces <code>[...]</code>. Markup declarations appearing in <filename>doc.dtd</filename> belong to the so called <emphasis>external subset</emphasis> reflecting the fact that they reside outside the <quote>current</quote> document instance. Any markup declarations appearing within <code>[ ... ]</code> are considered to belong to the document instance's <emphasis>internal subset</emphasis>. We are now able to review some of our introductory XML examples: Our <tag class="element">memo</tag> document instance from <xref linkend="dtd_and_document"/> has no external subset at all. The markup declarations are completely defined in the internal subset of the document instance. As being stated earlier this only makes sense for development or demonstration purposes.</para> <para>The internal subset may under some circumstances even be used to extend content model or attribute definitions of the underlying DTD and thus leading to non portable document instances. This is possible if the DTD provides <quote>hooks</quote> intended to be used as entry points for extensions.</para> <para>In the above example we might have defined the entity <tag class="genentity">copyrightnotice</tag> in the external subset i.e. within <filename>doc.dtd</filename>. We conclude this section by showing a meaningful use case for an internal general entity:</para> <qandaset role="exercise"> <title>Avoiding title duplication</title> <qandadiv> <qandaentry xml:id="example_xhtml_duplicate_title"> <question> <para>We recall the sample Xhtml document given in <xref linkend="figure_xhtmlbase"/>. The <tag class="starttag">title</tag> and the <tag class="starttag">h1</tag> node both contain the same content <quote>A first start</quote>. Use an entity to define this content to be used at the two different positions.</para> </question> <answer> <para>We define an entity being used at the two locations in question:</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"[ <!ENTITY mytitle "A first start" <co xml:id="programlisting_xhtml_duplicate_title_entity"/>> ]> <html xmlns="http://www.w3.org/1999/xhtml"> <head><title>&mytitle;<co xml:id="programlisting_xhtml_duplicate_title_entity_first"/></title></head> <body> <h1>&mytitle;<co xml:id="programlisting_xhtml_duplicate_title_entity_second"/></h1> <p>This is a very simple document</p> </body> </html></programlisting> <calloutlist> <callout arearefs="programlisting_xhtml_duplicate_title_entity"> <para>Definition of an internal general entity <tag class="genentity">mytitle</tag>.</para> </callout> <callout arearefs="programlisting_xhtml_duplicate_title_entity_first"> <para>First usage.</para> </callout> <callout arearefs="programlisting_xhtml_duplicate_title_entity_second"> <para>Second usage</para> </callout> </calloutlist> </answer> </qandaentry> <qandaentry xml:id="example_chapter_entities"> <question> <label>Dividing a book.dtd document instance into chapters.</label> <para>General entities may be used to physically split documents into smaller parts. Create a <tag class="starttag">book</tag> document instance <filename>master.xml</filename> with two chapters. Define an <code>IDREF</code> reference from the second to the first chapter. Now create two XML files <filename>chap1.xml</filename> and <filename>chap2.xml</filename> and move the content of the two chapters from <filename>master.xml</filename> into these files. Then include them into the master document as external general entities. What happens with the reference from the second to the first chapter?</para> </question> <answer> <para>Our master document reads:</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE book SYSTEM "book.dtd"[ <!ENTITY chap1 SYSTEM "chap1.xml"> <!ENTITY chap2 SYSTEM "chap2.xml"> ]> <book> <title>Master document example</title> &chap1; &chap2; </book></programlisting> <para>The first general entity <filename>chap1.xml</filename> contains:</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <chapter id="firstChapter"> <title>This is the first chapter</title> <para>We add some text here.</para> </chapter></programlisting> <para>Notice that the <tag class="starttag">chapter</tag> node contains an attribute <tag class="attribute">id</tag> with value <tag class="attvalue">firstChapter</tag>. The second file <filename>chap2.xml</filename> reads:</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <chapter> <title>This is the second chapter</title> <para>This is a <link linkend="firstChapter">reference</link>.</para> </chapter></programlisting> <para>The paragraph contains an <code>IDREF</code> based reference to the first chapter being defined as a general entity. The master document is a valid XML file with respect to our <filename>book.dtd</filename> grammar. We expect this result since entities are only a means to <emphasis>physically</emphasis> divide a XML file into smaller <quote>chunks</quote> without changing the logical structure at all.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="section_notation"> <title>Notations and unparsed entities</title> <para>An unparsed entity is conceptually part of an XML document but will be ignored by the parser. A common example for unparsed entities are images. The most simple way is to reference XML document external images by attributes:</para> <programlisting><graphic image="printer.gif"/></programlisting> <para>Many editors simply use this method which apparently suffers from some deficiencies:</para> </section> </chapter> <appendix> <title>W3C production rules</title> <productionset> <title>Characters</title> <production xml:id="w3RecXml_NT-Letter"> <lhs>Letter</lhs> <rhs><nonterminal def="#w3RecXml_NT-BaseChar">BaseChar</nonterminal> | <nonterminal def="#w3RecXml_NT-Ideographic">Ideographic</nonterminal></rhs> </production> <production xml:id="w3RecXml_NT-BaseChar"> <lhs>BaseChar</lhs> <rhs>[#x0041-#x005A] | [#x0061-#x007A] | [#x00C0-#x00D6] | [#x00D8-#x00F6] | [#x00F8-#x00FF] | [#x0100-#x0131] | [#x0134-#x013E] |...(values omitted here, see W3C documentation)</rhs> </production> <production xml:id="w3RecXml_NT-Ideographic"> <lhs>Ideographic</lhs> <rhs>[#x4E00-#x9FA5] | #x3007 | [#x3021-#x3029]</rhs> </production> <production xml:id="w3RecXml_NT-CombiningChar"> <lhs>CombiningChar</lhs> <rhs>[#x0300-#x0345] | ...(values omitted here)</rhs> </production> <production xml:id="w3RecXml_NT-Digit"> <lhs>Digit</lhs> <rhs>[#x0030-#x0039] | [#x0660-#x0669] | [#x06F0-#x06F9] | [#x0966-#x096F] | [#x09E6-#x09EF] | [#x0A66-#x0A6F] | [#x0AE6-#x0AEF] | [#x0B66-#x0B6F] | [#x0BE7-#x0BEF] | [#x0C66-#x0C6F] | [#x0CE6-#x0CEF] | [#x0D66-#x0D6F] | [#x0E50-#x0E59] | [#x0ED0-#x0ED9] | [#x0F20-#x0F29]</rhs> </production> <production xml:id="w3RecXml_NT-Extender"> <lhs>Extender</lhs> <rhs>#x00B7 | #x02D0 | #x02D1 | #x0387 | #x0640 | #x0E46 | #x0EC6 | #x3005 | [#x3031-#x3035] | [#x309D-#x309E] | [#x30FC-#x30FE]</rhs> </production> </productionset> </appendix> </part> <part xml:id="persistenceStrategies"> <title annotations="ws/eclipse/HibIntro/target/classes">Persistence strategies and application development</title> <chapter xml:id="orm"> <title>Object Relational Mapping</title> <remark>Mapping tools should be used only by someone familiar with relational technology. O-R mapping is not meant to save developers from understanding mapping problems or to hide them altogether. It is meant for those who have an understanding of the issues and know what they need, but who don't want to have to write thousands of lines of code to deal with a problem that has already been solved.<xref linkend="bibKeith09"/>.</remark> <section xml:id="configureEclipseMaven"> <title>Configuring a Maven based Eclipse <link linkend="gloss_Java"><trademark>Java</trademark></link> project with Hibernate</title> <para>We will use Maven for several purposes:</para> <figure xml:id="fig_reasonsUsingMaven"> <title>Reasons for using Maven</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/mavenIntro.fig" scale="65"/> </imageobject> </mediaobject> </figure> <para>We do explain the problem of managing transitive dependencies in projects:</para> <figure xml:id="fig_transitiveDependencies"> <title>Transitive dependencies</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/transitiveDep.fig" scale="65"/> </imageobject> </mediaobject> </figure> <section xml:id="sect_mavenConfigEclipseProject"> <title>Create a Maven based project in Eclipse</title> <para>The following section requires the eclipse Maven plugin to be installed. This may be accomplished by installing the <productname xlink:href="http://www.jboss.org/tools">Jboss Tools</productname> via <guimenu>Help</guimenu> <guisubmenu>Eclipse Marketplace</guisubmenu> which will install Maven as a dependency.</para> <orderedlist> <listitem> <para>We start Eclipse and choose the <quote>new project</quote> wizard.</para> <informalfigure> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/CreateMaven/1.png" scale="80"/> </imageobject> </mediaobject> </informalfigure> </listitem> <listitem> <para>Filtering <quote>maven</quote> yields our desired project type</para> <informalfigure> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/CreateMaven/2.png" scale="80"/> </imageobject> </mediaobject> </informalfigure> </listitem> <listitem> <para>Just accept the defaults</para> <informalfigure> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/CreateMaven/3.png" scale="80"/> </imageobject> </mediaobject> </informalfigure> </listitem> <listitem> <para>We select <quote>maven-archetype-quickstart</quote> to choose a plain <link linkend="gloss_Java"><trademark>Java</trademark></link> project</para> <informalfigure> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/CreateMaven/4.png" scale="80"/> </imageobject> </mediaobject> </informalfigure> </listitem> <listitem> <para>The chosen Group Id will become our project's name.</para> <informalfigure> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/CreateMaven/5.png" scale="80"/> </imageobject> </mediaobject> </informalfigure> </listitem> <listitem> <para>We end up with a <link linkend="gloss_Java"><trademark>Java</trademark></link> project already being enabled for <productname xlink:href="http://www.junit.org">Junit</productname> testing.</para> <informalfigure> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/CreateMaven/6.png" scale="80"/> </imageobject> </mediaobject> </informalfigure> </listitem> </orderedlist> <para> But wait: We are about to work with (Mysql) databases. Thus we need at least a <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> driver. Maven assists us if we define an appropriate dependency as we will see in the following section.</para> </section> <section xml:id="sect_mavenAddMysqlJdbcConnector"> <title>Adding a <productname xlink:href="http://www.mysql.com">Mysql</productname> <trademark>JDBC</trademark> driver</title> <para>We might just download a <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> implementation jar file like <filename>mysql-connector-java-5.1.16.jar</filename> manually and add it to our eclipse environment. If we want to share our project with other people or work on it on different workstations this jar file must be available on each system we are working with.</para> <para>One solution might be to integrate it into our project completely (e.g. in a <filename>lib</filename> folder) and put the whole project under version control (<productname xlink:href="http://git-scm.com/">git</productname>, <productname xlink:href="http://subversion.apache.org">svn</productname>). On the other hand this just bloats our project with external (library) dependencies.</para> <para>Maven helps us to easily manage external dependencies. The idea is to keep them in centralized repositories for download and add meta information like a package name, a package group name and a version number for retrieval:</para> <orderedlist> <listitem> <para>Searching for <quote>mysql</quote> in a maven repository yields the <link linkend="gloss_Java"><trademark>Java</trademark></link> <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> connector:</para> <informalfigure> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/CreateMaven/mysql1.png" scale="80"/> </imageobject> </mediaobject> </informalfigure> </listitem> <listitem> <para>We choose the most recent version:</para> <informalfigure> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/CreateMaven/mysql2.png" scale="80"/> </imageobject> </mediaobject> </informalfigure> </listitem> <listitem> <para>Again we copy the dependency snippet ...</para> <informalfigure> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/CreateMaven/mysql3.png" scale="80"/> </imageobject> </mediaobject> </informalfigure> </listitem> <listitem> <para>... and add it to our <filename>pom.xml</filename> file's dependency section:</para> <informalfigure> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/CreateMaven/mysql4.png" scale="80"/> </imageobject> </mediaobject> </informalfigure> </listitem> <listitem> <para>Did we actually succeed? Right-clicking on our project <guimenu>Build path</guimenu> <guisubmenu>Configure Build Path</guisubmenu> and choosing the <guisubmenu>Libraries</guisubmenu> tab we see our <envar>CLASSPATH</envar> being extended:</para> <informalfigure> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/CreateMaven/mysql5.png" scale="80"/> </imageobject> </mediaobject> </informalfigure> <para>Notice the location of the <productname xlink:href="http://www.mysql.com">Mysql</productname> jar below the <filename>.m2</filename> Maven folder in the user's home directory. If we share our project this location will change to e.g. <filename>c:\users\foo\.m2\...</filename> due to different system default paths.</para> </listitem> </orderedlist> </section> <section xml:id="sect_mavenAddHibernate"> <title>Adding Hibernate dependencies</title> <para>Our goal is to start using Hibernate for a console based project. Searching the Maven repository for hibernate-core provides a suitable artifact:</para> <programlisting><dependency> <groupId>org.hibernate</groupId> <artifactId>hibernate-core</artifactId> <version>4.1.9.Final</version> </dependency> </programlisting> </section> <section xml:id="sect_createHibernateConfiguration"> <title>Creating a Hibernate configuration</title> <para>Hibernate is intended to provide persistence services saving transient <link linkend="gloss_Java"><trademark>Java</trademark></link> instances to a database. For this purpose Hibernate needs:</para> <itemizedlist> <listitem> <para>The type of database (Oracle, DB2, Mysql,...)</para> </listitem> <listitem> <para>JDBC driver class name.</para> </listitem> <listitem> <para>JDBC connection parameters</para> <itemizedlist> <listitem> <para>Server name</para> </listitem> <listitem> <para>port</para> </listitem> <listitem> <para>user</para> </listitem> <listitem> <para>password</para> </listitem> </itemizedlist> </listitem> <listitem> <para>A list of classes to be mapped</para> </listitem> <listitem> <para>Parameters defining the log level, whether generated SQL code shall be logged etc.</para> </listitem> </itemizedlist> <para>Hibernate offers an XML based configuration syntax. We show a toy example of a <filename>hibernate.cfg.xml</filename> configuration file mapping just one class <classname>hibintro.v1.model.User</classname> to a Mysql database server:</para> <figure xml:id="hibernateConfigurationFile"> <title>A basic Hibernate configuration file <filename>hibernate.cfg.xml</filename>.</title> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE hibernate-configuration PUBLIC "-//Hibernate/Hibernate Configuration DTD 3.0//EN" "http://hibernate.sourceforge.net/hibernate-configuration-3.0.dtd"> <hibernate-configuration> <session-factory > <property name="hibernate.connection.driver_class">com.mysql.jdbc.Driver</property> <property name="hibernate.connection.password">XYZ</property> <property name="hibernate.connection.url">jdbc:mysql://localhost:3306/hdm</property> <property name="hibernate.connection.username">hdmuser</property> <property name="hibernate.dialect">org.hibernate.dialect.MySQL5InnoDBDialect</property> <property name="hibernate.show_sql">true</property> <property name="hibernate.format_sql">true</property> <property name="hibernate.hbm2ddl.auto">update</property> <mapping class="hibintro.v1.model.User"/> </session-factory> </hibernate-configuration></programlisting> </figure> <para>This file may be edited with a simple text editor. The Eclipse <productname xlink:href="http://www.jboss.org/tools">Jboss Tools</productname> Eclipse plugin provides a configuration editor simplifying this task. They may be installed on top of Eclipse <link xlink:href="http://www.jboss.org/tools/download">in several ways</link>. The following video shows some of its features.</para> <mediaobject> <videoobject> <videodata fileref="Ref/Video/hibernateConfig.mp4"/> </videoobject> </mediaobject> </section> </section> <section xml:id="sect_hibernateBasics"> <title>A round trip working with objects</title> <para>Hibernate may be regarded as a persistence provider to <link linkend="gloss_JPA"><abbrev>JPA</abbrev></link>:</para> <figure xml:id="jpaPersistProvider"> <title><link linkend="gloss_JPA"><abbrev>JPA</abbrev></link> persistence provider</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/persistProvider.fig"/> </imageobject> </mediaobject> </figure> <para>Having configured Hibernate we may now start working with <link linkend="gloss_Java"><trademark>Java</trademark></link> objects. To do so we need an appropriate session object to run transactions. Starting from the Hibernate documentation we code the following helper method:</para> <programlisting>package hibintro.util; import org.hibernate.SessionFactory; import org.hibernate.cfg.Configuration; import org.hibernate.service.ServiceRegistryBuilder; public class HibernateUtil { /** * @param hibernateConfigFileName The filename defaults to <code>hibernate.cfg.xml</code>. * @return Session factory instance to be used for actual session creation by caller. */ public static SessionFactory createSessionFactory(final String hibernateConfigFileName) { Configuration configuration = new Configuration(); configuration.configure(hibernateConfigFileName); ServiceRegistryBuilder serviceRegistryBuilder = new ServiceRegistryBuilder().applySettings(configuration .getProperties()); return configuration .buildSessionFactory(serviceRegistryBuilder.buildServiceRegistry()); } }</programlisting> <para>The following class <classname>hibintro.v1.model.User</classname> will be used as a starting example to be mapped to a database. Notice the <classname>javax.persistence.Entity</classname> <link xlink:href="http://docs.oracle.com/javase/tutorial/java/javaOO/annotations.html">annotation</link> <coref linkend="entityAnnotation"/>:</para> <figure xml:id="mappingUserInstances"> <title>Mapping <classname>hibintro.v1.model.User</classname> instances to a database.</title> <programlisting>package hibintro.v1.model; ... <emphasis role="bold">@Entity</emphasis> <co xml:id="entityAnnotation"/> public class User { <emphasis role="bold">//The user's unique login name e.g. "goik"</emphasis> String uid; public String getUid() {return uid;} public void setUid(String uid) {this.uid = uid;} <emphasis role="bold">// The user's common name e.g. "Martin Goik"</emphasis> String cname; public String getCname() {return cname;} public void setCname(String cname) {this.cname = cname;} <emphasis role="bold">// Hibernate requires a default constructor</emphasis> public User() {} public User(String uid, String cname) { super(); this.uid = uid; this.cname = cname; } }</programlisting> </figure> <para>With respect to <xref linkend="hibernateConfigurationFile"/> we notice our class <classname>hibintro.v1.model.User</classname> being referenced:</para> <programlisting><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE hibernate-configuration ... <mapping class="<emphasis role="bold">hibintro.v1.model.User</emphasis>"/> </session-factory> </hibernate-configuration></programlisting> <para>This line tells Hibernate to actually map <classname>hibintro.v1.model.User</classname> to a (Mysql) database.</para> <section xml:id="persistingObjects"> <title>Persisting objects</title> <para>Persisting transient objects may be achieved in various ways. In <xref linkend="jdbcIntro"/> we introduced the <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> <abbrev xlink:href="http://en.wikipedia.org/wiki/Api">API</abbrev> connecting <link linkend="gloss_Java"><trademark>Java</trademark></link> applications and relational database systems. We stored and retrieved object values.</para> <para>Having larger projects these tasks become increasingly tedious. It is thus desired to automate these tasks while still using <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> as a low level transport layer. This is being shown in <xref linkend="jdbcFourTier"/>. That figure already mentions Hibernate as a possible persistence service provider.</para> <para>The following sections start with a single class <classname>hibintro.v1.model.User</classname>:</para> <figure xml:id="fig_BasicUser"> <title>A basic <code>User</code> class.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/classUser.fig"/> </imageobject> </mediaobject> </figure> <para>Object relational mapping (ORM) denotes the process of mapping instances of classes to relational table data. In our current example we may draw a simple implementation sketch:</para> <figure xml:id="mappingProperties2attributes"> <title>Mapping properties to attributes.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/mapUser.fig"/> </imageobject> </mediaobject> </figure> <para>This is far too simplistic. What about integrity constraints?</para> <figure xml:id="mappingIntegrityConstraints"> <title>Annotating integrity constraints</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/mapUserIntegrity.fig"/> </imageobject> </mediaobject> </figure> <para>We start with the following <classname>hibintro.v1.model.User</classname> class lacking integrity constraints completely:</para> <programlisting>package hibintro.v1.model; @Entity public class User { String uid; public String getUid() {return uid;} public void setUid(String uid) {this.uid = uid;} String cname; public String getCname() {return cname;} public void setCname(String cname) {this.cname = cname;} /** * Hibernate/JPA require a default constructor. It has has to be implemented * if any non-default constructor has been defined */ public User() {} /** * @param uid See {@link #getUid()}. * @param cname See {@link #getCname()}. */ public User(String uid, String cname) { this.uid = uid; this.cname = cname; } }</programlisting> <para>Persisting objects with Hibernate requires a <classname>org.hibernate.Session</classname> instance <coref linkend="sessionInstance"/>. It happens between the start <coref linkend="startTransaction"/> and commit <coref linkend="commitTransaction"/> of a transaction being derived from that session:</para> <programlisting>package hibintro.v1.run; ... public class PersistSingleUser { public static void main(String[] args) { final <classname>org.hibernate.Session</classname> session <co xml:id="sessionInstance"/>= HibernateUtil.createSessionFactory("hibernate.cfg.xml").openSession(); final <classname>org.hibernate.Transaction</classname> transaction = session.beginTransaction();<co xml:id="startTransaction"/> final <classname>hibintro.v1.model.User</classname> u = new User("goik", "Martin Goik"); session.save(u); transaction.commit(); <co xml:id="commitTransaction"/> } }</programlisting> <para>Executing the above code yields a runtime exception:</para> <programlisting>Exception in thread "main" java.lang.ExceptionInInitializerError at myhibernate.intro.run.PersistUser.main(PersistUser.java:14) Caused by: org.hibernate.AnnotationException: <emphasis role="bold">No identifier specified for entity: myhibernate.intro.model.User</emphasis> ... at myhibernate.intro.util.HibernateUtil.buildConfiguration(HibernateUtil.java:17) at myhibernate.intro.util.HibernateUtil.<clinit>(HibernateUtil.java:9)</programlisting> <para>This runtime error is a little bit cryptic. The missing <quote>identifier</quote> refers to the absence of a primary key definition already mentioned in <xref linkend="mappingIntegrityConstraints"/>. We define a key by annotating the <code>uid</code> property with a <classname>javax.persistence.Id</classname> annotation <coref linkend="primaryKeyDefinition"/>:</para> <programlisting>package hibintro.v1.model; import javax.persistence.Entity; <emphasis role="bold">import javax.persistence.Id;</emphasis> ... @Entity public class User {... <emphasis role="bold">@Id</emphasis> <co xml:id="primaryKeyDefinition"/> public String getUid() { return uid; } ...</programlisting> <para>The careful reader will have noticed that we've annotated the getter method rather than the property <code>uid</code> itself. Hibernate / <link linkend="gloss_JPA"><abbrev>JPA</abbrev></link> can work both ways. Annotating a getter however offers additional support e.g. when logging for debugging purposes is required.</para> <para>This time we are successful. Since we enabled the logging of SQL statements in <xref linkend="hibernateConfigurationFile"/> Hibernate shows us the corresponding <code>INSERT</code> statement:</para> <programlisting>Hibernate: insert into User (cname, uid) sky values (?, ?)</programlisting> <para>Notice the (?,?) part of our log: This indicates the internal usage of <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> <classname>java.sql.PreparedStatement</classname> instances. Hibernate generates the following create table statement:</para> <figure xml:id="fig_createTableV1User"> <title>Database schema mapping instances of <classname>hibintro.v1.model.User</classname>.</title> <programlisting>CREATE TABLE User ( uid VARCHAR(255) NOT NULL PRIMARY KEY, cname VARCHAR(255) ) </programlisting> </figure> </section> <section xml:id="loadingObjectsByPrimaryKey"> <title>Loading Objects by primary key</title> <para>Having persisted a single <classname>hibintro.v1.model.User</classname> instance by means of <classname>hibintro.v1.run.PersistSingleUser</classname> we may now load the database object. The easiest way is based on both the requested object's type <coref linkend="specLoadType"/> and its primary key value <coref linkend="specLoadPrimaryKey"/>:</para> <figure xml:id="loadByClassAndPrimaryKey"> <title>Loading a single object by a primary key value.</title> <programlisting>package hibintro.v1.run; ... public class RetrieveSingleUser { ... final Transaction transaction = session.beginTransaction(); final User u = (User) session.load(<emphasis role="bold">User.class</emphasis> <co xml:id="specLoadType"/>, "<emphasis role="bold">goik</emphasis>" <co xml:id="specLoadPrimaryKey"/>); if (null == u ) { System.out.println("No such user 'goik'"); } else { System.out.println("Found user '" + u.getCname() + "'"); } transaction.commit();...</programlisting> </figure> <para>This retrieves the expected result. Buried in other log messages we find the following SQL <quote>background</quote> statement:</para> <programlisting>... INFO: HHH000232: Schema update complete Hibernate: <emphasis role="bold">select user0_.uid as uid0_0_, user0_.cname as cname0_0_ from User user0_ where user0_.uid=?</emphasis> Found user 'Martin Goik'</programlisting> <qandaset role="exercise"> <title>Choosing the correct method</title> <qandadiv> <qandaentry> <question> <para>Actually the code in <xref linkend="loadByClassAndPrimaryKey"/> is not quite correct. Execute it with a non-existing primary key value i.e. <quote>goik2</quote>. What do you observe? Can you explain that behaviour?</para> <para>Read the documentation of the <classname>org.hibernate.Session</classname>.<code>load()</code> method and correct the code snippet.</para> </question> <answer> <para>If there is no corresponding database object we receive a <classname>org.hibernate.ObjectNotFoundException</classname> :<coref linkend="loadUserObjectNotFoundException"/></para> <programlisting>Hibernate: select user0_.uid as uid0_0_, user0_.cname as cname0_0_ from User user0_ where user0_.uid=? Exception in thread "main" org.hibernate.ObjectNotFoundException: <co xml:id="loadUserObjectNotFoundException"/>No row with the given identifier exists: [hibintro.v1.model.User#goik2] ... at org.hibernate.proxy.pojo.javassist.JavassistLazyInitializer.invoke(JavassistLazyInitializer.java:185) at hibintro.v1.model.User_$$_javassist_0.getCname(User_$$_javassist_0.java) at hibintro.v1.run.RetrieveSingleUser.main(<emphasis role="bold">RetrieveSingleUser.java:35</emphasis>)<co xml:id="exceptionOnGetCname"/> </programlisting> <para>Due to <coref linkend="exceptionOnGetCname"/> the exception is being triggered by the <code>getCname()</code> call. The documentation of <code>load()</code> tells us that method calls may be delegated to proxy objects which is being implemented by byte code instrumentation. If however no matching database object exists calling the proxy instance yields a <classname>org.hibernate.ObjectNotFoundException</classname>.</para> <para>The documentation also tells us to use the corresponding <methodname>org.hibernate.Session.get(Class,Serializable)</methodname> method which actually returns <code>null</code> in case a primary key value does not exist:</para> <programlisting>... final User u = (User) session.get(User.class, "goik2"); if (null == u ) { System.out.println("No such user having key value 'goik2'"); ...</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="loadingObjectsByQuery"> <title>Loading objects by queries</title> <para>Often we are interested in a (sub)set of results. We populate our database with additional <classname>hibintro.v1.model.User</classname> instances:</para> <programlisting>package hibintro.v1.run; ... public class PersistUsers { ... final Transaction transaction = session.beginTransaction(); final User users[] = {new User("wings", "Fred Wings"), new User("eve", "Eve Briggs")} ; for (final User u : users ) {session.save(u);} transaction.commit(); ...</programlisting> <para>Now we'd like to retrieve these objects. Hibernate offers the <emphasis role="bold">H</emphasis>ibernate <emphasis role="bold">Q</emphasis>uery <emphasis role="bold">L</emphasis>anguage (<abbrev xlink:href="http://docs.jboss.org/hibernate/orm/4.1/manual/en-US/html/ch16.html">HQL</abbrev>) for object queries. As we will see <abbrev xlink:href="http://docs.jboss.org/hibernate/orm/4.1/manual/en-US/html/ch16.html">HQL</abbrev> extends <acronym xlink:href="http://en.wikipedia.org/wiki/Sql">SQL</acronym> with respect to polymorphic queries. The current example does not use inheritance leaving us with a simple <abbrev xlink:href="http://docs.jboss.org/hibernate/orm/4.1/manual/en-US/html/ch16.html">HQL</abbrev> query <coref linkend="hqlFromUser"/> in <classname>hibintro.v1.run.RetrieveAll</classname>:</para> <figure xml:id="retrieveAllUserByHql"> <title>Retrieving <classname>hibintro.v1.model.User</classname> instances by <abbrev xlink:href="http://docs.jboss.org/hibernate/orm/4.1/manual/en-US/html/ch16.html">HQL</abbrev>.</title> <programlisting>package hibintro.v1.run; ... public class RetrieveAll { ... final Query searchUsers = session.createQuery("<emphasis role="bold">from User</emphasis>");<co xml:id="hqlFromUser"/> final List<User> users = (List<User>) searchUsers.list(); for (final User u: users) { System.out.println("uid=" + u.getUid() + ", " + u.getCname()); }</programlisting> </figure> <para>Being used to <acronym xlink:href="http://en.wikipedia.org/wiki/Sql">SQL</acronym>we notice the absence of a SELECT clause in <coref linkend="hqlFromUser"/>: The ratio behind is having a focus on objects rather than on attribute sets. Thus our <abbrev xlink:href="http://docs.jboss.org/hibernate/orm/4.1/manual/en-US/html/ch16.html">HQL</abbrev> query returns a set of <classname>hibintro.v1.model.User</classname> instances:</para> <programlisting>uid=eve, Eve Briggs uid=goik, Martin Goik uid=wings, Fred Wings</programlisting> <qandaset role="exercise"> <title><abbrev xlink:href="http://docs.jboss.org/hibernate/orm/4.1/manual/en-US/html/ch16.html">HQL</abbrev> and <acronym xlink:href="http://en.wikipedia.org/wiki/Sql">SQL</acronym>.</title> <qandadiv> <qandaentry> <question> <para>We may actually retrieve attributes rather than objects. For this purpose our query actually resembles standard <acronym xlink:href="http://en.wikipedia.org/wiki/Sql">SQL</acronym><coref linkend="hqlWithSelect"/>:</para> <programlisting>final Query searchUsers = session.createQuery("<emphasis role="bold">select uid, cname from User</emphasis>" <co xml:id="hqlWithSelect"/>); final Object queryResult <co xml:id="queryResultFromSelect"/>= searchUsers.list();</programlisting> <para>Use the <methodname>Class.getSimpleName()</methodname> reflection method to iteratively analyze the <code>queryResult</code> <coref linkend="queryResultFromSelect"/> instance's structure. This guides you in finding suitable casts to add code similar as in <xref linkend="retrieveAllUserByHql"/> in order to write user's attribute values to standard output.</para> </question> <answer> <para>A possible implementation reads:</para> <programlisting>package hibintro.v1.run; ... public class GetUsersAsAttributes { ... final Query searchUsers = session.createQuery("<emphasis role="bold">select uid, cname from User</emphasis>"); @SuppressWarnings("unchecked") final Object queryResult = searchUsers.list(); System.out.println("queryResult type:" + queryResult.getClass().getSimpleName()); <co xml:id="typeOfHqlResult"/> final List<Object> usersAttributes = (List<Object>) queryResult; for (final Object o: usersAttributes) { System.out.println("result set element type:" + o.getClass().getSimpleName()); <co xml:id="typeOfEmbeddedObjects"/> final Object attributes[] = (Object []) o; for (Object attribute: attributes) { System.out.println("attribute value:" + attribute); } }...</programlisting> <para>Actually the two lines <coref linkend="typeOfHqlResult"/> and <coref linkend="typeOfEmbeddedObjects"/> are only needed during the development process to discover the result set's object structure.</para> </answer> </qandaentry> </qandadiv> </qandaset> <para>The careful reader may already expect <abbrev xlink:href="http://docs.jboss.org/hibernate/orm/4.1/manual/en-US/html/ch16.html">HQL</abbrev> to offer additional features namely predicate based queries. Following <classname>hibintro.v1.run.SelectUser</classname> we may restrict our result set by an <acronym xlink:href="http://en.wikipedia.org/wiki/Sql">SQL</acronym> style <code>WHERE</code> clause:</para> <programlisting> final List<User> users = (List<User>) session.createQuery( "<emphasis role="bold">from User u where u.cname like '%e%'</emphasis>").list(); for (final User u: users) { System.out.println("Found user '" + u.getCname() + "'"); }</programlisting> <para>This time we receive a true subset of <classname>hibintro.v1.model.User</classname> instances:</para> <programlisting>Found user 'Eve Briggs' Found user 'Fred Wings'</programlisting> </section> <section xml:id="criteriaBasedQueries"> <title>Criteria based queries</title> <para>Selecting Objects by <abbrev xlink:href="http://docs.jboss.org/hibernate/orm/4.1/manual/en-US/html/ch16.html">HQL</abbrev>c queries technically means parsing <abbrev xlink:href="http://docs.jboss.org/hibernate/orm/4.1/manual/en-US/html/ch16.html">HQL</abbrev> and transforming it into some sort of abstract syntax tree. We may instead create corresponding structures by using <trademark>Hibernate</trademark>'s criteria API:</para> </section> </section> <section xml:id="mappingSingleClasses"> <title>Mapping single entities and database tables</title> <section xml:id="transientProperties"> <title>Transient properties</title> <para>We take a closer look at <xref linkend="mappingUserInstances"/> assuming that Instances of <classname>hibintro.v1.model.User</classname> need an additional <emphasis role="bold">GUI related</emphasis> property <code>selected</code> <coref linkend="propertyIsSelected"/>:</para> <programlisting>package hibintro.v2; @Entity public class User { ... boolean <emphasis role="bold">selected</emphasis> <co xml:id="propertyIsSelected"/> = false; public boolean isSelected() { return selected; } public void setSelected(boolean selected) { this.selected = selected; } ... }</programlisting> <para>Hibernates produces the following <abbrev xlink:href="http://en.wikipedia.org/wiki/Data_definition_language">DDL</abbrev> statements containing an attribute <code>selected</code> <coref linkend="attributeSelected"/>:</para> <programlisting>CREATE TABLE User ( uid VARCHAR(255) NOT NULL PRIMARY KEY, cname VARCHAR(255), <emphasis role="bold">selected</emphasis> <co xml:id="attributeSelected"/> BIT NOT NULL, ) </programlisting> <para>If we just annotate a Java class with an <classname>javax.persistence.Entity</classname> Annotation all properties of the class in question will be mapped. The Hibernate framework of course cannot distinguish between transient and persistent properties. If we want a property to be transient we have to add a <classname>javax.persistence.Transient</classname> annotation to the corresponding getter method:</para> <informaltable border="1"> <colgroup width="6%"/> <colgroup width="94%"/> <tr> <td valign="top"><emphasis role="bold">Java</emphasis></td> <td valign="top"><programlisting>package hibintro.v3; ... @Entity public class User { ... boolean selected = false; @Transient <co xml:id="transientAnnotation"/> public boolean isSelected() { return selected; } public void setSelected(boolean selected) { this.selected = selected; }...</programlisting></td> </tr> <tr> <td valign="top"><emphasis role="bold">SQL</emphasis></td> <td><programlisting>CREATE TABLE User ( uid VARCHAR(255) NOT NULL PRIMARY KEY, cname VARCHAR(255) ) </programlisting></td> </tr> </informaltable> <para>The <classname>javax.persistence.Transient</classname> annotation inhibits the mapping of our property <code>selected</code>.</para> <caution> <para>When loading a <classname>hibintro.v3.User</classname> instance from a database the transient property's value is of course entirely determined by the constructor.</para> </caution> </section> <section xml:id="sect_mappingNullValues"> <title>Properties and NULL values</title> <para>In <xref linkend="fig_createTableV1User"/> the primary key <code>uid</code> property's value must not be <code>NULL</code>. This is an immediate consequence of the <classname>javax.persistence.Id</classname> annotation and the fact that databases don't allow NULL values for key attributes.</para> <para>The <code>cname</code> property however may be null. Sometimes we want to ensure the corresponding database attributes to be set, at least carrying an empty string value. This can be achieved by adding a <classname>javax.persistence.Column</classname><code>(nullable = false)</code> annotation:</para> <informaltable border="1"> <colgroup width="6%"/> <colgroup width="94%"/> <tr> <td valign="top"><emphasis role="bold">Java</emphasis></td> <td valign="top"><programlisting>package hibintro.v4; ... @Entity public class User { String cname; <emphasis role="bold">@Column(nullable = false)</emphasis> public String getCname() { return cname; } ...</programlisting></td> </tr> <tr> <td valign="top"><emphasis role="bold">SQL</emphasis></td> <td><programlisting>CREATE TABLE User ( uid VARCHAR(255) NOT NULL PRIMARY KEY, cname VARCHAR(255) <emphasis role="bold">NOT NULL</emphasis> <co xml:id="cnameDatabaseNotNull"/> )</programlisting></td> </tr> </informaltable> <para>This results in a corresponding database constraint <coref linkend="cnameDatabaseNotNull"/>. Attempting to store instances with null values now fails:</para> <informaltable border="1"> <colgroup width="6%"/> <colgroup width="94%"/> <tr> <td valign="top"><emphasis role="bold">Java</emphasis></td> <td valign="top"><programlisting>package hibintro.v4; ... public class PersistSingleUser { final Transaction transaction = session.beginTransaction(); { final User u = new User("goik", null); session.save(u); } transaction.commit(); ...</programlisting></td> </tr> <tr> <td valign="top"><emphasis role="bold">Log</emphasis></td> <td><programlisting>Hibernate: insert into User (cname, uid) values (?, ?) ... WARN: SQL Error: 1048, SQLState: 23000 Feb 13, 2013 9:38:32 PM org.hibernate.engine.jdbc.spi.SqlExceptionHelper logExceptions ERROR: Column 'cname' cannot be null Exception in thread "main" org.hibernate.exception.ConstraintViolationException: Column 'cname' cannot be null ... <emphasis role="bold">Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: Column 'cname' cannot be null</emphasis> ...</programlisting></td> </tr> </informaltable> <para>The exception is thrown by the <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> driver as the result of a database constraint violation but not by the hibernate framework itself prior to attempting the insert.</para> </section> <section xml:id="mappingKeys"> <title>Defining keys</title> <para>Frequently we need more than just a primary key. Starting from <classname>hibintro.v4.User</classname> we may want to add a property <code>uidNumber</code>. This is a common requirement: On UNIX type operation systems for example each user does have both a unique login name (like <quote>goik</quote>) and a unique numerical value (like <quote>123</quote>). We choose our primary key to be numeric <coref linkend="uidNumberIsPrimaryKey"/>and the login name to become a second candidate key <coref linkend="uidIsUnique"/>:</para> <programlisting>package hibintro.v5; ... @Entity @Table(uniqueConstraints={@UniqueConstraint(columnNames={"uid"})}) <co xml:id="uidIsUnique"/> public class User { int uidNumber; @Id <co xml:id="uidNumberIsPrimaryKey"/> public int getUidNumber() { return uidNumber; } public void setUidNumber(int uidNumber) { this.uidNumber = uidNumber; } String uid; public String getUid() { return uid; } public void setUid(String uid) { this.uid = uid; } ...</programlisting> <para>Notice the slight difference: The property <code>uid</code> may need a <code>@</code><code><classname>javax.persistence.Column</classname>(nullable=false)</code> annotation to become a candidate key. This is <emphasis>not</emphasis> automatically inferred by the <classname>javax.persistence.UniqueConstraint</classname> definition <coref linkend="uidIsUnique"/>. In contrast the property <code>uidNumber</code> is not being referenced by the preceding <classname>javax.persistence.Table</classname> annotation but annotated by <classname>javax.persistence.Id</classname>. Hence a <code>nullable=false</code> is not needed.</para> <para>This is in accordance with <abbrev xlink:href="http://en.wikipedia.org/wiki/Data_definition_language">DDL</abbrev>: Attributes composing a primary key must not allow <code>NULL</code> values but attributes only appearing in UNIQUE declarations may become <code>NULL</code>.</para> <para>The <abbrev xlink:href="http://en.wikipedia.org/wiki/Data_definition_language">DDL</abbrev> reads:</para> <programlisting>CREATE TABLE User ( uidNumber INT NOT NULL PRIMARY KEY, cname VARCHAR(255) NOT NULL, uid VARCHAR(255) NOT NULL UNIQUE )</programlisting> </section> <section xml:id="sect_ComposedKeys"> <title>Composed keys</title> <para>Composed candidate keys are sometimes referred to as business keys. The underlying logic defines which objects are considered to be identical based on their values.</para> <para>As an example, we consider a company having several departments. Regarding projects he following business rules shall apply:</para> <figure xml:id="projectBusinessRules"> <title>Business rules for projects</title> <orderedlist> <listitem> <para>Each department must have a unique name.</para> </listitem> <listitem> <para>A project's name must be unique within the set of all projects belonging to the same department.</para> </listitem> <listitem> <para>A project must be assigned to exactly one department.</para> </listitem> </orderedlist> </figure> <para>Right now we defer considerations of the n:1 relationship between departments and projects to a later chapter. Instead we focus just on project instances and represent departments just by their integer id values which will later become foreign keys.</para> <para>In addition each project receives a unique integer id value as well. This is in accordance with the <quote>best practice</quote> rule of defining a <link xlink:href="http://en.wikipedia.org/wiki/Surrogate_key">surrogate key</link> <coref linkend="projectPrimaryKeyDefinition"/> to be used as (primary) object identifier. This immutable key will then become the target in foreign key definitions:</para> <informaltable border="1"> <colgroup width="6%"/> <colgroup width="94%"/> <tr> <td valign="top"><emphasis role="bold">Java</emphasis></td> <td valign="top"><programlisting>package hibintro.v6; ... @Entity @Table(uniqueConstraints={@UniqueConstraint(columnNames={"name", "department"})}) <co xml:id="projectBusinessKey"/> public class Project { int id; @Id <co xml:id="projectPrimaryKeyDefinition"/> public int getId() {return id;} protected void setId(int id) {this.id = id;} String name; @Column(nullable=false) public String getName() {return name;} public void setName(String name) {this.name = name;} int department; @Column(nullable=false) public int getDepartment() {return department;} public void setDepartment(int department) {this.department = department;} ...</programlisting></td> </tr> <tr> <td valign="top"><emphasis role="bold">Sql</emphasis></td> <td><programlisting>CREATE TABLE Project ( id int(11) NOT NULL PRIMARY KEY <coref linkend="projectPrimaryKeyDefinition"/>, department int(11) NOT NULL, name varchar(255) NOT NULL, UNIQUE KEY name (name,department) <coref linkend="projectBusinessKey"/> )</programlisting></td> </tr> </informaltable> <calloutlist> <callout arearefs="projectPrimaryKeyDefinition"> <para>Defining the surrogate primary key.</para> </callout> <callout arearefs="projectBusinessKey"> <para>Defining a business key composed of a project's <code>name</code> and <code>department</code> number. This implements our second business rule in <xref linkend="projectBusinessRules"/>.</para> </callout> </calloutlist> <qandaset role="exercise"> <title><link linkend="gloss_JPA"><abbrev>JPA</abbrev></link> requirements.</title> <qandadiv> <qandaentry> <question> <para>The setter void <methodname annotations="nojavadoc">setId(int)</methodname>in <classname>hibintro.v6.Project</classname> has protected access. Explain this choice.</para> </question> <answer> <para>From an application developer's point of view the setter should be absent: The <code>id</code> property is immutable and should not be accessed at all.</para> <para>When loading an instance from a database a persistence provider however has to set its value. Hibernate uses the reflection-API to override the restriction being imposed by the <code>protected</code> modifier. So why not declare it private? Doing so may cause our IDE to flag a warning about an unused private method.</para> <para>So choosing <code>protected</code> is a compromise: An application developer cannot modify the property (unless deriving a class) and our persistence provider can still set its value to the database's primary key attribute value.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="nonUniqueIndexes"> <title>Indexes (non-unique)</title> <para>From the viewpoint of software modelling non-unique indexes are not part of the business logic but refer to database optimization. Consequently <link linkend="gloss_JPA"><abbrev>JPA</abbrev></link> has no support for non-unique indexes.</para> <para>On the other hand performance matters. Hibernate and other persistence providers offer vendor specific <link linkend="gloss_JPA"><abbrev>JPA</abbrev></link> extensions. We may find it useful to access <classname>hibintro.v5.User</classname> instances having a specific <code>cname</code> quickly. This can be achieved by adding a Hibernate specific <code>org.hibernate.annotations.</code><classname>org.hibernate.annotations.Table</classname> index generating annotation <coref linkend="hibernateExtensionIndex"/> which works on top of <link linkend="gloss_JPA"><abbrev>JPA</abbrev></link>'s <code>javax.persistence.</code><classname>javax.persistence.Table</classname>:</para> <informaltable border="1"> <colgroup width="6%"/> <colgroup width="94%"/> <tr> <td valign="top"><emphasis role="bold">Java</emphasis></td> <td valign="top"><programlisting>package hibintro.v7; ... @Entity @Table(uniqueConstraints={@UniqueConstraint(columnNames={"uid"})}) <emphasis role="bold">@org.hibernate.annotations.Table(</emphasis> <co xml:id="hibernateExtensionIndex"/> <emphasis role="bold">appliesTo="User", indexes = {@Index(name = "findCname", columnNames = {"cname"})})</emphasis> public class User { ... String cname; @Column(nullable = false) public String getCname() { return cname;} public void setCname(String cname) {this.cname = cname;} ...</programlisting></td> </tr> <tr> <td valign="top"><emphasis role="bold">Sql</emphasis></td> <td><programlisting>CREATE TABLE User ( uidNumber INT NOT NULL PRIMARY KEY, cname VARCHAR(255) NOT NULL, uid VARCHAR(255) NOT NULL UNIQUE ); CREATE INDEX findCname ON User (cname ASC);</programlisting></td> </tr> </informaltable> </section> <section xml:id="sect_RenameTablesAndAttributes"> <title>Renaming tables and attributes</title> <para>So far we assumed that we map classes to database tables having identical names: A <link linkend="gloss_Java"><trademark>Java</trademark></link> class <code>User</code> is being mapped to a relational table with identical name <code>User</code>. Sometimes a renaming is desired. We may for example want to access a legacy database by a newly implemented <link linkend="gloss_Java"><trademark>Java</trademark></link> application. Choosing meaningful names may conflict with decisions being taken when the original database design took place.</para> <para>In the following example we change the database tables name from its default User to Person <coref linkend="renameUserToPerson"/>. The properties <code>uidNummbe</code>r and <code>cname</code> are changed to attribute names <code>numericUid</code> <coref linkend="renameUidNumberToNumericUid"/>and <code>fullName</code> <coref linkend="renameCnameToFullName"/> respectively:</para> <informaltable border="1"> <colgroup width="6%"/> <colgroup width="94%"/> <tr> <td valign="top"><emphasis role="bold">Java</emphasis></td> <td valign="top"><programlisting>package hibintro.v8; ... @Entity @Table(name="Person") <co xml:id="renameUserToPerson"/> public class User { int uidNumber; @Id @Column(name="numericUid") <co xml:id="renameUidNumberToNumericUid"/> public int getUidNumber() {return uidNumber;} public void setUidNumber(int uidNumber) {this.uidNumber = uidNumber;} String uid; @Column(nullable=false) public String getUid() {return uid;} public void setUid(String uid) {this.uid = uid;} String cname; @Column(nullable = false, name="fullName") <co xml:id="renameCnameToFullName"/> public String getCname() {return cname;} public void setCname(String cname) {this.cname = cname;} ...</programlisting></td> </tr> <tr> <td valign="top"><emphasis role="bold">Sql</emphasis></td> <td><programlisting>CREATE TABLE Person <coref linkend="renameUserToPerson"/> ( numericUid <coref linkend="renameUidNumberToNumericUid"/> int(11) NOT NULL PRIMARY KEY, fullName <coref linkend="renameCnameToFullName"/> varchar(255) NOT NULL, uid varchar(255) NOT NULL )</programlisting></td> </tr> </informaltable> </section> <section xml:id="sectChangeDefaultTypeMapping"> <title>Changing the default type mapping</title> <para>Sometimes we are interested in changing <link linkend="gloss_JPA"><abbrev>JPA</abbrev></link>'s default type mapping strategy. For example <trademark xlink:href="http://www.mysql.com/about/legal/trademark.html">Mysql</trademark> versions prior to 5.0 lack an appropriate type representing boolean values. It was therefore quite common mapping boolean properties to <code>CHAR(1)</code> with possible values being <code>'Y'</code> and <code>'N'</code>. Hibernate will map boolean values to <code>tinyint(1)</code>. Supporting older software may require to tweak the standard mapping.</para> <para>Unfortunately <link linkend="gloss_JPA"><abbrev>JPA</abbrev></link> itself does not offer any interface for this purpose. The persistence provider may offer a solution though. Hibernate for example allows to remap <coref linkend="remapBooleanChar"/> types . We assume our <classname>hibintro.v9.User</classname> class to have a <code>boolean</code> property <code>active</code>:</para> <informaltable border="1"> <colgroup width="6%"/> <colgroup width="94%"/> <tr> <td valign="top"><emphasis role="bold">Java</emphasis></td> <td valign="top"><programlisting>package hibintro.v9; ... public class User { ... public void setCname(String cname) {this.cname = cname;} boolean active = false; @Type(type="yes_no") <co xml:id="remapBooleanChar"/> public boolean isActive() {return active;} public void setActive(boolean active) {this.active = active;} }</programlisting></td> </tr> <tr> <td valign="top"><emphasis role="bold">Sql</emphasis></td> <td><programlisting>CREATE TABLE User ( uidNumber int(11) NOT NULL PRIMARY KEY, active char(1) NOT NULL, cname varchar(255) DEFAULT NULL, uid varchar(255) NOT NULL )</programlisting></td> </tr> </informaltable> <para>Readers being interested in more sophisticated strategies like mapping user defined data types to database types are advised to read the <link xlink:href="http://docs.jboss.org/hibernate/orm/4.1/manual/en-US/html/ch05.html#mapping-types">manual section on Hibernate types</link>.</para> </section> </section> <section xml:id="inheritance"> <title>Inheritance</title> <para>Mapping inheritance hierarchies to relational databases means bridging the gap between object <link xlink:href="http://en.wikipedia.org/wiki/Object-relational_impedance_mismatch">oriented and relational models</link>. We start with a slightly modified example from <xref linkend="Bauer05"/>:</para> <figure xml:id="fig_BillingDetails"> <title>Modelling payment.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/billing.fig"/> </imageobject> <caption> <para>Simplified Billing details example derived from <xref linkend="Bauer05"/>. Notice <classname>inherit.v1.BillingDetails</classname> being an abstract parent class of two concrete classes <classname>inherit.v1.CreditCard</classname> and <classname>inherit.v1.BankAccount</classname>. The attribute <code>number</code> applies both to bank account and credit card payments.</para> </caption> </mediaobject> </figure> <para>Since the relational model lacks inheritance completely we have to implement a database schema ourselves. We subsequently explore three main approaches each of which having its own advantages and disadvantages.</para> <section xml:id="sect_InheritTablePerClassHierarchie"> <title>Single table per class hierarchy</title> <para>This approach may be considered the most simple: We just create one database table for storing instances of arbitrary classes belonging to the inheritance hierarchy in question:</para> <figure xml:id="fig_TablePerClassHierarchyData"> <title>A single relation mapping.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/billingData.fig"/> </imageobject> <caption> <para>Fitting both <classname>inherit.v1.CreditCard</classname> and <classname>inherit.v1.BankAccount</classname> instances into a single relation.</para> </caption> </mediaobject> </figure> <para>The relation may be created by the following <abbrev xlink:href="http://en.wikipedia.org/wiki/Data_definition_language">DDL</abbrev>:</para> <figure xml:id="fig_TablePerClassHierarchyMapping"> <title>Mapping the inheritance hierarchy.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/billingSql.fig"/> </imageobject> </mediaobject> </figure> <para>We take a closer look at the generated relation. Since</para> <informaltable border="1"> <colgroup width="6%"/> <colgroup width="94%"/> <tr> <td valign="top"><emphasis role="bold">Java</emphasis></td> <td valign="top"><programlisting>package inherit.v1; ... @Entity @Inheritance(strategy=InheritanceType.<emphasis role="bold">SINGLE_TABLE</emphasis>) <co linkends="billingMapSingleTableCallout" xml:id="billingMapSingleTable"/> @DiscriminatorColumn(name="dataType", discriminatorType=DiscriminatorType.STRING) <co linkends="billingMapSingleTableDiscriminatorCallout" xml:id="billingMapSingleTableDiscriminator"/> abstract class BillingDetails { @Id @GeneratedValue <co linkends="billingMapSingleTableIdGeneratedCallout" xml:id="billingMapSingleTableIdGenerated"/> public Long getId() ... @Column(nullable = false, length = 32)public final String getNumber() ... @Temporal(TemporalType.TIMESTAMP) @Column(nullable = false) public Date getCreated() ...</programlisting><programlisting>package inherit.v1; ... @Entity @DiscriminatorValue(value = "Credit card" <co xml:id="billingMapSingleTableDiscriminatorCredit"/>) public class CreditCard extends BillingDetails { ... //Nothing JPA related happens here</programlisting><programlisting>package inherit.v1; ... @Entity @DiscriminatorValue(value = "Bank account" <co xml:id="billingMapSingleTableDiscriminatorBank"/>) public class BankAccount extends BillingDetails { ... //Nothing JPA related happens here</programlisting></td> </tr> <tr> <td valign="top"><emphasis role="bold">Sql</emphasis></td> <td><programlisting continuation="continues">CREATE TABLE BillingDetails <co linkends="billingMapSingleTableCallout" xml:id="BillingDetailsGeneratedRelationName"/> ( dataType varchar(31) NOT NULL, id bigint(20) NOT NULL AUTO_INCREMENT PRIMARY KEY, number varchar(255) NOT NULL, <co linkends="billingMapSingleTableBaseNotNull" xml:id="billingMapSingleTableCalloutNumberNotNull"/> created datetime NOT NULL, <co linkends="billingMapSingleTableBaseNotNull" xml:id="billingMapSingleTableCalloutCreatedNotNull"/> cardType int(11) DEFAULT NULL, <co linkends="billingMapSingleTableDerivedNull" xml:id="billingMapSingleTableCardTypeNull"/> expiration datetime DEFAULT NULL, <co linkends="billingMapSingleTableDerivedNull" xml:id="billingMapSingleTableExpirationNull"/> bankName varchar(255) DEFAULT NULL, <co linkends="billingMapSingleTableDerivedNull" xml:id="billingMapSingleTableBankNameNull"/> swiftcode varchar(255) DEFAULT NULL <co linkends="billingMapSingleTableDerivedNull" xml:id="billingMapSingleTableSwiftCodeNull"/> )</programlisting></td> </tr> </informaltable> <calloutlist> <callout arearefs="billingMapSingleTable" xml:id="billingMapSingleTableCallout"> <para>All classes of the inheritance hierarchy will be mapped to a single table. Unless stated otherwise the <link linkend="gloss_JPA"><abbrev>JPA</abbrev></link> provider will choose the root class' name (<code>BillingDetails</code>) as default value for the generated relation's name <coref linkend="BillingDetailsGeneratedRelationName"/>.</para> </callout> <callout arearefs="billingMapSingleTableDiscriminator" xml:id="billingMapSingleTableDiscriminatorCallout"> <para>The <link linkend="gloss_JPA"><abbrev>JPA</abbrev></link> provider needs a column to distinguish the different types of database objects. We've chosen the discriminator attribute <code>dataType</code> values to be simple strings. Due to the definitions in <coref linkend="billingMapSingleTableDiscriminatorCredit"/> and <coref linkend="billingMapSingleTableDiscriminatorBank"/> database object types are being identified by either of the two values:</para> <itemizedlist> <listitem> <para><code>Credit card</code>: object will be mapped to <classname>inherit.v1.CreditCard</classname>.</para> </listitem> <listitem> <para><code>Bank account</code>: object will be mapped to <classname>inherit.v1.BankAccount</classname>.</para> </listitem> </itemizedlist> <para>In a productive system the <classname>javax.persistence.DiscriminatorType</classname> setting will typically favour <classname>javax.persistence.DiscriminatorType</classname><code>.INTEGER</code> over <classname>javax.persistence.DiscriminatorType</classname><code>.STRING</code> unless the application in question has to deal with a legacy database schema.</para> </callout> <callout arearefs="billingMapSingleTableIdGenerated" xml:id="billingMapSingleTableIdGeneratedCallout"> <para>This one is unrelated to inheritance: Our primary key values will be auto generated by the database server e.g. by <code>SEQUENCE</code> or <code>IDENTITY</code> mechanisms if available.</para> </callout> <callout arearefs="billingMapSingleTableCalloutNumberNotNull billingMapSingleTableCalloutCreatedNotNull" xml:id="billingMapSingleTableBaseNotNull"> <para>Only the base class' attributes may exclude <code>NULL</code> values.</para> </callout> <callout arearefs="billingMapSingleTableCardTypeNull billingMapSingleTableExpirationNull billingMapSingleTableBankNameNull billingMapSingleTableSwiftCodeNull" xml:id="billingMapSingleTableDerivedNull"> <para>All derived classes' attributes must allow <code>NULL</code> values.</para> </callout> </calloutlist> <para>We may now insert instances of <classname>inherit.v1.BankAccount</classname> or <classname>inherit.v1.CreditCard</classname>:</para> <figure xml:id="insertCreditBank"> <title>Inserting payment information</title> <programlisting>package inherit.v1; ... public class Persist { public static void main(String[] args) throws ParseException { ... final Transaction transaction = session.beginTransaction(); { final CreditCard creditCard = new CreditCard("4412 8334 4512 9416", 1, "05/18/15"); session.save(creditCard); final BankAccount bankAccount = new BankAccount("1107 2 31", "Lehman Brothers", "BARCGB22"); session.save(bankAccount); } transaction.commit(); ...</programlisting> </figure> <section xml:id="sect_InheritTablePerClassHierarchieLoad"> <title>Database object retrieval</title> <para>As in <xref linkend="retrieveAllUserByHql"/> objects being stored by <xref linkend="insertCreditBank"/> may be queried using <abbrev xlink:href="http://docs.jboss.org/hibernate/orm/4.1/manual/en-US/html/ch16.html">HQL</abbrev>.</para> <informaltable border="1"> <colgroup width="6%"/> <colgroup width="94%"/> <tr> <td valign="top"><emphasis role="bold">Java</emphasis></td> <td valign="top"><programlisting>package inherit.v1; ... public class RetrieveCredit { public static void main(String[] args) { ... final Transaction transaction = session.beginTransaction(); final Query searchCreditPayments = session.createQuery("<emphasis role="bold">from inherit.v1.CreditCard</emphasis>"); <co xml:id="hqlQueryCreditCard"/> final List<CreditCard> creditCardList = (List<CreditCard>) searchCreditPayments.list(); for (final CreditCard c: creditCardList) { System.out.println(c); } ...</programlisting></td> </tr> <tr> <td valign="top"><emphasis role="bold">Sql</emphasis></td> <td><programlisting continuation="continues">INFO: HHH000232: Schema update complete Hibernate: select creditcard0_.id as id0_, creditcard0_.created as created0_, creditcard0_.number as number0_, creditcard0_.cardType as cardType0_, creditcard0_.expiration as expiration0_ from BillingDetails creditcard0_ where creditcard0_.<emphasis role="bold">dataType</emphasis> <co xml:id="hqlQueryCreditCard_dataType"/>='<emphasis role="bold">Credit card</emphasis>' <emphasis role="bold">CreditCard: number=4412 8334 4512 9416, created 2013-02-19 13:09:22.0, cardType=1, expiration=2015-05-18 00:00:00.</emphasis> <co xml:id="hqlQueryCreditCardResultSet"/></programlisting></td> </tr> </informaltable> <para>Some Remarks: Our query asks for instances of <classname>inherit.v2.CreditCard</classname> <coref linkend="hqlQueryCreditCard"/>. This gets implemented as an <acronym xlink:href="http://en.wikipedia.org/wiki/Sql">SQL</acronym> <code>SELECT</code> choosing datasets whose discriminator attribute <code>value of dataType</code> <coref linkend="hqlQueryCreditCard_dataType"/> equals <quote><code>Credit card</code></quote>. The current result set contains just one element <coref linkend="hqlQueryCreditCardResultSet"/> in accordance with <xref linkend="insertCreditBank"/>.</para> <para>Retrieving both <classname>inherit.v1.CreditCard</classname> and <classname>inherit.v1.BankAccount</classname> instances is accomplished by querying for the common base class <classname>inherit.v1.BillingDetails</classname>:</para> <informaltable border="1"> <colgroup width="6%"/> <colgroup width="94%"/> <tr> <td valign="top"><emphasis role="bold">Java</emphasis></td> <td valign="top"><programlisting>package inherit.v1; ... public class RetrieveAll { ... final Query searchBilling = session.createQuery("from <emphasis role="bold">inherit.v1.BillingDetails</emphasis>"); @SuppressWarnings("unchecked") final List<BillingDetails> billingDetailsList = (List<BillingDetails>) searchBilling.list(); for (final BillingDetails c: billingDetailsList) { System.out.println(c); } ...</programlisting></td> </tr> <tr> <td valign="top"><emphasis role="bold">Sql</emphasis></td> <td><programlisting continuation="continues">INFO: HHH000232: Schema update complete Hibernate: select billingdet0_.id as id0_, ... billingdet0_.dataType as dataType0_ from BillingDetails billingdet0_ CreditCard: number=4412 8334 4512 9416, created 2013-02-19 13:09:22.0, <co xml:id="resultSetHeterogeneous"/> cardType=1, expiration=2015-05-18 00:00:00.0 BankAccount: number=1107 2 31, created 2013-02-19 13:09:22.0, bankName=Lehman Brothers, swiftcode=BARCGB22</programlisting></td> </tr> </informaltable> <para>This is the first example of a polymorphic query yielding a heterogeneous result set<coref linkend="resultSetHeterogeneous"/>.</para> </section> <section xml:id="sect_InheritTablePerClassHierarchieNullProblem"> <title>Null values</title> <para>Our current mapping strategy limits our means to specify data integrity constraints. It is no longer possible to disallow <code>null</code> values for properties belonging to derived classes. We might want to disallow <code>null</code> values in the <code>bankName</code> property. Hibernate will generate a corresponding database attribute <coref linkend="require_bankNameNotNullDb"/>:</para> <informaltable border="1"> <colgroup width="6%"/> <colgroup width="94%"/> <tr> <td valign="top"><emphasis role="bold">Java</emphasis></td> <td valign="top"><programlisting>package inherit.v2; ... @Entity @DiscriminatorValue(value = "Bank account") public class BankAccount extends BillingDetails { String bankName; @Column(<emphasis role="bold">nullable=false</emphasis>) <co xml:id="require_bankNameNotNull"/> public String getBankName() {return bankName;} ...</programlisting></td> </tr> <tr> <td valign="top"><emphasis role="bold">Sql</emphasis></td> <td><programlisting continuation="continues">CREATE TABLE BillingDetails ( id bigint(20) NOT NULL AUTO_INCREMENT PRIMARY KEY, bankName varchar(255) <emphasis role="bold">NOT NULL</emphasis>, <co xml:id="require_bankNameNotNullDb"/> ...</programlisting></td> </tr> </informaltable> <para>Looks good? Unfortunately the attempt to save a bank account <coref linkend="saveBankAccount"/> yields a runtime exception <coref linkend="saveBankAccountException"/>:</para> <informaltable border="1"> <colgroup width="6%"/> <colgroup width="94%"/> <tr> <td valign="top"><emphasis role="bold">Java</emphasis></td> <td valign="top"><programlisting>package inherit.v2; ... public class Persist { ... final CreditCard creditCard = new CreditCard("4412 8334 4512 9416", 1, "05/18/15"); session.save(creditCard); final BankAccount bankAccount = new BankAccount("1107 2 31", "Lehman Brothers", "BARCGB22"); session.save(bankAccount) <co xml:id="saveBankAccount"/>; ...</programlisting></td> </tr> <tr> <td valign="top"><emphasis role="bold">Sql</emphasis></td> <td><programlisting continuation="continues">... Feb 19, 2013 10:28:00 AM org.hibernate.tool.hbm2ddl.SchemaUpdate execute INFO: HHH000232: Schema update complete Hibernate: insert into BillingDetails (created, number, cardType, expiration, dataType) values (?, ?, ?, ?, 'Credit card') Feb 19, 2013 10:28:00 AM org.hibernate.engine.jdbc.spi.SqlExceptionHelper logExceptions WARN: SQL Error: 1364, SQLState: HY000 Feb 19, 2013 10:28:00 AM org.hibernate.engine.jdbc.spi.SqlExceptionHelper logExceptions <emphasis role="bold">ERROR: Field 'bankName' doesn't have a default value Exception in thread "main" org.hibernate.exception.GenericJDBCException: Field 'bankName' doesn't have a default value</emphasis> <co xml:id="saveBankAccountException"/> ... at inherit.v2.Persist.main(Persist.java:28) Caused by: java.sql.SQLException: Field 'bankName' doesn't have a default value</programlisting></td> </tr> </informaltable> <para>Conclusion: A table per class hierarchy mapping does not allow to specify not null constraints for properties of derived classes.</para> <qandaset role="exercise"> <title>Mapping figures</title> <qandadiv> <qandaentry> <question> <para>Map the following model to a database:</para> <figure xml:id="modelFigureInheritance"> <title>Figure subclasses</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/figureInherit.fig"/> </imageobject> </mediaobject> </figure> <para>The two properties <code>xCenter</code> and <code>yCenter</code> in the abstract base class <code>Figure</code> represent the coordinates of the concrete figure's center of gravity. In a drawing application this would be considered the placement of the respective object.</para> <para>The abstract method <code>getArea()</code> is meant to be implemented without interfering with your database mapping. Choose an integer discriminator. Test your application by storing and loading objects.</para> </question> <answer> <para>The main difference to the current <classname>inherit.v1.BillingDetails</classname> example is the <classname>javax.persistence.Transient</classname> annotation of the <code>area</code> property in <classname>inherit.v3.Figure</classname>, <classname>inherit.v3.Circle</classname> and <classname>inherit.v3.Rectangle</classname>. The storage ant retrieval applications are <classname>inherit.v3.Persist</classname>, <classname>inherit.v3.RetrieveRectangles</classname> and <classname>inherit.v3.RetrieveAll</classname> are straightforward.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> </section> <section xml:id="joinedSubclass"> <title>Joined subclasses</title> <para>The basic idea is to generate a normalized schema implementing inheritance relationships by foreign keys:</para> <figure xml:id="joindSubclassMapping"> <title>Joined subclass mapping.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/billingMapJoined.fig"/> </imageobject> </mediaobject> </figure> <para>The inheritance strategy of joined subclasses <coref linkend="strategyJoinedSubclass"/> is being defined in the abstract base class <classname>inherit.joined.v1.BillingDetails</classname>:</para> <programlisting>package inherit.joined.v1; ... @Entity @Inheritance(strategy=InheritanceType.JOINED) <co xml:id="strategyJoinedSubclass"/> public abstract class BillingDetails { ... }</programlisting> <para>The derived classes need to provide an implementation hint in order to identify the required foreign key <coref linkend="referenceParenntClass"/> to the parent class <classname>inherit.joined.v1.BillingDetails</classname>:</para> <programlisting>package inherit.joined.v1; ... @Entity @PrimaryKeyJoinColumn(name="parent" <co xml:id="referenceParenntClass"/>, referencedColumnName="id") public class CreditCard extends BillingDetails { int cardType; @Column(nullable=false) <co xml:id="tpcNotNullCardType"/> public int getCardType() {return cardType;} public void setCardType(int cardType) {this.cardType = cardType;} Date expiration; @Column(nullable=false) <co xml:id="tpcNotNullexpiration"/> public Date getExpiration() {return expiration;} public void setExpiration(Date expiration) {this.expiration = expiration;} ... }</programlisting> <para>Notice the ability to exclude null values in <coref linkend="tpcNotNullCardType"/> and <coref linkend="tpcNotNullexpiration"/>.</para> <section xml:id="joinedSubclassRetrieve"> <title>Retrieving Objects</title> <para>On the database server side object retrieval results in a more expensive operation: A query for root class instances of<classname>inherit.joined.v1.BillingDetails</classname> <coref linkend="joinedQueryBillingDetails"/> of our inheritance hierarchy results in joining all three tables <code>BillingDetails</code> <coref linkend="joinFromBillingDetails"/>, <code>BankAccount</code> <coref linkend="joinFromBankAccount"/> and <code>CreditCard</code> <coref linkend="joinFromCreditCard"/>:</para> <informaltable border="1"> <colgroup width="6%"/> <colgroup width="94%"/> <tr> <td valign="top"><emphasis role="bold">Java</emphasis></td> <td valign="top"><programlisting>package inherit.joined.v1; ... public class RetrieveAll { ... final Query searchBilling = session.createQuery("<emphasis role="bold">from inherit.tpc.v1.BillingDetails</emphasis>" <co xml:id="joinedQueryBillingDetails"/>); ...</programlisting></td> </tr> <tr> <td valign="top"><emphasis role="bold">Sql</emphasis></td> <td><programlisting continuation="continues">Hibernate: select billingdet0_.id as id0_, billingdet0_.created as created0_, billingdet0_.number as number0_, billingdet0_1_.bankName as bankName1_, billingdet0_1_.swiftcode as swiftcode1_, billingdet0_2_.cardType as cardType2_, billingdet0_2_.expiration as expiration2_, case when billingdet0_1_.id is not null then 1 when billingdet0_2_.id is not null then 2 when billingdet0_.id is not null then 0 end as clazz_ from <emphasis role="bold">BillingDetails</emphasis> billingdet0_ <co xml:id="joinFromBillingDetails"/> left outer join <emphasis role="bold">BankAccount</emphasis> billingdet0_1_ <co xml:id="joinFromBankAccount"/> on billingdet0_.id=billingdet0_1_.id left outer join <emphasis role="bold">CreditCard</emphasis> billingdet0_2_ <co xml:id="joinFromCreditCard"/> on billingdet0_.id=billingdet0_2_.id </programlisting></td> </tr> </informaltable> <qandaset role="exercise"> <title><link linkend="gloss_JPA"><abbrev>JPA</abbrev></link> constraints and database integrity.</title> <qandadiv> <qandaentry> <question> <para>Explain all integrity constraints of the Hibernate generated schema. Is it able to implement the correct constraints on database level corresponding to the inheritance related <link linkend="gloss_Java"><trademark>Java</trademark></link> objects? On contrary: Are there possible database states which do not correspond to the domain model's object constraints?</para> </question> <answer> <para>We take a look to the database schema:</para> <programlisting>CREATE TABLE BillingDetails ( id bigint(20) NOT NULL AUTO_INCREMENT PRIMARY KEY <co linkends="inheritJoinSqlJava-1" xml:id="inheritJoinSqlJava-1-co"/>, created datetime NOT NULL, number varchar(32) NOT NULL ); CREATE TABLE CreditCard ( id bigint(20) NOT NULL PRIMARY KEY <co linkends="inheritJoinSqlJava-2" xml:id="inheritJoinSqlJava-2-co"/> REFERENCES <co linkends="inheritJoinSqlJava-3" xml:id="inheritJoinSqlJava-3-co"/> BillingDetails, cardType int(11) NOT NULL, expiration datetime NOT NULL ); CREATE TABLE BankAccount ( id bigint(20) NOT NULL PRIMARY KEY <co linkends="inheritJoinSqlJava-4" xml:id="inheritJoinSqlJava-4-co"/> REFERENCES <co linkends="inheritJoinSqlJava-4" xml:id="inheritJoinSqlJava-5-co"/> BillingDetails, bankName varchar(255) NOT NULL, swiftcode varchar(255) NOT NULL );</programlisting> <calloutlist> <callout arearefs="inheritJoinSqlJava-1-co" xml:id="inheritJoinSqlJava-1"> <para>The table implementing the root class <classname>inherit.joined.v1.BillingDetails</classname> of the inheritance hierarchy will be referenced both by <code>CreditCard</code> and <code>BankAccount</code> datasets and thus requires a key to become addressable. Moreover the corresponding <classname>inherit.joined.v1.BillingDetails</classname> class requires this attribute to be the primary key anyway.</para> </callout> <callout arearefs="inheritJoinSqlJava-2-co" xml:id="inheritJoinSqlJava-2"> <para>Each <code>CreditCard</code> specific set of attributes belongs to exactly one <code>BillingDetails</code> instance and hence the id within our table <code>CreditCard</code> must be unique.</para> </callout> <callout arearefs="inheritJoinSqlJava-3-co" xml:id="inheritJoinSqlJava-3"> <para>As stated in <coref linkend="inheritJoinSqlJava-2-co"/> each <code>CreditCard</code> dataset must refer to its parent <code>BillingDetails</code> instance.</para> </callout> <callout arearefs="inheritJoinSqlJava-4-co inheritJoinSqlJava-5-co" xml:id="inheritJoinSqlJava-4"> <para>These constraints likewise describe <coref linkend="inheritJoinSqlJava-2-co"/> and <coref linkend="inheritJoinSqlJava-3-co"/> for <code>BankAccount</code> datasets.</para> </callout> </calloutlist> <para>The NOT NULL constraints implement their counterpart properties in the corresponding <link linkend="gloss_Java"><trademark>Java</trademark></link> objects.</para> <para>The mapping does not cover one important integrity constraint of our domain model: The base class <classname>inherit.joined.v1.BillingDetails</classname> is abstract. Thus each entry in the database must refer either to a <classname>inherit.joined.v1.CreditCard</classname> or a <classname>inherit.joined.v1.BankAccount</classname> instance. But the above database schema allows for datasets to appear in the <code>BillingDetails</code> table not being referenced by either <code>BankAccount</code> or <code>CreditCard</code> datasets.</para> <para>So the current database schema actually refers to a domain model having a <emphasis role="bold">concrete</emphasis> base class <code>BillingDetails</code>.</para> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset role="exercise"> <title>Implementing figures by joined subclasses</title> <qandadiv> <qandaentry> <question> <para>Implement the model being given in <xref linkend="modelFigureInheritance"/> by joined subclasses.</para> </question> <answer> <para>See <classname>inherit.joined.v2.Figure</classname>.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> </section> <section xml:id="inheritTablePerConcrete"> <title>Table per concrete class</title> <para>Not covered here.</para> </section> </section> <section xml:id="mappingRelatedEntities"> <title>Mapping related entities</title> <section xml:id="primaryKeyRevisit"> <title>Primary keys revisited</title> <para>Following <xref linkend="Bauer05"/> (p.88) we list important properties of primary keys with respect to <quote>best practices</quote> on top of their relational counterparts:</para> <itemizedlist> <listitem> <para>A primary key's values never change</para> </listitem> <listitem> <para>Primary key values should not have a business meaning</para> </listitem> <listitem> <para>Primary keys should be chosen to have proper indexing support with respect to the database product in question.</para> </listitem> </itemizedlist> <para>Regarding persistence we have three different concepts regarding an object's identity:</para> <glosslist> <glossentry> <glossterm>Java Object identity</glossterm> <glossdef> <para>The operator == checks whether two identifiers point to the same memory address.</para> </glossdef> </glossentry> <glossentry> <glossterm>Java Object equality</glossterm> <glossdef> <para>The <methodname>Object.equals(Object)</methodname>.</para> </glossdef> </glossentry> <glossentry> <glossterm>Database identity</glossterm> <glossdef> <para>Two distinct datasets (tuples) are identical if all primary key attributes have the same value.</para> <para>In other words: Two distinct database objects differ at least in one primary key attribute.</para> </glossdef> </glossentry> </glosslist> <section xml:id="objectEqualityByPrimaryKey"> <title>Defining object equality by primary key</title> <para>Since JPA entities require a</para> </section> </section> <section xml:id="entityValueTypes"> <title>Entity and value types</title> <para>From the viewpoint of <link linkend="gloss_ORM">ORM</link> we distinguish two distinct types of database objects:</para> <glosslist> <glossentry> <glossterm>Entity type</glossterm> <glossdef> <para>Objects of this type do have their own database identity and may exist independently of other (database) entities.</para> </glossdef> </glossentry> <glossentry> <glossterm>Value type</glossterm> <glossdef> <para>An object of value type has no database identity. It will appear in a database as a composite of a parent entity type. Its lifecycle is completely dependent on its parent.</para> </glossdef> </glossentry> </glosslist> </section> <section xml:id="sect_MappingEmbeddedClass"> <title>Mapping a single embedded class</title> <para/> </section> </section> <section xml:id="sect_hibernateValidation"> <title>Hibernate validation</title> <para/> </section> </chapter> </part> <xi:include href="bibliography.xml" xpointer="element(/1)"/> </book>